Lab 5
Lab 5
Overview
Spelling Suggestions In this assignment, you will implement an operation for an alphabetic trie to make spelling suggestions: given a target word, the operation will return a set of words that are “close” in spelling to the target.
Input:
- A trie with all the words in the vocabulary
- A target word
- A maximum distance
Output:
- List of all the words in the vocabulary with a distance less than or equal to the maximum specified distance.
For this lab, we will work with TrieNode
s that have the following structure:
TrieNode{
boolean isWord; //True if the current node is the end of a word
HashMap<character, TrieNode> children = new HashMap<Character, TrieNode>();
}
Additionally, every Trie has a root node, which can be initialized as follows:
TrieNode root = new TrieNode();
Measuring closeness
We measure “closeness” using the Levenshtein distance: the distance between two strings is the minimum number of the following transformations required to change one string into another. There are three possible transformations:
- Insert a single character
- Delete a single character
- Substitute one character for another in its place
For example, the Wikipedia article gives the example of “kitten” and “sitting” as strings at distance 3. The following operations transform the first string into the second:
substitute s
for k
at index 0
substitute i
for e
at index 4
insert g
at index 6 (the end of the string)
and there is no shorter sequence of operations to do so.
Strings that are equal are at distance 0. Distance is symmetrical, meaning that the number of operations to go from string s to string t is the same as from t to s: in particular, insertions can be reversed with deletions, and substitutions are reversible.
Step 0: Creating a markdown file for writeups
-
One person in your team should visit https://demo.hedgedoc.org/new to create an empty document
-
Share the URL of the document created with the others in your team (e.g., email them the link in the address bar). When someone else visits the URL, they are able to collaboratively edit the document.
-
Each person should copy and paste the URL of the document into the Moodle submission box for the “Lab 1” assignment. Do this right away, even before writing in the document, so that your course account has a link to your work.
Step 1: Trie review
Here is a toy example of a trie.
Include the answers to the following questions in your writeup:
- What are all the words contained in this trie (i.e., vocabulary)?
- For each word in the trie, identify a sequence of operations that can get you from the target word
loss
to the word in the trie.
What sequence of transformations can get you to any of the words in this trie from the target word loss
? What is the minimum distance between a word in the Trie and this target? What is the maximum distance?
Step 2: Building a system capable of only deletion
Let us start by assuming that the spelling suggestion system can only handle deletion. For example here are some transformations it should be able to handle:
- class $\rightarrow$ lass (delete c from beginning)
- blast $\rightarrow$ last (delete b from beginning)
- claps $\rightarrow$ clap (delete s from end)
- blasts $\rightarrow$ last (delete b from beginning and s from end)
Write pseudocode for a recursive function that can handle deletions. Your function should take the same input parameters as in the previous step.
Your function should take the following input parameters:
TrieNode node
: the current node in the Trie you are onString target
: the target wordString prefix
: the current string you have built up so farint index
: the current index of the target word you are onint distance
: the current distance from the target word to the word being built.int tolerance
: the maximum number of edits that can be made before an origin word is discarded as a suggestion.HashMap<String, Integer> words
: A HashMap of words to the distance from target. Since there are many possible ways of sequence of transformations that can get you from an origin word to target word, in your final HashMap, for each origin word you only want to store the smallest distance.
The function should not return anything, but instead modify the HashMap words.
Hint: a part of your function should be able to detect when there is an exact match. For example, once you’ve deleted c from the target class, you will need to know that the remainder of the word (i.e., lass) is in the trie.
Step 3: Building a system capable of only insertion
Now assume that the spelling suggestion system can only handle insertion. For example here are some transformations it should be able to handle:
- loss $\rightarrow$ gloss (insert g at beginning)
- loss $\rightarrow$ floss (insert f at beginning)
- class $\rightarrow$ classy (insert y at end)
- ass $\rightarrow$ glass (insert g and l at beginning)
- gas $\rightarrow$ glass (insert l in middle and s at end)
Write pseudocode for a recursive function that can handle insertions. Your function should take the same input parameters as in the previous step.
Step 4: Building a system capable of only substitution
Now assume that the spelling suggestion system can only handle substitution. For example here are some transformations it should be able to handle:
- list $\rightarrow$ last (change i to a)
- glassy $\rightarrow$ classy (change g to c)
- glassy $\rightarrow$ glossy (change a to o)
- crap $\rightarrow$ clap (change r to l)
- claps $\rightarrow$ crass (change l to r and p to s)
Write pseudocode for a recursive function that can handle substitutions. Your function should take the same input parameters as in the previous step.
Step 5: Building a system capable of handling all transformations
Combine the pseudocode from the previous three functions to create one suggestions
function that can handle all three transformations.
In your writeup also include an explanation of why this function works.
Step 6: Submit the writeup
- Each group member should submit the same markdown file to Moodle. In the submission comment add a link to the hedgedoc file and also indicate who you worked with.
Step 7: Understanding the program requirements
Your task is to complete the implementation of the suggestions method, which has the following signature:
public HashMap<String, Integer> suggestions(String target, int dist)
This method returns a collection of strings in the trie that are within a Levenshtein distance of dist from the given target string (which itself may or may not be in the trie and thus may or may not be included in the returned collection).
You can assume that target consists only of lowercase alphabetic characters.
The collection is returned as a map, where keys are strings in the trie, and the associated value for each string is its Levenshtein distance from the target.
Starter Files
The starter file, Trie.java
, contains a partial implementation of a standard alphabetic trie containing an add method that inserts a given String into the trie. When adding, all words are converted to lowercase, and only alphabetic characters are processed.
Upload your completed Trie.java source-code file, with an implementation of a suggestions method declared exactly as above, to Moodle, along with any helper files.
Your uploaded Trie.java file must have the following methods to compile and run after executing “java ComputeScores”:
- a no-argument constructor
- a working add(String) method
- a suggestions method as described above
You are free to otherwise modify the implementation of the Trie class.
Step 8 (homework): Coding, Testing, Program Submission
The Moodle page has a link to submit your code. Please submit your Trie.java
file only.
Scoring
To get scores call “java ComputeScores” after compiling your source code. Remember the score provided by the autograder for a given assignment will make up 90% of your homework score for that assignment. The other 10% will be based on the structure of the code, test cases and comments that explain both the logic of your code and the test cases.
Notes and hints
- Do not try to generate strings at some particular distance away from the target and then check if the string is in the trie. For tries with large alphabets but a small number of words or nodes, this is extremely inefficient.
- You should navigate the trie, recursively, keeping track of your location (node) and your current distance from the target string. You can trim your search when you have exceeded the distance.
- Don’t change the declaration of the public suggestions method. If you need a recursive version, create a private helper method. Of course, you can add as many additional methods as you want.
- You can make changes to the trie or node implementation, as long as the add method remains with the same signature and works correctly.
- Don’t forget to check the boolean markers in the trie to determine if nodes correspond to words that were added to the trie.
- Remember that there may be multiple sequences of single-character transformations to get from target to any string s. Strings should be included in the returned map at their Levenshtein distance, which is the minimum length of such a sequence.
- Multiple calls to suggestions for the same trie should work correctly, and the map returned should reflect the words currently in the trie at the time the method is called (so, there could be intervening add operations). Be careful with any global or class variables or side-effects caused by your implementation of suggestions.