Lab 5

Lab 5

Overview

Spelling Suggestions In this assignment, you will implement an operation for an alphabetic trie to make spelling suggestions: given a target word, the operation will return a set of words that are “close” in spelling to the target.

Input:

Output:

For this lab, we will work with TrieNodes that have the following structure:

TrieNode{
    boolean isWord; //True if the current node is the end of a word
    HashMap<character, TrieNode> children = new HashMap<Character, TrieNode>();
}

Additionally, every Trie has a root node, which can be initialized as follows:

TrieNode root = new TrieNode();

Measuring closeness

We measure “closeness” using the Levenshtein distance: the distance between two strings is the minimum number of the following transformations required to change one string into another. There are three possible transformations:

For example, the Wikipedia article gives the example of “kitten” and “sitting” as strings at distance 3. The following operations transform the first string into the second:

substitute s for k at index 0 substitute i for e at index 4 insert g at index 6 (the end of the string) and there is no shorter sequence of operations to do so.

Strings that are equal are at distance 0. Distance is symmetrical, meaning that the number of operations to go from string s to string t is the same as from t to s: in particular, insertions can be reversed with deletions, and substitutions are reversible.

Step 0: Creating a markdown file for writeups

Step 1: Trie review

Here is a toy example of a trie.

Include the answers to the following questions in your writeup:

  1. What are all the words contained in this trie (i.e., vocabulary)?
  2. For each word in the trie, identify a sequence of operations that can get you from the target word loss to the word in the trie.

What sequence of transformations can get you to any of the words in this trie from the target word loss? What is the minimum distance between a word in the Trie and this target? What is the maximum distance?

Step 2: Building a system capable of only deletion

Let us start by assuming that the spelling suggestion system can only handle deletion. For example here are some transformations it should be able to handle:

Write pseudocode for a recursive function that can handle deletions. Your function should take the same input parameters as in the previous step.

Your function should take the following input parameters:

The function should not return anything, but instead modify the HashMap words.

Hint: a part of your function should be able to detect when there is an exact match. For example, once you’ve deleted c from the target class, you will need to know that the remainder of the word (i.e., lass) is in the trie.

Step 3: Building a system capable of only insertion

Now assume that the spelling suggestion system can only handle insertion. For example here are some transformations it should be able to handle:

Write pseudocode for a recursive function that can handle insertions. Your function should take the same input parameters as in the previous step.

Step 4: Building a system capable of only substitution

Now assume that the spelling suggestion system can only handle substitution. For example here are some transformations it should be able to handle:

Write pseudocode for a recursive function that can handle substitutions. Your function should take the same input parameters as in the previous step.

Step 5: Building a system capable of handling all transformations

Combine the pseudocode from the previous three functions to create one suggestions function that can handle all three transformations.

In your writeup also include an explanation of why this function works.

Step 6: Submit the writeup

Step 7: Understanding the program requirements

Your task is to complete the implementation of the suggestions method, which has the following signature:

public HashMap<String, Integer> suggestions(String target, int dist)

This method returns a collection of strings in the trie that are within a Levenshtein distance of dist from the given target string (which itself may or may not be in the trie and thus may or may not be included in the returned collection).

You can assume that target consists only of lowercase alphabetic characters.

The collection is returned as a map, where keys are strings in the trie, and the associated value for each string is its Levenshtein distance from the target.

Starter Files

The starter file, Trie.java, contains a partial implementation of a standard alphabetic trie containing an add method that inserts a given String into the trie. When adding, all words are converted to lowercase, and only alphabetic characters are processed.

Upload your completed Trie.java source-code file, with an implementation of a suggestions method declared exactly as above, to Moodle, along with any helper files.

Your uploaded Trie.java file must have the following methods to compile and run after executing “java ComputeScores”:

You are free to otherwise modify the implementation of the Trie class.

Step 8 (homework): Coding, Testing, Program Submission

The Moodle page has a link to submit your code. Please submit your Trie.java file only.

Scoring

To get scores call “java ComputeScores” after compiling your source code. Remember the score provided by the autograder for a given assignment will make up 90% of your homework score for that assignment. The other 10% will be based on the structure of the code, test cases and comments that explain both the logic of your code and the test cases.

Notes and hints