Wednesday, June 17, 2009

One of the advantages of doing things this way is that now we can build the tree containing the pieces of the anagram in such a way as to guarantee that there are no duplicates. Consider "staringstaring" again. This is sorted before searching for the anagrams giving "aaggiinnrrsstt". Now look up in the trie the first character of that string. Any anagram of the string must contain that first character, so, of course, if we find no lookup for "a" then there are no anagrams. If we find all the words in the subtrie of the trie containing "a" and any other characters of the string, then we can look up all the anagrams for them in the top level of the trie using the same procedure and building subtrees along the way. These then get stitched together into the complete tree. Even better, we don't need to even bother with the second "a" when we start at the top again because we already have all the anagrams that use an "a" and the multiplicity doesn't matter.

Even better, if you look at the structure of the tree of anagrams we're building, it is (drum roll please) just a trie again but with the indices being strings, so if we paramaterize the Trie we get the second tree we're using essentially for free. In this trie though, we don't need to remember values at the end points (though having something there doesn't hurt), we use the path down through the trie as the information we need. Indeed, if we construct it correctly, every node in the trie corresponds to a partial anagram, so the path to every leaf node describes the anagram (of sorted words, remember).

Of course, we're not using this trie the same way - the first trie is used as a search device, the second as a storage device, but the shape of the trie is the same in both cases. Also, the first trie is built from the top down - starting with an empty top node and adding things to it as we go, and the second from the bottom up. Also in the latter case, we need to prune empty subtries (which correspond to "failure to anagram").

No comments: