Understanding a Trie

 

Imperial Palace Grounds, Tokyo

According to Wikipedia

In computer science, a trie, also called digital tree or prefix tree, is a type of search tree, a tree data structure used for locating specific keys from within a set.

Misconceptions

These were some of my misconceptions before reading up on a Trie -

  • A Trie is meant to optimize for storage.
  • A Trie cannot have repeated elements / characters
  • A Trie is always supposed to be a binary Tree

Populating a Trie

For the following example we have a set of words

  1. prsst
  2. pratt
  3. prank
  4. paid
  5. pain
  6. painful
  7. cats

A Trie is also called a prefix tree. The prefix in this case is each starting character(s) of the word. The first word "prsst" can be converted into a tree with "p" as its root. At this point it looks like a linked list

The 2nd and 3rd words "pratt" and "prank" share a common prefix - "pra", and they both in turn share the prefix "pr" with the first word. These characters of both respective words can be added to the tree -

The 4th and 5th words, "paid" and "pain" share a common prefix - "pai", and their respective characters can also be added to the tree -

At this point we have a tree where all the leaf nodes are the last characters in each of the words in our list. But what if we add a word that builds on a word that already exists. 

Example - "painful" that builds on the word "pain". How do we distinguish the two words in our tree ? Here we have a choice of adding a flag or property to each node to indicate that it is the last character in a word.

Alternatively, we can also add a counter to each character indicating how many times it is a last word. This is helpful if our word list contains dupes

Finally, What if we have a word that does not start with 'p' , like "cats" ? We need a different root that indicates the start of the Trie

Use cases

The most common use cases for a Trie involve strings and here are two of them

Autocomplete

A common use case for a Trie is string autocompletion. When given a set of words, a Trie should be able to optimally return all the words that start with a give prefix. This involves traversing the tree with the given prefix characters and then returning the sub tree. The sub tree now contains all the words required for auto completion

Lexicographical Sort

If all child nodes are sorted during population, then the Trie is automatically lexicographically sorted and the words can be returned in sorted order by doing a pre-order traversal of the tree.

Sources

https://en.wikipedia.org/wiki/Trie

https://www.youtube.com/watch?v=YG6iX28hmd0

https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014