/ SpaceNotes /
| The Research Platform |

Data Structures Demystified: Arrays
An Anatomy of the Least Recently Used Cache
A step by step guide on building your own basic LRU cache with code examples.
Data Structures Demystified: Arrays
Abstract
Hash Tables > Linked List > Doubly Linked List > Build your own LRU Cache
This article covers two key data structures (Doubly Linked List & Hash tables) that are essential for the basic development of an LRU cache, it will discuss the codes that are involved, and guide you through the thought process of building one on your own. The purpose of a cache is to enable fast data retrieval through recently used resources reducing the reliance on the computer's main memory source.
Hash Tables
They're like a Book Library That Is Searchable Using the Book's Title
This data structure is built on the foundation of an array. It deserves an article on its own for greater depth about hash functions and collision handling, but we will cover the basics to help you understand why it's a necessary component in building an LRU cache.

A hash table organizes data so you can efficiently lookup values for a given key. Imagine if you would like to develop a system to store employee IDs and associate them with their residential addresses. Wouldn't it be amazing if, given an employee ID, we could know exactly what the residential address or any data associated with that particular employee is? A hash table allows us to do exactly that, in constant speed on average (and O(n) in worst case situations due to hash collisions). In this scenario, the key would be the employee ID, while the value would be the residential address of the employee.


Linked List (aka Singly Linked List)
They're Like Paperclips chained together

Data Structures Demystified: Arrays

A linked list is a linear data structure that stores data sequentially, with each node containing a pointer to the next upcoming node. Each particular node is only aware of the next node in its path. The first node in the linked list is referred to as the "head", with the last node in the linked list is known as the "tail".

Unlike an array, there is no index to reference the position of where you are in the linked list. This is why lookup or editing a node in the linked list gives you a time complexity of O(N) because you have to perform a linear search to obtain the node you want. However, you can add nodes on either end of the linked list at a constant speed. Due to its sequential nature, you can only traverse the linked list in one direction.

If you're wondering what's the use of a linked list? Here's your answer, Stacks and Queues can be implemented using a linked list, simply because they only need fast operations at the ends of a sequential data structure. Below is a full code implementation of a Queue using a Linked List.

Doubly Linked List
Like a Linked List but with a Twist

Data Structures Demystified: Arrays

On the other hand, a doubly linked list's node has two pointers, referencing to the next and previous nodes. Allowing us to traverse both backward and forward in the list.

Having the ability to know both who the neighboring nodes are, allows you to remove nodes in constant time when given a pointer to the node you would like to remove, because all you have to do, is just update the two pointers after deletion to reflect the new sequential order. Such an operation will take O(N) time to delete in a singly linked list. Making doubly linked lists more efficient on certain operations.

Building your own basic LRU Cache
Here's the fun part

Data Structures Demystified: Arrays

The recipe for an LRU cache consists of a Doubly Linked List and a Hash Table. All operations of the LRU cache can be achieved in constant time.

Using a Doubly Linked List, helps you keep track of the recency of the items in the cache. It is used to update the most recently visited component as well as evicting the least recently visited component in constant time. On top of that, utilizing a dictionary allows you to access any node via keys in constant time on average from the cache.

Let's go through a scenario with picture illustrations and code to see how it all works when these two data structures are pieced together.

Scenario 1:
  • Imagine if "A" was a web page, that you have just visited that was previously the least recently visited.
Data Structures Demystified: Arrays

Node "A" will be updated to the most recently viewed key. While Node B and Node C will be updated accordingly. Shown in the code below:



Scenario 2:
  • Visiting a brand new web page, node "D", that you have not visited before, while your cache is full.
Data Structures Demystified: Arrays

When we attempt to insert another key-value pair (D-4), Node B will be evicted because we have reached the maximum size of our cache. Shown by the code below:



In situations where the cache is not full, all nodes will be simply added into the cache.
Full Implementation of the LRU Cache