LRU cache with Elixir


First steps

What is the cache? Let’s check Wikipedia page for the answer: https://en.wikipedia.org/wiki/Cache_(computing)

When the cache is full, the algorithm must choose which items to discard to make room for the new ones https://en.wikipedia.org/wiki/Cache_replacement_policies.

At this article we will implement LRU caching policy with Elixir: https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)

GenServer

What is the GenServer? https://hexdocs.pm/elixir/GenServer.html

A GenServer is a process like any other Elixir process and it can be used to keep state, execute code asynchronously and so on.

The advantage of using a generic server process (GenServer) implemented using this module is that it will have a standard set of interface functions and include functionality for tracing and error reporting.

So, in other words - GenServer just provide us something like a ‘rules’ or ‘skeleton’ which can be used for the structure of our application.

Agent

Agents are a simple abstraction around state https://hexdocs.pm/elixir/Agent.html

In most cases, you can use Agent instead of GenServer, it just provides you API. In fact the Agent module lets you easily create a process to hold state without implementing a bunch of stuff for a GenServer.

ETS

What is ETS?

In fact ets module is an interface to the Erlang built-in term storage.

These provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data.

Some additional information: https://openmymind.net/Learning-Elixir-ETS/

Most interesting part here - you can choose different ways to store data into tables:

set

A set table will tell you that each key instance must be unique.

ordered_set

There can still only be one key instance per table, but ordered_set adds a few other interesting properties. The first is that elements in an ordered_set table will be ordered. The first element of the table is the smallest one, and the last element is the largest one.

bag

A bag table can have multiple entries with the same key, as long as the tuples themselves are different. This means that the table can have {key, some, values} and {key, other, values} inside of it without a problem, which would be impossible with sets (they have the same key). However, you couldn’t have {key, some, values} twice in the table as they would be entirely identical.

duplicate_bag

The tables of this type work like bag tables, with the exception that they do allow entirely identical tuples to be held multiple time within the same table.

Implementing LRU caching policy

We will use two tables in our implementation:

  • table: for storing our cache values
  • time_table: for storing cache items expiration data

Let’s start from basic module and struct for the table, time_table and cache size

defmodule RequestCache do
  defstruct table: nil, time_table: nil, size: 0
end

We will initialize our storage with SIZE number, so it will be max number of cached elements in our cache.

Let’s start Agent with :init function as a param and SIZE param.

def start_link(size) do
  Agent.start_link(__MODULE__, :init, [size], name: :cache)
end

We should define a init function for this Agent:

def init(size) do
  time_table = :cache_time_table # time table name
  :ets.new(time_table, [:named_table, :ordered_set]) # new ets table, we're providing time table, type and :named_table for the access from everywhere
  :ets.new(:cache, [:named_table, :public]) # new ets table for cache, and public access rights
  %RequestCache{time_table: time_table, table: :cache, size: size} # our struct
end

Implementing put and get

We will use Agent interface here, let’s call Agent.get with our params, key and value:

def put(key, value) do
  Agent.get(:cache, __MODULE__, :put_func, [key, value])
end

def put_func(state = %{table: table}, key, value) do
  delete_time_entry(state, key) # first we're deleting old time entry from table
  position = insert_time_entry(state, key) # inserting time entry and get a position
  :ets.insert(table, {key, position, value}) # inserting cache entry
  reset_values(state) # removing old values, we will discuss implementation later
  :ok # return ok status
end

Getting results from ETS store:

def get(key) do
  case :ets.lookup(:cache, key) do # by key from cache records
    [{_, _, value}] ->
      Agent.get(:cache, __MODULE__, :get_func, [key]) # get value by key
      value
    [] ->
      nil # nil on empty
  end
end

def get_func(state = %{table: table}, key) do
  delete_time_entry(state, key) # deleting time entry for the key
  position = insert_time_entry(state, key) # inserting time entry and get a position
  :ets.update_element(table, key, [{2, position}]) # updating element on position
  :ok # return ok status
end

Implementing cleaning and inserting time entries

We will use Agent interface to put and get values, let’s call Agent.get with our params, key and value:


defp delete_time_entry(%{time_table: time_table, table: table}, key) do
  case :ets.lookup(table, key) do 
    [{_, old_position, _}] -> # getting position for item
      :ets.delete(time_table, old_position) # doing actual delete
    _ ->
      nil # nil on empty
  end
end

defp insert_time_entry(%{time_table: time_table}, key) do
  position = :erlang.unique_integer([:monotonic]) # getting uniq integer and this integer is always larger than previously returned integers on the current runtime system instance
  :ets.insert(time_table, {position, key}) # doing insert
  position # return position
end

Resetting values

defp reset_values(%{time_table: time_table, table: table, size: size}) do
  if :ets.info(table, :size) > size do # get table size
    oldest_timestamp = :ets.first(time_table) # find oldest timestamp
    [{_, old_key}] = :ets.lookup(time_table, oldest_timestamp) # find item by oldest item
    :ets.delete(time_table, oldest_timestamp) # deleting oldest item from time table
    :ets.delete(table, old_key) # deleting oldest key from cache
    true
  else
    nil
  end
end

Final results

RequestCache.start_link(2) # new cache process with 2 items max limit
RequestCache.put(:a, 1) # put a->1 { a: 1 }
RequestCache.put(:b, 2) # put b->2 { a: 1, b: 2 }
RequestCache.put(:c, 3) # can't put c, deleting a, and putting c { b: 2, c: 3}
RequestCache.put(:d, 4) # can't put d, deleting b, and putting d { c: 3, d: 4}
RequestCache.put(:d, 5) # updating d { c: 3, d: 5}

IO.inspect(RequestCache.get(:c))
IO.inspect(RequestCache.get(:d))

Result:

3
5

Stay tuned!