Caching strategies according to Amazon (FAANG)

Hello everyone!

In this post, we’ll cover what caching strategies exist and which ones you can or should use in your project, following one of the most important tech companies in the world (a FAANG classic), Amazon.

We’ll see:

  • Why this is important
  • Things to consider
  • L1 VS L2 caching, when to use which
  • Real-life examples
  • Different implementations for L1 caching: Scheduled updates vs Guava Caches
  • L1 Caches with Kafka
  • Conclusions


Why this is important

Teclado con botón de tráfico Web. - foto de stock

As your system scales, your data is spread across different tables and different services.

A single request might need data from multiple sources hence querying all of them on every request will increase the latency and degrade performance. That’s where Caching can help by storing the data in a fast and accessible solution which helps you reduce the latency.

In addition to latency improvement, caching can also support improving availability by storing the last known state of entities.

If a service goes down or the availability of the service is low, caching will reduce the number of calls to that service and improve availability.


Things to consider

When thinking about which caching solution to use, you need to consider the following:

  • How often the data is changing.
  • What staleness of data is acceptable by your clients.
  • In which flows the cached will be used.


L1 VS L2 caching, when to use which

L1 Caching

L1 cache refers to in-memory caching (HashMap, Guava Cache)


  • Easy to use. The keys and values are models that you use in the system.
  • Performance, reading from memory is O(1) and no networking is involved.


  • Consistency across different hosts, since the data is stored per host: a specific item can have different values in different services depending on the time the item was loaded. Eventual consistency depends on cache parameters like TTL and refresh rate.
  • Limited in size.


L2 Caching

L2 cache refers to distributed caching (Memcached, Redis).

  • While L1 cache size is limited to the JVM memory of the specific process and consumes part of the application memory resource, an L2 cache can extend horizontally and store the data across multiple hosts.
  • Consistency across all hosts. We deploy our services across multiple hosts. A request can route to each one of them. When you use an L2 cache, all of them will read the cached data from the same source.
  • Latency, unlike L1 cache which is in memory. L2 cache adds network latency + serialise/deserialise of the data.
  • Cost and Maintenance. L2 is a different cluster that requires a different set of hosts plus operational maintenance.

L2 caching can scale by adding more hosts to cache the data. It is a centralised cache so eviction of specific entries is easy compared to try-and-clean L1 caches across multiple hosts of the service. Managing or eviction of entries in an L2 cache is usually been done in two modes:

  • TTL: every item will have an expiration time and after that, it will not be accessible anymore.
  • Active invalidation: you can add logic to your services where you actively evict items due to some change in state. For example: after an update to an object, you will want to remove/update it even if the TTL is not yet expired.


Real-Life examples

  • List of valid countries. We maintain a list of supported countries for a supplier. Every time we create/update a supplier, we need to verify the country. Since the data almost never changes, we can store it in an L1 cache and refresh it every X minutes.
  • You are storing a list of booking in L1 (getAllBookings response) and refresh it every few minutes. The cache is being used in getAllBooking for the list view where the latency is the highest. When considering how to use the listing page, potentially staleness of some booking data is less important. The cache is not being used in the detail view of a booking where we want to see the latest data and potentially update it. Trying to do updates on a stale (not latest) booking will fail. In that approach, we still get the latest data on a single booking page and have a slight delay on the general listing page.


Different implementations for L1 caching

There are different possible implementations for L1 caching. The main ones are:

Scheduled updates

The cache will refresh constantly (even if no requests are coming at all).

final ScheduledExecutorService executorCache = Executors.newScheduledThreadPool(1);
executorCache.scheduleWithFixedDelay(new Runnable() {
   public void run() {
      try {
         refreshCache(); // load all the items.
      } catch (Exception e) {
         log.error("foo", e);
}, 0, 10, TimeUnit.MINUTES);

Simple executor that runs every X minutes. This approach is useful when you want to block the staleness of data up to the refresh delay and your data size is small so you can load all the items to the memory.

  • This approach is useful when you don’t have a lot calls to the same key but you want to keep the cache always warm and avoid even the first cache miss.
  • Using this approach should only be considered when you have known amount of data which is not expected to increase significantly.
  • For example: Currencies data where you are limited by amount of currencies exists or country level data.
Major drawbacks
  • The cache is always being refreshed, even on the weekends and outside working hours when there is little to non-traffic.
  • Caching lots of objects periodically can cause a dramatic increase in your cost allocation.
  • The cache basically loads everything on memory so there is a limit to how much data it can store.


Guava Cache

This type of cache gives you more control and ways to customise the behaviour. It supports advanced options like async reloading which helps you keep the data up to date without impacting your performance.

cache = CacheBuilder.newBuilder()
new CacheLoader<String, Set<T>>() {
   public Set<T> load(final String key) {
      return reloadCache(key);
  • Unlike the scheduler approach, you don’t need to load all the data in advance. You can build your cache while the requests are coming and see improvement in latency as the cache gets bigger. You can set Max size to the cache and Guava will keep the latest items, making sure you are not breaching your memory capacity.
  • Unlike the scheduler, the cache will be refreshed only when get triggered by requests. This approach is useful when you expect multiple calls for the same key and it is ok that the first call will get the penalty of loading the item from DB.
  • Another benefit in comparison to the scheduler is that the cache can be limited by size and you can evict the LRU (Least recently used). Keep the memory signature lower compared to the scheduler approach.
Major drawbacks
  • The cache is not useful for flows when you fetch the data once.


Kafka Global Tables

Another worth option to mention as an L1 solution with auto-updates capabilities if you integrate Kafka in your architecture.

Kafka global table and Kafka streams allow a local service to pull all the data from a stream, materialise it and cache it locally.

For streams with a small amount of data, you can use the stream to get all the data instead of relying on schedulers and getAll Rest APIs.



As you can see there are many different types of caches you can use inside your system, so make sure you evaluate the situations and context of every use case before choosing which cache type you want to use.


Hope you found this content helpful! What are your thoughts? Let me know in the comments! 🙂

Más Posts