Algorithm of Log Structure Merge Tree (LSM-Tree)

Shiv Iyer
Posted on January 16, 2023

How does the Log Structure Merge Tree (LSM-Tree) work?

Log Structured Merge Tree (LSM-Tree) is a data structure that is used to implement high-performance, disk-based storage systems such as RocksDB. The algorithm of LSM-Tree is based on the following key concepts:

  1. A log-structured data structure: Instead of directly modifying the data on disk, LSM-Tree stores the data in a log-structured format, where new data is appended to the end of the log and existing data is not modified in-place. This allows for more efficient disk usage and better performance when writing data.
  2. A merge process: LSM-Tree uses a merge process to periodically combine smaller SSTables (sorted string tables) into larger ones. This process is called compaction, and it helps to reduce the number of SSTables that need to be read when querying the data.
  3. A memory-based index: LSM-Tree uses a memory-based index, typically a Skip List or a B-Tree, to quickly locate the data in the SSTables. This index is called the Memtable, and it is used to quickly locate the data in the SSTables.
  4. A tiered storage: LSM-Tree uses a tiered storage system where data is first written to the Memtable, then to the lowest level of SSTables, and eventually to higher levels as data is merged. This tiered storage system allows for faster read operations by searching through the lower levels of SSTables first.
  5. A bloom filter: LSM-Tree uses a bloom filter to check whether a key exists in the SSTables or not. This helps to avoid unnecessary disk accesses, improving the performance of read operations.

The LSM-Tree algorithm is designed to handle high write loads