Dev Genius

Coding, Tutorials, News, UX, UI and much more related to development

Follow publication

B-trees: The Key to Many Database Indexes

--

B-trees are more generalized, self-balancing binary search trees that include multiple values within a single node and having more than 2 children. In this article, let’s dive deep into this ubiquitous data structure that powers most database indexes: B-trees.

Introduction

Before getting into the details of B-tree, let us review what a Binary Search Tree is. Following are the properties of a Binary Search Tree.

  • It is a binary tree — each node can have at most 2 children.
  • All the nodes in the left subtree of a node will have values less than the node’s value, and all the nodes in the right subtree will have values greater than the node’s value.

Below are 2 examples of BSTs,

As we can see in the above examples, the search operation can be achieved in O(log n) complexity in a balanced binary search tree. On the other hand, if the tree is not balanced, as shown on the right side, the search could take O(n) time.

To avoid the aforementioned problem and get faster search, insert and delete operations with O(log n) complexity, we need self-balancing trees. A B-tree is a self-balancing tree which is more generalized than a BST.

Properties

Let us understand some of the fundamental properties of the B-tree using an example.

A node can contain multiple keys and children.

As can be seen in the above image,

  1. A node in a B-tree can contain multiple keys(100 and 200 are keys) and references to children nodes(represented by dots).
  2. A node which can have k child node references will have k — 1 keys.
  3. The child node referenced from between the keys 100 and 200 will always hold keys in the range of 100–200.
  4. The number of children that a node could have is determined using another property — order.

Order of a B-tree

All B-trees have a fundamental property called order. The order of a B-tree determines the structure of a B-tree. In simple terms, a B-tree of order m can have at most m children. The example shown above is a tree of order 3.

All leaf nodes should be at the same level.

All the leaf nodes in a B-tree should appear in the same level. This ensures the self-balancing nature of B-trees and helps us achieve optimal time complexity.

Summarizing all the properties with some additional detailed properties below,

A B-tree of order m has the following properties,

  1. Maximum Children: Every node has at most m children
  2. Minimum Children for root: Root node should have at least 2 children, if it is not a leaf node itself.
  3. Minimum Children for Non-Leaf Nodes: Every non-leaf node except root has at least ⌈m/2⌉ children.
  4. Keys in Non-Leaf Nodes: A non-leaf node with k children will have k-1 keys.
  5. Leaf Node Level: All leaf nodes should be in the same level from the root.

Operations in B-tree

Let us try and understand how the operations like search and insertions work in a B-tree taking the below tree as the starting point.

Searching in a B-tree

Let us try to search for a value 140 in the B-tree shown in the above B-tree. Since we are looking for 140 which lies in between 100–200, we know that we should follow the pointer between the keys 100 and 200. Following this, we would get to the node which contains the key 140 and its value.

Inserting a new key in a B-tree

Let us try to insert a new value 145 in the same tree,

  1. 145 lies in between 100 and 200, so we have to follow the pointer between the keys 100 and 200.

2. There is no space to insert 145 in the node, since it violates the property of B-tree.

3. Hence the node gets split into two, and the median value is pushed up to the parent. But the parent node also is full and cannot accommodate the new value.

4. Hence the parent node is also split into two and the median value pushed up again, increasing the height of the tree by 1.

Summary

In this article, we explored the basic properties of B-tree and how it achieves logarithmic performance with respect to search and update operations.

References

Book: Designing Data Intensive Applications

Interesting visualization tool for playing around with B-trees

--

--

No responses yet

Write a response