ESE's B-tree Database Structure

Last week, I opened "Pandora's box" by beginning a discussion about Exchange's Extensible Storage Engine (ESE). Many of you sent feedback indicating that this topic is long overdue and that you were looking forward to digging deeper. Therefore, without further delay, let's jump in and discuss more details about ESE. This week, I focus on the internal structure of ESE's key database file (.edb) and look at B-tree structures and how ESE represents them to the Exchange Information Store (IS) and users.

The fundamental construct of an Exchange database file is the B-tree structure. Having been around for 20+ years, B-trees aren't rocket science but a proven database structure that allows fast and efficient data access. The key to B-tree technology is a hierarchical arrangement of data pages that enforces certain structural constraints. This structure is similar to an upside-down tree, starting with a root node and building from there. The root page (node) is the first parent of the tree, and all other nodes flow from there. In B-tree technology, each node can have only one parent but parent nodes can have from zero to N child nodes under them. In this way, ESE's 4KB pages are arranged into tables that form a large database file containing Exchange data.

Microsoft has made some key design changes to the B-tree structure to ensure that data access is fast and efficient. B-tree technology typically doesn't specify the depth and width of these B-tree structures (other than they must be balanced), which means that a B-tree can extend to an unlimited number of levels (called tree depth) and width (called the fan-out, degree, or branching factor). ESE limits the depth and fan-out of the B-tree so that every 4KB page of data in the database can be accessed with a consistent number of I/Os to the database. For example, by allowing for a high fan-out (about 200 pointers per page) and low tree depth, ESE can guarantee that users can access any page of data (called a leaf node) within four I/Os or less. Tree depth has the greatest effect on performance. A uniform tree depth across the entire structure (every leaf node or data page is equidistant from the root node) means database performance is consistent and predictable. (By the way, the B in B-tree refers to a "Balanced" tree for this very reason.) You can enhance the B-tree by implementing the B+tree variant, which adds horizontal relationships between nodes in the tree structure. This feature lets pages point not only to parent and child nodes but to adjacent nodes as well. Although these extra pointers can add overhead, the benefits of increased database navigation outweigh the downside.

The B+tree technology in Exchange's ESE (as well as in Microsoft Active Directory—AD, WINS, and DNS) might seem a bit technical, but the important thing to remember is that ESE stores all information in the .edb files in this manner (We'll discuss .stm files later; they store data in a different manner and ESE doesn't access them directly). To the Exchange IS process (Store.exe), everything appears in the form of database tables that are joined and linked to represent what the user sees as the inbox. Each table comprises one B-tree that contains the data, but there might also be many secondary index B-trees that provide data views. As we move further into our discussion of the Exchange database internals next week, I'll forgo digging any deeper into the physical structure of the database and switch to the logical view of Exchange data. We'll look at how your Outlook inbox is compiled and presented from the server's point of view.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.