The Tao : Facebook’s Distributed Data Store For The Social Graph

Open Source

Platforms Infrastructure Systems Physical Infrastructure Video & AR/VR Artificial Intelligence

Facebook puts an extremely demanding workload on its data backend. Every time any one of over a billion active users visits Facebook through a desktop browser or on a sản phẩm điện thoại device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends — the menu goes on. The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented khổng lồ users ahead of time. Thus, the data phối must be retrieved & rendered on the fly in a few hundred milliseconds.

Bạn đang xem: The Tao : Facebook’S Distributed Data Store For The Social Graph

This challenge is made more difficult because the data set is not easily partitionable, và by the tendency of some items, such as photos of celebrities, lớn have sầu request rates that can spike significantly. Multiply this by the millions of times per second this kind of highly customized data set must be delivered lớn users, & you have a constantly changing, read-dominated workload that is incredibly challenging lớn serve efficiently.

Memcache & MySQL

Facebook has always realized that even the best relational database công nghệ available is a poor match for this challenge unless it is supplemented by a large distributed cađậy that offloads the persistent store. Memcabịt has played that role since Mark Zuckerberg installed it on Facebook’s Apabít web servers bachồng in 2005. As efficient as MySQL is at managing data on disk, the assumptions built inkhổng lồ the InnoDB buffer pool algorithms don’t match the request pattern of serving the social graph. The spatial locality on ordered data sets that a bloông chồng cache attempts to exploit is not common in Facebook workloads. Instead, what we hotline creation time locality dominates the workload — a data thắng lợi is likely to lớn be accessed if it has been recently created. Another source of mismatch between our workload and the design assumptions of a bloông chồng cađậy is the fact that a relatively large percentage of requests are for relations that vì chưng not exist — e.g., “Does this user lượt thích that story?” is false for most of the stories in a user’s News Feed. Given the overall lack of spatial locality, pulling several kilobytes of data into a bloông chồng cache to answer such queries just pollutes the cache and contributes khổng lồ the lower overall hit rate in the blochồng cabít of a persistent store.

The use of memcache vastly improved the memory efficiency of caching the social graph & allowed us lớn scale in a cost-effective sầu way. However, the code that product engineers had to write for storing & retrieving their data became quite complex. Even though memcabít has “cache” in its name, it’s really a general-purpose networked in-memory data store with a key-value data mã sản phẩm. It will not automatically fill itself on a cabít miss or maintain cađậy consistency. Product engineers had lớn work with two data stores & very different data models: a large cluster of MySQL servers for storing data persistently in relational tables, và an equally large collection of memcađậy servers for storing & serving flat key-value pairs derived (some indirectly) from the results of Squốc lộ queries. Even with most of the comtháng chores encapsulated in a data access library, using the memcache-MySquốc lộ combination efficiently as a data store required quite a bit of knowledge of system internals on the part of product engineers. Inevitably, some made mistakes that led to lớn bugs, user-visible inconsistencies, và site performance issues. In addition, changing table schemas as products evolved required coordination between engineers and MySQL cluster operators. This slowed down the change-debug-release cycle & didn’t fit well with Facebook’s “move sầu fast” development philosophy.

Objects và associations

In 2007, a few Facebook engineers phối out to lớn define new data storage abstractions that would fit the needs of all but the most demanding features of the site while hiding most of the complexity of the underlying distributed data store from hàng hóa engineers. The Objects & Associations API that they created was based on the graph data model & was initially implemented in PHP và ran on Facebook’s website servers. It represented data items as nodes (objects), and relationships between them as edges (associations). The API was an immediate success, with several high-profile features, such as likes, pages, và events implemented entirely on objects và associations, with no direct memcađậy or MySquốc lộ calls.

As adoption of the new API grew, several limitations of the client-side implementation became apparent. First, small incremental updates to lớn a các mục of edges required invalidation of the entire công trình that stored the menu in cađậy, reducing hit rate. Second, requests operating on a danh mục of edges had lớn always transfer the entire danh sách from memcabít servers over to the website servers, even if the final result contained only a few edges or was empty. This wasted network bandwidth & CPU cycles. Third, cađậy consistency was difficult to lớn maintain. Finally, avoiding thundering herds in a purely client-side implementation required a size of distributed coordination that was not available for memcache-backed data at the time.

All those problems could be solved directly by writing a custom distributed service designed around objects & associations. In early 2009, a team of Facebook infrastructure engineers started to work on TAO (“The Associations & Objects”). TAO has now been in production for several years. It runs on a large collection of geographically distributed hệ thống clusters. TAO serves thousands of data types & handles over a billion read requests và millions of write requests every second. Before we take a look at its kiến thiết, let’s quickly go over the graph data Model và the API that TAO implements.

TAO data Model & API


This simple example shows a subgraph of objects & associations that is created in TAO after Alice checks in at the Golden Gate Bridge and tags Bob there, while Cathy comments on the check-in và David likes it. Every data tác phẩm, such as a user, check-in, or bình luận, is represented by a typed object containing a dictionary of named fields. Relationships between objects, such as “liked by” or “frikết thúc of,” are represented by typed edges (associations) grouped in association lists by their origin. Multiple associations may connect the same pair of objects as long as the types of all those associations are distinct. Together objects and associations size a labeled directed multigraph.

For every association type a so-called inverse type can be specified. Whenever an edge of the direct type is created or deleted between objects with quality IDs id1 and id2, TAO will automatically create or delete an edge of the corresponding inverse type in the opposite direction (id2 to lớn id1). The intent is lớn help the application programmer maintain referential integrity for relationships that are naturally mutual, lượt thích friendship, or where tư vấn for graph traversal in both directions is performance critical, as for example in “likes” and “liked by.”

The phối of operations on objects is of the fairly comtháng create / set-fields / get / delete variety. All objects of a given type have sầu the same set of fields. New fields can be registered for an object type at any time and existing fields can be marked deprecated by editing that type’s schema. In most cases product engineers can change the schemas of their types without any operational work.

Associations are created and deleted as individual edges. If the association type has an inverse type defined, an inverse edge is created automatically. The API helps the data store exploit the creation-time locality of workload by requiring every association to have sầu a special time attribute that is commonly used lớn represent the creation time of association. TAO uses the association time value khổng lồ optimize the working set in cache & to improve sầu hit rate.

Xem thêm: Ta Là Viết Tắt Của Từ Gì ? Cách Mở Rộng Mạng Lưới Ta Và Ota Cho Khách Sạn

There are three main classes of read operations on associations:

Point queries look up specific associations identified by their (id1, type, id2) triplets. Most often they are used to lớn check if two objects are connected by an association or not, or khổng lồ fetch data for an association.Range queries find outgoing associations given an (id1, type) pair. Associations are ordered by time, so these queries are commonly used lớn answer questions like “What are the 50 most recent comments on this piece of content?” Cursor-based iteration is provided as well.Count queries give sầu the total number of outgoing associations for an (id1, type) pair. TAO optionally keeps traông chồng of counts as association lists grow and shrink, và can report them in constant time

We have kept the TAO API simple on purpose. For instance, it does not offer any operations for complex traversals or pattern matching on the graph. Executing such queries while responding to a user request is almost always a suboptimal design decision. TAO does not offer a server-side mix intersection primitive. Instead we provide a client library function. The laông xã of clustering in the data phối virtually guarantees that having the client orchestrate the intersection through a sequence of simple point and range queries on associations will require about the same amount of network bandwidth and processing power as doing such intersections entirely on the server side. The simplithành phố of TAO API helps sản phẩm engineers find an optimal division of labor between application servers, data store servers, & the network connecting them.


The TAO service runs across a collection of server clusters geographically distributed & organized logically as a tree. Separate clusters are used for storing objects & associations persistently, & for caching them in RAM and Flash memory. This separation allows us to lớn scale different types of clusters independently và lớn make efficient use of the server hardware.

Client requests are always sent to caching clusters running TAO servers. In addition to satisfying most read requests from a write-through cache, TAO servers orchestrate the execution of writes và maintain cache consistency among muốn all TAO clusters. We continue to lớn use MySQL to manage persistent storage for TAO objects & associations.

The data mix managed by TAO is partitioned inkhổng lồ hundreds of thousands of shards. All objects and associations in the same shard are stored persistently in the same MySquốc lộ database, & are cached on the same phối of servers in each caching cluster. Individual objects & associations can optionally be assigned to lớn specific shards at creation time. Controlling the degree of data collocation proved to be an important optimization technique for reducing communication overhead và avoiding hot spots.

Shards can be migrated or cloned ahy vọng servers in the same cluster to lớn equalize the load & to smooth out load spikes. Load spikes are comtháng and happen when a handful of objects or associations become extremely popular as they appear in the News Feeds of tens of millions of users at the same time.

There are two tiers of caching clusters in each geographical region. Clients talk khổng lồ the first tier, called followers. If a cache miss occurs on the follower, the follower attempts khổng lồ fill its cabịt from a second tier, called a leader. Leaders talk directly to lớn a MySQL cluster in that region. All TAO writes go through followers to leaders. Caches are updated as the reply to a successful write propagates back down the chain of clusters. Leaders are responsible for maintaining cabít consistency within a region. They also act as secondary caches, with an option to cache objects và associations in Flash. Last but not least, they provide an additional safety net lớn protect the persistent store during planned or unplanned outages.


We chose eventual consistency as the mặc định consistency Model for TAO. Our choice was driven by both performance considerations & the inescapable consequences of CAPhường theorem for practical distributed systems, where machine failures và network partitioning (even within the data center) are a virtual certainty. For many of our products, TAO losing consistency is a lesser evil than losing availability. TAO tries hard to guarantee with high probability that users always see their own updates. For the few use cases requiring strong consistency, TAO clients may override the default policy at the expense of higher processing cost & potential loss of availability.

We run TAO as single primary region per shard và rely on MySquốc lộ replication lớn propagate updates from the region where the shard is primary to all other regions (secondary regions). A secondary region cannot update the shard in its regional persistent store. It forwards all writes lớn the shard’s primary region. The write-through thiết kế of cađậy simplifies maintaining read-after-write consistency for writes that are made in a secondary region for the affected shard. If necessary, the primary region can be switched lớn another region at any time. This is an automated procedure that is commonly used for restoring availability when a hardware failure brings down a MySquốc lộ instance.

A massive amount of effort has gone into lớn making TAO the easy lớn use và powerful distributed data store that it is today. TAO has become one of the most important data stores at Facebook — the power of graph helps us tame the demanding & dynamic social workload. For more details on the kiến thiết, implementation, và performance of TAO I invite you khổng lồ read our technical paper published in the Usenix ATC ‘13 proceedings.

Xem thêm: Tết Năm 2015 Vào Ngày Nào - Theo Dương Lịch Âm Lịch Năm Ất Mùi 2015

Thanks khổng lồ all the engineers who worked on building TAO: Zach Amsden, Nathan Bronson, George Cabrera, Prasad Chakka, Tom Conerly, Peter Dimov, Hui Ding, Mark Drayton, Jack Ferris, Anthony Giardullo, Sathya Gunasekar, Sachin Kulkarni, Nathan Lawrence, Bo Liu, Sarang Masti, Jyên Meyering, Dmitri Petrov, Hal Prince, Lovro Puzar, Terry Shen, Tony Savor, David Goode, and Venkat Venkataramani.

In an effort khổng lồ be more inclusive sầu in our language, we have sầu edited this post khổng lồ replace the terms “master” & “slave” with “primary” & “secondary.”