What Database Does Twitter Use?

What Database Does Twitter Use

There are many  people who would like to know about the database of different social media sites out of curiosity, or to get a sense of where the data gets stored.

Different sites use different types of databases like centralized databases, personal databases, NOSQL databases, and more.

Whatever, in this article, we have demonstrated What Database Does Twitter Use that will help you to know how it stores millions of tweets sent every single day.

We will start the article with a simple answer to the question; what database does twitter use, and then explain each one of them in detail.

So, without any further intro, let’s get into the main discussion.

What Database Does Twitter Use?

Twitter uses different databases for storing different data. For instance, Hadoop has been used  for analyzing social graphs, trends, recommendations, ad targeting, ad analytics, user engagement, MySQL backups, and for tweet impressions processing.

What Database Does Twitter Use

On the other hand, MySQL & Manhattan has databases that have been used as the initial data storage for keeping user data safe.

For caching, Memacach, and Redis are the two most vital databases that are used by twitter that support different types of conceptual data structures.

In twitter, to store social graphs, FlockDB has been used, where MetricksDB ensures putting up platform data metrics.

To store users’ images, videos, and objects that come with a large binary, twitter uses Blobstore. For which, your uploaded images, or videos, remain safe on your account.

See AlsoHow Do I Recover Twitter Drafts?

Functions of The Databases in Detail

In the above section, we have given you a brief idea about the databases that are used by twitter. Here, the individual function of the databases has been described with simple words.

1. Hadoop

The initial purpose of using Hadoop was to take MySQL Backups, but with the passage of time, the function of it has been changed. It is now used to put up data to operate analytics on the actions that the users perform on the platform.

Hadoop

It is also used for analyzing social graphs, trends, recommendations, API analytics, user engagement, ad targeting & analytics. Moreover, it is used for twitter impression processing, getting MySQL, and Manhattan Backups including storing front end scribe logs.

Hadoop can store more than 500 petabytes of data. Because of the database, over 150k services, and 130 millions containers are running on every single day.

2. Manhattan NoSQL

It is a distributed database that is used at twitter to provide millions of queries in a seconds with low invisibility being highly available.

Manhattan has been used to store tweets of users, direct messages, account details, and many more. In a single day, Manhattan clusters can manage over 10 millions queries per second.

The Manhattan system has been developed keeping several things in mind like reliable performance, availability, adjustbality, simple operation that will be up to working with hundreds of nodes & scalability.

This database is divided into four interfaces. These include storage service, storage engine, and core, and consistency model.

3. BlobStore

It is an scalable storage system that is used to store image, video, and other objects of users. The database has enabled twitter to reduce costs connected with storing users’ uploaded pictures with tweets.

BlobStore

BlobStore is a high performance system that is capable of providing images in low tens of milliseconds while getting an amount of thousands of requests per second.

When an image is uploaded to the BlobStore database, it organizes the images all over the data center of twitter with the help of an asynchronous queue server.

4. Memcached & Redis

The Memcach & Redis have been used on it that improvise performance while processing a number of data.

Memcache is a distributed memory cache system that is designed for easy to use, and is well developed as a cache or session store.

Memcached & Redis

On the other hand, Redis is an in- memory data structure that confers a rich set of features. Besides that, it is functional as a cache, database, queue, or message broker.

5. MySQL

MySQL was the first database of twitter that was used as the primary data store. Twitter holds one of the biggest formations of MySQL right from its beginning. It has MySQL clusters with hundreds or thousands of nodes providing millions of queries in a single second.

MySQL

MySQL is used in two cases. These are

  • It performs as the storage node for the delivered data store within the sharding framework of twitter.  The MySQL storage nodes also confirm best performance and reliability in the overall allocated data.
  • Acts as power services including ads, twitter trends, authentication, and other interval services.

6. Metrics DB

It is for storing metrics at twitter. The metric inclusion rate at twitter is over 5 billion metrics every minute with more than 25 thousands select statements or query in a minute.

In previous, Manhattan was used for storing metrics, but twitter confronted adjustability issues in addition to not getting support for additional minute metric tags.

To develop the Metrics DB, the compression algorithm of Facebook-In memory time-series database named Gorilla has been used.

This database confers multi-zone support, separating of matrices, and contracting efficiency out upon other data stores at twitter.

7. Flock DB

It is a distributed graph data store which is designed for fast graph crossings, keeping adjacency list, maintaining a high rate of add, remove, and update operations, flipping through millions of contenders, scaling, operating graph walking queries.

The FlockDB is used by twitter to store social graphs, data such as who gets followed by whom, and many more. But, as far as we know, twitter stopped using it, to store the social graph. It can be true or not.

The Infrastructure Behind Twitter: Scale

To deliver the best possible experience, twitter has brought some changes in its distribution of hardware, that have been described throughout the pie chart below.

Final Thought

Hopefully, you have got your answer regarding the question: What Database Does Twitter Use? Multiple databases are used at twitter for different functions that we have already illustrated above.

MySQL is the first database with which twitter started storing data. After that, they have included Manhattan, and Hadoop to run the analytics of the actions of users performed  on the twitter platform.

However, we have demonstrated all the databases that are used at twitter. Even then, if you have any query regarding the databases, let us know in the comment section below.

Leave a Comment