Programing

페이스 북 데이터베이스 디자인?

lottogame 2020. 7. 2. 07:47
반응형

페이스 북 데이터베이스 디자인?


나는 Facebook이 어떻게 친구 <-> 사용자 관계를 어떻게 설계했는지 궁금했습니다.

사용자 테이블이 다음과 같다고 생각합니다.

user_email PK
user_id PK
password 

나는 사용자의 데이터 (성별, 연령 등이 사용자 이메일로 연결되어 있음)로 테이블을 계산합니다.

모든 친구를이 사용자와 어떻게 연결합니까?

이 같은?

user_id
friend_id_1
friend_id_2
friend_id_3
friend_id_N 

아마 아닙니다. 사용자 수는 알 수 없으며 확장되기 때문입니다.


UserID를 보유한 친구 테이블과 그 다음 친구의 UserID를 유지하십시오 (이를 FriendID라고 함). 두 열 모두 사용자 테이블에 대한 외래 키입니다.

다소 유용한 예 :

Table Name: User
Columns:
    UserID PK
    EmailAddress
    Password
    Gender
    DOB
    Location

TableName: Friends
Columns:
    UserID PK FK
    FriendID PK FK
    (This table features a composite primary key made up of the two foreign 
     keys, both pointing back to the user table. One ID will point to the
     logged in user, the other ID will point to the individual friend
     of that user)

사용법 예 :

Table User
--------------
UserID EmailAddress Password Gender DOB      Location
------------------------------------------------------
1      bob@bob.com  bobbie   M      1/1/2009 New York City
2      jon@jon.com  jonathan M      2/2/2008 Los Angeles
3      joe@joe.com  joseph   M      1/2/2007 Pittsburgh

Table Friends
---------------
UserID FriendID
----------------
1      2
1      3
2      3

이것은 Bob이 Jon과 Joe의 친구이고 Jon이 Joe 와도 친구임을 나타냅니다. 이 예에서는 우정이 항상 두 가지 방법이라고 가정하므로 (2,1) 또는 (3,2)와 같이 테이블에 이미 다른 방향으로 표시되어 있으므로 행이 필요하지 않습니다. 우정 또는 기타 관계가 명시 적으로 양방향이 아닌 예의 경우 양방향 관계를 나타 내기 위해 해당 행도 있어야합니다.


Anatoly Lubarsky가 리버스 엔지니어링 한 다음 데이터베이스 스키마를 살펴보십시오 .

페이스 북 스키마


TL; DR :

이들은 스택의 MySQL 맨 위에있는 모든 것에 대해 캐시 된 그래프가있는 스택 아키텍처를 사용합니다.

긴 답변 :

I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way. I've seen people complaining about custom made social network scripts becoming slow when the user base grows. After I did some benchmarking myself with just 10k users and 2.5 million friend connections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed. So I've spent some time searching the web on how to do it better and came across this official Facebook article:

I really recommend you to watch the presentation of the first link above before continue reading. It's probably the best explanation of how FB works behind the scenes you can find.

The video and article tells you a few things:

  • They're using MySQL at the very bottom of their stack
  • Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.
  • I could not find anything on what software / DB they actually use for their cached graphs

Let's take a look at this, friend connections are top left:

여기에 이미지 설명을 입력하십시오

Well, this is a graph. :) It doesn't tell you how to build it in SQL, there are several ways to do it but this site has a good amount of different approaches. Attention: Consider that a relational DB is what it is: It's thought to store normalised data, not a graph structure. So it won't perform as good as a specialised graph database.

Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.

I can't tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.

Here is my disappointing test for just findings friends of friends:

DB Schema:

CREATE TABLE IF NOT EXISTS `friends` (
`id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `friend_id` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Friends of Friends Query:

(
        select friend_id
        from friends
        where user_id = 1
    ) union (
        select distinct ff.friend_id
        from
            friends f
            join friends ff on ff.user_id = f.friend_id
        where f.user_id = 1
    )

I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 seconds for that query. Maybe it can be optimized, I'm not a DB genius (suggestions are welcome). However, if this scales linear you're already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.

This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn't do any more complex query like "display me only posts from friends of friends + do the permission check if I'm allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them". You want to let the DB do the check on if you liked a post already or not or you'll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.

I think my answer answers the question how Facebook designed their friends relationship very well but I'm sorry that I can't tell you how to implement it in a way it will work fast. Implementing a social network is easy but making sure it performs well is clearly not - IMHO.

I've started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I'll write an article about it.


My best bet is that they created a graph structure. The nodes are users and "friendships" are edges.

Keep one table of users, keep another table of edges. Then you can keep data about the edges, like "day they became friends" and "approved status," etc.


It's most likely a many to many relationship:

FriendList (table)

user_id -> users.user_id
friend_id -> users.user_id
friendVisibilityLevel

EDIT

The user table probably doesn't have user_email as a PK, possibly as a unique key though.

users (table)

user_id PK
user_email
password

Take a look at these articles describing how LinkedIn and Digg are built:

There's also "Big Data: Viewpoints from the Facebook Data Team" that might be helpful:

http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html

Also, there's this article that talks about non-relational databases and how they're used by some companies:

http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.

There are a lot of links on the first two articles that should give you some more insight.

UPDATE 10/20/2014

Murat Demirbas wrote a summary on

  • TAO: Facebook's distributed data store for the social graph (ATC'13)
  • F4: Facebook's warm BLOB storage system (OSDI'14)

http://muratbuffalo.blogspot.com/2014/10/facebooks-software-architecture.html

HTH


It's not possible to retrieve data from RDBMS for user friends data for data which cross more than half a billion at a constant time so Facebook implemented this using a hash database (no SQL) and they opensourced the database called Cassandra.

So every user has its own key and the friends details in a queue; to know how cassandra works look at this:

http://prasath.posterous.com/cassandra-55


This recent June 2013 post goes into some detail into explaining the transition from relationship databases to objects with associations for some data types.

https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920

There's a longer paper available at https://www.usenix.org/conference/atc13/tao-facebook’s-distributed-data-store-social-graph


You're looking for foreign keys. Basically you can't have an array in a database unless it has it's own table.


Example schema:

    Users Table
        userID PK
        other data
    Friends Table
        userID   -- FK to users's table representing the user that has a friend.
        friendID -- FK to Users' table representing the user id of the friend

Its a type of graph database: http://components.neo4j.org/neo4j-examples/1.2-SNAPSHOT/social-network.html

Its not related to Relational databases.

Google for graph databases.


Keep in mind that database tables are designed to grow vertically (more rows), not horizontally (more columns)


Regarding the performance of a many-to-many table, if you have 2 32-bit ints linking user IDs, your basic data storage for 200,000,000 users averaging 200 friends apiece is just under 300GB.

Obviously, you would need some partitioning and indexing and you're not going to keep that in memory for all users.


Probably there is a table, which stores the friend <-> user relation, say "frnd_list", having fields 'user_id','frnd_id'.

Whenever a user adds another user as a friend, two new rows are created.

For instance, suppose my id is 'deep9c' and I add a user having id 'akash3b' as my friend, then two new rows are created in table "frnd_list" with values ('deep9c','akash3b') and ('akash3b','deep9c').

이제 특정 사용자에게 친구 목록을 표시 할 때 간단한 sql은 다음과 같이 수행합니다. "frnd_list에서 frnd_id 선택.

참고 URL : https://stackoverflow.com/questions/1009025/facebook-database-design

반응형