Define the Schema

Data Set

We will use the LDBC Social Network Benchmark (LDBC SNB) data set. This data set models a twitter-like social forum. It comes with a data generator, which allows you to generate data at different scale factors. Scale factor 1 generates roughly 1GB raw data, scale factor 10 generates roughly 10GB raw data, etc.

Figure 1 shows the schema (from the LDBC SNB specification). It models the activities and relationships of social forum participants. For example, a forum Member can publish Posts on a Forum, and other Members of the Forum can make a Comment on the Post or on someone else's Comment. A Person's home location is a hierarchy (Continent>Country>City), and a person can be affiliated with a University or a Company. Tags can be used to classify a Forum and a Person's interests. Tags can belong to a TagClass. The relationships between entities are modeled as directed edges. For example, Person connects to Tag by the hasInterest edge. Forum connects to Person by two different edges, hasMember and hasModerator.

LDBC SNB schema uses inheritance to model certain entity type relationships:

  • Message is the superclass of Post and Comment.

  • Place is the superclass of City, Country, and Continent.

  • Organization is the superclass of University and Company.

We do not use the superclasses in our graph model. When there is an edge type connecting an entity to a superclass, we instead create an edge type from the entity to each of the subclasses of the superclass. For example, Message has an isLocatedIn relationship to Country. Since Message has two subclasses, Post and Comment, we create two edge types to Country:

  • Post_IS_LOCATED_IN_Country

  • Comment_IS_LOCATED_IN_Country

Schema Naming Conventions

Vertex Type

For each entity in Figure 1 (the rectangular boxes), we create a vertex type with the entity's name.

  • Person is a person who participates in a forum.

  • Forum is a place where persons discuss topics.

  • City, Country, and Continent are geographic locations of other entities.

  • Company and University are organizations related to a person's affiliation.

  • Comment and Post are the interaction messages created by persons in a forum.

  • Tag is a topic or a concept.

  • TagClass is a class or a category. TagClass can form a hierarchy of tags.

Edge Type

For each relationship in Figure 1, we create an edge type whose name consists of the source entity name, the edge name (all capitalized), and the target entity name. The three parts are connected by underscores.

  • SourceEntityName_EDGENAME_TargetEntityName

For example,

  • Person_KNOWS_Person: Person is the source and target entity names, and Knows is the edge name.

  • Person_LIKES_Comment: Person is the source entity name, Comment is the target entity name, and Likes is the edge name.

When the edge name has two or more words, we separate words by an underscore as well. For example:

  • Tag_HAS_TYPE_TagClass: Tag is the source entity name, TagClass is the target entity name, and hasType is the edge name (which is written as HAS_TYPE).

  • Forum_HAS_MODERATOR_Person: Forum is the source entity name, Person is the target entity name, and hasModerator is the edge name (which is written as HAS_MODERATOR).

GSQL Schema DDL

The GSQL script below can be downloaded from this link.

GSQL script
//clear the current catalog. 
// It may take a while since it restarts the subsystem services. 
DROP ALL

//vertex types
CREATE VERTEX Comment (PRIMARY_ID id UINT, id UINT, creationDate DATETIME, locationIP STRING, browserUsed STRING, content STRING, length UINT)
CREATE VERTEX Post (PRIMARY_ID id UINT, id UINT, imageFile STRING, creationDate DATETIME, locationIP STRING, browserUsed STRING, lang STRING, content STRING, length UINT) 
CREATE VERTEX Company (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX University (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX City (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX Country (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX Continent (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX Forum (PRIMARY_ID id UINT, id UINT, title STRING, creationDate DATETIME)
CREATE VERTEX Person (PRIMARY_ID id UINT, id UINT, firstName STRING, lastName STRING, gender STRING, birthday DATETIME, creationDate DATETIME, locationIP STRING, browserUsed STRING, speaks set<STRING>, email set<STRING>)
CREATE VERTEX Tag (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX TagClass (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)

//edge types
CREATE DIRECTED EDGE Forum_CONTAINER_OF_Post (FROM Forum, TO Post) WITH REVERSE_EDGE="Forum_CONTAINER_OF_Post_REVERSE"
CREATE DIRECTED EDGE Comment_HAS_CREATOR_Person (FROM Comment, TO Person) WITH REVERSE_EDGE="Comment_HAS_CREATOR_Person_REVERSE"
CREATE DIRECTED EDGE Post_HAS_CREATOR_Person (FROM Post, TO Person) WITH REVERSE_EDGE="Post_HAS_CREATOR_Person_REVERSE"
CREATE DIRECTED EDGE Person_HAS_INTEREST_Tag (FROM Person, TO Tag) WITH REVERSE_EDGE="Person_HAS_INTEREST_Tag_REVERSE"
CREATE DIRECTED EDGE Forum_HAS_MEMBER_Person (FROM Forum, TO Person, joinDate DATETIME) WITH REVERSE_EDGE="Forum_HAS_MEMBER_Person_REVERSE"
CREATE DIRECTED EDGE Forum_HAS_MODERATOR_Person (FROM Forum, TO Person) WITH REVERSE_EDGE="Forum_HAS_MODERATOR_Person_REVERSE"
CREATE DIRECTED EDGE Comment_HAS_TAG_Tag (FROM Comment, TO Tag) WITH REVERSE_EDGE="Comment_HAS_TAG_Tag_REVERSE"
CREATE DIRECTED EDGE Post_HAS_TAG_Tag (FROM Post, TO Tag) WITH REVERSE_EDGE="Post_HAS_TAG_Tag_REVERSE"
CREATE DIRECTED EDGE Forum_HAS_TAG_Tag (FROM Forum, TO Tag) WITH REVERSE_EDGE="Forum_HAS_TAG_Tag_REVERSE"
CREATE DIRECTED EDGE Tag_HAS_TYPE_TagClass (FROM Tag, TO TagClass) WITH REVERSE_EDGE="Tag_HAS_TYPE_TagClass_REVERSE"
CREATE DIRECTED EDGE Company_IS_LOCATED_IN_Country (FROM Company, TO Country) WITH REVERSE_EDGE="Company_IS_LOCATED_IN_Country_REVERSE"
CREATE DIRECTED EDGE Comment_IS_LOCATED_IN_Country (FROM Comment, TO Country) WITH REVERSE_EDGE="Comment_IS_LOCATED_IN_Country_REVERSE"
CREATE DIRECTED EDGE Post_IS_LOCATED_IN_Country (FROM Post, TO Country) WITH REVERSE_EDGE="Post_IS_LOCATED_IN_Country_REVERSE"
CREATE DIRECTED EDGE Person_IS_LOCATED_IN_City (FROM Person, TO City) WITH REVERSE_EDGE="Person_IS_LOCATED_IN_City_REVERSE"
CREATE DIRECTED EDGE University_IS_LOCATED_IN_City (FROM University, TO City) WITH REVERSE_EDGE="University_IS_LOCATED_IN_City_REVERSE"
CREATE DIRECTED EDGE City_IS_PART_OF_Country (FROM City, TO Country) WITH REVERSE_EDGE="City_IS_PART_OF_Country_REVERSE"
CREATE DIRECTED EDGE Country_IS_PART_OF_Continent (FROM Country, TO Continent) WITH REVERSE_EDGE="Country_IS_PART_OF_Continent_REVERSE"
CREATE DIRECTED EDGE TagClass_IS_SUBCLASS_OF_TagClass (FROM TagClass, TO TagClass) WITH REVERSE_EDGE="TagClass_IS_SUBCLASS_OF_TagClass_REVERSE"
CREATE DIRECTED EDGE Person_KNOWS_Person (FROM Person, TO Person, creationDate DATETIME) WITH REVERSE_EDGE="Person_KNOWS_Person_REVERSE"
CREATE DIRECTED EDGE Person_LIKES_Comment (FROM Person, TO Comment, creationDate DATETIME) WITH REVERSE_EDGE="Person_LIKES_Comment_REVERSE"
CREATE DIRECTED EDGE Person_LIKES_Post (FROM Person, TO Post, creationDate DATETIME) WITH REVERSE_EDGE="Person_LIKES_Post_REVERSE"
CREATE DIRECTED EDGE Comment_REPLY_OF_Comment (FROM Comment, TO Comment) WITH REVERSE_EDGE="Comment_REPLY_OF_Comment_REVERSE"
CREATE DIRECTED EDGE Comment_REPLY_OF_Post (FROM Comment, TO Post) WITH REVERSE_EDGE="Comment_REPLY_OF_Post_REVERSE"
CREATE DIRECTED EDGE Person_STUDY_AT_University (FROM Person, TO University, classYear INT) WITH REVERSE_EDGE="Person_STUDY_AT_University_REVERSE"
CREATE DIRECTED EDGE Person_WORK_AT_Company (FROM Person, TO Company, workFrom INT) WITH REVERSE_EDGE="Person_WORK_AT_Company_REVERSE"

//LDBC SNB graph schema 
CREATE GRAPH ldbc_snb (*)

Last updated