Define the Schema

Data Set

We will use the LDBC Social Network Benchmark (LDBC SNB) data set. This data set models a twitter-like social forum. It comes with a data generator, which allows you to generate data at different scale factors. Scale factor 1 generates roughly 1GB raw data, scale factor 10 generates roughly 10GB raw data, etc.

Figure 1 shows the schema (from the LDBC SNB specification). It models the activities and relationships of social forum participants. For example, a forum Member can publish Posts on a Forum, and other Members of the Forum can make a Comment on the Post or on someone else's Comment. A Person's home location is a hierarchy (Continent>Country>City), and a person can be affiliated with a University or a Company. Tags can be used to classify a Forum and a Person's interests. Tags can belong to a TagClass. The relationships between entities are modeled as directed edges. For example, Person connects to Tag by the hasInterest edge. Forum connects to Person by two different edges, hasMember and hasModerator.

LDBC SNB schema uses inheritance to model certain entity type relationships:

  • Message is the superclass of Post and Comment.

  • Place is the superclass of City, Country, and Continent.

  • Organization is the superclass of University and Company.

We do not use the superclasses in our graph model. When there is an edge type connecting an entity to a superclass, we instead create an edge type from the entity to each of the subclasses of the superclass. For example, Message has an isLocatedIn relationship to Country. Since Message has two subclasses, Post and Comment, we create two edge types to Country:

  • Post_IS_LOCATED_IN_Country

  • Comment_IS_LOCATED_IN_Country

Schema Naming Conventions

Vertex Type

For each entity in Figure 1 (the rectangular boxes), we create a vertex type with the entity's name.

  • Person is a person who participates in a forum.

  • Forum is a place where persons discuss topics.

  • City, Country, and Continent are geographic locations of other entities.

  • Company and University are organizations related to a person's affiliation.

  • Comment and Post are the interaction messages created by persons in a forum.

  • Tag is a topic or a concept.

  • TagClass is a class or a category. TagClass can form a hierarchy of tags.

Edge Type

For each relationship in Figure 1, we create an edge type whose name consists of the source entity name, the edge name (all capitalized), and the target entity name. The three parts are connected by underscores.

  • SourceEntityName_EDGENAME_TargetEntityName

For example,

  • Person_KNOWS_Person: Person is the source and target entity names, and Knows is the edge name.

  • Person_LIKES_Comment: Person is the source entity name, Comment is the target entity name, and Likes is the edge name.

When the edge name has two or more words, we separate words by an underscore as well. For example:

  • Tag_HAS_TYPE_TagClass: Tag is the source entity name, TagClass is the target entity name, and hasType is the edge name (which is written as HAS_TYPE).

  • Forum_HAS_MODERATOR_Person: Forum is the source entity name, Person is the target entity name, and hasModerator is the edge name (which is written as HAS_MODERATOR).

GSQL Schema DDL

Two GSQL scripts for defining the LDBC-SNB schema are shown below. Choose the one that serve your needs. They are not equivalent if you have different organizations using the same graph database instance.

Method 1. Bottom-up DDL

In this method, global vertex/edge type containers are created first. Next, graphs are created to group them. In other words, the global vertex/edge type containers can be shared across graphs.

GSQL script
//clear the current catalog. 
// It may take a while since it restarts the subsystem services. 
DROP ALL

//vertex types
CREATE VERTEX Comment (PRIMARY_ID id UINT, id UINT, creationDate DATETIME, locationIP STRING, browserUsed STRING, content STRING, length UINT)
CREATE VERTEX Post (PRIMARY_ID id UINT, id UINT, imageFile STRING, creationDate DATETIME, locationIP STRING, browserUsed STRING, lang STRING, content STRING, length UINT) 
CREATE VERTEX Company (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX University (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX City (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX Country (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX Continent (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX Forum (PRIMARY_ID id UINT, id UINT, title STRING, creationDate DATETIME)
CREATE VERTEX Person (PRIMARY_ID id UINT, id UINT, firstName STRING, lastName STRING, gender STRING, birthday DATETIME, creationDate DATETIME, locationIP STRING, browserUsed STRING, speaks set<STRING>, email set<STRING>)
CREATE VERTEX Tag (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)
CREATE VERTEX TagClass (PRIMARY_ID id UINT, id UINT, name STRING, url STRING)

//edge types
CREATE DIRECTED EDGE Forum_CONTAINER_OF_Post (FROM Forum, TO Post) WITH REVERSE_EDGE="Forum_CONTAINER_OF_Post_REVERSE"
CREATE DIRECTED EDGE Comment_HAS_CREATOR_Person (FROM Comment, TO Person) WITH REVERSE_EDGE="Comment_HAS_CREATOR_Person_REVERSE"
CREATE DIRECTED EDGE Post_HAS_CREATOR_Person (FROM Post, TO Person) WITH REVERSE_EDGE="Post_HAS_CREATOR_Person_REVERSE"
CREATE DIRECTED EDGE Person_HAS_INTEREST_Tag (FROM Person, TO Tag) WITH REVERSE_EDGE="Person_HAS_INTEREST_Tag_REVERSE"
CREATE DIRECTED EDGE Forum_HAS_MEMBER_Person (FROM Forum, TO Person, joinDate DATETIME) WITH REVERSE_EDGE="Forum_HAS_MEMBER_Person_REVERSE"
CREATE DIRECTED EDGE Forum_HAS_MODERATOR_Person (FROM Forum, TO Person) WITH REVERSE_EDGE="Forum_HAS_MODERATOR_Person_REVERSE"
CREATE DIRECTED EDGE Comment_HAS_TAG_Tag (FROM Comment, TO Tag) WITH REVERSE_EDGE="Comment_HAS_TAG_Tag_REVERSE"
CREATE DIRECTED EDGE Post_HAS_TAG_Tag (FROM Post, TO Tag) WITH REVERSE_EDGE="Post_HAS_TAG_Tag_REVERSE"
CREATE DIRECTED EDGE Forum_HAS_TAG_Tag (FROM Forum, TO Tag) WITH REVERSE_EDGE="Forum_HAS_TAG_Tag_REVERSE"
CREATE DIRECTED EDGE Tag_HAS_TYPE_TagClass (FROM Tag, TO TagClass) WITH REVERSE_EDGE="Tag_HAS_TYPE_TagClass_REVERSE"
CREATE DIRECTED EDGE Company_IS_LOCATED_IN_Country (FROM Company, TO Country) WITH REVERSE_EDGE="Company_IS_LOCATED_IN_Country_REVERSE"
CREATE DIRECTED EDGE Comment_IS_LOCATED_IN_Country (FROM Comment, TO Country) WITH REVERSE_EDGE="Comment_IS_LOCATED_IN_Country_REVERSE"
CREATE DIRECTED EDGE Post_IS_LOCATED_IN_Country (FROM Post, TO Country) WITH REVERSE_EDGE="Post_IS_LOCATED_IN_Country_REVERSE"
CREATE DIRECTED EDGE Person_IS_LOCATED_IN_City (FROM Person, TO City) WITH REVERSE_EDGE="Person_IS_LOCATED_IN_City_REVERSE"
CREATE DIRECTED EDGE University_IS_LOCATED_IN_City (FROM University, TO City) WITH REVERSE_EDGE="University_IS_LOCATED_IN_City_REVERSE"
CREATE DIRECTED EDGE City_IS_PART_OF_Country (FROM City, TO Country) WITH REVERSE_EDGE="City_IS_PART_OF_Country_REVERSE"
CREATE DIRECTED EDGE Country_IS_PART_OF_Continent (FROM Country, TO Continent) WITH REVERSE_EDGE="Country_IS_PART_OF_Continent_REVERSE"
CREATE DIRECTED EDGE TagClass_IS_SUBCLASS_OF_TagClass (FROM TagClass, TO TagClass) WITH REVERSE_EDGE="TagClass_IS_SUBCLASS_OF_TagClass_REVERSE"
CREATE DIRECTED EDGE Person_KNOWS_Person (FROM Person, TO Person, creationDate DATETIME) WITH REVERSE_EDGE="Person_KNOWS_Person_REVERSE"
CREATE DIRECTED EDGE Person_LIKES_Comment (FROM Person, TO Comment, creationDate DATETIME) WITH REVERSE_EDGE="Person_LIKES_Comment_REVERSE"
CREATE DIRECTED EDGE Person_LIKES_Post (FROM Person, TO Post, creationDate DATETIME) WITH REVERSE_EDGE="Person_LIKES_Post_REVERSE"
CREATE DIRECTED EDGE Comment_REPLY_OF_Comment (FROM Comment, TO Comment) WITH REVERSE_EDGE="Comment_REPLY_OF_Comment_REVERSE"
CREATE DIRECTED EDGE Comment_REPLY_OF_Post (FROM Comment, TO Post) WITH REVERSE_EDGE="Comment_REPLY_OF_Post_REVERSE"
CREATE DIRECTED EDGE Person_STUDY_AT_University (FROM Person, TO University, classYear INT) WITH REVERSE_EDGE="Person_STUDY_AT_University_REVERSE"
CREATE DIRECTED EDGE Person_WORK_AT_Company (FROM Person, TO Company, workFrom INT) WITH REVERSE_EDGE="Person_WORK_AT_Company_REVERSE"

//LDBC SNB graph schema 
CREATE GRAPH ldbc_snb (*)

Method 2. Top-down DDL

In this method, an empty graph is created first. Next, local vertex/edge type containers are added to the empty graph via a schema change job. The vertex/egde type containers added this way will be private to the graph, no other graph can see them.

GSQL script
//clear the current catalog.
// It may take a while since it restarts the subsystem services.
DROP ALL

# 1. Create graph
CREATE GRAPH ldbc_snb ()

# 2. Create schema_change job to include all vertex/edge types
CREATE SCHEMA_CHANGE JOB change_schema_of_ldbc  FOR GRAPH ldbc_snb {

  ## Post and Comment
  ADD VERTEX Comment (PRIMARY_ID id UINT, creationDate DATETIME, locationIP STRING,
    browserUsed STRING, content STRING, length UINT) WITH primary_id_as_attribute="TRUE";

  ADD VERTEX Post (PRIMARY_ID id UINT, imageFile STRING, creationDate DATETIME,
    locationIP STRING, browserUsed STRING, lang STRING, content STRING,
    length UINT) WITH primary_id_as_attribute="TRUE";
  ## organisation
  ADD VERTEX Company (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX University (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ## place
  ADD VERTEX City (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX Country (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX Continent (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ## etc
  ADD  VERTEX Forum (PRIMARY_ID id UINT, title STRING, creationDate DATETIME) WITH primary_id_as_attribute="TRUE";
  ADD  VERTEX Person (PRIMARY_ID id UINT, firstName STRING, lastName STRING, gender STRING, birthday DATETIME,
   creationDate DATETIME, locationIP STRING, browserUsed STRING, speaks set<STRING>, email set<STRING>)
   WITH primary_id_as_attribute="TRUE";
  ADD VERTEX Tag (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";
  ADD VERTEX TagClass (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE";

  // create edge types
  ADD DIRECTED EDGE CONTAINER_OF (FROM Forum, TO Post) WITH REVERSE_EDGE="CONTAINER_OF_REVERSE";
  ADD  DIRECTED EDGE HAS_CREATOR (FROM Comment|Post, TO Person) WITH REVERSE_EDGE="HAS_CREATOR_REVERSE";
  ADD  DIRECTED EDGE HAS_INTEREST (FROM Person, TO Tag) WITH REVERSE_EDGE="HAS_INTEREST_REVERSE";
  ADD DIRECTED EDGE HAS_MEMBER (FROM Forum, TO Person, joinDate DATETIME) WITH REVERSE_EDGE="HAS_MEMBER_REVERSE";
  ADD DIRECTED EDGE HAS_MODERATOR (FROM Forum, TO Person) WITH REVERSE_EDGE="HAS_MODERATOR_REVERSE";
  ADD DIRECTED EDGE HAS_TAG (FROM Comment|Post|Forum, TO Tag) WITH REVERSE_EDGE="HAS_TAG_REVERSE";
  ADD DIRECTED EDGE HAS_TYPE (FROM Tag, TO TagClass) WITH REVERSE_EDGE="HAS_TYPE_REVERSE";
  ADD  DIRECTED EDGE IS_LOCATED_IN (FROM Comment, TO Country
                                  | FROM Company, TO Country
                                  | FROM Person, TO City
                                  | FROM University, TO City) WITH REVERSE_EDGE="IS_LOCATED_IN_REVERSE";
  ADD DIRECTED EDGE IS_PART_OF (FROM City, TO Country
                               | FROM Country, TO Continent) WITH REVERSE_EDGE="IS_PART_OF_REVERSE";
  ADD DIRECTED EDGE IS_SUBCLASS_OF (FROM TagClass, TO TagClass) WITH REVERSE_EDGE="IS_SUBCLASS_OF_REVERSE";
  ADD UNDIRECTED EDGE KNOWS (FROM Person, TO Person, creationDate DATETIME)
  ADD DIRECTED EDGE LIKES (FROM Person, TO Comment|Post, creationDate DATETIME) WITH REVERSE_EDGE="LIKES_REVERSE";
  ADD DIRECTED EDGE REPLY_OF (FROM Comment, TO Comment|Post) WITH REVERSE_EDGE="REPLY_OF_REVERSE";
  ADD DIRECTED EDGE STUDY_AT (FROM Person, TO University, classYear INT) WITH REVERSE_EDGE="STUDY_AT_REVERSE";
  ADD DIRECTED EDGE WORK_AT (FROM Person, TO Company, workFrom INT) WITH REVERSE_EDGE="WORK_AT_REVERSE";
}

# 3. Run schema_change job
RUN SCHEMA_CHANGE JOB change_schema_of_ldbc

# 4. Drop schema_change job
DROP JOB change_schema_of_ldbc

Last updated