Define the Schema
Data Set
We will use the LDBC Social Network Benchmark (LDBC SNB) data set. This data set models a twitter-like social forum. It comes with a data generator, which allows you to generate data at different scale factors. Scale factor 1 generates roughly 1GB raw data, scale factor 10 generates roughly 10GB raw data, etc.
Figure 1 shows the schema (from the LDBC SNB specification). It models the activities and relationships of social forum participants. For example, a forum Member can publish Posts on a Forum, and other Members of the Forum can make a Comment on the Post or on someone else's Comment. A Person's home location is a hierarchy (Continent>Country>City), and a person can be affiliated with a University or a Company. Tags can be used to classify a Forum and a Person's interests. Tags can belong to a TagClass. The relationships between entities are modeled as directed edges. For example, Person connects to Tag by the hasInterest edge. Forum connects to Person by two different edges, hasMember and hasModerator.
LDBC SNB schema uses inheritance to model certain entity type relationships:
Message is the superclass of Post and Comment.
Place is the superclass of City, Country, and Continent.
Organization is the superclass of University and Company.
We do not use the superclasses in our graph model. When there is an edge type connecting an entity to a superclass, we instead create an edge type from the entity to each of the subclasses of the superclass. For example, Message has an isLocatedIn relationship to Country. Since Message has two subclasses, Post and Comment, we create two edge types to Country:
Post_IS_LOCATED_IN_Country
Comment_IS_LOCATED_IN_Country
Schema Naming Conventions
Vertex Type
For each entity in Figure 1 (the rectangular boxes), we create a vertex type with the entity's name.
Person is a person who participates in a forum.
Forum is a place where persons discuss topics.
City, Country, and Continent are geographic locations of other entities.
Company and University are organizations related to a person's affiliation.
Comment and Post are the interaction messages created by persons in a forum.
Tag is a topic or a concept.
TagClass is a class or a category. TagClass can form a hierarchy of tags.
Edge Type
For each relationship in Figure 1, we create an edge type whose name consists of the source entity name, the edge name (all capitalized), and the target entity name. The three parts are connected by underscores.
SourceEntityName_EDGENAME_TargetEntityName
For example,
Person_KNOWS_Person: Person is the source and target entity names, and Knows is the edge name.
Person_LIKES_Comment: Person is the source entity name, Comment is the target entity name, and Likes is the edge name.
When the edge name has two or more words, we separate words by an underscore as well. For example:
Tag_HAS_TYPE_TagClass: Tag is the source entity name, TagClass is the target entity name, and hasType is the edge name (which is written as HAS_TYPE).
Forum_HAS_MODERATOR_Person: Forum is the source entity name, Person is the target entity name, and hasModerator is the edge name (which is written as HAS_MODERATOR).
GSQL Schema DDL
The GSQL script below can be downloaded from this link.
Last updated