Defining a Graph Schema

Before data can be loaded into the graph store, the user must define a graph schema. A graph schema is a "dictionary" that defines the types of entities, vertices and edges , in the graph and how those types of entities are related to one another. Each vertex or edge type has a name and a set of attributes (properties) associated with it. For example, a Book vertex could have title, author, publication year, genre, and language attributes.

In the figure below, circles represent vertex types, and lines represent edge types. The labeling text shows the name of each type. This example has four types of vertices: User, Occupation, Book, and Genre . Also, the example has 3 types of edges: user_occupation, user_book_rating, and book_genre . Note that this diagram does not say anything about how many users or books are in the graph database. It also does not indicate the cardinality of the relationship. For example, it does not specify whether a User may connect to multiple occupations.

An edge connects two vertices; in TigerGraph terminology these two vertices are the source vertex and the target vertex . An edge type can be either directed or undirected . A directed edge has a clear semantic direction, from the source vertex to the target vertex. For example, if there is an edge type that represents a plane flight segment, each segment needs to distinguish which airport is the origin (source vertex) and which airport is the destination (target vertex). In the example schema below, all of the edges are undirected. A useful test to decide whether an edge should be directed or undirected is the following: "An edge type is directed if knowing there is a relationship from A to B does not tell me whether there is a relationship from B to A." Having nonstop service from Chicago to Shanghai does not automatically imply there is nonstop service from Shanghai to Chicago.

Figure 1 - A schema for a User-Book-Rating graph

An expanded schema is shown below, containing all the original vertex and edge types plus three additional edge types: friend_of, sequel_of, and user_book_read . Note that friend_of joins a User to a User. The friendship is assumed to be bidirectional, so the edge type is undirected. Sequel_of joins a Book to a Book but it is directed, as evidenced by the arrowhead. The Two Towers is the sequel of The Fellowship of the Ring , but the reverse is not true. User_book_read is added to illustrate that there may be more than one edge type between a pair of vertex types.

Figure 2 - Expanded-User-Book-Rating schema with additional edges

The TigerGraph system user designs a graph schema to fit the source data and the user's needs and interests. The TigerGraph system user should consider what type of relationships are of interest and what type of analysis is needed. The TigerGraph system lets the user modify an existing schema, so the user is not locked into the initial design decision.

In the first schema diagram above, there are seven entities: four vertex types and three edge types.You may wonder why it was decided to make Occupation a separate vertex type instead of an attribute of User. Likewise, why is Genre a vertex type instead of an attribute of Book? These are examples of design choices. Occupation and Genre were separated out as vertex types because in graph analysis, if an attribute will be used as a query variable, it is often easier to work with as a vertex type.

Once the graph designer has chosen a graph schema, the schema is ready to be formalized into a series of GSQL statements.

Graph Creation and Modification Privileges

Only superusers and globaldesigners can define global vertex types. global edge types, and graphs, using CREATE VERTEX / EDGE / GRAPH. However, once a graph has been created, its admin and designer users can customize its schema, including adding new local vertex types and local edge types, by using a SCHEMA_CHANGE JOB, described in the next section.

CREATE VERTEX

Available to superuser and globaldesigner roles only.

The CREATE VERTEX statement defines a new global vertex type, with a name and an attribute list. At a high level of abstraction, the format is

CREATE VERTEX vertex_type_name (id_and_attribute_list) [ vertex_options ]

More specifically, the syntax is as follows, assuming that the vertex ID is listed first:

CREATE VERTEX Syntax
CREATE VERTEX vertex_type_name "(" primary_id_name_type
["," attribute_name type [DEFAULT default_value] ]* ")"
[WITH [STATS="none"|"outdegree_by_edgetype"][primary_id_as_attribute="true"]]

Keys and Attributes

Beginning with v2.3, there are two syntaxes for specifying the primary id/key:

Legacy PRIMARY_ID syntax: The legacy syntax remains valid, but there are additional options and additional flexibility:

  1. PRIMARY_ID and WITH primary_id_as attribute

  2. PRIMARY KEY syntax. This syntax is modeled after SQL.

PRIMARY_ID and WITH primary_id_as_attribute

The primary_id is a required field whose purpose is to uniquely identify each vertex instance. GSQL creates a hash index on the primary id with O(1) time complexity. Its data type may be STRING, INT, or UINT. The syntax for the primary_id_name_type term is as follows:

primary_id_name_type := PRIMARY_ID id_name id_type

NOTE: In default mode, the primary_id field is not one of the attribute fields. The purpose of this distinction is to minimize storage space for vertices. The functional consequence of this difference is that a query cannot read the primary_id or use it as part of an expression.

Example:

CREATE VERTEX movie (PRIMARY_ID id UINT, name STRING, year UINT)
WITH primary_id_as_attribute="true"

PRIMARY KEY

Instead of the legacy PRIMARY_ID syntax, starting with v2.3, GSQL now offers another option for specifying the primary key. The keyword phrase PRIMARY KEY may be appended to any one of the attributes in the attribute list, though it is conventional for it to be the first attribute. Each vertex instance must have a unique value for the primary key attribute. GSQL creates a hash index on the PRIMARY KEY attribute with O(1) time complexity. It is recommended that the primary key data type be STRING, INT, or UINT.

primary_id_name_type := id_name_id_type PRIMARY KEY

Note the differences between PRIMARY_ID and PRIMARY KEY:

  1. "PRIMARY_ID" precedes the (name, type) pair. "PRIMARY KEY" follows the (name, type) pair.

  2. In default mode, a PRIMARY_ID is not an attribute, but the WITH primary_id_as_attribute="true" clause can be used to make it an attribute. Alternately, the PRIMARY KEY is always an attribute; the WITH option is unneeded.

Example:

CREATE VERTEX movie (id UINT PRIMARY KEY, name STRING, year UINT)

PRIMARY KEY is not supported in GraphStudio. If you decide to use this feature, you will only be able to use command line interface.

COMPOSITE KEY

Beginning with v2.4, GSQL PRIMARY KEY supports composite keys - grouping multiple attributes to create a primary key for a specific vertex. Composite Key usage is similar to a single PRIMARY KEY, but rather than appending "PRIMARY KEY" after an attribute, the syntax is a bit different.

composite_id_name_type := PRIMARY KEY "(" attribute_name ("," attribute_name)* ")"

Example:

CREATE VERTEX movie (id UINT, title STRING, year UINT, PRIMARY KEY (title,year,id) )

COMPOSITE KEY is not supported in GraphStudio. If you decide to use this feature, you will only be able to use command line interface.

Vertex Attribute List

The attribute list, enclosed in parentheses, is a list of one or more id definitions and attribute descriptions separated by commas:

primary_id_name_type
[, attribute_name type [DEFAULT default_value ] ]*

The available attribute types, including user-defined types, are listed in the section Attribute Data Types.

  1. Every attribute data type has a built-in default value (e.g., the default value for INT type is 0). The DEFAULT default_value option overrides the built-in value.

  2. Any number of additional attributes may be listed after the primary_id attribute. Each attribute has a name, type, and optional default value (for primitive-type, DATETIME, or STRING COMPRESS attributes only)

Example:

  • Create vertex types for the graph schema of Figure 1.

Vertex definitions for User-Book-Rating graph
CREATE VERTEX User (PRIMARY_ID user_id UINT, name STRING, age UINT, gender STRING, postalCode STRING)
CREATE VERTEX Occupation (PRIMARY_ID occ_id UINT, occ_name STRING)
WITH STATS="outdegree_by_edgetype"
CREATE VERTEX Book (PRIMARY_ID bookcode UINT, title STRING, pub_year UINT)
WITH STATS="none"
CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING)

Unlike the tables in a relational database, vertex types do not need to have a foreign key attribute for one vertex type to have a relationship to another vertex type. Such relationships are handled by edge types.

WITH STATS

By default, when the loader stores a vertex and its attributes in the graph store, it also stores some statistics about the vertex's outdegree – how many connections it has to other vertices. The optional WITH STATS clause lets the user control how much information is recorded. Recording the information in the graph store will speed up queries which need degree information, but it increases the memory usage. There are two* options. If "outdegree_by_edgetype" is chosen, then each vertex records a list of degree count values, one value for each type of edge in the schema. If "none" is chosen, then no degree statistics are recorded with each vertex. If the WITH STATS clause is not used, the loader acts as if "outdegree_by_edgetype" were selected.

The graph below has two types of edges between persons: phone_call and text. For Bobby, the "outdegree_by_edgetype" option records how many phone calls Bobby made (1) and how many text messages Bobby sent (2). This information can be retrieved using the built-in vertex function outdegree(). To get the outdegree of a specific edge type, provide the edgetype name as a string parameter. To get the total outdegree, omit the parameter.

Figure 3 - Outdegree stats illustration

WITH STATS option (case insensitive)

Bobby.outdegree()

Bobby.outdegree("text")

Bobby.outdegree("phone_call")

"none"

not available

not available

not available

"outdegree_by_edgetype" (default)

3

2

1

CREATE EDGE

Available to superuser and globaldesigner roles only.

The CREATE EDGE statement defines a new global edge type. There are two forms of the CREATE EDGE statement, one for directed edges and one for undirected edges. Each edge type must specify that it connects FROM one vertex type TO another vertex type. Additional pairs of FROM,To vertex types may be added. Then additional attributes may be added. Each attribute follows the same requirements as described in the Attribute List subsection for the "CREATE VERTEX" section.

CREATE UNDIRECTED EDGE
CREATE UNDIRECTED EDGE edge_type_name "("
FROM vertex_type_name "," TO vertex_type_name
["|" FROM vertex_type_name, TO vertex_type_name]*
["," attribute_name type [DEFAULT default_value]]* ")"
CREATE DIRECTED EDGE
CREATE DIRECTED EDGE edge_type_name "("
FROM vertex_type_name "," TO vertex_type_name
["|" FROM vertex_type_name, TO vertex_type_name]*
["," attribute_name type [DEFAULT default_value]]* ")"
[WITH REVERSE_EDGE="rev_name"]

As of v3.0, a single edge type can be defined between multiple pairs of vertex types, e.g.

CREATE DIRECTED EDGE member_of (FROM Person, TO Org | FROM Org, TO Org,
joined DATETIME)

Note that edges do not have a PRIMARY_ID field. Instead, each edge is uniquely identified by a FROM vertex, a TO vertex, and optionally other attributes. The edge type may also be a distinguishing characteristic. For example, as shown in Figure 2 above, there are two types of edges between User and Book. Therefore, both types would have attribute lists which begin (FROM User, To Book,...).

Creating an Edge from or to Any Vertex Type

An edge type can be defined which connects FROM and/or TO any of the currently defined types of vertices. Use the wildcard symbol * to indicate "any vertex type". For example, the any_edge type below can connect from any vertex to any other vertex:

Wildcard edge type
CREATE DIRECTED EDGE any_edge (FROM *, TO *, label STRING)

Note: If new vertex types are added after a wildcard edge type is defined, the new vertex types are NOT included in the wildcard. That is, "*" is an alias for the vertex types that existed at the point in time that the CREATE EDGE statement is executed.

WITH REVERSE_EDGE

If a CREATE DIRECTED EDGE statement includes the WITH REVERSE_EDGE=" rev_name " optional clause, then an additional directed edge type called " rev_name " is automatically created, with the FROM and TO vertices swapped. Moreover, whenever a new edge is created, a reverse edge is also created. The reverse edge will have the same attributes, and whenever the principal edge is updated, the corresponding reverse edge is also updated.

In a TigerGraph system, reverse edges provide the most efficient way to perform graph queries and searches that need to look "backwards". For example, referring to the schema of Figure 2, the query "What is the sequel of Book X, if it has one?" is a forward search, usingsequel_of edges. However, the query "Is Book X a sequel? If so, what Book came before X?" requires examining reverse edges.

Example:

Create undirected edges for the three edge types in Figure 1.

Edge definitions for User-Book-Rating graph
CREATE UNDIRECTED EDGE user_occupation (FROM User, TO Occupation)
CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre)
CREATE UNDIRECTED EDGE user_book_rating (FROM User, TO Book, rating UINT, date_time UINT)

The user_occupation and book_genre edges have no attributes. A user_book_rating edge symbolizes that a user has assigned a rating to a book. Therefore it includes an additional attribute rating . In this case the rating attribute is defined to be an integer, but it could just as easily have been set to be a float attribute.

Example :

Create the additional edges depicted in Figure 2.

Additional Edge definitions for Expanded-User-Book-Rating graph
CREATE UNDIRECTED EDGE friend_of (FROM User, TO User, on_date UINT)
CREATE UNDIRECTED EDGE user_book_read (FROM User, To Book, on_date UINT)
CREATE DIRECTED EDGE sequel_of (FROM Book, TO Book) WITH REVERSE_EDGE="preceded_by"

Every time the GSQL loader creates a sequel_of edge, it will also automatically create a preceded_by edge, pointing in the opposite direction.

Special Options

Sharing a Compression Dictionary

The STRING COMPRESS data type achieve compression by mapping each unique attribute value to a small integer. The mapping table ("this string" = "this integer") is called the dictionary. If two such attributes have the same or similar sets of possible values, then it is desirable to have them share one dictionary because it uses less storage space.

When a STRING COMPRESS attribute is declared in a vertex or edge, the user can optionally provide a name for the dictionary. Any attributes which share the same dictionary name will share the same dictionary. For example, v1.attr1, v1.attr2, and e.attr1 below share the same dictionary named "e1".

Shared STRING COMPRESS dictionaries
CREATE VERTEX v1 (PRIMARY_ID main_id STRING, att1 STRING COMPRESS e1, att2 STRING COMPRESS e1)
CREATE UNDIRECTED EDGE e (FROM v1, TO v2, att1 STRING COMPRESS e1)

CREATE INDEX

User-defined indexes (or secondary indexes. as they are called commonly called in the database industry) are a valuable feature that enhances the performance of a database system. Indexes allow users to perform fast lookups on non-key columns or attributes without a full-fledged scan.

The TigerGraph database allows users to define on Vertex attributes. User has the flexibility to create Indexes in an empty graph initially or to add Indexes later when the database is running. If the index is added on an existing Vertex, Index data is built in the background.

Indexes can be created on vertexes on a single attribute of the following data types only: STRING, UINT, INT, DATETIME, and STRING COMPRESS. Indexes will used to optimize queries with all predicate types. However, if a predicate uses an in-built function, then Index will not be used to optimize the query. Also, Built-In Queries are not optimized using Indexes.

Indexes are very important for data retrieval performance. However, adding indexes will affect write performance. For this reason, users should be judicious about adding indexes. Users should review the querying patterns to decide where Indexes can help.

Users can create and drop Indexes using ALTER VERTEX command as shown below.

Syntax:
CREATE GLOBAL SCHEMA_CHANGE job <job-name>
{
ALTER VERTEX object_type_name ADD INDEX index_type_name ON (attribute_name);
ALTER VERTEX vertex_type_name DROP INDEX index_type_name;
};

Example:

ALTER VERTEX User ADD INDEX user_country_index ON country;

CREATE GRAPH

Available to superuser and globaldesigner roles only.

Multiple Graph support

If the optional MultiGraph service is enabled, CREATE GRAPH can be invoked multiple times to define multiple graphs, and vertex types and edge types may be re-used (shared) among multiple graphs. There is an option to assign an admin use for the new graph.

After all the required vertex and edge types are created, the CREATE GRAPH command defines a graph schema which contains the given vertex types and edge types, and prepares the graph store to accept data. The vertex types and edge types may be listed in any order.

CREATE GRAPH syntax
CREATE GRAPH gname "(" vertex_or_edge_type "," vertex_or_edge_type...)
[WITH ADMIN username] ")"

The optional WITH ADMIN clause sets the named user to be the admin for the new graph.

As a convenience, executing CREATE GRAPH will set the new graph to be the working graph.

Instead of providing a list of specific vertex types and edge types, it is also possible to define a graph type which includes all the available vertex types and edge types. It is also legal to create a graph with an empty domain. A SCHEMA_CHANGE can be used later to add vertex and edge types.

Examples of CREATE GRAPH with all vertex & edge types and with an empty domain.
CREATE GRAPH everythingGraph (*)
CREATE GRAPH emptyGraph ()

Examples :

Create graph Book_rating for the edge and vertex types defined for Figure 1:

Graph definition for User-Book-Rating graph
CREATE GRAPH Book_rating (*)

The following code example shows the full set of statements to define the expanded user-book-rating graph:

Full definition for the Expanded User-Book-Rating graph
CREATE VERTEX User (PRIMARY_ID user_id UINT, name STRING, age UINT, gender STRING, postalCode STRING)
CREATE VERTEX Occupation (PRIMARY_ID occ_id UINT, occ_name STRING)
WITH STATS="outdegree_by_edgetype"
CREATE VERTEX Book (PRIMARY_ID bookcode UINT, title STRING, pub_year UINT)
WITH STATS="none"
CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING)
CREATE UNDIRECTED EDGE user_occupation (FROM User, TO Occupation)
CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre)
CREATE UNDIRECTED EDGE user_book_rating (FROM User, TO Book, rating UINT, date_time UINT)
CREATE UNDIRECTED EDGE friend_of (FROM User, TO User, on_date UINT)
CREATE UNDIRECTED EDGE user_book_read (FROM User, To Book, on_date UINT)
CREATE DIRECTED EDGE sequel_of (FROM Book, TO Book) WITH REVERSE_EDGE="preceded_by"
CREATE GRAPH Book_rating (*)

USE GRAPH

New requirement for MultiGraph support. Applies even if only one graph exists.

Before a user can make use of a graph, first the user must be granted a role on that graph by an admin user of that graph or by a superuser. (Superusers are automatically granted the admin role on every graph). Second, for each GSQL session, the user must set a working graph. The USE GRAPH command sets or changes the user's working graph, for the current session.

For more about roles and privileges, see the document Managing User Privileges and Authentication.

USE GRAPH syntax
USE GRAPH gname

Instead of the USE GRAPH command, gsql can be invoked with the -g <graph_name> option.

DROP GRAPH

Available to superuser and globaldesigner roles only. The effect of this command takes into account shared domains.

DROP GRAPH syntax
DROP GRAPH gname

The DROP GRAPH deletes the logical definition of the named graph. Furthermore, it will also delete all local vertex or edge types. Local vertex and edge types are created by an ADD VERTEX/EDGE statement within a SCHEMA_CHANGE JOB and so belong only to that graph. Any shared types are unaffected. To delete only selected vertex types or edge types, see DROP VERTEX | EDGE in the Section "Modifying a Graph Schema".

SHOW - View Parts of the Catalog

The SHOW command can be used to show certain aspects of the graph, instead of manually filtering through the entire graph schema when using the ls command. You can either type the exact identifier or use regular expression / Linux globbing to search.

SHOW <VERTEX> | <EDGE> | <JOB> | <QUERY> | <GRAPH> [ <name> | <glob> | -r <regex> ]

This feature supports the ? and * from linux globbing operations, and also regular expression matching. Usage of the feature is limited to the scope of the graph the user is currently in - if you are using a global graph, you will not be able to see vertices that are not included in your current graph.

Regular expression searching will not work with escaping characters.

To use regular expressions, you will need to use the -r flag after the part of the schema you wish to show. If you wish to dive deeper into regular expressions, visit "Java Patterns". The following are a few examples of what is supported by the SHOW command.

Linux Globbing examples
SHOW VERTEX us* //shows all vertices that start with the letters "Us"
SHOW VERTEX co?*y //shows the vertex that starts with co and ends with y
SHOW VERTEX ????? //shows all vertices that are 5 letters long
Regular Expression Examples
SHOW VERTEX -r "skil{2}" //match the pattern "skill"
SHOW EDGE -r "test[1][13579]*" //match pattern that only contains odd numbers after "test"
SHOW JOB -r "[a-zA-Z]*" //match all jobs that contain only letters