Data Types

This section describes the data types that are native to and are supported by the GSQL Query Language. Most of the data objects used in queries come from one of three sources: (1) the query's input parameters, (2) the vertices, edges, and their attributes which are encountered when traversing the graph, or (3) variables defined within the query that are used to assist in the computational work of the query.

This section covers the following subset of the EBNF language definitions:

EBNF for Data Types
lowercase          := [a-z]
uppercase          := [A-Z]
letter             := lowercase | uppercase
digit              := [0-9]
integer            := ["-"]digit+
real               := ["-"]("." digit+) | ["-"](digit+ "." digit*)
numeric            := integer | real
stringLiteral      := '"' [~["] | '\\' ('"' | '\\')]* '"'

name := (letter | "_") [letter | digit | "_"]*   // Can be a single "_" or start with "_"
graphName := name
queryName := name
paramName := name
vertexType := name
edgeType := name
accumName := name
vertexSetName := name
attrName := name
varName := name
tupleType := name
fieldName :=name
funcName := name

type := baseType | tupleType | accumType | STRING COMPRESS

baseType := INT
          | UINT
          | FLOAT
          | DOUBLE
          | STRING
          | BOOL
          | VERTEX ["<" vertexType ">"]
          | EDGE
          | JSONOBJECT
          | JSONARRAY
          | DATETIME

filePath := paramName | stringLiteral

typedef := TYPEDEF TUPLE "<" tupleFields ">" tupleType

tupleFields := (baseType fieldName) | (fieldName baseType)
           ["," (baseType fieldName) | (fieldName baseType)]*

parameterType := baseType
               | [ SET | BAG ] "<" baseType ">"
               | FILE

Identifiers

An identifier is the name for an instance of a language element. In the GSQL query language, identifiers are used to name elements such as a query, a variable, or a user-defined function. In the EBNF syntax, an identifier is referred as a name . It can be a sequence of letters, digits, or underscores ("_"). Other punctuation characters are not supported. The initial character can only be letter or an underscore.

name (identifier)
name := (letter | "_") [letter | digit | "_"]* 

Overview of Types

Different types of data can be used in different contexts. The EBNF syntax defines several classes of data types. The most basic is called baseType. The other independent types are FILE and STRING COMPRESS. The remaining types are either compound data types built from the independent data types, or supersets of other types. The table below gives an overview of their definitions and their uses.

EBNF term

Description

Use Case

baseType

INT, UINT, FLOAT, DOUBLE, STRING, BOOL, DATETIME, VERTEX, EDGE, JSONOBJECT, or JSONARRAY

  • global variable

  • query return value

tupleType

sequence of baseType

  • user-defined tuple

accumType

family of specialized data objects which support accumulation operations

FILE

FILE object

  • global sequential data object, linked to a text file

parameterType

baseType (except EDGE or JSONOBJECT), a SET or BAG of baseType, or FILE object

  • query parameter

STRING COMPRESS

STRING COMPRESS

  • more compact storage of STRING, if there is a limited number of different values and the value is rarely accessed. Otherwise, if may use more memory.

elementType

baseType, STRING COMPRESS, or identifier

  • element for most types of container accumulators: SetAccum, BagAccum, GroupByAccum, key of a MapAccum element

type

baseType, STRING COMPRESS, identifier, or accumType

  • element of a ListAccum, value of a MapAccum element

  • local variable

Base Types

The query language supports the following base types , which can be declared and assigned anywhere within their scope. Any of these base types may be used when defining a global variable, a local variable, a query return value, a parameter, part of a tuple, or an element of a container accumulator. Accumulators are described in detail in a later section.

EBNF
baseType := INT
          | UINT
          | FLOAT
          | DOUBLE
          | STRING
          | BOOL
          | VERTEX ["<" vertexType ">"]
          | EDGE
          | JSONOBJECT
          | JSONARRAY
          | DATETIME

The default value of each base type is shown in the table below. The default value is the initial value of a base type variable (see Section "Variable Types" for more details), or the default return value for some functions (see Section "Operators, Functions, and Expressions" for more details).

The first seven types (INT, UINT, FLOAT, DOUBLE, BOOL, STRING, and DATETIME) are the same ones mentioned in the "Attribute Data Types" section of the GSQL Language Reference, Part 1 .

type

default value

INT, UINT, FLOAT, DOUBLE (see note below)

0

BOOL

false

STRING

An empty string ""

DATETIME

1970-01-01 00:00:00

VERTEX

"Unknown"

EDGE

No edge: {}

JSONOBJECT

An empty object: {}

JSONARRAY

An empty array: []

FLOAT and DOUBLE input values must be in fixed point d.dddd format, where d is a digit. Output values will be printed in either fixed point for exponential notation, whichever is more compact.

The GSQL Loader can read FLOAT and DOUBLE values with exponential notation (e.g., 1.25 E-7).

VERTEX and EDGE

VERTEX and EDGE are the two types of objects which form a graph. A query parameter or variable can be declared as either of these two types. In additional, the schema for the graph defines specific vertex and edge types (e.g., CREATE VERTEX person ). The parameter or variable type can be restricted by giving the vertex/edge type in angle brackets < > after the keyword VERTEX/EDGE. A VERTEX or EDGE variable declared without a specifier is called a generic type. Below are examples of generic and typed vertex and edge variable declarations:

Examples of generic and typed VERTEX and EDGE declarations
VERTEX anyVertex;
VERTEX<person> owner;
EDGE anyEdge;
EDGE<friendship> friendEdge;

Vertex and Edge Attribute Types

The following table maps vertex or edge attribute types in the Data Definition Language (DDL) to GSQL query language types. Accumulators are introduced in Section "Accumulators".

DDL

GSQL Query

INT

INT

UINT

UINT

FLOAT

FLOAT

DOUBLE

DOUBLE

BOOL

BOOL

STRING

STRING

STRING COMPRESS

STRING

SET< type >

SetAccum< type >

LIST< type >

ListAccum< type >

DATETIME

DATETIME

JSONOBJECT and JSONARRAY

These two base types allow users to pass a complex data object or to write output in a customized format. These types follow the industry standard definition of JSON at www.json.org . A JSONOBJECT instance's external representation (as input and output) is a string, starting and ending with curly braces "{" and "}", which enclose an unordered list of string:value pairs. A JSONARRAY is represented as a string, starting and ending with square brackets "[" and "]", which enclose an ordered list of values . Since a value can be an object or an array, JSON supports hierarchical, nested data structures.

More details are introduced in the Section entitled "JSONOBJECT and JSONARRAY Functions".

A JSONOBJECT or JSONARRAY value is immutable. No operator is allowed to modify its value.

TUPLE

A tuple is a user-defined data structure consisting of a fixed sequence of baseType variables. Tuple types can be created and named using a TYPEDEF statement. Tuples must be defined first, before any other statements in a query.

ENBF for tuples
typedef := TYPEDEF TUPLE "<" tupleFields ">" tupleType

tupleFields := (baseType fieldName) | (fieldName baseType)
           ["," (baseType fieldName) | (fieldName baseType)]*

A tuple can also be defined in a graph schema and then can be used as a vertex or edge attribute type. A tuple type which has been defined in the graph schema does not need to be re-defined in a query.

The graph schema investmentNet contains two complex attributes:

  • user-defined tuple SECRET_INFO, which is used for the secret_info attribute in the person vertex.

  • portfolio MAP<STRING, DOUBLE > attribute, also in the person vertex.

investmentNet schema
TYPEDEF TUPLE <age UINT (4), mothersName STRING(20) > SECRET_INFO
CREATE VERTEX person(PRIMARY_ID personId STRING, portfolio MAP<STRING, DOUBLE>, secretInfo SECRET_INFO)
CREATE VERTEX stockOrder(PRIMARY_ID orderId STRING, ticker STRING, orderSize UINT, price FLOAT)
CREATE UNDIRECTED EDGE makeOrder(FROM person, TO stockOrder, orderTime DATETIME)
CREATE GRAPH investmentNet (*)

The query below reads both the SECRET_INFO tuple and the portfolio MAP. The tuple type does not need to redefine SECRET_INFO. To read and save the map, we define a MapAccum with the same key:value type as the original portfolio map. (The "Accumulators" chapter has more information about accumulators.) In addition, the query creates a new tuple type, ORDER_RECORD.

tupleEx query
CREATE QUERY tupleEx(VERTEX<person> p) FOR GRAPH investmentNet{
  #TYPEDEF TUPLE <UINT age, STRING mothersName> SECRET_INFO;       # already defined in schema
  TYPEDEF TUPLE <STRING ticker, FLOAT price, DATETIME orderTime> ORDER_RECORD; # new for query

  SetAccum<SECRET_INFO> @@info;
  ListAccum<ORDER_RECORD> @@orderRecords;
  MapAccum<STRING, DOUBLE> @@portf;       # corresponds to MAP<STRING, DOUBLE> attribute

  INIT = {p};

  # Get person p's secret_info and portfolio
  X = SELECT v FROM INIT:v
      ACCUM @@portf += v.portfolio, @@info += v.secretInfo;

  # Search person p's orders to record ticker, price, and order time.
  # Note that the tuple gathers info from both edges and vertices.
  orders = SELECT t
      FROM INIT:s -(makeOrder:e)->stockOrder:t
      ACCUM @@orderRecords += ORDER_RECORD(t.ticker, t.price, e.orderTime);

  PRINT @@portf, @@info;
  PRINT @@orderRecords;
}
tupleEx.json
GSQL > RUN QUERY tupleEx("person1")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "@@info": [{
        "mothersName": "JAMES",
        "age": 25
      }],
      "@@portf": {
        "AAPL": 3142.24,
        "MS": 5000,
        "G": 6112.23
      }
    },
    {"@@orderRecords": [
      {
        "ticker": "AAPL",
        "orderTime": "2017-03-03 18:42:28",
        "price": 34.42
      },
      {
        "ticker": "B",
        "orderTime": "2017-03-03 18:42:30",
        "price": 202.32001
      },
      {
        "ticker": "A",
        "orderTime": "2017-03-03 18:42:29",
        "price": 50.55
      }
    ]}
  ]
}

STRING COMPRESS

STRING COMPRESS is an integer type encoded by the system to represent string values. STRING COMPRESS uses less memory than STRING. The STRING COMPRESS type is designed to act like STRING: data are loaded and printed just as string data, and most functions and operators which take STRING input can also take STRING COMPRESS input. The difference is in how the data are stored internally. A STRING COMPRESS value can be obtained from a STRING_SET COMPRESS or STRING_LIST COMPRESS attribute or from converting a STRING value.

Using STRING COMPRESS instead of STRING is a trade-off: smaller storage vs. slower access times. The storage space will only be smaller if (1) the original strings are long, and (2) there are only a small number of different strings. Performance will always be slower; the slowdown is greater if the STRING COMPRESS attributes are accessed more often. We recommend performing comparison tests for both performance and memory usage before settling on STRING COMPRESS.

STRING COMPRESS type is beneficial for sets of string values when the same values are used multiple times. In practice, STRING COMPRESS are most useful for container accumulators like ListAccum<STRING COMPRESS> or SetAccum<STRING COMPRESS>.

An accumulator (introduced in Section "Accumulator") containing STRING COMPRESS stores the dictionary when it is assigned an attribute value or from another accumulator containing STRING COMPRESS. An accumulator containing STRING COMPRESS can store multiple dictionaries. A STRING value can be converted to a STRING COMPRESS value only if the value is in the dictionaries. If the STRING value is not in the dictionaries, the original string value is saved. A STRING COMPRESS value can be automatically converted to a STRING value.

When a STRING COMPRESS value is output (e.g. by PRINT statement, which is introduced in ), it is shown as a STRING.

STRING COMPRESS is not a base type.

STRING COMPRESS example
CREATE QUERY stringCompressEx(VERTEX<person> m1) FOR GRAPH workNet {
  ListAccum<STRING COMPRESS> @@strCompressList, @@strCompressList2;
  SetAccum<STRING COMPRESS> @@strCompressSet, @@strCompressSet2;
  ListAccum<STRING> @@strList, @@strList2;
  SetAccum<STRING> @@strSet, @@strSet2;

  S = {m1};

  S = SELECT s 
      FROM S:s
      ACCUM @@strSet += s.interestSet,    
            @@strList += s.interestList,   
            @@strCompressSet += s.interestSet,   # use the dictionary from person.interestSet
            @@strCompressList += s.interestList; # use the dictionary from person.interestList

  @@strCompressList2 += @@strCompressList;  # @@strCompressList2 gets the dictionary from @@strCompressList, which is from person.interestList
  @@strCompressList2 += "xyz";   # "xyz" is not in the dictionary, so store the actual string value

  @@strCompressSet2 += @@strCompressSet; 
  @@strCompressSet2 += @@strSet; 

  @@strList2 += @@strCompressList;  # string compress integer values are decoded to strings
  @@strSet2 += @@strCompressSet;  

  PRINT @@strSet, @@strList, @@strCompressSet, @@strCompressList;
  PRINT @@strSet2, @@strList2, @@strCompressSet2, @@strCompressList2;
}
stringCompressEx.json Results
GSQL > RUN QUERY stringCompressEx("person12")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "@@strCompressList": [
        "music",
        "engineering",
        "teaching",
        "teaching",
        "teaching"
      ],
      "@@strSet": [ "teaching", "engineering", "music" ],
      "@@strCompressSet": [ "music", "engineering", "teaching" ],
      "@@strList": [
        "music",
        "engineering",
        "teaching",
        "teaching",
        "teaching"
      ]
    },
    {
      "@@strSet2": [ "music", "engineering", "teaching" ],
      "@@strCompressList2": [
        "music",
        "engineering",
        "teaching",
        "teaching",
        "teaching",
        "xyz"
      ],
      "@@strList2": [
        "music",
        "engineering",
        "teaching",
        "teaching",
        "teaching"
      ],
      "@@strCompressSet2": [ "teaching", "engineering", "music" ]
    }
  ]
}

FILE Object

A FILE object is a sequential data storage object, associated with a text file on the local machine.

When referring to a FILE object, we always capitalize the word FILE, to distinguish it from ordinary files.

When a FILE object is declared, associated with a particular text file, any existing content in the text file will be erased . During the execution of the query, content written to the FILE will be appended to the FILE. When the query where the FILE was declared finishes running, the FILE contents are saved to the text file.

A FILE object can be passed as a parameter to another query. When a query receives a FILE object as a parameter, it can append data to that FILE, as can every other query which receives this FILE object as a parameter.

Query Parameter Types

Input parameters to a query can be base type (except EDGE or JSONOBJECT). A parameter can also be a SET or BAG which uses base type (except EDGE or JSONOBJECT) as the element type. A FILE object can also be a parameter. Within the query, SET and BAG are converted to SetAccum and BagAccum, respectively (See Section "Accumulator" for more details).

A query parameter is immutable . It cannot be assigned a new value within the query.

The FILE object is a special case. It is passed by reference, meaning that the receiving query gets a link to the original FILE object. The receiving query can write to the FILE.

EBNF
parameterType := baseType
               | [ SET | BAG ] "<" baseType ">"
               | FILE
Examples of collection type parameters
(SET<VERTEX<person> p1, BAG<INT> ids, FILE f1)

Last updated