Formal Grammar for Query Language

This is the definition for the GSQL Query Language syntax. It is defined as a set of rules expressed in EBNF notation.

Notation Used to Define Syntax

This defines the EBNF notation used to describe the syntax. Rules contains terminal and non-terminal symbols. A terminal symbol is a base-level symbol which expresses literal output. All symbols in single or double quotes (e.g., '+', "=", ")", "10") are terminal symbols. A non-terminal symbol is defined as some combination of terminal and non-terminal symbols. The left-hand side of a rule is always a non-terminal; this rule defines the non-terminal. The example rule below defines assignmentStmt (that is, an Assignment Statement) to be a name followed by an equal sign followed by an expression, operator, and expression with a terminating semi-colon. AssignmentStmt, name, and expr are all non-terminals. Additionally, all KEYWORDS are in all-capitals and are terminal symbols. The ":=" is part of EBNF and states the left hand side can be expanded to the right hand side.

EBNF Syntax example: A rule
assignmentStmt := name "=" expr op expr ";"

A vertical bar | in EBNF indicates choice. Choose either the symbol on the left or on the right. A sequence of vertical bars means choose any one of the symbols in the sequence.

EBNF Syntax: vertical bar
op := "+" | "-" | "*" | "/"

Square brackets [ ] indicate an optional part or group of symbols. Parentheses ( ) group symbols together. The rule below defines a constant to be one, two, or three digits preceded by an optional plus or minus sign.

EBNF Syntax: Square brackets and parentheses
constant := ["+" | "-"] (digit | (digit digit) | (digit digit digit))

Star * and plus + are symbols in EBNF for closure. Star means zero or more occurrences, and plus means one or more occurrences. The following defines intConstant to be an optional plus or minus followed by one or more digits. It also defines floatConstant to be an optional plus or minus followed by zero or more digits followed by a decimal followed by one or more digits. The star and plus also can be applied to groups of symbols as in the definition of list. The non-terminal list is defined as a parenthesized list of comma-separated expressions (expr). The list has at least one expression which can be followed by zero or more comma-expression pairs.

EBNF Syntax: square brackets and parentheses
intConstant := ["+" | "-"] digit+
floatConstant := ["+" | "-"] digit* "." digit+
list := "(" expr ["," expr]* ")"

GSQL Query Language EBNF

#########################################################
## EBNF for GSQL Query Language

createQuery := CREATE [OR REPLACE] [DISTRIBUTED] QUERY queryName
               "(" [parameterList] ")"
               [FOR GRAPH graphName]
               [RETURNS "("  baseType | accumType ")"]
               [API "(" stringLiteral ")"]
               [SYNTAX syntaxName]
               "{" queryBody "}"

interpretQuery := INTERPRET QUERY "(" ")"
               [FOR GRAPH graphName]
               [SYNTAX syntaxName]
               "{" queryBody "}"

parameterValueList := parameterValue ["," parameterValue]*
parameterValue := parameterConstant
                | "[" parameterValue ["," parameterValue]* "]"  // BAG or SET
                | "(" stringLiteral "," stringLiteral ")"        // a generic VERTEX value
parameterConstant := numeric | stringLiteral | TRUE | FALSE
parameterList := parameterType paramName ["=" constant]
                 ["," parameterType paramName  ["=" constant]]*

syntaxName := name

queryBody := [typedefs] [declExceptStmts] queryBodyStmts
typedefs := (typedef ";")+
declStmts := (declStmt ";")+
declStmt := baseDeclStmt | accumDeclStmt | fileDeclStmt
declExceptStmts := (declExceptStmt ";")+
queryBodyStmts := (queryBodyStmt ";")+
queryBodyStmt := assignStmt           // Assignment
               | vSetVarDeclStmt      // Declaration
               | declStmts            // Declaration
               | lAccumAssignStmt     // Assignment
               | gAccumAssignStmt     // Assignment
               | gAccumAccumStmt      // Assignment
               | funcCallStmt         // Function Call
               | selectStmt           // Select
               | queryBodyCaseStmt    // Control Flow
               | queryBodyIfStmt      // Control Flow
               | queryBodyWhileStmt   // Control Flow
               | queryBodyForEachStmt // Control Flow
               | BREAK                // Control Flow
               | CONTINUE             // Control Flow
               | updateStmt           // Data Modification
               | insertStmt           // Data Modification
               | queryBodyDeleteStmt  // Data Modification
               | printStmt            // Output
               | printlnStmt          // Output
               | logStmt              // Output
               | returnStmt           // Output
               | raiseStmt            // Exception
               | tryStmt              // Exception

installQuery := INSTALL QUERY [installOptions] ( "*" | ALL |queryName ["," queryName]* )
runQuery := (RUN | INTERPRET) QUERY [runOptions] queryName "(" parameterValueList ")"

showQuery := SHOW QUERY queryName
dropQuery := DROP QUERY ( "*" | ALL | queryName ["," queryName]* )

#########################################################
## Types and names

lowercase          := [a-z]
uppercase          := [A-Z]
letter             := lowercase | uppercase
digit              := [0-9]
integer            := ["-"]digit+
real               := ["-"]("."digit+) | ["-"](digit+"."digit*)

numeric            := integer | real
stringLiteral      := '"' [~["] | '\\' ('"' | '\\')]* '"'

name := (letter | "_") [letter | digit | "_"]*   // can be a single "_" or start with "_"
graphName := name
queryName := name
paramName := name
vertexType := name
edgeType := name
accumName := name
vertexSetName := name
attrName := name
varName := name
tupleType := name
fieldName := name
funcName := name

type := baseType | tupleType | accumType | STRING COMPRESS

baseType := INT
          | UINT
          | FLOAT
          | DOUBLE
          | STRING
          | BOOL
          | VERTEX ["<" vertexType ">"]
          | EDGE
          | JSONOBJECT
          | JSONARRAY
          | DATETIME

filePath := paramName | stringLiteral

typedef := TYPEDEF TUPLE "<" tupleFields ">" tupleType

tupleFields := (baseType fieldName) | (fieldName baseType)
           ["," (baseType fieldName) | (fieldName baseType)]*

parameterType := baseType
               | [ SET | BAG ] "<" baseType ">"
               | FILE

#########################################################
## Accumulators

accumDeclStmt := accumType localAccumName ["=" constant]
                        ["," localAccumName ["=" constant]]*
               | accumType globalAccumName ["=" constant]
                                 ["," globalAccumName ["=" constant]]*
localAccumName := "@"accumName;
globalAccumName := "@@"accumName;


accumType := "SumAccum" "<" ( INT | FLOAT | DOUBLE | STRING | STRING COMPRESS) ">"
           | "MaxAccum" "<" ( INT | FLOAT | DOUBLE ) ">"
           | "MinAccum" "<" ( INT | FLOAT | DOUBLE ) ">"
           | "AvgAccum"
           | "OrAccum"
           | "AndAccum"
           | "BitwiseOrAccum"
           | "BitwiseAndAccum"
           | "ListAccum" "<" type ">"
           | "SetAccum"  "<" elementType ">"
           | "BagAccum"  "<" elementType ">"
           | "MapAccum"  "<" elementType "," (baseType | accumType | tupleType) ">"
           | "HeapAccum" "<" tupleType ">" "(" simpleSize "," fieleName [ASC | DESC]
                                              ["," fieldName [ASC | DESC]]* ")"
           | "GroupByAccum" "<" elementType fieldName ["," elementType fieldName]* ,
		                        accumType fieldName ["," accumType fieldName]* ">"
           | "ArrayAccum" "<" accumName ">"

elementType := baseType | tupleType | STRING COMPRESS

gAccumAccumStmt := globalAccumName "+=" expr

###############################################################################
## Operators, Functions, and Expressions

constant := numeric | stringLiteral | TRUE | FALSE | GSQL_UINT_MAX
          | GSQL_INT_MAX | GSQL_INT_MIN | TO_DATETIME "(" stringLiteral ")"

mathOperator := "*" | "/" | "%" | "+" | "-" | "<<" | ">>" | "&" | "|"

condition := expr
           | expr comparisonOperator expr
           | expr [ NOT ] IN setBagExpr
           | expr IS [ NOT ] NULL
           | expr BETWEEN expr AND expr
           | "(" condition ")"
           | NOT condition
           | condition (AND | OR) condition
           | (TRUE | FALSE)
           | expr [NOT] LIKE expr [ESCAPE escape_char]

comparisonOperator := "<" | "<=" | ">" | ">=" | "==" | "!="

aggregator := COUNT | MAX | MIN | AVG | SUM

expr := name
    | globalAccumName
		| name "." name
		| name "." localAccumName ["\'"]
		| name "." name "." name "(" [argList] ")"
    | name "." name "(" [argList] ")" [ "." FILTER "(" condition ")" ]
		| name ["<" type ["," type]* ">"] "(" [argList] ")"
		| name "." localAccumName ("." name "(" [argList] ")")+ ["." name]
		| globalAccumName ("." name "(" [argList] ")")+ ["." name]
		| COALESCE "(" [argList] ")"
		| aggregator "(" [DISTINCT] setBagExpr ")"
		| ISEMPTY "(" setBagExpr ")"
		| expr mathOperator expr
		| "-" expr
		| "(" expr ")"
		| "(" argList "->" argList ")"	// key value pair for MapAccum
		| "[" argList "]"               // a list
		| constant
		| setBagExpr
		| name "(" argList ")"          // function call or a tuple object

setBagExpr := name
        | globalAccumName
    	  | name "." name
		    | name "." localAccumName
		    | name "." localAccumName ("." name "(" [argList] ")")+
		    | name "." name "(" [argList] ")" [ "." FILTER "(" condition ")" ]
		    | globalAccumName ("." name "(" [argList] ")")+
		    | setBagExpr (UNION | INTERSECT | MINUS) setBagExpr
		    | "(" argList ")"
		    | "(" setBagExpr ")"

#########################################################
## Declarations and Assignments ##

## Declarations ##
baseDeclStmt    := baseType name ["=" expr] ["," name ["=" expr]]*
fileDeclStmt := FILE fileVar "(" filePath ")"
fileVar := name

localVarDeclStmt := baseType varName "=" expr
assignDeclLocalTuple := (tupleTypeName | anonymousTupleType) localTupleVal "=" expr

vSetVarDeclStmt := vertexSetName ["(" vertexType ")"]
                   "=" (seedSet | simpleSet | selectBlock)

simpleSet := vertexSetName | "(" simpleSet ")"
           | simpleSet (UNION | INTERSECT | MINUS) simpleSet

seedSet := "{" [seed ["," seed ]*] "}"
seed := '_'
      | ANY
      | vertexSetName
      | globalAccumName
      | vertexType ".*"
      | paramName
      | "SelectVertex" selectVertParams

selectVertParams := "(" filePath "," columnId "," (columnId | name) ","
                 stringLiteral "," (TRUE | FALSE) ")" ["." FILTER "(" condition ")"]

columnId := "$"(integer | stringLiteral)

## Assignment Statements ##
assignStmt := name "=" expr
            | name "." attrName "=" expr

attrAccumStmt := name "." attrName "+=" expr

lAccumAssignStmt := vertexAlias "." localAccumName ("+="| "=") expr

gAccumAssignStmt :=  globalAccumName ("+=" | "=") expr

loadAccumStmt := globalAccumName "=" "{" LOADACCUM loadAccumParams
                                  ["," LOADACCUM loadAccumParams]* "}"

loadAccumParams := "(" filePath "," columnId ["," [columnId]* ","
                stringLiteral "," (TRUE | FALSE) ")" ["." FILTER "(" condition ")"]

## Function Call Statement ##
funcCallStmt := name ["<" type ["," type"]* ">"] "(" [argList] ")"
              | globalAccumName ("." funcName "(" [argList] ")")+

argList := expr ["," expr]*


#########################################################
## Select Statement

selectStmt  := gsqlSelectBlock
             | sqlSelectBlock

gsqlSelectBlock := gsqlSelectClause
               fromClause
               [sampleClause]
               [whereClause]
               [accumClause]
               [postAccumClause]*
               [havingClause]
               [orderClause]
               [limitClause]

sqlSelectBlock := sqlSelectClause
               fromClause
               [whereClause]
               [groupByClause]
               [havingClause]
               [orderClause]
               [limitClause]

gsqlSelectClause := vertexSetName "=" SELECT vertexAlias
sqlSelectClause := SELECT [DISTINCT] columnExpr ("," columnExpr)*
                   INTO tableName
columnExpr := expr [AS columnName]
            | aggregator "("[DISTINCT] expr ")" [AS columnName]
columnName := name
tableName := name

fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)

step   :=  stepSourceSet ["-" "(" stepEdgeSet ")" ("-"|"->") stepVertexSet]
stepV2 :=  stepVertexSet ["-" "(" stepEdgeSet ")" "-" stepVertexSet]

stepSourceSet := vertexSetName [":" vertexAlias]
stepEdgeSet := [stepEdgeTypes] [":" edgeAlias]
stepVertexSet := [stepVertexTypes] [":" vertexAlias]
alias := (vertexAlias | edgeAlias)
vertexAlias := name
edgeAlias := name

stepEdgeTypes := atomicEdgeType | "(" edgeSetType ["|" edgeSetType]* ")"
atomicEdgeType := "_" | ANY | edgeSetType
edgeSetType := edgeType | paramName | globalAccumName

stepVertexTypes := atomicVertexType | "(" vertexSetType ["|" vertexSetType]* ")"
atomicVertexType := "_" | ANY | vertexSetType
vertexSetType := vertexType | paramName | globalAccumName

#----------# Pattern Matching #----------#
pathPattern :=  stepVertexSet ["-" "(" pathEdgePattern ")" "-" stepVertexSet]*

pathEdgePattern := atomicEdgePattern
                 | "(" pathEdgePattern ")"
                 | pathEdgePattern "." pathEdgePattern
                 | disjPattern
                 | starPattern

atomicEdgePattern  := atomicEdgeType
        	        | atomicEdgeType ">"
        	        | "<" atomicEdgeType

disjPattern := atomicEdgePattern ("|" atomicEdgePattern)*

starPattern := ([atomicEdgePattern] | "(" disjPattern ")") "*" [starBounds]

starBounds := CONST_INT ".." CONST_INT
            | CONST_INT ".."
            | ".." CONST_INT
            | CONST_INT
#--------------------------------------#

sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition
              | SAMPLE expr TARGET WHEN condition
              | SAMPLE expr "%" TARGET PINNED WHEN condition

whereClause := WHERE condition

accumClause := [perClauseV2] ACCUM dmlSubStmtList

perClauseV2 := PER "(" alias ["," alias] ")"

postAccumClause := "POST-ACCUM" dmlSubStmtList

dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*

dmlSubStmt := assignStmt           // Assignment
            | funcCallStmt         // Function Call
            | gAccumAccumStmt      // Assignment
            | lAccumAccumStmt      // Assignment
            | attrAccumStmt        // Assignment
            | vAccumFuncCall       // Function Call
            | localVarDeclStmt     // Declaration
            | dmlSubCaseStmt       // Control Flow
            | dmlSubIfStmt         // Control Flow
            | dmlSubWhileStmt      // Control Flow
            | dmlSubForEachStmt    // Control Flow
            | BREAK                // Control Flow
            | CONTINUE             // Control Flow
            | insertStmt           // Data Modification
            | dmlSubDeleteStmt     // Data Modification
            | printlnStmt          // Output
            | logStmt              // Output


vAccumFuncCall := vertexAlias "." localAccumName ("." funcName "(" [argList] ")")+

groupByClause := GROUP BY groupExpr ("," groupExpr)*
groupExpr := expr

havingClause := HAVING condition

orderClause := ORDER BY expr [ASC | DESC] ["," expr [ASC | DESC]]*

limitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )

#########################################################
## Control Flow Statements ##

queryBodyIfStmt := IF condition THEN queryBodyStmts
                 [ELSE IF condition THEN queryBodyStmts ]*
                 [ELSE queryBodyStmts ] END
dmlSubIfStmt :=    IF condition THEN dmlSubStmtList
                 [ELSE IF condition THEN dmlSubStmtList ]*
                 [ELSE dmlSubStmtList ] END

queryBodyCaseStmt := CASE  (WHEN condition THEN queryBodyStmts)+ [ELSE queryBodyStmts] END
               | CASE expr (WHEN constant  THEN queryBodyStmts)+ [ELSE queryBodyStmts] END
dmlSubCaseStmt := CASE     (WHEN condition THEN dmlSubStmtList)+ [ELSE dmlSubStmtList] END
               | CASE expr (WHEN constant  THEN dmlSubStmtList)+ [ELSE dmlSubStmtList] END

queryBodyWhileStmt := WHILE condition [LIMIT simpleSize] DO queryBodyStmts END
dmlSubWhileStmt :=    WHILE condition [LIMIT simpleSize] DO dmlSubStmtList END
simpleSize := integer | varName | paramName

queryBodyForEachStmt := FOREACH forEachControl DO queryBodyStmts END
dmlSubForEachStmt :=    FOREACH forEachControl DO dmlSubStmtList END

forEachControl := ( iterationVar | "(" keyVar ("," valueVar)+ ")") (IN | ":") setBagExpr
                | iterationVar IN RANGE "[" expr "," expr"]" ["." STEP "(" expr ")"]
iterationVar := name
keyVar := name
valueVar := name

#########################################################
## Other Data Modifications Statements ##

queryBodyDeleteStmt := DELETE alias FROM pattern [whereClause]
dmlSubDeleteStmt := DELETE "(" alias ")"

updateStmt := UPDATE alias FROM pattern SET dmlSubStmtList [whereClause]

insertStmt := insertVertexStmt | insertEdgeStmt
insertVertexStmt := INSERT INTO (vertexType | name)
                 ["(" PRIMARY_ID ["," attrName]* ")"]
                 VALUES "(" ( "_" | expr ) ["," ("_" | expr)]*] ")"

insertEdgeStmt   := INSERT INTO (edgeType | EDGE name)
                 ["(" FROM "," TO ["," attrName]* ")"]
                 VALUES "(" ( "_" | expr ) [vertexType]
                 ["," ( "_" | expr ) [vertexType] ["," ("_" | expr)]*] ")"

#########################################################
## Output Statements ##

printStmt := PRINT printExpr ("," printExpr)* [WHERE condition] [TO_CSV (filePath | fileVar)]
printExpr := (expr | vExprSet) [ AS jsonKey]
           | tableName
vExprSet  := expr "[" vSetProj ("," vSetProj)* "]"
vSetProj  := expr [ AS jsonKey]
jsonKey := name

printlnStmt := fileVar ".println" "(" expr ("," expr)* ")"

logStmt := LOG "(" condition "," argList ")"

returnStmt := RETURN expr

#########################################################
## Exception Statements ##

declExceptStmt := EXCEPTION exceptVarName "(" errorInt ")"
exceptVarName  := name
errorInt      := integer


raiseStmt       := RAISE exceptVarName [errorMsg]
errorMsg        := "(" expr ")"


tryStmt         := TRY queryBodyStmts EXCEPTION caseExceptBlock+ [elseExceptBlock] END ";"
caseExceptBlock := WHEN exceptVarName THEN queryBodyStmts
elseExceptBlock := ELSE queryBodyStmts