Fast Random Projection
Fast Random Projection (FastRP) is a scalable and performant nodeembedding algorithm. It generates node embeddings (vectors) of low dimensionality through random projections from the graph’s adjacency matrix (a highdimensional matrix) to a lowdimensional matrix, significantly reducing the computing power required to process the data. The algorithm is theoretically backed by the the JohnssonLindenstrauss lemma, which states that a set of points in a highdimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved.
This algorithm is especially useful in the following situations:

Your data is in such high dimension that it is too expensive to perform calculations.

The dimensionality is very high and the number of samples are too sparse to calculate covariances.

You don’t have access to the entire dataset, such as when working with realtime data.
Specifications
CREATE QUERY tg_fastRP_query(
INT num_edges, INT num_nodes, INT k, INT sampling_constant,
INT reduced_dimension, DOUBLE normalization_strength, STRING input_weights,
STRING index_attr)
This algorithm requires an index attribute on the schema. Before running the algorithm, create an attribute on all vertex types to hold the index. You can use our provided schema change job script in the algorithm library.
After the attribute has been created, run the tg_fastRP_preprocessing.gsql query to index all vertices on the graph.
Parameters
Parameter  Description  Data type 


The total number of edges in the graph. 


The total number of vertices in the graph. 


The sampling constant. A high 


The length of the produced vectors. A greater dimension offers a greater precision, but is more costly to operate over. 


The number of times the algorithm iteratively constructs intermediate embeddings. A 


The normalization strength determines how node degrees influence the embedding. Using a negative value will downplay the importance of high degree neighbors, while a positive value will instead increase their importance. 


A list of floats that determines the weight of the intermediate embeddings constructed in each iteration. 


The name of the index attribute created during preprocessing. 

Parameter tuning
The optimal values for the following parameters depend on your dataset. In order to obtain the best quality embeddings for your graph, it is a good idea to tune these parameters.
reduced_dimension
The reduced dimension (reduced_dimension
) is the length of the produced vectors. A greater dimension offers a greater precision, but is more costly to operate over.
input_weights
The algorithm interactively constructs intermediate embeddings by averaging either neighboring intermediate embeddings from the previous iteration, or the generated random vectors during the first iteration. The final embeddings is a weighted sum of the intermediate embeddings from each iteration, and the input_weights
parameter determine how much each set of intermediate embeddings weigh.
normalization_strength
The normalization strength determines how node degrees influence the embedding. Using a negative value will downplay the importance of high degree neighbors, while a positive value will instead increase their importance.
sampling_constant
FastRP uses very sparse random projection to reduce the dimensionality of the data from an n*m matrix to an n*d matrix where d ⇐ m by multiplying the original matrix with a m*d matrix. The m*d matrix is made up of independently and identically distributed data sampled from:
Where s is the sampling constant (sampling_constant
). The higher the constant, the higher the number of zeros in the resulting matrix, which speeds up the algorithm.