Approximate Closeness Centrality
In the Closeness Centrality algorithm, to obtain the closeness centrality score for a vertex, we measure the distance from the source vertex to every single vertex in the graph. In large graphs, running this calculation for every vertex can be highly timeconsuming.
The Approximate Closeness Centrality algorithm (based on Cohen et al. 2014) calculates the approximate closeness centrality score for each vertex by combining two estimation approaches  sampling and pivoting. This hybrid estimation approach offers nearlinear time processing and linear space overhead within a small relative error. It runs on graphs with unweighted edges (directed or undirected).
Notes
This query uses another subquery closeness_cent_approx_sub
,
which needs to be installed before closeness_approx
can be successfully run.
Specifications
tg_closeness_approx (
SET<STRING> v_type,
SET<STRING> e_type,
INT k = 100, # sample num
INT max_hops = 10, # max BFS explore steps
DOUBLE epsilon = 0.1, # error parameter
BOOL print_accum = true, # output to console
STRING file_path = "", # output file
INT debug = 0, # debug flag  0: No LOG;1: LOG without the samplenode bfs loop;2: ALL LOG.
INT sample_index = 0, # random sample group
INT maxsize = 1000, # max size of connected components using exact closeness algorithm
BOOL wf = True # Wasserman and Faust formula
)
Parameters
Name  Description 


Vertex types to calculate approximate closeness centrality for. 

Edge types to traverse. 

Size of the sample. 

Upper limit of how many jumps the algorithm will perform from each vertex. 

The maximum relative error, between 0 and 1. Setting a lower value produces a more accurate estimate but increases run time. 

Boolean value that indicates whether or not to output to console in JSON format. 

If provided, the algorithm will output to this file path in CSV format 

There are many conditional logging statements inside the query. If the input is 0, nothing will be logged. If the input is 1, everything else but the breadthfirstsearch process from the samplenode. If the input is 2, everything will be logged. 

The algorithm will partition the graph based on the sample size. This index indicates which partition to use when estimating closeness centrality. 

If the number of vertices in the graph is lower than 

Boolean value that indicates whether to use the Wasserman and Faustformula to calculate closeness centrality rather than the classic formula. 
Result
The result is a list of all vertices in the graph with their approximate closeness centrality score. It is available both in JSON and CSV format.
Example
Below is an example of running the algorithm on the social10 test graph and an excerpt of the response.
RUN QUERY tg_closeness_aprox(["Person"], ["Friend", "Coworker"], 6, 3 \
0.1, true, "", 0, 0, 100, false)
[
{
"Start": [
{
"attributes": {
"Start.@closeness": 0.58333
},
"v_id": "Fiona",
"v_type": "Person"
},
{
"attributes": {
"Start.@closeness": 0.44444
},
"v_id": "Justin",
"v_type": "Person"
},
{
"attributes": {
"Start.@closeness": 0.53333
},
"v_id": "Bob",
"v_type": "Person"
}
]