Influence Maximization

Supported Graph Characteristics

Weighted edges

Unweighted edges

Homogeneous vertex types

Influence maximization is the problem of finding a small subset of vertices in a social network that could maximize the spread of influence. There are two versions of the Influence Maximization algorithm. Both versions find k vertices that maximize the expected spread of influence in the network. The CELF version improves upon the efficiency of the greedy version and should be preferred in analyzing large networks.

Notes

The algorithm is implemented based on the following research papers:

Specifications

CELF

CREATE QUERY tg_influence_maximization_CELF(STRING v_type,STRING e_type, STRING weight_attribute, INT top_k, BOOL print_results = True, STRING file_path = "")

Greedy

CREATE QUERY tg_influence_maximization_greedy(STRING v_type,
  STRING e_type,STRING weight_attribute,INT top_k,
  BOOL print_results = True, STRING file_path = "")

Parameters

The CELF version and the greedy version of the algorithms share the same set of parameters.

Name Description

Name	Description
`STRING v_type`	The vertex type to measure influence of
`STRING e_type`	The edge type to traverse
`STRING weight_attribute`	The name of the weight attribute on the edge type
`INT top_k`	The number of vertices with the highest influence score to return
`BOOL print_results`	If true, print results to JSON output.
`STRING file_path`	If not empty, save results in CSV to this file.

STRING v_type

The vertex type to measure influence of

STRING e_type

The edge type to traverse

STRING weight_attribute

The name of the weight attribute on the edge type

INT top_k

The number of vertices with the highest influence score to return

BOOL print_results

If true, print results to JSON output.

STRING file_path

If not empty, save results in CSV to this file.

Output

Returns the ID of the vertices with the highest influence scores along with their scores.

Time complexity

The greedy version has a complexity of \(O(E * k)\), where \(E\) is the number of edges and \(k\) is the number of top scores to report.

The CELF version also has a worst-case complexity of \(O(E * k)\), but generally performs much better with a best-case complexity of \(O(E)\).

Examples

We use this example graph showing a social network.

CELF

[
  {
    "@@res_list": [
      {
        "Vertex_ID": "Dan",
        "score": 4
      },
      {
        "Vertex_ID": "Vicki",
        "score": 3
      },
      {
        "Vertex_ID": "Nancy",
        "score": 2
      },
      {
        "Vertex_ID": "Juan",
        "score": 1
      },
      {
        "Vertex_ID": "Kevin",
        "score": 1
      },

    ]
  }
]

Greedy

[
  {
    "@@res_list": [
      {
        "Vertex_ID": "Dan",
        "score": 4
      },
      {
        "Vertex_ID": "Nancy",
        "score": 3
      },
      {
        "Vertex_ID": "Vicki",
        "score": 2
      },
      {
        "Vertex_ID": "Juan",
        "score": 1
      },
      {
        "Vertex_ID": "Jenny",
        "score": 1
      },
    ]
  }
]