algorithms
#
Algorithmic functions that can be run on Raphtory graphs
Classes#
Matching
#
A Matching (i.e., a set of edges that do not share any nodes)
Infected
#
Functions#
dijkstra_single_source_shortest_paths(graph, source, targets, direction='both', weight='weight')
#
Finds the shortest paths from a single source to multiple targets in a graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
The graph to search in. |
required |
source
|
NodeInput
|
The source node. |
required |
targets
|
list[NodeInput]
|
A list of target nodes. |
required |
direction
|
Direction
|
The direction of the edges to be considered for the shortest path. Defaults to "both". |
'both'
|
weight
|
str
|
The name of the weight property for the edges. Defaults to "weight". |
'weight'
|
Returns:
Type | Description |
---|---|
NodeStateWeightedSP
|
Mapping from nodes to a tuple containing the total cost and the nodes representing the shortest path. |
global_reciprocity(graph)
#
Reciprocity - measure of the symmetry of relationships in a graph, the global reciprocity of the entire graph. This calculates the number of reciprocal connections (edges that go in both directions) in a graph and normalizes it by the total number of directed edges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
a directed Raphtory graph |
required |
Returns:
Type | Description |
---|---|
float
|
reciprocity of the graph between 0 and 1. |
betweenness_centrality(graph, k=None, normalized=True)
#
Computes the betweenness centrality for nodes in a given graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A reference to the graph. |
required |
k
|
int
|
Specifies the number of nodes to consider for the centrality computation. All nodes are considered by default. |
None
|
normalized
|
bool
|
Indicates whether to normalize the centrality values. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
NodeStateF64
|
Mapping from nodes to their betweenness centrality. |
all_local_reciprocity(graph)
#
Local reciprocity - measure of the symmetry of relationships associated with a node
This measures the proportion of a node's outgoing edges which are reciprocated with an incoming edge.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
a directed Raphtory graph |
required |
Returns:
Type | Description |
---|---|
NodeStateF64
|
Mapping of nodes to their reciprocity value. |
triplet_count(graph)
#
Computes the number of connected triplets within a graph
A connected triplet (also known as a wedge, 2-hop path) is a pair of edges with one node in common. For example, the triangle made up of edges A-B, B-C, C-A is formed of three connected triplets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
a Raphtory graph, treated as undirected |
required |
Returns:
Type | Description |
---|---|
int
|
the number of triplets in the graph |
local_triangle_count(graph, v)
#
Implementations of various graph algorithms that can be run on a graph.
To run an algorithm simply import the module and call the function with the graph as the argument
Local triangle count - calculates the number of triangles (a cycle of length 3) a node participates in.
This function returns the number of pairs of neighbours of a given node which are themselves connected.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Raphtory graph, this can be directed or undirected but will be treated as undirected |
required |
v
|
NodeInput
|
node id or name |
required |
Returns:
Type | Description |
---|---|
int
|
number of triangles associated with node v |
average_degree(graph)
#
The average (undirected) degree of all nodes in the graph.
Note that this treats the graph as simple and undirected and is equal to twice the number of undirected edges divided by the number of nodes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
a Raphtory graph |
required |
Returns:
Type | Description |
---|---|
float
|
the average degree of the nodes in the graph |
directed_graph_density(graph)
#
Graph density - measures how dense or sparse a graph is.
The ratio of the number of directed edges in the graph to the total number of possible directed edges (given by N * (N-1) where N is the number of nodes).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
a directed Raphtory graph |
required |
Returns:
Type | Description |
---|---|
float
|
Directed graph density of graph. |
degree_centrality(graph)
#
Computes the degree centrality of all nodes in the graph. The values are normalized by dividing each result with the maximum possible degree. Graphs with self-loops can have values of centrality greater than 1.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
The graph view on which the operation is to be performed. |
required |
Returns:
Type | Description |
---|---|
NodeStateF64
|
Mapping of nodes to their associated degree centrality. |
max_degree(graph)
#
min_degree(graph)
#
max_out_degree(graph)
#
max_in_degree(graph)
#
min_out_degree(graph)
#
min_in_degree(graph)
#
pagerank(graph, iter_count=20, max_diff=None, use_l2_norm=True, damping_factor=0.85)
#
Pagerank -- pagerank centrality value of the nodes in a graph
This function calculates the Pagerank value of each node in a graph. See https://en.wikipedia.org/wiki/PageRank for more information on PageRank centrality. A default damping factor of 0.85 is used. This is an iterative algorithm which terminates if the sum of the absolute difference in pagerank values between iterations is less than the max diff value given.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Raphtory graph |
required |
iter_count
|
int
|
Maximum number of iterations to run. Note that this will terminate early if convergence is reached. Defaults to 20. |
20
|
max_diff
|
Optional[float]
|
Optional parameter providing an alternative stopping condition. The algorithm will terminate if the sum of the absolute difference in pagerank values between iterations is less than the max diff value given. |
None
|
use_l2_norm
|
bool
|
Flag for choosing the norm to use for convergence checks, True for l2 norm, False for l1 norm. Defaults to True. |
True
|
damping_factor
|
float
|
The damping factor for the PageRank calculation. Defaults to 0.85. |
0.85
|
Returns:
Type | Description |
---|---|
NodeStateF64
|
Mapping of nodes to their pagerank value. |
single_source_shortest_path(graph, source, cutoff=None)
#
Calculates the single source shortest paths from a given source node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A reference to the graph. Must implement |
required |
source
|
NodeInput
|
The source node. |
required |
cutoff
|
int
|
An optional cutoff level. The algorithm will stop if this level is reached. |
None
|
Returns:
Type | Description |
---|---|
NodeStateNodes
|
Mapping from end node to shortest path from the source node. |
global_clustering_coefficient(graph)
#
Computes the global clustering coefficient of a graph. The global clustering coefficient is defined as the number of triangles in the graph divided by the number of triplets in the graph.
Note that this is also known as transitivity and is different to the average clustering coefficient.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
a Raphtory graph, treated as undirected |
required |
Returns:
Type | Description |
---|---|
float
|
the global clustering coefficient of the graph |
See also
temporally_reachable_nodes(graph, max_hops, start_time, seed_nodes, stop_nodes=None)
#
Temporally reachable nodes -- the nodes that are reachable by a time respecting path followed out from a set of seed nodes at a starting time.
This function starts at a set of seed nodes and follows all time respecting paths until either a) a maximum number of hops is reached, b) one of a set of stop nodes is reached, or c) no further time respecting edges exist. A time respecting path is a sequence of nodes v_1, v_2, ... , v_k such that there exists a sequence of edges (v_i, v_i+1, t_i) with t_i < t_i+1 for i = 1, ... , k - 1.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
directed Raphtory graph |
required |
max_hops
|
int
|
maximum number of hops to propagate out |
required |
start_time
|
int
|
time at which to start the path (such that t_1 > start_time for any path starting from these seed nodes) |
required |
seed_nodes
|
list[NodeInput]
|
list of node names or ids which should be the starting nodes |
required |
stop_nodes
|
Optional[list[NodeInput]]
|
nodes at which a path shouldn't go any further |
None
|
Returns:
Type | Description |
---|---|
NodeStateReachability
|
Mapping of nodes to their reachability history. |
temporal_bipartite_graph_projection(graph, delta, pivot_type)
#
Projects a temporal bipartite graph into an undirected temporal graph over the pivot node type. Let G
be a bipartite graph with node types A
and B
. Given delta > 0
, the projection graph G'
pivoting over type B
nodes,
will make a connection between nodes n1
and n2
(of type A
) at time (t1 + t2)/2
if they respectively have an edge at time t1
, t2
with the same node of type B
in G
, and |t2-t1| < delta
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A directed raphtory graph |
required |
delta
|
int
|
Time period |
required |
pivot_type
|
str
|
node type to pivot over. If a bipartite graph has types |
required |
Returns:
Type | Description |
---|---|
Graph
|
Projected (unipartite) temporal graph. |
local_clustering_coefficient(graph, v)
#
Local clustering coefficient - measures the degree to which nodes in a graph tend to cluster together.
The proportion of pairs of neighbours of a node who are themselves connected.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Raphtory graph, can be directed or undirected but will be treated as undirected. |
required |
v
|
NodeInput
|
node id or name |
required |
Returns:
Type | Description |
---|---|
float
|
the local clustering coefficient of node v in graph. |
local_clustering_coefficient_batch(graph, v)
#
Returns the Local clustering coefficient (batch, intersection) for each specified node in a graph. This measures the degree to which one or multiple nodes in a graph tend to cluster together.
Uses path-counting for its triangle-counting step.
weakly_connected_components(graph)
#
Weakly connected components -- partitions the graph into node sets which are mutually reachable by an undirected path
This function assigns a component id to each node such that nodes with the same component id are mutually reachable by an undirected path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Raphtory graph |
required |
Returns:
Type | Description |
---|---|
NodeStateUsize
|
Mapping of nodes to their component ids. |
strongly_connected_components(graph)
#
Strongly connected components
Partitions the graph into node sets which are mutually reachable by an directed path
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Raphtory graph |
required |
Returns:
Type | Description |
---|---|
NodeStateUsize
|
Mapping of nodes to their component ids |
in_components(graph)
#
In components -- Finding the "in-component" of a node in a directed graph involves identifying all nodes that can be reached following only incoming edges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Raphtory graph |
required |
Returns:
Type | Description |
---|---|
NodeStateNodes
|
Mapping of nodes to the nodes in their 'in-component' |
in_component(node)
#
In component -- Finding the "in-component" of a node in a directed graph involves identifying all nodes that can be reached following only incoming edges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node
|
Node
|
The node whose in-component we wish to calculate |
required |
Returns:
Type | Description |
---|---|
NodeStateUsize
|
Mapping of nodes in the in-component to the distance from the starting node. |
out_components(graph)
#
Out components -- Finding the "out-component" of a node in a directed graph involves identifying all nodes that can be reached following only outgoing edges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Raphtory graph |
required |
Returns:
Type | Description |
---|---|
NodeStateNodes
|
Mapping of nodes to the nodes within their 'out-component' |
out_component(node)
#
Out component -- Finding the "out-component" of a node in a directed graph involves identifying all nodes that can be reached following only outgoing edges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node
|
Node
|
The node whose out-component we wish to calculate |
required |
Returns:
Type | Description |
---|---|
NodeStateUsize
|
A NodeState mapping the nodes in the out-component to their distance from the starting node. |
fast_rp(graph, embedding_dim, normalization_strength, iter_weights, seed=None, threads=None)
#
Computes embedding vectors for each vertex of an undirected/bidirectional graph according to the Fast RP algorithm. Original Paper: https://doi.org/10.48550/arXiv.1908.11512 Arguments: graph (GraphView): The graph view on which embeddings are generated. embedding_dim (int): The size (dimension) of the generated embeddings. normalization_strength (float): The extent to which high-degree vertices should be discounted (range: 1-0) iter_weights (list[float]): The scalar weights to apply to the results of each iteration seed (int, optional): The seed for initialisation of random vectors threads (int, optional): The number of threads to be used for parallel execution.
Returns:
Type | Description |
---|---|
NodeStateListF64
|
Mapping from nodes to embedding vectors. |
global_temporal_three_node_motif(graph, delta, threads=None)
#
Computes the number of three edge, up-to-three node delta-temporal motifs in the graph, using the algorithm of Paranjape et al, Motifs in Temporal Networks (2017). We point the reader to this reference for more information on the algorithm and background, but provide a short summary below.
Motifs included:
Stars
There are three classes (in the order they are outputted) of star motif on three nodes based on the switching behaviour of the edges between the two leaf nodes.
- PRE: Stars of the form i<->j, i<->j, i<->k (ie two interactions with leaf j followed by one with leaf k)
- MID: Stars of the form i<->j, i<->k, i<->j (ie switching interactions from leaf j to leaf k, back to j again)
- POST: Stars of the form i<->j, i<->k, i<->k (ie one interaction with leaf j followed by two with leaf k)
Within each of these classes is 8 motifs depending on the direction of the first to the last edge -- incoming "I" or outgoing "O". These are enumerated in the order III, IIO, IOI, IOO, OII, OIO, OOI, OOO (like binary with "I"-0 and "O"-1).
Two node motifs:
Also included are two node motifs, of which there are 8 when counted from the perspective of each node. These are characterised by the direction of each edge, enumerated in the above order. Note that for the global graph counts, each motif is counted in both directions (a single III motif for one node is an OOO motif for the other node).
Triangles:
There are 8 triangle motifs:
- i --> j, k --> j, i --> k
- i --> j, k --> j, k --> i
- i --> j, j --> k, i --> k
- i --> j, j --> k, k --> i
- i --> j, k --> i, j --> k
- i --> j, k --> i, k --> j
- i --> j, i --> k, j --> k
- i --> j, i --> k, k --> j
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A directed raphtory graph |
required |
delta
|
int
|
Maximum time difference between the first and last edge of the motif. NB if time for edges was given as a UNIX epoch, this should be given in seconds, otherwise milliseconds should be used (if edge times were given as string) |
required |
threads
|
int
|
The number of threads to use when running the algorithm. |
None
|
Returns:
Type | Description |
---|---|
list[int]
|
A 40 dimensional array with the counts of each motif, given in the same order as described above. Note that the two-node motif counts are symmetrical so it may be more useful just to consider the first four elements. |
Notes
This is achieved by calling the local motif counting algorithm, summing the resulting arrays and dealing with overcounted motifs: the triangles (by dividing each motif count by three) and two-node motifs (dividing by two).
global_temporal_three_node_motif_multi(graph, deltas, threads=None)
#
Computes the global counts of three-edge up-to-three node temporal motifs for a range of timescales. See global_temporal_three_node_motif
for an interpretation of each row returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A directed raphtory graph |
required |
deltas
|
list[int]
|
A list of delta values to use. |
required |
threads
|
int
|
The number of threads to use. |
None
|
Returns:
Type | Description |
---|---|
list[list[int]]
|
A list of 40d arrays, each array is the motif count for a particular value of delta, returned in the order that the deltas were given as input. |
local_temporal_three_node_motifs(graph, delta, threads=None)
#
Computes the number of each type of motif that each node participates in. See global_temporal_three_node_motifs for a summary of the motifs involved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A directed raphtory graph |
required |
delta
|
int
|
Maximum time difference between the first and last edge of the motif. NB if time for edges was given as a UNIX epoch, this should be given in seconds, otherwise milliseconds should be used (if edge times were given as string) |
required |
Returns:
Type | Description |
---|---|
NodeStateMotifs
|
A mapping from nodes to lists of motif counts (40 counts in the same order as the global motif counts) with the number of each motif that node participates in. |
Notes
For this local count, a node is counted as participating in a motif in the following way. For star motifs, only the centre node counts
the motif. For two node motifs, both constituent nodes count the motif. For triangles, all three constituent nodes count the motif.
hits(graph, iter_count=20, threads=None)
#
HITS (Hubs and Authority) Algorithm:
AuthScore of a node (A) = Sum of HubScore of all nodes pointing at node (A) from previous iteration / Sum of HubScore of all nodes in the current iteration
HubScore of a node (A) = Sum of AuthScore of all nodes pointing away from node (A) from previous iteration / Sum of AuthScore of all nodes in the current iteration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
Graph to run the algorithm on |
required |
iter_count
|
int
|
How many iterations to run the algorithm. Defaults to 20. |
20
|
threads
|
int
|
Number of threads to use |
None
|
Returns:
Type | Description |
---|---|
NodeStateHits
|
A mapping from nodes their hub and authority scores |
balance(graph, name='weight', direction='both')
#
Sums the weights of edges in the graph based on the specified direction.
This function computes the sum of edge weights based on the direction provided, and can be executed in parallel using a given number of threads.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
The graph view on which the operation is to be performed. |
required |
name
|
str
|
The name of the edge property used as the weight. Defaults to "weight". |
'weight'
|
direction
|
Direction
|
Specifies the direction of the edges to be considered for summation. Defaults to "both". * "out": Only consider outgoing edges. * "in": Only consider incoming edges. * "both": Consider both outgoing and incoming edges. This is the default. |
'both'
|
Returns:
Type | Description |
---|---|
NodeStateF64
|
Mapping of nodes to the computed sum of their associated edge weights. |
label_propagation(graph, seed=None)
#
Computes components using a label propagation algorithm
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A reference to the graph |
required |
seed
|
bytes
|
Array of 32 bytes of u8 which is set as the rng seed |
None
|
Returns:
Type | Description |
---|---|
list[set[Node]]
|
A list of sets each containing nodes that have been grouped |
k_core(graph, k, iter_count, threads=None)
#
Determines which nodes are in the k-core for a given value of k
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
A reference to the graph |
required |
k
|
int
|
Value of k such that the returned nodes have degree > k (recursively) |
required |
iter_count
|
int
|
The number of iterations to run |
required |
threads
|
int
|
number of threads to run on |
None
|
Returns:
Type | Description |
---|---|
list[Node]
|
A list of nodes in the k core |
temporal_SEIR(graph, seeds, infection_prob, initial_infection, recovery_rate=None, incubation_rate=None, rng_seed=None)
#
Simulate an SEIR dynamic on the network
The algorithm uses the event-based sampling strategy from https://doi.org/10.1371/journal.pone.0246961
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
the graph view |
required |
seeds
|
int | float | list[NodeInput]
|
the seeding strategy to use for the initial infection (if |
required |
infection_prob
|
float
|
the probability for a contact between infected and susceptible nodes to lead to a transmission |
required |
initial_infection
|
int | str | datetime
|
the time of the initial infection |
required |
recovery_rate
|
float | None
|
optional recovery rate (if None, simulates SEI dynamic where nodes never recover) the actual recovery time is sampled from an exponential distribution with this rate |
None
|
incubation_rate
|
float | None
|
optional incubation rate (if None, simulates SI or SIR dynamics where infected nodes are infectious at the next time step) the actual incubation time is sampled from an exponential distribution with this rate |
None
|
rng_seed
|
int | None
|
optional seed for the random number generator |
None
|
Returns:
Type | Description |
---|---|
NodeStateSEIR
|
Mapping from nodes to |
louvain(graph, resolution=1.0, weight_prop=None, tol=None)
#
Louvain algorithm for community detection
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
the graph view |
required |
resolution
|
float
|
the resolution parameter for modularity. Defaults to 1.0. |
1.0
|
weight_prop
|
str | None
|
the edge property to use for weights (has to be float) |
None
|
tol
|
None | float
|
the floating point tolerance for deciding if improvements are significant (default: 1e-8) |
None
|
Returns:
Type | Description |
---|---|
NodeStateUsize
|
Mapping of nodes to their community assignment |
fruchterman_reingold(graph, iterations=100, scale=1.0, node_start_size=1.0, cooloff_factor=0.95, dt=0.1)
#
Fruchterman Reingold layout algorithm
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
the graph view |
required |
iterations
|
int | None
|
the number of iterations to run. Defaults to 100. |
100
|
scale
|
float | None
|
the scale to apply. Defaults to 1.0. |
1.0
|
node_start_size
|
float | None
|
the start node size to assign random positions. Defaults to 1.0. |
1.0
|
cooloff_factor
|
float | None
|
the cool off factor for the algorithm. Defaults to 0.95. |
0.95
|
dt
|
float | None
|
the time increment between iterations. Defaults to 0.1. |
0.1
|
Returns:
Type | Description |
---|---|
NodeLayout
|
A mapping from nodes to their [x, y] positions |
cohesive_fruchterman_reingold(graph, iter_count=100, scale=1.0, node_start_size=1.0, cooloff_factor=0.95, dt=0.1)
#
Cohesive version of fruchterman_reingold
that adds virtual edges between isolated nodes
Arguments:
graph (GraphView): A reference to the graph
iter_count (int): The number of iterations to run. Defaults to 100.
scale (float): Global scaling factor to control the overall spread of the graph. Defaults to 1.0.
node_start_size (float): Initial size or movement range for nodes. Defaults to 1.0.
cooloff_factor (float): Factor to reduce node movement in later iterations, helping stabilize the layout. Defaults to 0.95.
dt (float): Time step or movement factor in each iteration. Defaults to 0.1.
Returns:
Type | Description |
---|---|
NodeLayout
|
A mapping from nodes to their [x, y] positions |
max_weight_matching(graph, weight_prop=None, max_cardinality=True, verify_optimum_flag=False)
#
Compute a maximum-weighted matching in the general undirected weighted
graph given by "edges". If max_cardinality
is true, only
maximum-cardinality matchings are considered as solutions.
The algorithm is based on "Efficient Algorithms for Finding Maximum Matching in Graphs" by Zvi Galil, ACM Computing Surveys, 1986.
Based on networkx implementation https://github.com/networkx/networkx/blob/3351206a3ce5b3a39bb2fc451e93ef545b96c95b/networkx/algorithms/matching.py
With reference to the standalone protoype implementation from: http://jorisvr.nl/article/maximum-matching
http://jorisvr.nl/files/graphmatching/20130407/mwmatching.py
The function takes time O(n**3)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
GraphView
|
The graph to compute the maximum weight matching for |
required |
weight_prop
|
str
|
The property on the edge to use for the weight. If not provided, |
None
|
max_cardinality
|
bool
|
If set to True, consider only maximum-cardinality matchings. Defaults to True. If True, finds the maximum-cardinality matching with maximum weight among all maximum-cardinality matchings, otherwise, finds the maximum weight matching irrespective of cardinality. |
True
|
verify_optimum_flag
|
bool
|
Whether the optimum should be verified. Defaults to False. If true prior to returning, an additional routine to verify the optimal solution was found will be run after computing the maximum weight matching. If it's true and the found matching is not an optimal solution this function will panic. This option should normally be only set true during testing. |
False
|
Returns:
Type | Description |
---|---|
Matching
|
The matching |