graph_loader
#
Load and save Raphtory graphs from/to file(s)
Functions#
lotr_graph()
#
Load the Lord of the Rings dataset into a graph. The dataset is available at https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv and is a list of interactions between characters in the Lord of the Rings books and movies. The dataset is a CSV file with the following columns:
- src_id: The ID of the source character
- dst_id: The ID of the destination character
- time: The time of the interaction (in page)
Dataset statistics
- Number of nodes (subreddits) 139
- Number of edges (hyperlink between subreddits) 701
Returns:
Type | Description |
---|---|
Graph
|
A Graph containing the LOTR dataset |
lotr_graph_with_props()
#
Same as lotr_graph()
but with additional properties race and gender for some of the nodes
Returns:
Type | Description |
---|---|
Graph
|
|
neo4j_movie_graph(uri, username, password, database=...)
#
stable_coin_graph(path=None, subset=None)
#
reddit_hyperlink_graph(timeout_seconds=600)
#
Load (a subset of) Reddit hyperlinks dataset into a graph. The dataset is available at http://snap.stanford.edu/data/soc-redditHyperlinks-title.tsv The hyperlink network represents the directed connections between two subreddits (a subreddit is a community_detection on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017. NOTE: It may take a while to download the dataset
Dataset statistics
- Number of nodes (subreddits) 35,776
- Number of edges (hyperlink between subreddits) 137,821
- Timespan Jan 2014 - April 2017
Source
- S. Kumar, W.L. Hamilton, J. Leskovec, D. Jurafsky. Community Interaction and Conflict on the Web. World Wide Web Conference, 2018.
Properties:
- SOURCE_SUBREDDIT: the subreddit where the link originates
- TARGET_SUBREDDIT: the subreddit where the link ends
- POST_ID: the post in the source subreddit that starts the link
- TIMESTAMP: time of the post
- POST_LABEL: label indicating if the source post is explicitly negative towards the target post. The value is -1 if the source is negative towards the target, and 1 if it is neutral or positive. The label is created using crowd-sourcing and training a text based classifier, and is better than simple sentiment analysis of the posts. Please see the reference paper for details.
- POST_PROPERTIES: a vector representing the text properties of the source post, listed as a list of comma separated numbers. This can be found on the source website
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout_seconds
|
int
|
The number of seconds to wait for the dataset to download. Defaults to 600. |
600
|
Returns:
Type | Description |
---|---|
Graph
|
A Graph containing the Reddit hyperlinks dataset |
reddit_hyperlink_graph_local(file_path)
#
karate_club_graph()
#
karate_club_graph
constructs a karate club graph.
This function uses the Zachary's karate club dataset to create a graph object. Nodes represent members of the club, and edges represent relationships between them. Node properties indicate the club to which each member belongs.
Background
These are data collected from the members of a university karate club by Wayne Zachary. The ZACHE matrix represents the presence or absence of ties among the members of the club; the ZACHC matrix indicates the relative strength of the associations (number of situations in and outside the club in which interactions occurred). Zachary (1977) used these data and an information flow model of network conflict resolution to explain the split-up of this group following disputes among the members.
Reference
Zachary W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452-473.
Returns:
Type | Description |
---|---|
Graph
|
|