Skip to content

Exporting to Pandas dataframes#

You can ingest from a set of dataframes, work on them in Raphtory formats then convert back into dataframes. Raphtory provides the to_df() function on both the Nodes and Edges for this purpose.

Node Dataframe#

To explore the use of to_df() on the nodes we can first we call the function with default parameters. This exports only the latest property updates and utilises epoch timestamps - the output from this can be seen below.

To demonstrate flags, we call to_df() again, this time enabling the property history and utilising datetime timestamps. The output for this can also be seen below.

from raphtory import Graph
import pandas as pd

server_edges_df = pd.read_csv("../data/network_traffic_edges.csv")
server_edges_df["timestamp"] = pd.to_datetime(server_edges_df["timestamp"])

server_nodes_df = pd.read_csv("../data/network_traffic_nodes.csv")
server_nodes_df["timestamp"] = pd.to_datetime(server_nodes_df["timestamp"])

traffic_graph = Graph()
traffic_graph.load_edges_from_pandas(
    df=server_edges_df,
    src="source",
    dst="destination",
    time="timestamp",
    properties=["data_size_MB"],
    layer_col="transaction_type",
    metadata=["is_encrypted"],
    shared_metadata={"datasource": "docs/data/network_traffic_edges.csv"},
)
traffic_graph.load_nodes_from_pandas(
    df=server_nodes_df,
    id="server_id",
    time="timestamp",
    properties=["OS_version", "primary_function", "uptime_days"],
    metadata=["server_name", "hardware_type"],
    shared_metadata={"datasource": "docs/data/network_traffic_edges.csv"},
)

df = traffic_graph.nodes.to_df()
print("--- to_df with default parameters --- ")
print(f"{df}\n")
print()
df = traffic_graph.nodes.to_df(include_property_history=True, convert_datetime=True)
print("--- to_df with property history and datetime conversion ---")
print(f"{df}\n")

Output

--- to_df with default parameters --- 
    name type                           datasource hardware_type  \
0  ServerA       docs/data/network_traffic_edges.csv  Blade Server   
1  ServerE       docs/data/network_traffic_edges.csv   Rack Server   
2  ServerB       docs/data/network_traffic_edges.csv   Rack Server   
3  ServerD       docs/data/network_traffic_edges.csv  Tower Server   
4  ServerC       docs/data/network_traffic_edges.csv  Blade Server   

server_name    primary_function  uptime_days           OS_version  \
0       Alpha            Database          120         Ubuntu 20.04   
1        Echo              Backup           30          Red Hat 8.1   
2        Beta          Web Server           45          Red Hat 8.1   
3       Delta  Application Server           60         Ubuntu 20.04   
4     Charlie        File Storage           90  Windows Server 2022   

                                    update_history  
0      [1693555200000, 1693555500000, 1693556400000]  
1      [1693556100000, 1693556400000, 1693556700000]  
2  [1693555200000, 1693555500000, 1693555800000, ...  
3      [1693555800000, 1693556100000, 1693557000000]  
4  [1693555500000, 1693555800000, 1693556400000, ...  


--- to_df with property history and datetime conversion ---
    name type hardware_type                           datasource  \
0  ServerA       Blade Server  docs/data/network_traffic_edges.csv   
1  ServerE        Rack Server  docs/data/network_traffic_edges.csv   
2  ServerB        Rack Server  docs/data/network_traffic_edges.csv   
3  ServerD       Tower Server  docs/data/network_traffic_edges.csv   
4  ServerC       Blade Server  docs/data/network_traffic_edges.csv   

server_name                                   primary_function  \
0       Alpha            [[2023-09-01 08:00:00+00:00, Database]]   
1        Echo              [[2023-09-01 08:20:00+00:00, Backup]]   
2        Beta          [[2023-09-01 08:05:00+00:00, Web Server]]   
3       Delta  [[2023-09-01 08:15:00+00:00, Application Server]]   
4     Charlie        [[2023-09-01 08:10:00+00:00, File Storage]]   

                                        OS_version  \
0        [[2023-09-01 08:00:00+00:00, Ubuntu 20.04]]   
1         [[2023-09-01 08:20:00+00:00, Red Hat 8.1]]   
2         [[2023-09-01 08:05:00+00:00, Red Hat 8.1]]   
3        [[2023-09-01 08:15:00+00:00, Ubuntu 20.04]]   
4  [[2023-09-01 08:10:00+00:00, Windows Server 20...   

                        uptime_days  \
0  [[2023-09-01 08:00:00+00:00, 120]]   
1   [[2023-09-01 08:20:00+00:00, 30]]   
2   [[2023-09-01 08:05:00+00:00, 45]]   
3   [[2023-09-01 08:15:00+00:00, 60]]   
4   [[2023-09-01 08:10:00+00:00, 90]]   

                                    update_history  
0  [2023-09-01 08:00:00+00:00, 2023-09-01 08:05:0...  
1  [2023-09-01 08:15:00+00:00, 2023-09-01 08:20:0...  
2  [2023-09-01 08:00:00+00:00, 2023-09-01 08:05:0...  
3  [2023-09-01 08:10:00+00:00, 2023-09-01 08:15:0...  
4  [2023-09-01 08:05:00+00:00, 2023-09-01 08:10:0...

Edge Dataframe#

Exporting to an edge dataframe via to_df() generally works the same as for the nodes. However, by default this will export the property history for each edge, split by edge layer. This is because to_df() has an alternative flag to explode the edges and view each update individually (which will then ignore the include_property_history flag).

In the below example we first create a subgraph of the monkey interactions, selecting ANGELE and FELIPE as the monkeys we are interested in. This isn't a required step, but helps to demonstrate the export of GraphViews.

Then we call to_df() on the subgraph edges, setting no flags. In the output you can see the property history for each interaction type (layer) between ANGELE and FELIPE.

Finally, we call to_df() again, turning off the property history and exploding the edges. In the output you can see each interaction that occurred between ANGELE and FELIPE.

Info

We have further reduced the graph to only one layer (Grunting-Lipsmacking) to reduce the output size.

from raphtory import Graph
import pandas as pd

monkey_edges_df = pd.read_csv(
    "../data/OBS_data.txt", sep="\t", header=0, usecols=[0, 1, 2, 3, 4], parse_dates=[0]
)
monkey_edges_df["DateTime"] = pd.to_datetime(monkey_edges_df["DateTime"])
monkey_edges_df.dropna(axis=0, inplace=True)
monkey_edges_df["Weight"] = monkey_edges_df["Category"].apply(
    lambda c: 1 if (c == "Affiliative") else (-1 if (c == "Agonistic") else 0)
)

monkey_graph = Graph()
monkey_graph.load_edges_from_pandas(
    df=monkey_edges_df,
    src="Actor",
    dst="Recipient",
    time="DateTime",
    layer_col="Behavior",
    properties=["Weight"],
)

subgraph = monkey_graph.subgraph(["ANGELE", "FELIPE"])
df = subgraph.edges.to_df()
print("Interactions between Angele and Felipe:")
print(f"{df}\n")

grunting_graph = subgraph.layer("Grunting-Lipsmacking")
print(grunting_graph)
print(grunting_graph.edges)
df = grunting_graph.edges.to_df()
print("Exploding the grunting-Lipsmacking layer")
print(df)

Output

Interactions between Angele and Felipe:
    src     dst                 layer  \
0   ANGELE  FELIPE               Resting   
1   ANGELE  FELIPE            Presenting   
2   ANGELE  FELIPE  Grunting-Lipsmacking   
3   ANGELE  FELIPE              Grooming   
4   ANGELE  FELIPE            Copulating   
5   ANGELE  FELIPE            Submission   
6   FELIPE  ANGELE               Resting   
7   FELIPE  ANGELE            Presenting   
8   FELIPE  ANGELE              Touching   
9   FELIPE  ANGELE  Grunting-Lipsmacking   
10  FELIPE  ANGELE               Chasing   
11  FELIPE  ANGELE              Mounting   
12  FELIPE  ANGELE            Submission   
13  FELIPE  ANGELE             Embracing   
14  FELIPE  ANGELE           Supplanting   

                                            Weight  \
0   [[1560422580000, 1], [1560441780000, 1], [1560...   
1                                [[1560855660000, 1]]   
2   [[1560526320000, 1], [1560855660000, 1], [1561...   
3   [[1560419400000, 1], [1560419400000, 1], [1560...   
4                                [[1561720320000, 0]]   
5                               [[1562253540000, -1]]   
6   [[1560419460000, 1], [1560419520000, 1], [1560...   
7                                [[1562321580000, 1]]   
8   [[1560526260000, 1], [1562253540000, 1], [1562...   
9   [[1560526320000, 1], [1561972860000, 1], [1562...   
10         [[1562057520000, -1], [1562671200000, -1]]   
11                               [[1562253540000, 1]]   
12                              [[1562057520000, -1]]   
13                               [[1560526320000, 1]]   
14                              [[1561110180000, -1]]   

                                    update_history  
0   [1560422580000, 1560441780000, 1560441780000, ...  
1                                     [1560855660000]  
2       [1560526320000, 1560855660000, 1561042620000]  
3   [1560419400000, 1560419400000, 1560419460000, ...  
4                                     [1561720320000]  
5                                     [1562253540000]  
6   [1560419460000, 1560419520000, 1560419580000, ...  
7                                     [1562321580000]  
8       [1560526260000, 1562253540000, 1562321580000]  
9       [1560526320000, 1561972860000, 1562253540000]  
10                     [1562057520000, 1562671200000]  
11                                    [1562253540000]  
12                                    [1562057520000]  
13                                    [1560526320000]  
14                                    [1561110180000]  

Graph(number_of_nodes=2, number_of_edges=2, number_of_temporal_edges=6, earliest_time=1560526320000, latest_time=1562253540000)
Edges(Edge(source=ANGELE, target=FELIPE, earliest_time=1560526320000, latest_time=1561042620000, properties={Weight: 1}, layer(s)=[Grunting-Lipsmacking]), Edge(source=FELIPE, target=ANGELE, earliest_time=1560526320000, latest_time=1562253540000, properties={Weight: 1}, layer(s)=[Grunting-Lipsmacking]))
Exploding the grunting-Lipsmacking layer
    src     dst                 layer  \
0  ANGELE  FELIPE  Grunting-Lipsmacking   
1  FELIPE  ANGELE  Grunting-Lipsmacking   

                                            Weight  \
0  [[1560526320000, 1], [1560855660000, 1], [1561...   
1  [[1560526320000, 1], [1561972860000, 1], [1562...   

                                update_history  
0  [1560526320000, 1560855660000, 1561042620000]  
1  [1560526320000, 1561972860000, 1562253540000]