Skip to content

Saving and loading graphs#

The fastest way to ingest a graph is to load one from Raphtory's on-disk format using the load_from_file() function on the graph.

Once a graph has been created by direct updates or by ingesting a dataframe you can save it via save_to_file() or save_to_zip() functions. This means you do not need to parse the data every time you run a Raphtory script which is useful for large datasets.

!!!

You can also [pickle](https://docs.python.org/3/library/pickle.html) Raphtory graphs, which uses these functions under the hood.

In the example below we ingest the edge dataframe from the last section, save this graph and reload it into a second graph. These are both printed to show they contain the same data.

from raphtory import Graph
from pathlib import Path
import pandas as pd
from tempfile import TemporaryDirectory

edges_df = pd.read_csv("../data/network_traffic_edges.csv")
edges_df["timestamp"] = pd.to_datetime(edges_df["timestamp"])

g = Graph()
g.load_edges_from_pandas(
    df=edges_df,
    time="timestamp",
    src="source",
    dst="destination",
    properties=["data_size_MB"],
    layer_col="transaction_type",
)

save_loc = TemporaryDirectory(dir="..")
graph_path = Path(save_loc.name) / "saved_graph"
g.save_to_file(graph_path)
loaded_graph = Graph.load_from_file(graph_path)
print(g)
print(loaded_graph)

Output

Graph(number_of_nodes=5, number_of_edges=7, number_of_temporal_edges=7, earliest_time=1693555200000, latest_time=1693557000000)
Graph(number_of_nodes=5, number_of_edges=7, number_of_temporal_edges=7, earliest_time=1693555200000, latest_time=1693557000000)