Saving and loading graphs#
The fastest way to ingest a graph is to load one from Raphtory's on-disk format using the load_from_file()
function on the graph.
Once a graph has been created by direct updates or by ingesting a dataframe you can save it via save_to_file()
or save_to_zip()
functions. This means you do not need to parse the data every time you run a Raphtory script which is useful for large datasets.
!!!
You can also [pickle](https://docs.python.org/3/library/pickle.html) Raphtory graphs, which uses these functions under the hood.
In the example below we ingest the edge dataframe from the last section, save this graph and reload it into a second graph. These are both printed to show they contain the same data.
from raphtory import Graph
from pathlib import Path
import pandas as pd
from tempfile import TemporaryDirectory
edges_df = pd.read_csv("../data/network_traffic_edges.csv")
edges_df["timestamp"] = pd.to_datetime(edges_df["timestamp"])
g = Graph()
g.load_edges_from_pandas(
df=edges_df,
time="timestamp",
src="source",
dst="destination",
properties=["data_size_MB"],
layer_col="transaction_type",
)
save_loc = TemporaryDirectory(dir="..")
graph_path = Path(save_loc.name) / "saved_graph"
g.save_to_file(graph_path)
loaded_graph = Graph.load_from_file(graph_path)
print(g)
print(loaded_graph)
Output
Graph(number_of_nodes=5, number_of_edges=7, number_of_temporal_edges=7, earliest_time=1693555200000, latest_time=1693557000000)
Graph(number_of_nodes=5, number_of_edges=7, number_of_temporal_edges=7, earliest_time=1693555200000, latest_time=1693557000000)