Under the hood any data that you can represent in a graph, you can also represent as a matrix of values. Therefore,
networkx has a ton of great tools for translating between graph thinking and your typical Data Science
To demonstrate this, we’ll load the canonical dataset representing a group of kids in a karate cohort and measuring if they interacted outside of class at all.
import networkx as nx G = nx.karate_club_graph() # ensure the same position layout = nx.spring_layout(G) nx.draw(G, pos=layout)
As you can see, there are 34 students, and the 78 edges between them represent relationships that emerged between each pair, outside the context of the course.
As I said above, all graph data can be boiled down to tabular data. In this case, we can represent this network as an adjacency matrix, where we have
n rows and
n columns (where
n is the number of nodes in our network). And is read by looking at the intersection of row
m and column
n– if the value is
1, there’s an edge in the network, otherwise it’s zero.
Getting to this point is a one-liner in
mat = nx.to_numpy_matrix(G) print(mat)
[[0. 1. 1. ... 1. 0. 0.] [1. 0. 1. ... 0. 0. 0.] [1. 1. 0. ... 0. 1. 0.] ... [1. 0. 0. ... 0. 1. 1.] [0. 0. 1. ... 1. 0. 1.] [0. 0. 0. ... 1. 1. 0.]]
As promised, the shape of the matrix is
n x n
And if you inspect the number of
1s in the matrix, you might be surprised to see that it’s double what you had expected.
But if you actually plot out the matrix, it should be clear that the data is symmetric down the middle.
import numpy as np import seaborn as sns ax = sns.heatmap(mat) ax.plot(np.linspace(32, 0), np.linspace(32, 0), 'r');
This is because
n=m has a value of
1 at the point
(n, m) AS WELL AS
(m, n). Therefore, we get the value we might have expected to see by dividing by 2.
mat.sum() / 2
Similarly, we can work backwards from an adjacency matrix to a graph with another one-liner.
pandas does a lot of the same work that
numpy does, but with greater context, and less emphasis on raw, numeric compute.
Adjacency matrix and
networkx is no different. Here, we get the same underlying values that we did when we piped our data into a
numpy format, but with the added context of our node labels as row and column indicies.
df = nx.to_pandas_adjacency(G) df.head()
5 rows × 34 columns
Same as before, we can construct new graph objects from an adjacency DataFrame, no problem.
One interesting wrinkle to this, however, comes when we decide to transform our network data into a tall, sparse DataFrame representation.
Here, we can build a DataFrame that represents all
(from, to) edge pairs in our data, and omits the rest.
78 rows × 2 columns
And in reverse, if we have a DataFrame organized in a similar fashion
import pandas as pd connected_pairs =  for col in df.columns: for row, val in enumerate(df[col]): if val == 1: connected_pairs.append((row, col)) pair_df = pd.DataFrame(connected_pairs, columns=['from', 'to']) pair_df.head()
We can specify which columns represent the
target node labels, with which to draw the new edges.
F = nx.from_pandas_edgelist(pair_df, source='from', target='to') nx.draw(F, pos=layout)
Giving us the same Graph representation that we started with.