Networkx vs Numpy/Pandas

Under the hood any data that you can represent in a graph, you can also represent as a matrix of values. Therefore, networkx has a ton of great tools for translating between graph thinking and your typical Data Science numpy/pandas fare.

Simple Data

To demonstrate this, we’ll load the canonical dataset representing a group of kids in a karate cohort and measuring if they interacted outside of class at all.

import networkx as nx

G = nx.karate_club_graph()

# ensure the same position
layout = nx.spring_layout(G)
nx.draw(G, pos=layout)

png

As you can see, there are 34 students, and the 78 edges between them represent relationships that emerged between each pair, outside the context of the course.

display(G.number_of_edges())
display(G.number_of_nodes())
78



34

Numpy

As I said above, all graph data can be boiled down to tabular data. In this case, we can represent this network as an adjacency matrix, where we have n rows and n columns (where n is the number of nodes in our network). And is read by looking at the intersection of row m and column n– if the value is 1, there’s an edge in the network, otherwise it’s zero.

To Matrix

Getting to this point is a one-liner in networkx

mat = nx.to_numpy_matrix(G)
print(mat)
[[0. 1. 1. ... 1. 0. 0.]
 [1. 0. 1. ... 0. 0. 0.]
 [1. 1. 0. ... 0. 1. 0.]
 ...
 [1. 0. 0. ... 0. 1. 1.]
 [0. 0. 1. ... 1. 0. 1.]
 [0. 0. 0. ... 1. 1. 0.]]

As promised, the shape of the matrix is n x n

mat.shape
(34, 34)

And if you inspect the number of 1s in the matrix, you might be surprised to see that it’s double what you had expected.

mat.sum()
156.0

But if you actually plot out the matrix, it should be clear that the data is symmetric down the middle.

import numpy as np
import seaborn as sns

ax = sns.heatmap(mat)
ax.plot(np.linspace(32, 0), np.linspace(32, 0), 'r');

png

This is because n=m has a value of 1 at the point (n, m) AS WELL AS (m, n). Therefore, we get the value we might have expected to see by dividing by 2.

mat.sum() / 2
78.0

From Matrix

Similarly, we can work backwards from an adjacency matrix to a graph with another one-liner.

nx.draw(nx.from_numpy_matrix(mat), pos=layout)

png

Pandas

In general, pandas does a lot of the same work that numpy does, but with greater context, and less emphasis on raw, numeric compute.

To Adjacency

Adjacency matrix and networkx is no different. Here, we get the same underlying values that we did when we piped our data into a numpy format, but with the added context of our node labels as row and column indicies.

df = nx.to_pandas_adjacency(G)
df.head()
0 1 2 3 4 5 6 7 8 9 ... 24 25 26 27 28 29 30 31 32 33
0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
1 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
2 1.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 ... 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0
3 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 34 columns

From Adjacency

Same as before, we can construct new graph objects from an adjacency DataFrame, no problem.

nx.draw(nx.from_pandas_adjacency(df), pos=layout)

png

To Edgelist

One interesting wrinkle to this, however, comes when we decide to transform our network data into a tall, sparse DataFrame representation.

Here, we can build a DataFrame that represents all (from, to) edge pairs in our data, and omits the rest.

nx.to_pandas_edgelist(G)
source target
0 0 1
1 0 2
2 0 3
3 0 4
4 0 5
... ... ...
73 30 32
74 30 33
75 31 32
76 31 33
77 32 33

78 rows × 2 columns

From Edgelist

And in reverse, if we have a DataFrame organized in a similar fashion

import pandas as pd

connected_pairs = []

for col in df.columns:
    for row, val in enumerate(df[col]):
        if val == 1:
            connected_pairs.append((row, col))

pair_df = pd.DataFrame(connected_pairs, columns=['from', 'to'])
pair_df.head()
from to
0 1 0
1 2 0
2 3 0
3 4 0
4 5 0

We can specify which columns represent the source and target node labels, with which to draw the new edges.

F = nx.from_pandas_edgelist(pair_df, source='from', target='to')
nx.draw(F, pos=layout)

png

Giving us the same Graph representation that we started with.