# Networkx vs Numpy/Pandas

Under the hood any data that you can represent in a graph, you can also represent as a matrix of values. Therefore, `networkx`

has a ton of great tools for translating between graph thinking and your typical Data Science `numpy/pandas`

fare.

## Simple Data

To demonstrate this, we’ll load the canonical dataset representing a group of kids in a karate cohort and measuring if they interacted outside of class at all.

```
import networkx as nx
G = nx.karate_club_graph()
# ensure the same position
layout = nx.spring_layout(G)
nx.draw(G, pos=layout)
```

As you can see, there are 34 students, and the 78 edges between them represent relationships that emerged between each pair, outside the context of the course.

```
display(G.number_of_edges())
display(G.number_of_nodes())
```

```
78
34
```

## Numpy

As I said above, all graph data can be boiled down to tabular data. In this case, we can represent this network as an *adjacency matrix*, where we have `n`

rows and `n`

columns (where `n`

is the number of nodes in our network). And is read by looking at the intersection of row `m`

and column `n`

– if the value is `1`

, there’s an edge in the network, otherwise it’s zero.

### To Matrix

Getting to this point is a one-liner in `networkx`

```
mat = nx.to_numpy_matrix(G)
print(mat)
```

```
[[0. 1. 1. ... 1. 0. 0.]
[1. 0. 1. ... 0. 0. 0.]
[1. 1. 0. ... 0. 1. 0.]
...
[1. 0. 0. ... 0. 1. 1.]
[0. 0. 1. ... 1. 0. 1.]
[0. 0. 0. ... 1. 1. 0.]]
```

As promised, the shape of the matrix is `n x n`

`mat.shape`

```
(34, 34)
```

And if you inspect the number of `1`

s in the matrix, you might be surprised to see that it’s double what you had expected.

`mat.sum()`

```
156.0
```

But if you actually plot out the matrix, it should be clear that the data is symmetric down the middle.

```
import numpy as np
import seaborn as sns
ax = sns.heatmap(mat)
ax.plot(np.linspace(32, 0), np.linspace(32, 0), 'r');
```

This is because `n=m`

has a value of `1`

at the point `(n, m)`

AS WELL AS `(m, n)`

. Therefore, we get the value we might have expected to see by dividing by 2.

`mat.sum() / 2`

```
78.0
```

### From Matrix

Similarly, we can work backwards from an adjacency matrix to a graph with another one-liner.

`nx.draw(nx.from_numpy_matrix(mat), pos=layout)`

## Pandas

In general, `pandas`

does a lot of the same work that `numpy`

does, but with greater context, and less emphasis on raw, numeric compute.

### To Adjacency

Adjacency matrix and `networkx`

is no different. Here, we get the same underlying values that we did when we piped our data into a `numpy`

format, but with the added context of our node labels as row and column indicies.

```
df = nx.to_pandas_adjacency(G)
df.head()
```

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |

1 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |

2 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |

3 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

4 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

5 rows × 34 columns

### From Adjacency

Same as before, we can construct new graph objects from an adjacency DataFrame, no problem.

`nx.draw(nx.from_pandas_adjacency(df), pos=layout)`

### To Edgelist

One interesting wrinkle to this, however, comes when we decide to transform our network data into a tall, *sparse* DataFrame representation.

Here, we can build a DataFrame that represents all `(from, to)`

edge pairs in our data, and omits the rest.

`nx.to_pandas_edgelist(G)`

source | target | |
---|---|---|

0 | 0 | 1 |

1 | 0 | 2 |

2 | 0 | 3 |

3 | 0 | 4 |

4 | 0 | 5 |

... | ... | ... |

73 | 30 | 32 |

74 | 30 | 33 |

75 | 31 | 32 |

76 | 31 | 33 |

77 | 32 | 33 |

78 rows × 2 columns

### From Edgelist

And in reverse, if we have a DataFrame organized in a similar fashion

```
import pandas as pd
connected_pairs = []
for col in df.columns:
for row, val in enumerate(df[col]):
if val == 1:
connected_pairs.append((row, col))
pair_df = pd.DataFrame(connected_pairs, columns=['from', 'to'])
pair_df.head()
```

from | to | |
---|---|---|

0 | 1 | 0 |

1 | 2 | 0 |

2 | 3 | 0 |

3 | 4 | 0 |

4 | 5 | 0 |

We can specify which columns represent the `source`

and `target`

node labels, with which to draw the new edges.

```
F = nx.from_pandas_edgelist(pair_df, source='from', target='to')
nx.draw(F, pos=layout)
```

Giving us the same Graph representation that we started with.