Basic Terminology and Inspection

Node Inspection

For this, we’ll throw together a bare-bones network representing a straight-line, directed path from 1 to 4, peppering in weights along the edges.

%pylab inline
import networkx as nx

F = nx.DiGraph()
F.add_weighted_edges_from([(1, 2, .3), (2, 3, .4), (3, 4, .5)])
Populating the interactive namespace from numpy and matplotlib

Simple stuff, yeah?

nx.draw_networkx(F)

png

Here, we built F with simple integers, but we could have just as easily used str or any other hashable type.

F.nodes
NodeView((1, 2, 3, 4))

Now, if we want to drill in to any particular node, we can simply subscript the Graph object by it’s node label

F[2]
AtlasView({3: {'weight': 0.4}})

As you can see, this only gives us a partial picture– namely the outbound edge(s) and the relevant edge attributes.

If instead, we wanted to see everything– incoming and outgoing– we’d use the nx.all_neighbors() method

list(nx.all_neighbors(F, 2))
[1, 3]

However you’ll notice that we lost the convenience of getting the edge attribute, automagically

A Less-Simple Dataset

In order to dive into more macro measures, we’ll import the canonical Graph Dataset that analyzes the social network of a karate class.

G = nx.karate_club_graph()
type(G)
networkx.classes.graph.Graph

Inspecting the type of G, we can see that it’s a Graph (as opposed to a directional graph) and therefore the edge that connects two nodes doesn’t have any inherent direction associated to it.

pos = nx.spring_layout(G)

nx.draw_networkx(G, pos=pos)

png

Clustering

Quoting the book, “Some social theories consider triads essential units of social network analysis.” So what does that mean?

Well, I actually found this to be a little counter-intuitive.

For starters, let’s look at a random node in our network, 15. It’s kinda hanging out on its own, only sharing a connection with two other nodes

pos = nx.circular_layout(G)

fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos, ax=ax, node_color='gray')
nx.draw_networkx_nodes(G, pos=pos, nodelist=[15], node_color='gold');
nx.draw_networkx_nodes(G, pos=pos, nodelist=G[15], node_color='lightblue');

png

32 and 33

G[15]
AtlasView({32: {}, 33: {}})

Critically, however, these two nodes are also connected to one another

33 in G[32]
True
32 in G[33]
True

And so in this un-directed graph, you could potentially cycle through the triangle 15-32-33-15... forever.

On the other hand, consider node 9. It also only connects to two nodes

G[9]
AtlasView({2: {}, 33: {}})
pos = nx.circular_layout(G)

fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos, ax=ax, node_color='gray')
nx.draw_networkx_nodes(G, pos=pos, nodelist=[9], node_color='gold');
nx.draw_networkx_nodes(G, pos=pos, nodelist=G[9], node_color='lightblue');

png

But as you might be able to see from the picture, its neighbors aren’t connected.

33 in G[2]
False
2 in G[33]
False

So what does this mean?

Well, in the context of (social) networks, a node’s clustering coefficient is a measure of “what percent of all possible triangles exist between a node and all possible pairs of adjacent nodes?”

That’s a mouthful. Let’s try and unpack a little. We’ll do this by looking more closely at a more middle-of-the-road node. Here, I’ve gotten each node’s clustering coefficient and I spy that node 10 will probably give us what we’re after.

print(nx.clustering(G))
{0: 0.15, 1: 0.3333333333333333, 2: 0.24444444444444444, 3: 0.6666666666666666, 4: 0.6666666666666666, 5: 0.5, 6: 0.5, 7: 1.0, 8: 0.5, 9: 0, 10: 0.6666666666666666, 11: 0, 12: 1.0, 13: 0.6, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 1.0, 21: 1.0, 22: 1.0, 23: 0.4, 24: 0.3333333333333333, 25: 0.3333333333333333, 26: 1.0, 27: 0.16666666666666666, 28: 0.3333333333333333, 29: 0.6666666666666666, 30: 0.5, 31: 0.2, 32: 0.19696969696969696, 33: 0.11029411764705882}

It’s got three neighbors

G[10]
AtlasView({0: {}, 4: {}, 5: {}})
pos = nx.circular_layout(G)

fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos, ax=ax, node_color='gray')
nx.draw_networkx_nodes(G, pos=pos, nodelist=[10], node_color='gold');
nx.draw_networkx_nodes(G, pos=pos, nodelist=G[10], node_color='lightblue');

png

Which makes for a possibility of 3-choose-2 possible triangle pairs:

0, 4
0, 5
4, 5

Examining each pair for connectivity, we get 23 of the neighboring connections, and therefore 23 of the possible triangles realized.

0 in G[4]
True
0 in G[5]
True
4 in G[5]
False

Therefore, node 10 has a connectivity coefficient of .6666

Transitivity

Taken as a whole, the transitivity of a network, gives you a one-shot look at “number of triangles / number of all possible triangles”

nx.transitivity(G)
0.2556818181818182

Density

Another, similar measure of “how interconnected is my network” is its density.

To work this out we’ll start with the fact that there are 34 nodes in our Graph

len(G.nodes)
34

and therefore, are 561 possible pairs of nodes (assuming (m, n) == (n, m) in an undirected graph).

34*33/2
561.0

Looking at our data, we can see there are 78 lines connecting pairs of data, edges

num_edges = len(nx.edges(G))
num_edges
78

And therefore 483 non_edges between all of the other pairs that don’t share an edge

num_non_edges = len(list(nx.non_edges(G)))
num_non_edges
483

Which is to say that all possible pairs in our graph are accounted for, between edges and non_edges

561 == (num_edges + num_non_edges)
True

Therefore, we can define the density of our Graph as “the ratio of connected pairs to all possible pairs in the network”.

The Karate Graph has a density of approximately .14, which we can calculate manually

78/(78+483)
0.13903743315508021

or use the function provided by networkx

nx.density(G)
0.13903743315508021

Note: When working with networks where the direction does matter, this density measure will be half of what we calculated in the undirected case– because each pair could also have an edge that connects in the opposite direction

The Difference

Two measures (transitivity and density) follow the same form of “number observed shapes / number of possible shapes”, but truth be told, I don’t have a strong grasp on when one is a more appropriate measure than the other for evaluating a network as a whole.

Paths

I’m sure the mechanics of this will get worked out at length in another notebook, but for now, I wanted to get a few core definitions down.

Before we do, though, I want to point out that for this particular network, it’s possible to ping-pong from node to node such that no matter where you start, you can always reach the terminal node of interest. Or to start in on the terminology, we can say that there exists a walk– a sequence of edges– that connect any two points in our graph.

for node_1 in G:
    for node_2 in G:
        assert nx.has_path(G, node_1, node_2)
else:
    print('No errors')
No errors
fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos)

png

Tightening that definition, a trail is a walk that never visits the same edge twice. Its length is guaranteed to be less than or equal to any other valid walk between two points, because if there was a revisited edge, we would just take whatever the final step off of that node would be, instead.

A cycle is a trail that begins and ends at the same place. Our earlier discussion of “completed triangles” is a great example of a cycle.

Tightening even further, a path is a trail that never visists the same node twice.

And finally, a geodesic is a two-dollar word meaning “shortest simple path.” It has no redundant steps, and gets from point A to point B in the minimum number of steps. Here, we can see that there’s a path that exists, connecting nodes 8 and 24.

Moreover, we can use the built-in function shortest_path to determine what it is.

shortest_path = nx.shortest_path(G, 8, 24)
shortest_path
[8, 2, 27, 24]
fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos)
nx.draw_networkx_nodes(G, pos=pos, nodelist=shortest_path, node_color='gold');

png

Graphs as Circles

This part still throws me for a bit of a loop, but the book also concedes that it’s a bit tricky to grok, so I’ll try and be careful.

Graph Networks borrow a lot of their descriptors from geometric terms used to describe circles. But before we get into that I want to define node eccentricity as a measure of how far from the normal/center node a point is. Note that prefix– ecc, interchangeable with the Latin ex, meaning “out”, so taken literally, this means “measure of how far out from the center” a given node is.

In the graph context, this is expressed as the max distance from a node to all other nodes in the network, like so

print(nx.eccentricity(G))
{0: 3, 1: 3, 2: 3, 3: 3, 4: 4, 5: 4, 6: 4, 7: 4, 8: 3, 9: 4, 10: 4, 11: 4, 12: 4, 13: 3, 14: 5, 15: 5, 16: 5, 17: 4, 18: 5, 19: 3, 20: 5, 21: 4, 22: 5, 23: 5, 24: 4, 25: 4, 26: 5, 27: 4, 28: 4, 29: 5, 30: 4, 31: 3, 32: 4, 33: 4}

Which should track.

If you’d describe someone as “eccentric” you’d likely also use the phrase “out there” which has a more literal interpretation here than I might have expected!

Where it gets confusing is borrowing the circle terminology.

diameter is the maximum eccentricity for a graph

max_ecc = max(nx.eccentricity(G).items(), key=lambda x:x[1])[1]

nx.diameter(G) == max_ecc
True

radius is the minimum eccentricity of a Graph

Note: and has NO proportional relationship to the diameter whatsover

min_ecc = min(nx.eccentricity(G).items(), key=lambda x:x[1])[1]

nx.radius(G) == min_ecc
True

Using these terms, we’d say that the center of a graph, is a set of all nodes whose eccentricity is equal to the radius

center_nodes = nx.center(G)
center_nodes
[0, 1, 2, 3, 8, 13, 19, 31]
[nx.eccentricity(G, node) for node in center_nodes]
[3, 3, 3, 3, 3, 3, 3, 3]

Similarly, the periphery represents all nodes whose eccentricity is equal to the diameter

periphery_nodes = nx.periphery(G)
periphery_nodes
[14, 15, 16, 18, 20, 22, 23, 26, 29]
[nx.eccentricity(G, node) for node in periphery_nodes]
[5, 5, 5, 5, 5, 5, 5, 5, 5]