# Basic Terminology and Inspection

## Node Inspection

For this, we’ll throw together a bare-bones network representing a straight-line, *directed* path from `1`

to `4`

, peppering in weights along the edges.

```
%pylab inline
import networkx as nx
F = nx.DiGraph()
F.add_weighted_edges_from([(1, 2, .3), (2, 3, .4), (3, 4, .5)])
```

```
Populating the interactive namespace from numpy and matplotlib
```

Simple stuff, yeah?

`nx.draw_networkx(F)`

Here, we built `F`

with simple integers, but we could have just as easily used `str`

or any other hashable type.

`F.nodes`

```
NodeView((1, 2, 3, 4))
```

Now, if we want to drill in to any particular node, we can simply subscript the Graph object by it’s node label

`F[2]`

```
AtlasView({3: {'weight': 0.4}})
```

As you can see, this only gives us a partial picture– namely the *outbound* edge(s) and the relevant edge attributes.

If instead, we wanted to see everything– incoming and outgoing– we’d use the `nx.all_neighbors()`

method

`list(nx.all_neighbors(F, 2))`

```
[1, 3]
```

However you’ll notice that we lost the convenience of getting the edge attribute, automagically

### A Less-Simple Dataset

In order to dive into more macro measures, we’ll import the canonical Graph Dataset that analyzes the social network of a karate class.

```
G = nx.karate_club_graph()
type(G)
```

```
networkx.classes.graph.Graph
```

Inspecting the type of `G`

, we can see that it’s a `Graph`

(as opposed to a *directional* graph) and therefore the edge that connects two nodes doesn’t have any inherent direction associated to it.

```
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos=pos)
```

### Clustering

Quoting the book, “Some social theories consider triads essential units of social network analysis.” So what does that mean?

Well, I actually found this to be a little counter-intuitive.

For starters, let’s look at a random node in our network, `15`

. It’s kinda hanging out on its own, only sharing a connection with two other nodes

```
pos = nx.circular_layout(G)
fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos, ax=ax, node_color='gray')
nx.draw_networkx_nodes(G, pos=pos, nodelist=[15], node_color='gold');
nx.draw_networkx_nodes(G, pos=pos, nodelist=G[15], node_color='lightblue');
```

32 and 33

`G[15]`

```
AtlasView({32: {}, 33: {}})
```

Critically, however, these two nodes are *also* connected to one another

`33 in G[32]`

```
True
```

`32 in G[33]`

```
True
```

And so in this un-directed graph, you could potentially cycle through the *triangle* `15-32-33-15...`

forever.

On the other hand, consider node 9. It *also* only connects to two nodes

`G[9]`

```
AtlasView({2: {}, 33: {}})
```

```
pos = nx.circular_layout(G)
fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos, ax=ax, node_color='gray')
nx.draw_networkx_nodes(G, pos=pos, nodelist=[9], node_color='gold');
nx.draw_networkx_nodes(G, pos=pos, nodelist=G[9], node_color='lightblue');
```

But as you might be able to see from the picture, its neighbors *aren’t* connected.

`33 in G[2]`

```
False
```

`2 in G[33]`

```
False
```

So what does this mean?

Well, in the context of (social) networks, a node’s `clustering coefficient`

is a measure of “what percent of all possible triangles exist between a node and all possible pairs of adjacent nodes?”

That’s a mouthful. Let’s try and unpack a little. We’ll do this by looking more closely at a more middle-of-the-road node. Here, I’ve gotten each node’s clustering coefficient and I spy that node `10`

will probably give us what we’re after.

`print(nx.clustering(G))`

```
{0: 0.15, 1: 0.3333333333333333, 2: 0.24444444444444444, 3: 0.6666666666666666, 4: 0.6666666666666666, 5: 0.5, 6: 0.5, 7: 1.0, 8: 0.5, 9: 0, 10: 0.6666666666666666, 11: 0, 12: 1.0, 13: 0.6, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 1.0, 21: 1.0, 22: 1.0, 23: 0.4, 24: 0.3333333333333333, 25: 0.3333333333333333, 26: 1.0, 27: 0.16666666666666666, 28: 0.3333333333333333, 29: 0.6666666666666666, 30: 0.5, 31: 0.2, 32: 0.19696969696969696, 33: 0.11029411764705882}
```

It’s got three neighbors

`G[10]`

```
AtlasView({0: {}, 4: {}, 5: {}})
```

```
pos = nx.circular_layout(G)
fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos, ax=ax, node_color='gray')
nx.draw_networkx_nodes(G, pos=pos, nodelist=[10], node_color='gold');
nx.draw_networkx_nodes(G, pos=pos, nodelist=G[10], node_color='lightblue');
```

Which makes for a possibility of `3-choose-2`

possible triangle pairs:

```
0, 4
0, 5
4, 5
```

Examining each pair for connectivity, we get ^{2}⁄_{3} of the neighboring connections, and therefore ^{2}⁄_{3} of the possible triangles realized.

`0 in G[4]`

```
True
```

`0 in G[5]`

```
True
```

`4 in G[5]`

```
False
```

Therefore, node `10`

has a connectivity coefficient of .6666

### Transitivity

Taken as a whole, the `transitivity`

of a network, gives you a one-shot look at “number of triangles / number of all possible triangles”

`nx.transitivity(G)`

```
0.2556818181818182
```

### Density

Another, similar measure of “how interconnected is my network” is its `density`

.

To work this out we’ll start with the fact that there are 34 nodes in our Graph

`len(G.nodes)`

```
34
```

and therefore, are 561 possible pairs of nodes (assuming `(m, n) == (n, m)`

in an undirected graph).

`34*33/2`

```
561.0
```

Looking at our data, we can see there are 78 lines connecting pairs of data, `edges`

```
num_edges = len(nx.edges(G))
num_edges
```

```
78
```

And therefore 483 `non_edges`

between all of the other pairs that *don’t* share an edge

```
num_non_edges = len(list(nx.non_edges(G)))
num_non_edges
```

```
483
```

Which is to say that all possible pairs in our graph are accounted for, between `edges`

and `non_edges`

`561 == (num_edges + num_non_edges)`

```
True
```

Therefore, we can define the `density`

of our Graph as “the ratio of connected pairs to all possible pairs in the network”.

The Karate Graph has a density of approximately .14, which we can calculate manually

`78/(78+483)`

```
0.13903743315508021
```

or use the function provided by `networkx`

`nx.density(G)`

```
0.13903743315508021
```

**Note:** When working with networks where the direction *does* matter, this `density`

measure will be *half* of what we calculated in the undirected case– because each pair could *also* have an edge that connects in the opposite direction

### The Difference

Two measures (transitivity and density) follow the same form of “number observed shapes / number of possible shapes”, but truth be told, I don’t have a strong grasp on when one is a more appropriate measure than the other for evaluating a network as a whole.

## Paths

I’m sure the mechanics of this will get worked out at length in another notebook, but for now, I wanted to get a few core definitions down.

Before we do, though, I want to point out that **for this particular network**, it’s possible to ping-pong from node to node such that no matter where you start, you can always reach the terminal node of interest. Or to start in on the terminology, we can say that there exists a `walk`

– a sequence of edges– that connect any two points in our graph.

```
for node_1 in G:
for node_2 in G:
assert nx.has_path(G, node_1, node_2)
else:
print('No errors')
```

```
No errors
```

```
fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos)
```

Tightening that definition, a `trail`

is a `walk`

that never visits the same edge twice. Its length is guaranteed to be less than or equal to any other valid walk between two points, because if there was a revisited edge, we would just take whatever the final step off of that node would be, instead.

A `cycle`

is a `trail`

that begins and ends at the same place. Our earlier discussion of “completed triangles” is a great example of a `cycle`

.

Tightening even further, a `path`

is a `trail`

that never visists the same node twice.

And finally, a `geodesic`

is a two-dollar word meaning “shortest simple path.” It has no redundant steps, and gets from point A to point B in the minimum number of steps. Here, we can see that there’s a path that exists, connecting nodes `8`

and `24`

.

Moreover, we can use the built-in function `shortest_path`

to determine what it is.

```
shortest_path = nx.shortest_path(G, 8, 24)
shortest_path
```

```
[8, 2, 27, 24]
```

```
fig, ax = plt.subplots(figsize=(12, 12))
nx.draw_networkx(G, pos=pos)
nx.draw_networkx_nodes(G, pos=pos, nodelist=shortest_path, node_color='gold');
```

## Graphs as Circles

*This part still throws me for a bit of a loop, but the book also concedes that it’s a bit tricky to grok, so I’ll try and be careful.*

Graph Networks borrow a lot of their descriptors from geometric terms used to describe circles. But before we get into that I want to define `node eccentricity`

as a measure of how far from the normal/center node a point is. Note that prefix– `ecc`

, interchangeable with the Latin `ex`

, meaning “out”, so taken literally, this means “measure of how far out from the center” a given node is.

In the graph context, this is expressed as the max distance from a node to all other nodes in the network, like so

`print(nx.eccentricity(G))`

```
{0: 3, 1: 3, 2: 3, 3: 3, 4: 4, 5: 4, 6: 4, 7: 4, 8: 3, 9: 4, 10: 4, 11: 4, 12: 4, 13: 3, 14: 5, 15: 5, 16: 5, 17: 4, 18: 5, 19: 3, 20: 5, 21: 4, 22: 5, 23: 5, 24: 4, 25: 4, 26: 5, 27: 4, 28: 4, 29: 5, 30: 4, 31: 3, 32: 4, 33: 4}
```

Which should track.

If you’d describe someone as “eccentric” you’d likely also use the phrase “out there” which has a more literal interpretation here than I might have expected!

Where it gets confusing is borrowing the circle terminology.

`diameter`

is the maximum `eccentricity`

for a graph

```
max_ecc = max(nx.eccentricity(G).items(), key=lambda x:x[1])[1]
nx.diameter(G) == max_ecc
```

```
True
```

`radius`

is the *minimum* `eccentricity`

of a Graph

**Note**: and has NO proportional relationship to the `diameter`

whatsover

```
min_ecc = min(nx.eccentricity(G).items(), key=lambda x:x[1])[1]
nx.radius(G) == min_ecc
```

```
True
```

Using these terms, we’d say that the `center`

of a graph, is a set of all nodes whose `eccentricity`

is equal to the `radius`

```
center_nodes = nx.center(G)
center_nodes
```

```
[0, 1, 2, 3, 8, 13, 19, 31]
```

`[nx.eccentricity(G, node) for node in center_nodes]`

```
[3, 3, 3, 3, 3, 3, 3, 3]
```

Similarly, the `periphery`

represents all nodes whose `eccentricity`

is equal to the `diameter`

```
periphery_nodes = nx.periphery(G)
periphery_nodes
```

```
[14, 15, 16, 18, 20, 22, 23, 26, 29]
```

`[nx.eccentricity(G, node) for node in periphery_nodes]`

```
[5, 5, 5, 5, 5, 5, 5, 5, 5]
```