Session 6

This continues from last week. The NetworkX modules need to be imported:

from networkx import *
from operator import *

1 Analysing relations

Download the file friends.csv and save it on your I:-drive. This is a comma-separated variable (csv) file. You can look at the file if you open it in Wordpad. If you double click a csv-file, it will open in Excel.

The file contains a binary relation which can be represented as a graph using:

F = read_edgelist("friends.csv",delimiter=",",create_using=DiGraph())

(Notes: F is specified as a directed graph because whether someone calls someone else a "friend" may not be symmetric.
If you create a csv file yourself, make sure that it does not contain any blank lines at the end of the file.)

1.1 Exercises

1) Something to think about: What is the domain/co-domain for the friends relation?

2) Create a gif file for F. Try different layouts. Which layout seems most suitable? Is it easy to see who is considered "friend" by most people and who has the most friends?

1.2 Calculating Google's PageRank

PageRank is an algorithm that calculates the "relative importance" of nodes in a directed graph. Basically, nodes that have more incoming edges are more important. Edges coming from more important nodes count more than edges coming from less important nodes.

The code below calculates the PageRank for each node in F and then prints a list of nodes in decreasing order of importance.

pg = pagerank(F)
for item in sorted(pg.iteritems(), key=itemgetter(1), reverse=True):
     print item

Who is considered "friend" by most people according to PageRank? Who has the least friends?

1.3 Exercise

3) Calculate the reverse graph using F.reverse(), i.e., if (John,Paul) is in the original, then (Paul,John) is in the reverse graph.

4) Calculate the PageRank of the reverse graph. Create gif files for both F and the reverse graph. Answer the following questions based on what you observe in the pictures and the PageRank data:

Why is Claire high in F and low in the reverse graph?

Why are Helen and Sue highly ranked in F and in the reverse graph?

What does this mean for websites: how can a website achieve a high PageRank in Google?

2 Small-world networks

A small-world network has a small average shortest path lenght (eg below 6) and an average clustering coefficient that is larger than the clustering coefficient of a random graph with the same number of nodes and edges. The average clustering coefficient determines whether the neighbours of nodes are also connected and thus the graph forms clusters.

Using

average_shortest_path_length(G)
average_clustering(G)             ### for the clustering coefficient
gnm_random_graph(n, m)            ### for a random graph with n nodes, m edges
G.to_undirected()                 ### returns an undirected graph from DiGraph
G.number_of_nodes()
G.number_of_edges()

2.1 Exercise

5) Determine whether the friends graph is a small-world network.

3 Exercise with pen and paper

Suppose you are going to a party with your girlfriend/boyfriend. There are three other couples at the party. Several people shake hands. Obviously nobody shakes hands with themselves or their girlfriend/boyfriend. Nobody shakes hands with the same person more than once. At the end of the party, you ask everybody including your girlfriend/boyfriend how many hands they have shaken. Each person gives a different answer.

Can you figure out how many hands you shook and how many hands your girlfriend/boyfriend shook?

Hints:

Consider this a graph problem where the nodes are people and the edges are handshakes.

What is the maximum number of handshakes possible for a single person in this graph? What is the minimum number?

Because each person gives a different answer, you can figure out what the answers must have been.

Insert edges into the graph, starting with the person with the most handshakes and then the one with the second most handshakes.