The TfL Tube map is becoming increasingly dense, as more and more lines and stations are added to it. (By the time I achieve anything with this graph analysis, there may be more tweaking to do because of Crossrail!)
I have tried to find a graph online that is set up for basic analysis, which accounts for:
- Interchanges within station
- Interchanges between stations, i.e. out-of-station interchanges (OSIs) and other short walking routes.
- The latest TfL network, especially all the current London Overground routes.
There are interactive tools (e.g. Oliver O’Brien’s Tube Creature and Google Maps) that cover those three factors, but I have not found a workable dataset that can be used for my own graph analysis.
These are two core challenges I want to tackle:
- Calculate the shortest path between two stations.
- Perform graph analysis / link analysis on the network.
I have created my own dataset to include all the Overground routes and added my own interchange edges. There are about 120 out-of-station interchanges (OSIs) and walking routes in my dataset (fairly even split between the two).
For my dataset, I have also designed an ID system that includes information about all the stations: e.g. Piccadilly Circus has ID 71034, where ’10’ shows that it is in Zone 1. This numbering system has been useful for producing pivot tables and bringing in an element of data verification.
I used the NetworkX (as nx) library to set up the graph.
Here are the neighbours for Queen’s Park station: “Queen’s Park [Bakerloo]”, “Queen’s Park [WDCL]”, “Queen’s Park [WALK]”.
In other words, Queen’s Park station has its Bakerloo and Watford DC line platforms as neighbours, in addition to the ‘WALK’ platform, which arises from how my dataset is constructed to account for short walking routes.
Now let’s create a function to obtain the fastest route and see how it is working.
def fastest_route(start, end): """ Return the fastest path between the 'start' and 'end' points. Each station and interchange is printed, along with the journey time. Tip: use "" when calling the function, as escape characters may be needed with ''. """ journey_path = nx.shortest_path(graph_times, start, end, weight='weight') journey_time = nx.shortest_path_length(graph_times, start, end, weight='weight') print('\nJOURNEY:', *journey_path, sep='\n\t') print('\nJOURNEY TIME:', journey_time, 'minutes')
Let’s try some examples with the function and see how it is working.
fastest_route("Queen's Park", "Brondesbury Park")
Queen’s Park [WALK]
Brondesbury Park [WALK]
JOURNEY TIME: 12.0 minutes
Queen’s Park to Brondesbury Park is not an OSI, despite both stations’ being only 0.5 miles apart on the same road (Salisbury Road) and their being on different Overground lines. On TfL’s map, people unfamiliar with the area might think that an interchange at Willesden Junction would be a faster journey.
Note: The distance between each station and its ‘walk’ node is a 1-minute journey due to the uniform assumption applied to all interchanges in the graph design.
West Hampstead to West Ruislip
Let’s look at an interesting case: a journey from the West Hampstead area to West Ruislip station.
There are three different routes that take a similar amount of time (approx. 50 minutes, +/- a few minutes):
- Take Jubilee line to Bond Street, then take the Central line direct to West Ruislip.
- Get on the Metropolitan line (Finchley Road), leave at Ickenham and then walk from there to West Ruislip. That walking route is an OSI on the network.
- Take Overground service to Shepherd’s Bush, then take the Central line direct to West Ruislip.
West Hampstead’s Underground, Overground and Thameslink stations are all separate stations. These are the fastest routes based on my current graph:
- West Hampstead Overground → West Ruislip: 50.5 mins, with interchange at Shepherd’s Bush.
- West Hampstead Underground → West Ruislip: 48 mins, with interchange at Finchley Road and walk from Ickenham to destination.
If more data on actual interchange times were put in the dataset, then perhaps the shortest paths would change. The Overground and Underground stations are less than 2.5 minutes apart in reality.
Thoughts so far
I was pleased with the results of my fastest route function and moved on to some graph analysis: I will show the findings in the next blog post.
The key thing to change in the data pipeline is to add the actual interchange times between stations’ platforms, in order to give more accurate journey times.
Note: I hope to publish the Jupyter notebook of this project later on and am trying not to dwell too much on the actual code in these blog posts.