In this section, we are going to perform an analysis that will be using all the things that we have learned previously in Shortest Path Analysis. It will be using networkX skill, pandas skill and some newer form of visualization which we have not done yet.
So we are doing today the analysis of the shortest path of flight keeping in mind that no direct flight go from the source to direction and also keeping in view that No New York City airport has the capacity of taking a direct flight to the destination.
In this task, we first acquire the longitude and latitude of the countries and then we will create an extensive data set by calculating the distance from each of the countries.
So let's start the project:
import random
import csv
import geopy.distance
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Airports Data")
import pickle
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import folium
SourceCode Credit: Hassan Rehman, Shortest Path Analysis, (2019), GitHub repository, https://github.com/HassanRehman11/Shortest-Path-Analysis/blob/master/Map.ipynb
So, first of all, we have imported random to do some random jobs when we need to randomize something we will be using this module. We are going to create a CSV file that will store all the location or source and all the location of the destination with the distance from the source to destination.
Then there is a geolocator which will extract the information out of the name of the place it will provide the longitude and latitude of the place that is very important for this task.
Then there is a pickle to pickle the longitude and latitude, so we don't have to extract every time. Then there is networkX which we will be using to make the relation based on distance so that we can make the shortest path out of it. If anyone wants to plot networkX graph than it can be done using Matplotlib library and in last we have a folium. Folium is a new thing for you. It is a mapping technique in which we can plot markers and make lines on the world map.
Ok so after making all the significant imports we now the move to our first part building a data set that we can use to analyze:
loc = []
capital = ['Washington','Las Vegas','California','Beijing','Tehran',
'Kabul','Baku','Dhaka','Cairo','Baghdad',
'Tel Aviv','Astana','Kuwait City','Tripoli','Muscat','Riyadh','Abu Dhabi','Amman',
'Moscow','London','New York City','Istanbul','Bangkok','Geneva','Colombo']
t = len(capital)
#for city in range(t):
#location = geolocator.geocode(capital[city])
#loc.append((location.latitude, location.longitude))
From = 'New York City'
To = 'London'
forbidden_from = ['New York City']
forbidden_to = ['London']
with open('lon_lat.pkl', 'rb') as f:
city = pickle.load(f)
capital=city[0]
loc = city[1]
with open('large_data.csv', mode='w') as capital_distance:
capital_writer = csv.writer(capital_distance, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
capital_writer.writerow(["Source", "Target", "Distance","s_lon","s_lat","t_lon","t_lat"])
for i in range(t):
for j in range(t):
if (i!=j):
source = loc[i]
target = loc[j]
if (not((capital[i] in forbidden_from and capital[j] in forbidden_to) or (capital[i] in forbidden_to and capital[j] in forbidden_from))):
distance = geopy.distance.vincenty(source,target).miles
distance = '%.2f' %distance
capital_writer.writerow([capital[i],capital[j],distance,source[0],source[1],target[0],target[1]])
SourceCode Credit: Hassan Rehman, Shortest Path Analysis, (2019), GitHub repository, https://github.com/HassanRehman11/Shortest-Path-Analysis/blob/master/Map.ipynb
Ok so in the first line we initialize an empty list in which we are going to append longitude and latitude. We are not taking all of the countries we are taking some of the countries. In this case, we have taken 26 countries.
After this initialization, we start a loop that will iterate over all of the countries and extract the longitude and latitude of the countries.
We already said theta there should be no direct flight from New York to the destination so we forbidden New York City airport and after that, we run a loop to get the distance from each of the countries. This step doesn't include the distance from the New York City airport and the destination.
Distance is calculated using the geopy module and the unit is miles. We can also convert into kilometers by specifying km instead of miles. The CSV file looks like the following:
527 |
New York City |
Mumbai |
7808.03 |
40.730862 |
-73.987156 |
18.938771 |
72.835335 |
528 |
New York City |
Beijing |
6842.98 |
40.730862 |
-73.987156 |
39.906217 |
116.391276 |
529 |
New York City |
Tehran |
6135.80 |
40.730862 |
-73.987156 |
35.700618 |
51.401378 |
530 |
New York City |
Kabul |
6748.57 |
40.730862 |
-73.987156 |
34.526013 |
69.177648 |
531 |
New York City |
Baku |
5830.62 |
40.730862 |
-73.987156 |
40.375443 |
49.832675 |
532 |
New York City |
Dhaka |
7881.95 |
40.730862 |
-73.987156 |
23.759357 |
90.378814 |
533 |
New York City |
Cairo |
5616.46 |
40.730862 |
-73.987156 |
30.048819 |
31.243666 |
534 |
New York City |
Baghdad |
6004.45 |
40.730862 |
-73.987156 |
33.302431 |
44.378799 |
535 |
New York City |
Tel Aviv |
5674.81 |
40.730862 |
-73.987156 |
32.080481 |
34.780527 |
536 |
New York City |
Astana |
5771.76 |
40.730862 |
-73.987156 |
51.128258 |
71.430550 |
537 |
New York City |
Kuwait City |
6347.83 |
40.730862 |
-73.987156 |
29.379709 |
47.973563 |
538 |
New York City |
Tripoli |
4660.58 |
40.730862 |
-73.987156 |
32.896672 |
13.177792 |
539 |
New York City |
Muscat |
7066.25 |
40.730862 |
-73.987156 |
23.512550 |
58.551402 |
540 |
New York City |
Riyadh |
6546.47 |
40.730862 |
-73.987156 |
24.631969 |
46.715065 |
541 |
New York City |
Abu Dhabi |
6856.08 |
40.730862 |
-73.987156 |
23.997644 |
53.643910 |
542 |
New York City |
Amman |
5729.39 |
40.730862 |
-73.987156 |
31.951569 |
35.923963 |
543 |
New York City |
Moscow |
4678.28 |
40.730862 |
-73.987156 |
55.750718 |
37.617661 |
544 |
New York City |
Istanbul |
5024.81 |
40.730862 |
-73.987156 |
41.009633 |
28.965165 |
545 |
New York City |
Bangkok |
8667.21 |
40.730862 |
-73.987156 |
13.753893 |
100.816080 |
546 |
New York City |
Geneva |
3871.64 |
40.730862 |
-73.987156 |
46.201756 |
6.146601 |
547 |
New York City |
Colombo |
8757.31 |
40.730862 |
-73.987156 |
6.921812 |
79.865561 |
It contains the source and destination name along with their coordinates.
with open('lon_lat.pkl', 'rb') as f:
city = pickle.load(f)
capital=city[0]
loc = city[1]
G = nx.Graph()
df = pd.read_csv('large_data.csv')
From = 'New York City'
To= 'London'
city = df.Source.unique()
G.add_nodes_from(city)
data = df[['Source', 'Target', 'Distance']]
edge = list(map(tuple,data.values))
for source,target,distance in edge:
G.add_edge(source, target,edge_color='g', weight=distance)
path = nx.dijkstra_path(G, From,To)
print(path)
SourceCode Credit: Hassan Rehman, Shortest Path Analysis, (2019), GitHub repository, https://github.com/HassanRehman11/Shortest-Path-Analysis/blob/master/Map.ipynb
We open the file that we have pickled to get the city and its longitude and latitude for creating the markers later. After that, we have initialized the networkX graph and make the edges and nodes of all cities and their respective distance from each other. After that, we call the method of shortest path and print the shortest path. The output is as follows:
So according to it the shortest path is from New York City to Geneva and then to London.
Now we have the shortest path lets plot on folium.
a = list(nx.all_shortest_paths(G,From,To))
short={}
for e in a:
total_d=0.0
for i in range(len(e)-1):
j = df[(df.Source == e[i])&(df.Target == e[i+1])].index
dist = float((df.iloc[j,]['Distance']))
total_d = total_d+dist
short[tuple(e)]= total_d
SourceCode Credit: Hassan Rehman, Shortest Path Analysis, (2019), GitHub repository, https://github.com/HassanRehman11/Shortest-Path-Analysis/blob/master/Map.ipynb
However, before plotting the map, we will get all the shortest path from the networkX all_shortet_path module to see other alternatives too. After getting all the shortest path, we iterate over the list and calculate the total distance from the source to the 2^{nd} city and then the destination and store it in a dictionary.
To plot the shortest path on folium and afterward, we will output all the shortest paths.
So, the code for this is as follows:
import folium
lon = []
lat = []
for i in range(len(loc)):
lon.append(loc[i][0])
lat.append(loc[i][1])
data = pd.DataFrame({
'lat':lat,
'lon':lon,
'name':capital
})
m = folium.Map(location=[20, 0], tiles="Mapbox Bright", zoom_start=2)
for i in range(0,len(data)):
folium.Marker([data.iloc[i]['lon'], data.iloc[i]['lat']], popup=data.iloc[i]['name']).add_to(m)
color = ['red','green','blue','yellow','black','brown']
for i in range(len(path)-1):
j = df[(df.Source == path[i])&(df.Target == path[i+1])].index
lon1 = float((df.iloc[j,]['s_lon']))
lat1 = float((df.iloc[j,]['s_lat']))
lon2 = float((df.iloc[j,]['t_lon']))
lat2 = float((df.iloc[j,]['t_lat']))
dist = float((df.iloc[j,]['Distance']))
folium.PolyLine(locations=[[lon1,lat1], [lon2,lat2]],popup=str(dist)+"miles",color=random.choice(color)).add_to(m)
m.save('map.html')
m
SourceCode Credit: Hassan Rehman, Shortest Path Analysis, (2019), GitHub repository, https://github.com/HassanRehman11/Shortest-Path-Analysis/blob/master/Map.ipynb
After running this script following map will be saved as an HTML:
Figure 1 |
It shows all the cities specified by the marker and the polyline from Islamabad to Kabul and then London.
Now let’s see how we can get the all shortest paths:
print("All possible routes are: ")
sorted_by_value = sorted(short.items(), key=lambda kv: kv[1])
for e in sorted_by_value:
print(e)
SourceCode Credit: Hassan Rehman, Shortest Path Analysis, (2019), GitHub repository, https://github.com/HassanRehman11/Shortest-Path-Analysis/blob/master/Map.ipynb
To print all the shortest paths, we first sort the dictionary with respect the distance in descending order. The output is as follows:
Mentioned above is the analysis of the shortest path. This code contains many areas to improve it which is intentionally designed in such a way so that you improve it by yourself. The task is to make it more dynamic less with less loop and how we can improve the efficiency and timing of this code.