In our ever more connected world, more and more data is about the flow of people, of goods, and of information. With the Tableau spatial file connector now also supporting linear geographies - in addition to points and polygones - such infrastructure networks can now be easily plotted in Tableau. As we have seen in the launch announcement for Tableau Public 10.4, examples of linear Shapefiles, KML files or GeoJSON files abound.
But what if your network data is not yet in such a file format? It is actually not that difficult to create these file formats yourself, whether that is in a GIS tool or using a programming language such as R or Python. Here we’d like to show how we recently created a KML file containing all the flight routes of the world.
We came across the OpenFlights Database that has information on all the airports, airlines, and flight routes of the world. It is easy enough to download this information as text files (.dat files), and open it in Tableau.
In fact, given that Tableau maps can plot airports based on their IATA codes, we had created a first flight route map in less than a minute!
Traditional approaches to great-circle paths
However, as you can see in the screenshot above, it comes with a small fly in the ointment: the airports are connected by straight lines. Of course, on world maps flight routes are often depicted as arcs – the so-called great circle connections that represent the shortest distance between two points on a globe.
Many ingenious workarounds to create such curved lines have been described by the talented Tableau community (e.g. here, here, here, or here). But all of these rely on creating coordinates for enough intermediate points between the origin and destination so that the connected line appears curved to the human eye.
This approach has two small drawbacks: a) when hovering over the lines Tableau highlights the intermediate dots, not the whole route, and b) because we are adding more rows to the data and more marks to the view, the performance of the published viz might decrease.
Enter the new and improved spatial file connector
With linear-geography spatial files, one can now completely prevent the first problem: We can get one mark for each route that highlights as a whole when hovered over or tapped on. The spatial file connector also mitigates the second problem, as it is designed to optimize the rendering of the lines so that the smoothness of the curve is appropriate for the zoom level.
Creating your own spatial file
Often a quick internet search will yield a spatial file for your need. In our case we weren’t happy with what we found, so we decided to build our own spatial file. There are different ways to go about it, but we used Python for this exercise.
The script is fairly straightforward and it shouldn’t require a lot of adjustments when one wants to apply it to other data sets. Here is how you can replicate what we did:
Step 1: Get the data ready.
For use in Python, the table should have the following columns: origin latitude, origin longitude, destination latitude, destination longitude, route name. The exact names are not important, but the columns need to be in this particular order if you don’t want to make any adjustments to the script. Each line below that should have the details for one route.
In our case, we first had to join the table with the flight routes with information on the airports, in order to get the longitude and latitude from the latter. (We actually added it twice – so as to get the coordinates for both the origin and the destination airports.)
In the script, our data file is called flightsoftheworld.csv. Change as needed.
Step 2: Get Python ready.
Make sure you have the following packages installed: pandas, geographiclib, and lxml.
Step 3: Download the script and run it in Python.
Get the script from here, and run it in Python. It should spit out a KML file. (See below for a more detailed description of what KML files are and what exactly the script does).
Step 4: Play in Tableau
Open the KML file in Tableau; add “geography” and “name” to the view. Done. With a bit more formatting (We coloured the lines according to the distance of the routes) it should look similar to the viz below. Feel free to download the workbook and play with the data.
A closer look at KML files
KML files describe geographic features for maps using the XML language. Under the hood, the information is presented in a tree-like fashion. The example adapted from Wikipedia shown below would draw a point on a map indicating where New York City is. To add another city you would add another block of the lines that make up a “Placemark”, with the relevant coordinates, name and description.
<?xml version="1.0" encoding="UTF-8"?> <kml> <Document> <Placemark> <name>New York City</name> <description>New York City</description> <Point> <coordinates>-74.006393,40.714172,0</coordinates> </Point> </Placemark> </Document> </kml>
If you wanted to add a line instead, one would replace the <Point> tags with <LineString>, and you would add the starting and ending coordinates between the <coordinates> tags, separated by a space, like so:
<coordinates>60.8027,56.7430,0 59.2750,56.36221,0 </coordinates>
Note, since we are only operating in 2-dimesnional space, we set the 3rd coordinate to zero. A non-linear line can be pieced together by adding more waypoints – i.e. intermediate points:
<coordinates>60.8027,56.7430,0 59.2750,56.3622,0 57.7782,55.9628,0 </coordinates>
A closer look at the script
Roughly speaking the Python script does the following:
- Use the lxml package to set up a xml tree structure.
- Read in the dataset with the flight routes as described in step 1 above.
- Loop through the data to create waypoints for each of the routes. This makes use of the geographiclib package. The number of intermediate points is set depending on how long the route is. For shorter routes we don’t need that many points, as the lines will be fairly straight. We get the coordinates for each intermediate point and add it to a string called output.
- For each route we use this output string in a block of xml code. It is written between the coordinate tags. We also get the route name for the respective row in the data file and write it between the name tags. As a bonus we also add the distance to the description of each placemark which we also get from geographiclib.
- Finally, we write the xml tree into a kml file.
A short-cut. Writing directly into a TDE
If you know that you want to use this data in Tableau (and you have the Personal or Professional Edition of Tableau Desktop), you can take a shortcut and write your routes directly into a Tableau Data Extract (TDE) file using the TableauSDK (SDK = Software Development Kit). The SDK supports several languages (C, C++, Python, etc.). We used the Python version to turn the routes created using geographiclib into a TDE.
A benefit of writing straight into a TDE is that the files are much smaller because coordinates for each route are compressed into our spatial data format as you drop them into the TDE.
We can follow the same set of steps we did for creating the KML file with a few changes, namely:
- In step 2 make sure you also have the following packages installed: csv (probably came with your Python installation), and the TableauSDK.
- In step 3: Download the TDE writing version of the script and run it in Python. It should spit out a TDE file. Then continue with that in Tableau.
A closer look at how the TDE script works:
- Initialize a Tableau extract and create the table definition. This defines where your TDE will be saved and what fields will be written in the TDE.
- Create a new table (called ‘Extract’) and create an empty row in the table so that you can start writing data.
- Open the csv file and loop through the file to calculate the great circle routes
- For every route, write the results into a new row in the TDE – we are writing route name, the route coordinates as a linestring in Well Known Text (WKT) format, and the route distance.
- Close the extract and start mapping in Tableau!