Friday 1 July 2016

Transportation

Introduction


The notebook Transportation focuses on the transport system of Europe and performs analysis on the data made available by the EuroStat organization which provides a large number of datasets on every possible topic. This notebook uses Apache Flink for analysis of data and the visualization capabilities of Apache Zeppelin.
Transportation Notebook
The analysis has been divided into three broad sections : transportation overview, road transport and rail transport. The overview section provides general information and performs informative analysis on the availability and the usage of transport services in the region. Thereafter, the analysis focuses on the most used means of transport - road and rail transport, and considers various parameters provided by the datasets that might indicate the conditions of the means of transport. In both the sections, the main aim has been the analysis of congestion in the traffic and the measures that the countries have taken to handle the pressures of traffic. As in the previous notebooks, the first few paragraphs are used to load the datasets and define some general functions that have been used throughout the notebook. At the beginning of each section, a paragraph defines the case classes needed for loading the datasets for that particular section. Since, a large number of datasets with different formats have been used two case classes : 'CommonType1' and 'CommonType2' have been defined at the start for converting any dataset to a common type for visualization. Additional functions have been defined to get the data from the 'common types' into the table display format. The notebook makes use of html display and Helium Application for displaying custom visualizations and maps respectively.


Getting the data from EuroStat


The EuroStat organization is a European organization that provides data on various indicators in Europe. All the data was fetched from this organization. Navigating to the bulk download feature of the website provides complete information about the available datasets in a pdf file. From that file one may find the relevant datasets and download them using the bulk download facility. In the case of this notebook, all the datasets that were related to transportation were downloaded except for a few that were individually at the country level. After downloading, inconsistencies were removed using the stream editor. Since the number of countries is significantly large, only five representative nations have been chosen.


Analysis using Apache Flink

 

1. Transportation  Overview


The first section in the notebook examines the transportation facilities available in the region, starting with the lengths of the various means of ground transport such as canals, rivers, roads and rail tracks and moving on to the passenger usage of the road and rail transport measured as percentage of total traffic.
Transportation Overview
All the countries have greatest usage of cars and low net usage of the rest of the means of transport such as buses and trains. Spain which is low on the usage of trains, has significantly more of bus usage. The next dataset that has been considered : the difficulty in accessing public transport has multiple dimensions with every dimension spanning into multiple dimensions, so a radial reingold tilford tree would be the best possible visualization for such a dataset. The first paragraph for this indicator generates the json string of the data required for the visualization code. The subsequent paragraph displays the visualization. Since the tree is reasonably big, only two of the nations - Germany and France have been considered. Then we consider the number of trips grouped by the means of transport. United Kingdom has had the greatest number of air trips both within and outside the country while Germany and France that had the greatest length of roads also have the greatest number of road trips. Lastly, we consider the consumer price indexes for the road and rail transport and find that they both have fallen drastically over the recent years which indicates the ease in the access of transport services in terms of the amount paid for a particular service.


2. Road Transport


Road Transport has been the most used means of transport historically and has been taken up in the next section of the notebook. We begin with the first dataset for the section - the number of vehicles on road the vehicle stock.
Road Transport
For the period for which the number of vehicles data was available, the graph appeared to be exponential and highly zoomed in for the years considered(2009 - 2012). Then we consider the vehicle stock individually for the most prevalent means of commuting on road - the number of  buses and the number of cars and compare them against the usage to get a  picture of how the usage is related to the stock. The number of buses is the maximum for United Kingdom while the usage is the greatest for Spain. For cars, the usage is maximum for the UK but the number of cars is maximum for Germany. Next, we perform comparison between the length of roads and the number of cars and buses to get an idea of the congestion on roads. Since the number of cars is an independent factor, the road infrastructure should scale up accordingly. Only France shows this trend, the rest of the nations only have increasing vehicle stock but no increase in the road infrastructure. Motorization rate which is the number of passenger vehicles per thousand inhabitants is the greatest  for Italy and Germany. The consumer price index for the purchase of vehicles has been decreasing constantly which implies that the countries have made it easier for the people to own their personal vehicles. Lastly, we turn our attention to the road accidents and find that the number of road accidents have been constantly decreasing with the maximum being for drivers and in the rural areas. This section ends with a map showing some of the cities of Europe along with the congestion expressed as a percentage and the road network of Europe.



3. Rail Transport

 

The second most used means of land transport is the railways. The last section of the notebook focuses on the rail transportation facilities in the region. The first dataset visualized in the form of pie charts shows the number of trains in the region and we get to see that Germany by far has the greatest number of goods and passenger trains, followed by the UK and France. In the comparison of the track length vs. the number of rail cars, we find that the lengths have not been increasing in comparison to the increase in the rail cars, infact for some of the nations the track length has decreased for the recent years.
Rail Transport
The number of passe
nger rail vehicles have largely stayed constant and for Germany and Italy they have decreased. As compared to the passenger rail vehicles the train  usage has gone up, however this may not be a problem as the net usage of train services has been low overall so that slight increases in the train usage can be accommodated without any significant increase in the train services or the number of rail cars. France and Germany that have the greatest length of tracks also have the longest distances of train travel while the number of passengers have been the maximum for Germany and the UK indicating that the commuters may prefer the usage of these services over other means of transportation. The number of passenger rail vehicles have been increased according to  increase in the number of passengers only for France and to some extent for Italy, for others there has been no increase in the number of passenger rail vehicles. France and Germany also have the maximum number of people employed in the railway as they have the largest network. Lastly, we take a look at the railway accidents and find that they have been decreasing over the years with the greatest number of accidents occurring due to rolling stock in motion. A map showing all the railway lines of Europe ends the last section.


Visualizations

 

The visualizations for this notebook were generated using the nvd3 and the d3 libraries using the html display. The data for the reingold tilford tree was generated using a custom function written right above the paragraph showing the tree.
Road Network and Congestion Map
This function converts
the data present as rows in a table to the json format needed for the visualization code. Other functions that are defined at the top in a generic paragraph contain the functions to get the datasets in  common type format and produce the corresponsing table display. The congestion map contains the markers that were generated using javascript which was run through Helium Application. The last map showing the rail network was produced using open layers 3 library and html display. See sample demo of the notebook here.

No comments:

Post a Comment