Incident analysis of Paris RATP metro lines

This library can be used to generate following figures illustrating the probability of incidents for each Paris metro/RER line during the last 30 days.

Which line is more probable to make you angry?

_images/ranking.png

When (at which hour and on which day) should you avoid metros?

_images/hour-weekday.png

API documentation

class ratpmetro.RATPMetroTweetsAnalyzer(api=None)

Class for analyzing Paris RATP metro line incidents, using their official Twitter accounts

To be able to download tweets you need to obtain your Twitter developer API keys (consumer_key, consumer_secret, access_key and access_secret). Be aware that it may be not possible to download all 14 lines on a row: there is some usage limitation of the Twitter API.

Parameters:api (dict) – Dictionary containing Twitter developer API keys: consumer_key, consumer_secret, access_key, access_secret
incident_prob(year=None, loc=None)

Return the mean probability of incidents

Parameters:
  • year (int) – If year is given then only tweets within this specific year are used, else then all downloaded tweets are used
  • loc (list of str) – Time period from loc[0] to loc[1]
load(line, number_of_tweets=3200, folder_tweets='tweets', force_download=False)

Download the tweets from the official RATP Twitter account.

Some code is adapted from https://github.com/gitlaura/get_tweets

Parameters:
  • line (int or str) – RATP metro line number (1 to 14), or "A", "B" for RER lines
  • number_of_tweets (int) – Number of tweets to download, must be smaller than 3200 due to some limitation of the Twitter API
  • folder_tweets (str) – Folder to store the downloaded tweets as a .csv file
  • force_download (bool) – If False, it will directly load the already downloaded file without re-downloading it. You can force downloading by using force_download = True
plot_incident_cause(year=None, loc=None)

Plot frequencies of the main cause of incidents

Parameters:
  • year (int) – If year is given then only tweets within this specific year are used, else then all downloaded tweets are used
  • loc (list of str) – Time period from loc[0] to loc[1]
plot_incident_prob(by='hour', year=None, loc=None, **kwargs)

Plot (marginal) probability of operational incidents

Parameters:
  • by (str) – Can be “year”, “month”, “day”, “weekday”, “hour”, or any two of them connected by a “-“, like “hour-weekday”
  • year (int) – If year is given then only tweets within this specific year are used, else then all downloaded tweets are used
  • loc (list of str) – Time period from loc[0] to loc[1]
process()

Process the downloaded raw data frame (using Paris time zone, identifying incidents, resampling…)

Indices and tables