Jit Corn

Jit Corn

US Accidents Visualization

Jupyter Notebook
Dropbox Link
Github Repo

A geospatial data visualization project on Traffic Accidents across the US.

Features of the App

1. Users can choose amongst different display presets of the US Accidents Geospatial Data Sets using different layers, in 2D, 3D or 4D (see Hexbins).

2. Choropleth (i.e a coloured heat map) layer allows users to get a quick overview of the cumulative accidents that have occurred across states from 2016 to 2020. They appear to be directly correlated to the more populous regions in US.

3. Individual Layer allows for accurate depiction of where accidents occured. They can be found located nicely on roads. Smaller roads are not displayed because of a lower resolution map used. campaigns/listings.

4. Hexbins give users a 4D representation of the dataset, where the cumulative accidents are clustered in an area demarcated by the relevant geometric co-ordinates. Height corresponds to the cumulative accidents in the same area.

5. Usage of Kepler GL's Time Playback to view how the individual accidents and hexbins layer change over time.

Technical Learnings

  • First attempt at using pandas/Geopands and I have learnt alot about data manipulation and cleaning using GeoPandas in Jupyter Notebook.
  • Utilized Google Cloud Platform's AI Notebook for more performant loading times
  • Understood more indepth about the quirks of Geospatial Data such as Co-ordinate Reference Systems (CRS) and Geographic Information System (GIS).
  • Learnt more about IPython's magic functions and the importance of saving intermediate file results to files such that information is backed up through out multiple steps of the data ETL process.
  • Deep dived into Geopandas and Geoplot's dependent libraries when fixing bugs(i.e shapely is a dependency of geopandas). For more detailed write up, refer to Jupyter Notebook in link above.

Reflections

  • While the framework on this project was similiar to tutorials online, finding a legitimate geospatial data (i.e including longitude and latitude data) was immensely difficult. Good geospatial data is hard to come by.
  • The usage of magic functions saved tremendous amounts of time, when intermediate results could be saved into the kernels for later usage. Because I was inexperienced at first, I had to keep re-running the code and that wasted alot of time.
  • Most of the code was performed in Kepler GL's Python implementation, which is less mature compared to its Javascript implementation. However, this still gave me a good grasp in performing the necessary ETL operations on the raw US Accidents dataset before loading onto KeplerGL.
  • Kepler GL is highly memory intensive, and it is highly recommended to only feed it the necessary columns.
  • Troubles with Deployment: I also tried to deploy onto Heroku but it keeps exceeding the free dyno's memory limit using a Flask implementation. Since we can simply view it from an embedded HTML file, the intermediate measure is to save it onto Dropbox for public to view. I will explore using the Dropbox OAuth method in the JS implementation in the future.

Technologies Used

  • Data Manipulation & ETL: Python, GeoPandas
  • 2D Visualization: GeoPlot
  • 3D and 4D Visualization: Kepler.GL
  • Misc: Google Cloud Platform: AI Notebook, Jupyter Notebook