A Shot in the Dark: Analyzing Mass Shootings in the United States, 2014-2019

By: Miranda Ramnarayan

Geovis Project Assignment @RyersonGeo, SA8905, Fall 2019

The data gathered for this project was downloaded from the Gun Violence Archive (https://www.gunviolencearchive.org/), which is a non-for Profit Corporation. The other dataset is the political affiliation per state, gathered by scrapping this information from (https://www.usa.gov/election-results). Since both of these datasets contain a “State Name” column, an inner join will be conducted to allow the two datasets to “talk” to each other.

The first step is importing your excel files, and setting up that inner join.

There are four main components this dashboard is made of: States with Mass Shootings, States with Highest Death Count, Total Individuals Injured from Mass Shootings and a scattergram displaying the amount of individuals injured and killed. All of these components were created in Tableau Worksheets and then combined on a Dashboard upon completion. The following are steps on how to re-create each Worksheet. 

1. States with Mass Shootings

In order to create a map in Tableau, very basic geographic information is needed. In this case, drag and drop the “State” attribute under the “Dimensions” column into the empty frame. This will be the result:

In order to change the symbology from dots to polygons, select “Map” under the Marks section.

To assign the states with their correct political affiliation, simply drag and drop the associated year you want into the “Colour” box under Marks.

This map is displaying the states that have had mass shootings within them, from 2014 to 2019. In order to automatic this, simply drag and drop the “Incident Date” attribute under Pages. The custom date page has been selected as “Month / Year” since the data set is so large.

This map is now complete and when you press the play button displayed in the right side of this window, the map will change as it only displays states that have mass shootings within them for that month and year.

2. States with Highest Death Count

This is an automated chart that shows the Democratic and Republican state that has the highest amount of individuals killed from mass shootings, as the map with mass shootings above it runs through its time series. Dragging and dropping “State” into the Text box under Marks will display all the states within the data set. Dragging and dropping the desired year into Colour under Marks will assign each state with its political party.

 In order for this worksheet to display the state with the highest kill count, the following calculations have to be made once you drag and drop the “# Killed” from Measures into Marks.

To link this count to each state, filter “State” to only display the one that has the maximum count for those killed.

This will automatically place “State” under Filters.

Drag and drop “Incident Date” into Pages and set the filter to Month / Year, matching the format from section 1.

Format your title and font size. The result will look like:

3. Total Individuals Injured from Mass Shootings

In terms of behind the scenes editing, this graph is the easiest to replicate.

Making sure that “State Name” is above “2016” in this frame is very important, since this is telling Tableau to display each state individually in the bar graph, per year.

4. Scattergram

This graph displays the amount of individuals killed and injured per month / year. This graph is linked to section 1 and section 2, since the “Incident Date” under Pages is set to the same format. Dragging and dropping “SUM (#Killed)” into Rows and SUM (#Injured) into Columns will set the structure for the graph.

In order for the dot to display the sum of individuals killed and injured, drag and drop “# Killed” into Filter and the following prompt will appear. Select “Sum” and repeat this process for “# Injured”.

Drag and drop “Incident Date” and format the date to match Section 1 and 2. This will be your output.

Dashboard Assembly

This is where Tableau allows you to be as customizable as you want. Launching a new Dashboard frame will allow you to drag and drop your worksheets into the frame. Borders, images and text boxes can be added at this point. From here, you can re-arrange/resize and adjust your inserted workbooks to make sure formatting is to your desire.  

Right clicking on the map on the dashboard and selecting “Highlight” will enable an interactive feature on the dashboard. In this case, users will be able to select a state of interest, and it will highlight that state across all workbooks on your dashboard. This will also highlight the selected state on the map, “muting” other states and only displaying that state when it fits the requirements based on the calculations set up prior.

Since all the Pages were all set to “Month/Year”, once you press “play” on the States with Mass Shootings map, the rest of the dashboard will adjust to display the filtered information.

It should be noted that Tableau does not allow the user to change the projection of any maps produced, resulting in a lack of projection customization. The final dashboard looks like this:

Transportation Flow Mapping Using R

Transportation Flows Mapping Using R

The geographic visualization of data using programming languages, and specifically R, has seen a substantial upsurge in adoption and popularity among members of the GIS and data analytics community in recent years. While the learning curve in acquainting oneself with scripting techniques might be steeper than using more traditional and out of box GIS applications, it undoubtedly provides some other benefits such as building customizable processes and handling complex spatial analysis operations. The latter point being imperative for projects containing extensive amounts of data as is often the case with transportation and commuting flows which ordinarily contain considerable amount of records comprising of trips’ origins and destinations, mode of transport and travel times information. An added interesting perk is that R offers very creative and visually appealing finalized graphical solutions which were one of the motivators behind the choice of technique for this project. The primary motivator was, however, the program’s capacity in transportation data modelling and mapping as the aim of the project was mapping commuting flows.

Story of R

R is an open source software environment and language for statistical computing and graphics. It is highly extensible which makes it particularly useful to researchers from varied academic and professional fields (they increasingly range from social science, biology and engineering to finance and energy sectors and multifold other fields in between). It is also one of the most rapidly growing software programs in the world, most likely due to the expansion of data science. In the context of Geographic Information Systems (GIS), it can be described as a powerful command-line system comprised of a range of tailored packages, each of them offering different and additional components for handling and analyzing spatial data. The ones utilized in the project were ggplot2, and maptools, and to lesser extent plyr. The former two are some of the most common ones in the R geospatial community while the others encountered in research and worth exploring further were: leaflet and mapview for interactive maps; shiny for web applications; and ggmap, sp and sf for general GIS capabilities. Being an open source software, R community is very helpful in organizing and locating necessary information. One neat option is the readily available cheat sheets for many of the packages (i.e. ggplot cheat sheet) which make finding information genuinely fast.

There are some stunning examples of data visualization in R. One that made a significant media splash a few years ago was done by Paul Butler, a mathematics student at University of Toronto at the time, who plotted social media friendship connections (it created admiration as well as disbelief from many, according to an author, that this was done with less than 150 lines of code in an “old dusty” statistical software such as R). It also inspired further data visualization explorations using R. One of my favorite recent such works came in the form of a compelling book London – The Information Capital by geographer James Cheshire and its co-author designer Oliver Uberti. The majority of the examples in the book were predominantly written not only in R but specifically in its ggplot package, in combination with graphic design applications, and should serve as innovative illustrations on data visualization approaches as well as capabilities on what software could potentially provide. Both of the aforementioned projects inspired mine.

Transportation Mapping and Modelling

I would like to give some background on the type of analysis that was conducted. One of the common types of analysis in transportation geography, transportation planning and transportation engineering is geographic analysis of transport systems for origin-destination data that shows how many people travel (or could potentially travel) between places. This also represents the basic unit of analysis in most transport models which is the trip (single purpose journeys from an origin “A” to origin “B”, and not to be mistaken with Timothy Leary definition). Trips are often grouped by transport mode or number of people travelling, and are represented as desire lines connecting zone centroids (desire lines are straight and closest possible lines between origin – destination points, and can be converted to routes). They do not necessarily need to represent just movement of the people and can show commodity flows and retail trade as well. TransCAD software is often used as the industry standard for this type of modelling. It is, however, quite costly and implemented solely by transportation planning firms and agencies. On the other hand, R is starting to see dedicated transportation planning packages and continuously utilizing relevant GIS ones in transportation field. And most importantly: it’s free.

Data

The dataset implemented for the project was American Community Survey 2009-2013 – 5 Year American Community Survey Commuting Flows located via Inter-University Consortium for Political and Social Research. It is a survey for the entire United States focusing on people’s (over the working age of 16) journeys to work. Data in the original survey was tabulated based on a few categories: means of transportation to work, private vehicle occupancy, time leaving home to go to work, travel and aggregated travel time to work, etc. For the purposes of the project all workers in commuting flows were selected (grouped together for all transportation modes). The trips were based on inner and inter-county commutes.

There are two main components needed when mapping transportation flows in general: coordinates of place of origin, and coordinates of place of destination. Common practice in transportation planning field is to have population weighted centroids for origins and destinations, regardless of the geographic unit of analysis, which in this case was U.S. counties. Therefore population weighted centroid shapefile for U.S. counties was needed so that it can be merged with the original survey data. It was located at the U.S. Census Bureau website and based on 2010 U.S. Census population numbers and distributions per county areas. The study area for the project was the United States and it excluded Canada and Mexico (even though both countries were included for workplace-based geographies), because specific regions of both countries were not mentioned which would make calculations of population weighted centroids not very realistic. Additionally, these records were not numerous to significantly change the model.

Process

In the first step, data was loaded and reformatted in R (R can be downloaded from https://www.r-project.org/ and although analysis can be conducted in R directly it is much preferred and easier to use Rstudio which provides a user-friendly-graphical interface). Rstudio interface and snippet of code is displayed in Figure 1 below (Rstudio can be downloaded from https://www.rstudio.com/ ).

Figure 1: Rstudio interface and snippet of code in the project

Following the two datasets, original commuting survey and population weighted centroids, were joined based on county name and code, and then the unified file was subset to exclude Canada and Mexico, followed by renaming some columns fields for easier readings of origin and destination coordinates. In the next step, ggplot2 was used to position scales for continuous data for x and y axes, succeeded by plotting line segments with alpha command. Number of trips to be plotted were experimented with to show either all trips, or to filter them based on more than 5, 10, 15, 20, 25 and 50 trips. Showing all trips resulted in too dense of a plot as all of the United States was used as a study area. If the study area was of a large scale in nature, showing all trips would be acceptable. The optimal results seemed to be when trips were filtered to show over 10 inner and inter county journeys-to-work trips which resulted in the plot displayed in Figure 2.

 

Figure 2 – U.S. origin-destination plot in Rstudio_US_11x17_10_Nebojsa_Stulic

 

The final map was then graphically improved in Adobe Creative Suite resulting in image in Figure 3.

Figure 3: Final mapping project after graphical improvements

Map

The final design showing thousands of commuting trips resembled a NASA image of United States from space at night. It indicated some predictable commuting patterns such as increased journey-to-work lines concentration in large urban centres and in areas with large population densities, such as the North East part of the country. However, some patterns were not so obvious and required some further digging into data accuracy (which passed the test) and then the way in which the original survey was designed. For instance, there are lines from Honolulu, Anchorage and Puerto Rico to the mainland even though the survey was designed to represent daily commuting flows by car, truck, or van; public transport, and other means of commuting. The survey was designed to ask questions for all workers based on primary and secondary jobs by way of commuting for respective reference week when it was conducted and answered. These uncommon results were attributable to people who worked during the reference week at a location that was different from their home (or usual place of work), such as people away from home on business. Therefore place-of-work data showed some interesting geographic patterns of workers who made daily work trips to different parts of the country (e.g., workers who lived in New York and worked in California).

The final mapping product was printed and framed on 24” x 36” canvas as shown in Figure 4. Size was chosen based on aspect ratio of 2 to 3 which seemed best suited to represent the geography of the United States horizontal width and vertical length. Some other options would be to print on acrylic or aluminum which is less cost effective and more time consuming (most of the shops require around 10 days to complete it). However, the printed map on canvas was my preferred choice for this project based on the aesthetic I was aiming for which was to have the appearance of accentuated high commuting areas and dimmed low commuting areas. Another pleasant surprise was that when printing was finalized it manifested more as a painting than data visualization transportation project.

Figure 4 – Printed map on canvas_Nebojsa_Stulic

Figure 4: Printed map on canvas