Dealing with Time in Geospatial Data
It is easy to forget how important time in when it comes to understanding the world. Change over time is what feeds most prediction engines and it is vital for understanding how something is increasing or decreasing. There are a variety of tools that can be used for geospatial time-series data, from libraries for Python and packages in R to the plugins or built-in functionality in ArcGIS and QGIS.
The beginning of my geospatial time-series started with exploring the Decennial Census (which is collected every 10 years) for Vermont from 1791 to 2010 (gather ffrom the Vermont Historical Society here). Before connecting it to any sort of geographic visualizations I used Python to understand how Vermont’s population changed. When more variables are added, like counties, the conclusion is a little more nuanced. It becomes clear that once county grew far more than any other. The power of Geospatial Information Systems (GIS) is splitting out information into geographic groupings to understand these trends. However, in GIS the data time series can provide unique challenges. GIS is defined by points, lines and polygons; when layering one year on top of another year, visualizing the change get more complicated than just working with flat data. There are two basic approaches:
Each Layer is a New Time Period — Separating each time interval into it’s own separate files may be easier to make but it also requires manually hiding the layers to walk through a time sequence. Depending on the number of time intervals it also can be much less scalable to keep track of all of those files. It may also limit accessibility for analysis in languages like R and Python without having to merge everything back together.
One Layer with Multiple Time Periods — Including all time periods into a single file make it easier for statistical analysis and works with animation plugins in most GIS programs, but it can be problematic to assemble and depending on the number of features (polygons, lines, etc.) the scaling for the file size can be daunting.
In this article we are going to explore how to create the second option starting with formatting the data in an optimal way for exploration.
Excel and Google Spreadsheets are a common tool for data manipulation, visualization and analysis. These spreadsheets are often shared for use as a report to show trends, changes and comparisons. Because of this the data is arranged to be visually understood rather than manipulated and analyzed. Time is often arranged into columns (January, February, March, etc) so someone can look horizontally and see changes. As a static report this makes sense but it also has its limitations. Below is an example observations across three days with several variables:
Understanding how the 10–20 group performed is as simple as deleting the unnecessary columns or pivoting the table and looking at the trend across the columns. However, in a world where data is growing at an exponential rate, this quickly becomes completely impractical. If there were three groups (10–20, 20–30, 30–40) across a month for every hour, the number of columns would swell to 2,160. If it was at the minute level for a month, there would be a startling 129,600 columns! For example, I have been working on taking some great labor data from the State of Vermont and compiling it across multiple years and time periods (monthly, quarterly, yearly). The goal is to make datasets that can be analyzed across time and subsetted as needed. The files are at split by year at the county level and state level into separate Excel workbooks. Inside of each workbook is a sheet for each county with columns for all the months of that year, all the quarters and the total for the year.
10 workbooks by year (2005–2015)
34 Labor Categories
32 columns with monthly, quarterly and annual periods
That is over 609,208 unique observations. While the original format makes perfect sense if you want to see what it looks like month to month for a single county and year, it is unwieldy to analyze across multiple counties and years. This is the inherent difficulty of scaling and it is exactly when repeating actions in something like Python can be incredibly powerful. Tidy Data While messing around with Panda’s Dataframes in Python I ran across the concept of Tidy Data. While we spend a huge amount of time cleaning and manipulating data, it is often not clear nor formalized what is the ideal kind of structure. The idea behind tidy data is an attempt to create the most flexible format for a variety of uses. Tidy data A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how… vita.had.co.nz In a tidy data set each variable is a column and each observation is a row. What this means when it comes to time-enabled data that each row represents a discrete point in time. Below is the same data rearranged, or melted, into a Tidy format:
While this initially looks more complicated and larger, in reality it is much more flexible to the user’s need. It is indeed longer, for the same 3 months suggest earlier it would be 1,092 rows instead of 4 but it much easier to explore, and more importantly slice as needed. So What about Time? Tidy data is a perfect fit for time series analysis. By ensuring that each row is an observation the data can be aggregated by year, month, day, hour or second. Grouping it by month can show average seasonal employment over the years. See if you can spot where Vermont has their winter season sports… Or trends can be discovered over time about what industries are growing in Vermont. GIS and Tidy Data In reality this is Working with data frames and time-series data is one thing, using it in a GIS application is quite different. When working in ArcGIS Pro and QGIS, the most common GIS programs, symbolism of polygons (like the shape of a county), lines (like a road), or a point generally expect only one element for that area. With time series, those features layer on top of each other and obscure each other. But before digging into this I should go through some of the caveats I have found so far:
Shapefiles, the most common GIS file format, does not have a datetime data format. This means every time the shapefile is loaded, it will need to be enabled as a time series layer. This is much like a CSV file where everything is just strings.
The date format YYYY/MM/DD is the one that work for both ArcGIS and QGIS. If you ensure it is formatted like this, it will save you time.
QGIS requires a plugin called Time Manager to explore time series data while ArcGIS has some built-in functionality.
QGIS and Time Manager QGIS is a free open-source GIS program. While is had many powerful abilities out of the gate, the true power of QGIS is the user create plugins. For working with time series data, Time Manager is the prefect place to start. First add the plugin to QGIS through the Plugin Manager. Once installed a bar will appear at the bottom of the screen with controls (the bar can be toggled through the Plugin Menu). Be sure that the datetime fields are refactored at a date format. Fields can be refactored to a different format using the Refactor Fields tool under QGIS geoalgorithms >Vector Table Tools. Timestamps have to be in one of the following formats:
Integer timestamp in seconds after or before the epoch (1970–1–1)
The first step toward making an animation is clicking on Settings in the Time Manager plugin to add the time-enabled layer (this can be vector or raster). Select the layer that should be added and specify the columns containing start and (optionally) end time. If there is only one date leave it like below. The offset option allows you to further time the appearance of features. If you specify an offset of -1, the features will appear one second later than they would by default.
Once your symbolism of whichever variable you are interested in is added to the layer, all you have to do is hit play! Adjusting the time frame size will better display the data as in my example the data was collected every 10 years. A export of the animation can be created through the Export Video function. It will exports an image series based on current settings (This button is only enabled if there are layers registered in Time Manager “Settings”). There is the option of exporting just the frame, make a video file or a animated GIF. I will add the caveat that I have yet to get the export to work correctly, so your mileage may vary. ArcGIS Pro and Time Animation In order to make a nice animation in ArcGIS Pro, there are multiple steps to enable the tools required. First thing is to tell ArcGIS that there are time fields in your shapefiles. Go to the layer and click Properties, that will pop up the menu below. Simple set what fields are time fields, like QGIS, and the Time tools will be enabled. Now that time is enables it should show at a tab under the menus at the top. Time has a variety options depending on your use case. Due to my data being in 10 year increments, I played with the settings Step Interval to 10 years and Time Snapping to decades to get it to display while cycling through the intervals. At this point I decided I wanted to make it a bit more interesting, so I changed it to a 3D model and extruded the population count while also modifying the symbology colors to better emphasize the changes. Making 3D extrusions is a whole other article but it is good to note that I set the unit to Yards to make difference more obvious and added a multiplier. Remember at this height I am looking across 400+miles to I need to make the population differences obvious. The next step is to go to the Share tab and click the Add icon under Animation. This will add a tab in the top for Animation. Animation is a collection of shots so you will to start by creating the first keystone frame. I recommend manually cycling through time and making sure each one completely renders (in 3D extrusions this can take a little time) before hitting the + button for adding a frame. The little circles in between the frames contain the transitions, set them depending on how you want it to look. That can make a huge difference. Now to export the finished product! Select Movie from the Export section of the Animation tab, and choose the final product. The time to create the clip varies on the content, number of frames and the count of features. There are a lot of settings that can be modified in ArcGIS, including adding labels (which never quite worked the way I wanted it to), changing transitions, frames, and so much more. My attempts to upload the results to Medium repeatedly failed but here is what it looks like below. Experimentation is the key.
Vermont in 1820 on the left, and Vermont in 2000 on the right
Summary There is a lot of power in both exploring data in a time-driven way and also animating it to tell a story. Often graphs won’t have the same impact as watching population grow in 3D. There are also important things to consider, for example I struggled to set the graduated symbology to base the percentage breaks on the given year instead of the whole (this could hypothetically be solve if each layer was a year, so that is always an option). This is important if the goal is to show percentage distribution for each time interval. It is also not a fast process in either program, be sure to account for that when you start, and to use the rendering time to step away and drink some coffee. Enjoy your mapping and comment with your own explorations on the article!