Protecting Privacy in Archived Instrumented Vehicle Data

Investigator:

Dr. Randall Guensler, Professor, Georgia Tech
Vetri Elango, Research Engineer I, Georgia Tech

Project Overview:

The Commute Atlanta research project is one of the most comprehensive instrumented vehicle projects deployed in the US. The dataset from this project includes second-by-second speed and position data for more than 1.8 million trips over a three year period.  The dataset also includes household demographics, vehicle data, and trip purpose data.  Given the richness of the dataset and the desire to protect participant anonymity, researchers are developing processed instrumented vehicle data sets that retain the detailed trip information, but eliminate the details about the trip ends to prevent identification of the home location. 

The precision, resolution, and level of detail in the processed instrumented vehicle data sets with respect to departure time, trip distance and trip duration is very high.  For each repeat trip, the travel distances are very accurate.  Given repetitive travel data and detailed trip summary information, it is therefore possible to triangulate and resolve the home location simply because there are a limited number of reasonable paths between any origin or destination in the network and the participants’ home or work locations.  During the first phase of this research, the team is working to demonstrate that it is possible to identify the Commute Atlanta physical household locations using only these travel summary data (no second-by-second data) by applying advanced GIS techniques.  The team will then evaluate new methods to post-process the data and effectively “anonymize” the home and work locations.  The results from this project will help in identifying the maximum level of detail that could be provided in instrumented vehicle data for public-release.

The research team has developed the first iteration in a new methodology to identify household locations using a combination of trip summary data and land use data.  The initial routines process the travel data using the underlying GIS network and assign actual origin and destination coordinates collected by the onboard equipment to four geographic levels:  census tract, TAZ, census block group, and census block.  The geographic grouping will be used later in the study to examine whether assignment to the higher level census tract or TAZ is enough to protect the coordinate of the home location.  The underlying GIS road network is then overlaid to identify all intersections that fall within a specified geographic zone.  A variety of GIS tools are then used in an iterative process to identify potential routes and ultimately the most likely intersections that represent the origin-destination pair for repetitive trips made by the household.  The final step is to identify the actual household location by analyzing the land use characteristics around the selected intersection along with other heuristic techniques. (GTI/UTC Project 09-02, Value Pricing Data Analysis of HOV Lane Conversion)

The results from this study will help the research team identify the maximum geographic resolution for origin and destination locations that should be included in processed instrumented vehicle data sets (e.g. tying a trip to a single latitude and longitude for a census tract, TAZ, or census block group).  In addition, the team hopes to assess the number of repetitive trips and level of trip summary details (departure time, distance, and duration) that are reasonable to release from instrumented vehicle studies while still ensuring that third party analysts will not be able to post process the data and infer the physical home location of a participant.  Ensuring the protection of participant privacy is the cornerstone for instrumented vehicle sampling and will ensure that we can maximize volunteer participation in future studies.

Sponsored by Georgia Department of Transportation