You're a Principal Data Scientist working on the Open Location Platform (OLP) Customer Solutions Team at HERE, based in Chicago. What do you do?
I lead a team working on multiple data science projects, including customer data analytics, location algorithms, and data science tutorials for new OLP users.
Tell me about yourself.
I am a physicist by training. Before HERE, I spent a lot of time doing statistical analysis of data collected by large physics experiments. Data was accumulated over years and amounted to many terabytes. It was processed using thousands of CPUs, running for weeks, at a time when Hadoop and Spark did not exist and the term “Data Science” was not yet invented (failed jobs had to be resubmitted manually, multiple times; that was the not-so-fun part). During those years I developed a passion for squeezing information out of large and messy data samples using statistics and machine learning.
I joined HERE in 2012 as part of the Traffic organization. In 2014, we joined the Auto Cloud Services organization, where we worked on clustering of road objects, historical speed profiles[2,3] and personalized recommendations[4,5].
Most recently, our focus has been to support our partners to solve their analytics problems on the Open Location Platform (OLP). We are working on analytics tutorials for new OLP customers, on location algorithms (such as distributed clustering and most probable path), and provide data science support for various projects, including the OLP weather and vehicle sensor analytics.
What’s a specific problem or class of problems you’re trying to solve with data science?
One problem that we have been working on, in both Auto Cloud Services and now in OLP, is clustering of observations from connected vehicles. As a vehicle drives along the road network, its sensors identify various road objects and events, such as traffic signs, markings on the pavement, poles on the side of the road, potholes, speed bumps, slippery areas, accidents and many others.
All these observations can be used to update our map and road conditions in real time. This source of data is complementary to the data collected by the HERE True cars. The HERE True car data is of very high quality, but it’s generated by only a few hundred vehicles, which cannot capture the road conditions and changes all over the world in real time.
The crowdsourced data from OEMs is collected from millions of connected vehicles, constantly scanning the roads and providing us with real time updates. However, the crowdsourced data is never the same quality as that of HERE True vehicles.
To deal with such problems, instead of using single observations to update the map, a safer option is to cluster together many observations of the same object/event, coming from multiple vehicles. The more observations in a cluster, the better the position estimate and the higher the confidence in the existence of the object/event.
Clustering methods can be applied not only to vehicle observations but pretty much to any location points which are associated with a certain geographical area. For example, one can cluster GPS positions associated with a point of interest to better position them on the map.
What have been the biggest challenges?
One of the biggest challenges is the collection of large scale “ground truth” or “labeled” data, needed to validate and optimize our algorithms and in any kind of supervised machine learning analysis. It is a laborious and costly process which we, as a company, are continuously trying to improve.
What have been the biggest successes?
One of our successes was a PoC in which we have shown that vehicle sensor data can be used to identify and precisely localize traffic signs, and by extension any other road objects/events, using unsupervised learning techniques such as clustering.
We have successfully applied similar techniques to localize road surface anomalies (such as potholes or speed bumps) using sensor data, which proves the feasibility of a potential “Road Roughness” product using OEM sensor data.
We have also built the first high resolution speed profiles using both HERE probe data and OEM sensor data [2,3]. We expect that speed profiles will play an important role in the future of autonomous navigation.
Do you see any other applications for what you’ve been doing with data science inside or outside of HERE?
Let’s take the speed profiles. I can see a time in the future when all vehicles on the road communicate with their neighbors and drive as synchronous platoons. Most probably, there will be a speed profile optimal for each platoon, on each road and for the corresponding road conditions. We will be the providers of these optimal speed profiles.
Another example is clustering. Observations of road changes, by connected vehicles, have to be continuously clustered to confirm these changes and properly localize them. This will enable continuous map and road conditions updates in real time, improving the driving experience.
What resources would you recommend for someone just starting out with data science?
One of the introductory books I liked was Introduction to Statistical Learning by Hastie and Tibshirani. They also have a more advanced text called Elements of Statistical Learning and an online class based on these books. In addition, I would take Andrew Ng’s Machine Learning class with Coursera. Another good starting point is https://www.kaggle.com/.
How do you stay on top of developments in data science?
There are so many developments that I am afraid it’s hard to stay on top of them all. I’m trying to browse through the latest papers on https://arxiv.org/list/cs.AI/recent and https://arxiv.org/list/stat.ML/recent, but I have to admit that my to-read list is growing beyond control. Within our team, we find interesting data samples, news and FAQs on Kaggle and KDnuggets.
Learn more about Gavril and connect with him here.
 A. Anastassov, D. Jang, G. Giurgiu, “Analysis of OEM sensor data for determination of speed sign placement on road maps”, 22nd ITS World Congress, 2015 Bordeaux
 D. Jang, G. Giurgiu, ”Human driving behavior”, 22nd ITS World Congress, 2015 Bordeaux
 A. Anastassov, D. Jang, G. Giurgiu, “Driving speed profiles”, IEEE Intelligent Vehicles, 2017
 Y. Zhao, D. Ayala, D. Jang, G. Giurgiu, “Smart recommendations for drivers”, 22nd ITS World Congress, 2015 Bordeaux
 J. Thompson, Y. Zhao, D. Jang, G. Giurgiu, “Feature selection in a personalized refueling recommendation system”, Transportation Research Board (2016)