Openly available dataset - Synthetic population of the city of Tallinn
Updated: Feb 28, 2022
In this post we share the work carried out in modeling a statistically sound equivalent of the Tallinn population. The dataset includes roughly 400.000 individual (reference year 2015) and variables such as income, household size, workplace address, etc.

The dataset can be found at https://github.com/Angelo3452/Tallinn-Synthetic-Population.
The work has just been published as a research paper in the International Journal of Geo-Information: https://www.mdpi.com/2220-9964/11/2/148
We are exploiting the dataset to build an agent-based model, but other research directions we would be happy to collaborate on are listed in the related poster at https://www.finestcentre.eu/forumposters
Abstract:
Agent-based modeling has the potential to deal with the ever-growing complexity of transport systems, including future disrupting mobility technologies and services, such as automated driving, Mobility as a Service, and micromobility. Although different software dedicated to the simulation of disaggregate travel demand have emerged, the amount of needed input data, in particular the characteristics of a synthetic population, is large and not commonly available, due to legit privacy concerns. In this paper, a methodology to spatially assign a synthetic population by exploiting only publicly available aggregate data is proposed, providing a systematic approach for an efficient treatment of the data needed for activity-based demand generation. The assignment of workplaces exploits aggregate statistics for economic activities and land use classifications to properly frame origins and destination dynamics. The methodology is validated in a case study for the city of Tallinn, Estonia, and the results show that, even with very limited data, the assignment produces reliable results up to a 500 × 500 m resolution, with an error at district level generally around 5%. Both the tools needed for spatial assignment and the resulting dataset are available as open source, so that they may be exploited by fellow researchers.