Data Sources for urban mobility in URBANITE project

By Sebastian Urbanek Photo by Robin Pierre on Unsplash

The Urbanite project will provide technical support to decision-makers in the domain of urban mobility. This support takes the form of predictive models constructed by specialized AI algorithms. To ensure that the models provide an appropriate output, a suitable data basis is essential.

Most of this data is provided by the participating pilot partners. These are the cities of Amsterdam (Netherlands), Bilbao (Spain), Helsinki (Finland) and Messina (Italy). As data comes from such different partners - which naturally have their own systems for collecting mobility data - the assumption is that the provided data is very heterogeneous and should already be pre-processed. After all, comparable information is needed for the AI processes that will take place in the future.

The challenge is to identify the relevant data, find ways to aggregate or curate them and make them available for further use. The first step is to study the data sources. This is where the first problems and challenges for later data management become visible.

In the context of the Urbanite project when referring to urban mobility, the focus is on spatial mobility in a city environment. It is about the movement of people or commodities in a specific geographic area. People go to work every day, socialize or go shopping.

A set of key indicators can be used to measure urban mobility. These indicators also provide a first hint of what is essential for improving urban mobility and which data sources should be studied for further processing.

Local public transport and its density. It is particularly important for cities that not every citizen uses their car. A proper density is essential for people who want to get from point A to point B quickly.
- At the same time, local public transport should also be financially attractive.
- The distribution of roads with appropriate signage can also play a significant role in a city's infrastructure, as a continuous flow of traffic should be guaranteed.
- Cycling and walking lanes should match the roads' percentage for automobiles.
- Sharing services are an excellent alternative to owning a car to reduce the number of vehicles on the roads because a vehicle is not needed 24/7. This also includes bicycles.
- The density of all traffic users (broken down by type) is an additional indicator of infrastructure use.
- Characteristics and values about air pollution and traffic deaths can provide useful intervention indications.

These indicators help define what kind of data should be studied in the context of Urbanite. The actual data are often assigned to the category "mobility data". Mobility data is information related to transport that helps plan and design (urban) mobility.

If this data is used, statements can be made about the general traffic situation. If this data is also placed in an overall context and manipulated, it may be possible to simulate traffic flows and manipulate them in theory.

Construction zone redirections, traffic signs, or lights directly manipulate a traffic flow. Sensors and traffic cameras can be used to measure traffic density at any given moment.

The timing of data collection plays a major role, as a representation of a city's traffic situation can change from one minute to the next. This time-series data forms an additional dimension.

Timetables and real-time information on local public transport (bus, train, taxi, ferry) are just as useful for evaluation as mobility services provided by sharing services.

In addition to the active traffic representation, passive sources can also be used, such as free and taken parking spaces within a region or the time-dependent air pollution values.

Next to the date and time, the geographical location usually plays a major role — most represented by a cartesian coordinate system (e.g., like GPS).

After urban mobility and mobility data have been highlighted, the question is now: What characteristics make a good data source? And which are the barriers to the use of data?

In general, the availability of data is essential from a technical perspective. Accessibility should be ensured by a download function or an API.

However, practice shows that additional barriers may exist. Servers can be faulty, APIs can undergo breaking changes, or authentication may be required. Uptime should be as high as possible. All API changes should be communicated in a transparent manner. Ideally, legacy APIs are provided alongside the new ones. Authentication should be secure and reliable.

Further attention should be paid to different API types and file formats. An API can provide data in the form of a REST, SOAP or DB API, for example. A file in tabular format can appear as CSV, TSV or XLSX.

Besides infrastructural problems, errors can also occur within the data. Systems fail and create gaps in the data set. Data quality flaws can have many facets. For example, dates could be in an individual format in different systems. A time interval may not be uniform either. One sensor records a value every second, while other systems only aggregate data in a 10-minute interval.

When working in a larger setting, such as an EU project, there is an additional language barrier. Metadata and data themselves may only appear in the local language. Documentation may be missing or not available in English.

It is not only the technical challenges that need to be looked at when studying data sources, but also legal issues. Every data source comes with a license for further use. This license regulates the further processing and provision of the data.

Using data sources means taking a closer look at the different facets and putting them into a greater context. A good data management platform is needed for the addressed problems, which will be the topic of one of our next blog posts.