Data handling and transformation in Urbanite
In previous blog posts the overall architecture of the Urbanite platform [^1] and the harvesting process [^2] were described. The harvesting process consist of multiple services working together to gather, transform and store the data from the diverse sources harvested in this project. To give the reader a better understanding of this procedure, the harvesting process for the data of two pilot cities is described in more detail in this post: Amsterdam and Messina. This text also describes the import of the data but with the focus on the transformation of different data into the FIWARE models used in Urbanite.
The Amsterdam demonstrator uses data from different sources and platforms, public sources like the Amsterdam open data portal and data from private initiatives, like Telraam. Since each source has a different API, each source also needs a different importer and indeed also a different transformation script. This also means, that the generation of the metadata can be included in the importer itself, as it is specialised for this specific source. However, the general structure of Importer -> Transformer -> Exporter still works for all sources, as it was developed specifically for such scenarios.
The Messina demonstrator on the other hand collects all data, from public and private sources, into an internal platform. The data can be static data like Points of Interest or Camera Locations, but it also can be time-series data like traffic or noise measurements. This platform can contain sensitive information and is thus not open to the public. However, it provides a unified API from which data can be harvested. This means, that a single importer for all data sources is sufficient to handle all data sources. But since different data sources might map to different FiWare data models [^3] and also might have different fields mapping to the same field in a specific model we would still need a transformation script for each source.
A simple transformation script for traffic measurements looks like this:
```javascript
function transforming(input) {
var output = {
"@context": [
"https://smartdatamodels.org/context.jsonld",
"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context.jsonld"
],
}
var result = input.data[0];
var details = input.details;
var currentDate = new Date(input.data[0].datetime);
output.location = details[result["sourceId"]]["location"];
output.dateObserved = result["datetime"];
output.intensity = result["vehicleCount"];
output.alternateName = "vehicleCounts";
output.id = "urn:ngsi-ld:trafficFlowObserved:messina:" + result["sourceId"] +":"+ currentDate.getTime() ;
output.type = "TrafficFlowObserved";
output.source = "https://urbanite-node1.comune.messina.it";
var data = [];
data.push(output);
return {
"metadata": input.metadata,
"data": data
};
}
```
The `input` JSON object with a single record coming from the importer service is being transformed into the TrafficFlowObserved data model. This record is then sent, together with the corresponding metadata object, to the exporter service, which stores it in the data storage.
Since most APIs that are harvested in the Urbanite project deliver JSON responses, this example is representative of most transforming scripts. However, other Sources might deliver XML or CSV responses, which would be handled by different transformation services with different transformation instructions in a different format. These will be described in future posts.
[^1]: https://urbanite-project.eu/content/urbanite-integrated-architecture
[^2]: https://urbanite-project.eu/content/harvesting-data-urbanite
[^3]: https://www.fiware.org/developers/smart-data-models/