The Smart Data for Mobility (SD4M) project, funded by the German Federal Ministry of Economics and Energy under the Technology Programme “Smart Data” (grant number 01MD15007C), ran from 1 February 2015 to 31 January 2018. The consortium was led by idalab GmbH and included the German Research Center for Artificial Intelligence (DFKI), the service provider Umbelievable Machine, and certification body TÜV. The project’s core objective was to create a multimodal mobility data ecosystem that would underpin new value‑creation chains across the mobility sector, enabling data aggregation, structuring, enrichment, and commercial access, as well as premium predictive and optimisation services for partner companies.
Technically, the team built a high‑quality benchmark dataset for event‑linking, annotated by domain experts. This dataset served as the foundation for developing a novel event‑linking approach that combines a two‑dimensional distance metric linking temporal proximity with semantic overlap. The approach allows the system to associate events across heterogeneous news and sensor streams, a capability that had not previously existed for mobility‑specific unstructured text. In parallel, the project advanced deep‑learning and long‑short‑term‑memory (LSTM) classifiers to filter and coarse‑classify mobility‑relevant news items, and to extract key entities such as locations, transport modes, events, and timetable information. The resulting architecture processes data in three layers—filtering, classification, and event‑linking—within a real‑time pipeline built on Apache Flink, ensuring that even high‑throughput streams can be handled without latency.
The platform’s analytics and forecasting services were integrated into a reference architecture that connected multiple data sources, including structured datasets from partner operators and unstructured feeds from news outlets. Security and legal compliance were addressed by engaging Umbelievable Machine, whose datacenter operations are ISO/IEC 27001:2013 certified by TÜV. The system also supports the integration of sensor data with information extracted from unstructured sources, enabling richer, multimodal analytics.
Milestones were met on schedule. After six months, the consortium defined the data value chains and platform specifications in workshops, clarifying each partner’s role. At 24 months, a fully functional prototype of the SD4M platform was delivered, complete with defined output formats, unstructured data analysis, interface provisioning, and the integration of structured and unstructured data. By 30 months, the use‑case demonstrations were finished: a large‑scale demonstrator for use cases 1 and 3 and a mobile application for use case 2, all of which validated the predictive models and data preparation pipelines. The final 36‑month milestone culminated in a public presentation and demonstration of the prototype and use cases, showcasing the platform’s capabilities to stakeholders and potential third‑party data‑analytics providers.
Although idalab, as a small‑to‑medium enterprise, lacked the capacity to publish all findings in peer‑reviewed journals, the research outputs were incorporated into DFKI publications, ensuring that the scientific community benefits from the developed methodologies. The SD4M project therefore delivered a robust, secure, and real‑time mobility data platform, advanced event‑linking and classification techniques, and a blueprint for future data‑value chains in the mobility sector.
