The JValue‑PML project, led by Professor Dr. Dirk Riehle, ran from 1 November 2022 to 31 October 2023 under grant number 19F1133A. Its aim was to create a domain‑specific language (DSL) called Jayvee that simplifies the definition and execution of data pipelines for open mobility data. By providing a textual syntax and a dedicated compiler and runtime, the project sought to make the extraction, transformation and loading (ETL) of open data faster, less error‑prone and more collaborative than the current practice of using general‑purpose languages such as Python with Pandas or Java with Apache libraries.
Jayvee was designed to be lightweight yet expressive. The language includes constructs for declaring data sources, transformation steps, and target formats, and it compiles to an intermediate representation that the runtime executes. The compiler is written in Java and produces a byte‑code format that the runtime interprets. The runtime itself is written in Go, chosen for its performance and ease of deployment. The entire toolchain is released under an open‑source license on GitHub (https://github.com/jvalue/jayvee), allowing external developers to inspect, modify and extend the system.
During the first phase of the project, the team implemented the core language features, the compiler, and the runtime. The second phase focused on demonstrating Jayvee’s usefulness in real data‑engineering tasks. The language was integrated into the university’s teaching curriculum, where it was used in two consecutive semesters to teach students how to build pipelines for mobility datasets. Students reported that they could complete the same tasks that would normally require Python and Pandas in comparable time, while the DSL’s higher‑level abstractions reduced boilerplate code and lowered the likelihood of bugs. A qualitative evaluation, conducted by the project team, confirmed that the quality of the students’ pipelines matched that of traditional approaches, and that the learning curve was acceptable.
All four project objectives were met with 100 % success. The ETL process for open data was optimized, allowing faster extraction and cleaning of mobility datasets. The open‑source software stack was fully released and documented. Demonstrations in the classroom proved that collaborative work on open data is feasible with Jayvee, and the qualitative evaluation showed that students could produce high‑quality pipelines at speeds comparable to Python/Pandas. The project also produced a series of blog posts (https://oss.cs.fau.de/tag/made-projects/) that showcase practical examples and encourage community participation.
Collaboration within the project was largely internal. No external technical partners were involved, and the project’s outreach relied on social‑media channels to share results and attract contributors. The funding covered the salary of a dedicated developer who worked full time on the language, compiler, and runtime. The project’s success demonstrates that a lightweight DSL can match the performance of general‑purpose languages for data‑engineering tasks while improving developer productivity and reducing errors. The open‑source release invites further research and development, and the project’s methodology can be replicated for other domains that rely on open data pipelines.
