The Harbsafe 2 project was launched to enhance the testability and quality of the entire IEC terminology corpus. Its core objective was to deliver a modern, web‑based database application that would allow standardisation experts to search, analyse and harmonise IEC terms in a user‑friendly way. The prototype, now available at http://e‑glossary.dke.de, was deployed in pilot settings and offers a low‑threshold interface for both specialists and the wider technical community. Three innovative automatic procedures were integrated into the system. HintAn supplies concrete harmonisation hints, flagging definitional inconsistencies both with respect to other entries and with respect to IEC directives and terminology standards. Bedeutungsspektren provides an overview of definition variants by sorting and grouping all entries that share a name, thereby revealing the breadth of definitional diversity. SemAn decomposes individual definitions into semantic components and presents them in a hierarchical layout, improving readability and facilitating quick comprehension. Together, these methods identify inconsistencies, give a clear picture of definitional variants, and break down the meaning of each definition into plausible components. In addition, the project developed quantitative metrics that capture the degree of harmonisation, enabling targeted and controllable harmonisation activities.
The technical backbone of Harbsafe 2 relies on a natural‑language‑processing pipeline that evaluates definitions, annotates word stems, parts of speech and grammatical dependency structures. For the detection of homonymous definitions, a state‑of‑the‑art machine‑learning model—Sentence‑BERT as described by Reimer & Gurevych—was employed. Semantic textual similarity (STS) is used to compare texts algorithmically, allowing the system to pre‑structure definitions that are semantically similar. The project processed approximately 64 000 term entries extracted from 3 000 current, digitally available IEC terminology parts. These entries comprise about 43 000 distinct names, of which roughly 40 % are defined differently across norms. More than 2 250 names have five or more distinct definitions, and 370 names have more than 15. Homonyms with the greatest definitional diversity include generic concepts such as “system”, “device”, “node” as well as specific terms like “rated current” and “rated voltage”. The high incidence of multiple definitions signals a strong need for harmonisation; many differences are purely syntactic, while others reflect domain‑specific usage or conceptual contradictions. The metrics developed by the team quantify these inconsistencies, providing a clear picture of where harmonisation efforts should be focused.
The project ran from 1 October 2020 to 30 September 2022 and was funded by the German Federal Ministry for Economic Affairs and Energy under grant number 03TN0018A‑C. Harbsafe 2 was carried out by a consortium that included the Technical University of Braunschweig, the German Institute for Standardisation (DKE) and the IEC. TU Braunschweig led the development of the prototype and the NLP pipeline, DKE supplied the terminology data and facilitated integration into existing IEC workflows, while the IEC provided normative guidance and helped validate the harmonisation hints. Throughout the project, the consortium organised a series of workshops and presentations to disseminate findings and gather feedback from the standardisation community. The final report, dated 31 March 2023, documents the technical achievements, the quantitative assessment of harmonisation potential, and the collaborative framework that enabled the successful delivery of a practical tool for IEC terminology management.
