Add to favorites:
Share:
Projects are expected to contribute to one or more of the following outcomes:
- Enhanced applicability of large AI systems to new domains through the integration of innovative data modalities, such as sensor measurements (e.g. in robotics, IoT) or remote sensing (e.g. earth observation), as input.
- Improvement of current multimodal large AI systems capabilities and expansion of the number of data modalities jointly handed by one AI system, leading to broader application potential and improved AI performance.
Scope:
Large artificial intelligence (AI) models refer to a new generation of general-purpose AI models (i.e., generative AI) capable of adapting to diverse domains and tasks without significant modification. Notable examples, such as OpenAI's GPT-4V and META’s Llama 2 or DinoV2, have demonstrated a wide and growing variety of capabilities.
The swift progression of large AI models in recent years holds immense potential to revolutionize various industries, due to their ability to adapt to diverse tasks and domains. For them to achieve their potential, access to vast data repositories, significant computing resources, and skilled engineers is required. A promising avenue of research is the development of multi-modal large AI models that can seamlessly integrate multiple modalities, including text, structured data, computer code, visual or audio media, robotics or IoT sensors, and remote sensing data.
This topic centres around the development of innovative multimodal large AI models, covering both the training of foundation models and their subsequent fine-tuning. These models should show superior capabilities across a wide array of down-stream tasks. The emphasis is both on integrating new input data modalities into large AI models and on developing multimodal large AI models with either significantly higher capabilities and/or the ability to handle a greater number of modalities.
Moreover, projects should contribute to reinforcing Europe's research excellence in the field of large AI models by driving substantial scientific progress and innovation in key large AI areas. This includes the development of novel methods for pretraining multimodal foundation models. Additionally, novel approaches to effective and efficient fine-tuning of such models should be pursued.
Research activities should explore innovative methodologies for enhancing the representation, alignment, and interaction among the different data modalities, thereby substantially improving the overall performance and trustworthiness of these models. Advances in efficient computation for the pre-training, execution and fine-tuning of foundation models to reduce their computational and environmental impact, and increasing the safety of models are also topics of interest.
Proposals should outline how the models will incorporate trustworthiness, considering factors such as explainability, security, and privacy in line with provisions in the upcoming Artificial Intelligence Act. Additionally, the models should incorporate characteristics that align with European values, and provide improved multilingual capabilities, where relevant.
Proposals should address at least one of the following focus areas:
- the integration of innovative modalities of data for large AI models during training and inference. Examples of innovative modalities include event streams, structured data and sensor measurements. The incorporation of such new modalities could potentially bring unforeseen enhancements to model performance and enable their application in new domains like weather forecasting, robotics, and manufacturing.
- enhanced multimodal models that exceed the current state of the art, with either significantly improved capabilities or the ability to handle a larger number of modalities. This focus area also encompasses models capable of multi-modal output generation. Current large-scale multimodal models most commonly engage with only vision and language.
Each proposal is expected to address all of the following:
- Data Collection, Processing and Cross-modal Alignment. The proposal should describe convincingly the characteristics and availability of the large, trustworthy data sources, as well as the trustworthy data processing to be utilised within the project, detailing the data processing steps to ensure reliability, accountability and transparency, and the alignment of data among the different modalities. A modest portion (up to 10%) of the budget may be allocated to data collection activities; proposals may involve relevant data owners in this task, if necessary. Importantly, the proposal should delineate how potential privacy and IPR issues associated with the data will be managed and mitigated.
- Multimodal Foundation Model Pretraining. The pretrained multimodal foundation model is expected to demonstrate high capabilities across a wide range of tasks. The pretraining tasks used should be agnostic of down-stream tasks. These activities also cover the development of the codebase and implementation of small-scale experiments. A minor portion (up to 10%) of the budget may be allocated for the acquisition of computing resources for codebase development and small-scale experiments, though the primary source of computing resources for pretraining should be sought from external high-performance computing facilities such as EuroHPC or National centres. The proposal should describe convincingly the strategy to access these computing resources.
- Fine-Tuning of Multimodal Foundation Models: The proposal should clearly detail the activities pursued to fine-tune the model for diverse downstream tasks demonstrating illustrative potential use-cases. The tasks' output may either be of a single modality or multimodality. Research activities should investigate innovative methodologies designed to bolster the interplay between different data modalities, thereby enhancing the overall performance of these models.
- Testing and Evaluation: The proposal should detail the development of workflows, benchmarks, testing procedures, and pertinent tools for evaluating both foundation and fine-tuned models. Attention should be paid to the performance, transparency, bias, robustness, accuracy, and security of the models, through appropriate testing procedures (e.g., red teaming for safety and security), in compliance with the future AI Act.
Proposals should adopt a multidisciplinary research team, as appropriate, to cover all the above issues.
Proposals should adhere to Horizon Europe's guidelines regarding Open Science practices as well as the FAIR data principles. Open access should be provided to research outputs - including training datasets, software tools, model architecture and hyperparameters, as well as model weights - unless a legitimate interest or constraint applies. Additionally, proposals are encouraged to deliver results under open-source licenses.
All proposals are expected to embed mechanisms to assess and demonstrate progress (with qualitative and quantitative KPIs, benchmarking and progress monitoring, including participation to international evaluation contests, as well as illustrative application use-cases demonstrating concrete potential added value), and share communicable results with the European R&D community, through the AI-on-demand platform, and Common European data spaces, and if necessary other relevant digital resource platforms in order to enhance the European AI, Data and Robotics ecosystem through the sharing of results and best practice.
Proposals are also expected to dedicate tasks and resources to collaborate with and provide input to the open innovation challenge under HORIZON-CL4-2023-HUMAN-01-04. Research teams involved in the proposals are expected to participate in the respective Innovation Challenges.
This topic implements the co-programmed European Partnership on AI, data and robotics.
Specific Topic Conditions:
Activities are expected to start at TRL 2-3 and achieve TRL 4-5 by the end of the project – see General Annex B.
Expected Outcome
Projects are expected to contribute to one or more of the following outcomes:
- Enhanced applicability of large AI systems to new domains through the integration of innovative data modalities, such as sensor measurements (e.g. in robotics, IoT) or remote sensing (e.g. earth observation), as input.
- Improvement of current multimodal large AI systems capabilities and expansion of the number of data modalities jointly handed by one AI system, leading to broader application potential and improved AI performance.
Scope
Large artificial intelligence (AI) models refer to a new generation of general-purpose AI models (i.e., generative AI) capable of adapting to diverse domains and tasks without significant modification. Notable examples, such as OpenAI's GPT-4V and META’s Llama 2 or DinoV2, have demonstrated a wide and growing variety of capabilities.
The swift progression of large AI models in recent years holds immense potential to revolutionize various industries, due to their ability to adapt to diverse tasks and domains. For them to achieve their potential, access to vast data repositories, significant computing resources, and skilled engineers is required. A promising avenue of research is the development of multi-modal large AI models that can seamlessly integrate multiple modalities, including text, structured data, computer code, visual or audio media, robotics or IoT sensors, and remote sensing data.
This topic centres around the development of innovative multimodal large AI models, covering both the training of foundation models and their subsequent fine-tuning. These models should show superior capabilities across a wide array of down-stream tasks. The emphasis is both on integrating new input data modalities into large AI models and on developing multimodal large AI models with either significantly higher capabilities and/or the ability to handle a greater number of modalities.
Moreover, projects should contribute to reinforcing Europe's research excellence in the field of large AI models by driving substantial scientific progress and innovation in key large AI areas. This includes the development of novel methods for pretraining multimodal foundation models. Additionally, novel approaches to effective and efficient fine-tuning of such models should be pursued.
Research activities should explore innovative methodologies for enhancing the representation, alignment, and interaction among the different data modalities, thereby substantially improving the overall performance and trustworthiness of these models. Advances in efficient computation for the pre-training, execution and fine-tuning of foundation models to reduce their computational and environmental impact, and increasing the safety of models are also topics of interest.
Proposals should outline how the models will incorporate trustworthiness, considering factors such as explainability, security, and privacy in line with provisions in the upcoming Artificial Intelligence Act. Additionally, the models should incorporate characteristics that align with European values, and provide improved multilingual capabilities, where relevant.
Proposals should address at least one of the following focus areas:
- the integration of innovative modalities of data for large AI models during training and inference. Examples of innovative modalities include event streams, structured data and sensor measurements. The incorporation of such new modalities could potentially bring unforeseen enhancements to model performance and enable their application in new domains like weather forecasting, robotics, and manufacturing.
- enhanced multimodal models that exceed the current state of the art, with either significantly improved capabilities or the ability to handle a larger number of modalities. This focus area also encompasses models capable of multi-modal output generation. Current large-scale multimodal models most commonly engage with only vision and language.
Each proposal is expected to address all of the following:
- Data Collection, Processing and Cross-modal Alignment. The proposal should describe convincingly the characteristics and availability of the large, trustworthy data sources, as well as the trustworthy data processing to be utilised within the project, detailing the data processing steps to ensure reliability, accountability and transparency, and the alignment of data among the different modalities. A modest portion (up to 10%) of the budget may be allocated to data collection activities; proposals may involve relevant data owners in this task, if necessary. Importantly, the proposal should delineate how potential privacy and IPR issues associated with the data will be managed and mitigated.
- Multimodal Foundation Model Pretraining. The pretrained multimodal foundation model is expected to demonstrate high capabilities across a wide range of tasks. The pretraining tasks used should be agnostic of down-stream tasks. These activities also cover the development of the codebase and implementation of small-scale experiments. A minor portion (up to 10%) of the budget may be allocated for the acquisition of computing resources for codebase development and small-scale experiments, though the primary source of computing resources for pretraining should be sought from external high-performance computing facilities such as EuroHPC or National centres. The proposal should describe convincingly the strategy to access these computing resources.
- Fine-Tuning of Multimodal Foundation Models: The proposal should clearly detail the activities pursued to fine-tune the model for diverse downstream tasks demonstrating illustrative potential use-cases. The tasks' output may either be of a single modality or multimodality. Research activities should investigate innovative methodologies designed to bolster the interplay between different data modalities, thereby enhancing the overall performance of these models.
- Testing and Evaluation: The proposal should detail the development of workflows, benchmarks, testing procedures, and pertinent tools for evaluating both foundation and fine-tuned models. Attention should be paid to the performance, transparency, bias, robustness, accuracy, and security of the models, through appropriate testing procedures (e.g., red teaming for safety and security), in compliance with the future AI Act.
Proposals should adopt a multidisciplinary research team, as appropriate, to cover all the above issues.
Proposals should adhere to Horizon Europe's guidelines regarding Open Science practices as well as the FAIR data principles. Open access should be provided to research outputs - including training datasets, software tools, model architecture and hyperparameters, as well as model weights - unless a legitimate interest or constraint applies. Additionally, proposals are encouraged to deliver results under open-source licenses.
All proposals are expected to embed mechanisms to assess and demonstrate progress (with qualitative and quantitative KPIs, benchmarking and progress monitoring, including participation to international evaluation contests, as well as illustrative application use-cases demonstrating concrete potential added value), and share communicable results with the European R&D community, through the AI-on-demand platform, and Common European data spaces, and if necessary other relevant digital resource platforms in order to enhance the European AI, Data and Robotics ecosystem through the sharing of results and best practice.
Proposals are also expected to dedicate tasks and resources to collaborate with and provide input to the open innovation challenge under HORIZON-CL4-2023-HUMAN-01-04. Research teams involved in the proposals are expected to participate in the respective Innovation Challenges.
This topic implements the co-programmed European Partnership on AI, data and robotics.