Multimodal adaptation of LLMs for Smart mobility in Africa

  • * Register (or log in) to the Neural Network to add this session to your agenda or watch the replay

  • Date
    29 April 2025
    Timeframe
    11:00 - 12:00 CEST Geneva
    Duration
    60 minutes
    Share this session

    Recent advancements in large Vision-Language Models (VLMs) and Large Language Models (LLMs) have demonstrated the effectiveness of pre-training on large-scale, general-domain corpora to enable zero-shot and few-shot transfer across a broad range of downstream tasks [1, 2, 3]. While such models are not inherently domain-specific, they can be adapted to specialized domains through fine-tuning or Retrieval-Augmented Generation (RAG), which augments input prompts with relevant retrieved context to enhance in-context learning. Also, current LLMs typically respond to transportation-related queries without leveraging critical multimodal signals—such as GPS data, graph structures, time-series patterns, and weather information—that could provide richer contextual grounding and improve the relevance and accuracy of responses in real-world mobility scenarios. To support intelligent transportation in the African context, we propose the development of a domain-adapted multimodal LLM trained on mobility-specific datasets derived from African cities. The resulting model will encode knowledge specific to mobility in the African context, enabling robust performance on downstream tasks such as congestion prediction, route optimization, and data-driven traffic planning. As a representative case study, we consider three African cities—Kigali, Nairobi, and Lagos—characterized by distinct mobility dynamics and infrastructure development levels. The system architecture consists of three primary components: (1) Data Acquisition and Curation: Collection of high-quality, multimodal datasets (e.g., Road landmark images, GPS traces, public transport logs, infrastructure maps, etc.) from the target cities. (2) Model Adaptation: Development of a retrieval-augmented generation pipeline and domain-specific fine-tuning of a pretrained multimodal LLM using the curated data. (3) Deployment Interface: Implementation of a web-based application that interfaces with the adapted model.

    Learning Objectives: 

    1. Understand: Describe the key principles behind large Vision-Language Models (VLMs), Large Language Models (LLMs), and the benefits of pre-training on general-domain corpora.
    2. Apply: Demonstrate how Retrieval-Augmented Generation (RAG) enhances LLM performance in mobility-related query resolution.
    3. Analyze: Differentiate between general-purpose LLMs and domain-adapted multimodal models in the context of African urban mobility challenges.
    4. Evaluate: Assess the effectiveness of multimodal signals (e.g., GPS, graph structures, weather) in improving the contextual accuracy of transportation models.
    5. Create: Design a conceptual architecture for a domain-adapted LLM system incorporating data acquisition, model fine-tuning, and user interface components for African smart mobility.

    Recommended Mastery Level / Prerequisites:
    None 

    Are you sure you want to remove this speaker?