Workshop
In person
LeadersGoldDiscovery

Testing and verification of frontier AI systems

  • Date
    7 July 2026
    Timeframe
    14:00 - 17:15
    Duration
    3h 15 minutes
    • Days
      Hours
      Min
      Sec
    Rigorous testing and verification of frontier AI systems is no longer optional. It is essential for security, and global trust.
     
    This workshop brings experts together to explore cutting edge research on how advanced AI models can be tested, verified, and validated before they are deployed at scale. The session delivers a clear overview of today’s most relevant methodologies, from stress testing complex model behaviour to evaluating system robustness under real world conditions. It also shines a spotlight on the critical gaps that still exist across current testing practices, tools, and international standards. By identifying where today’s methods fall short, the workshop sets the stage for designing stronger, more interoperable frameworks. Participants will also discuss how countries, institutions, and research communities can collaborate to accelerate progress. The goal is simple and urgent: to build AI systems that are secure, reliable and robust enough for widespread global deployment.
     
    The main objectives of the workshop are to:
    1. discuss the research around AI system testing and verification methods,
    2. identify any gaps in current methodologies for AI system testing and verification,
    3. security challenges of autonomous AI agents and
    4. discuss opportunities for international collaboration on Frontier AI Systems through a multistakeholder platform to take into consideration the requirements of low and middle income countries and the global south.
    Schedule

    The focus of AI testing has moved from theoretical alignment to active verification. As "Frontier" models (those exceeding 1025 FLOPs) and "Autonomous Agents" become the standard, there is a shift from testing what a model knows to testing what a system can do.

    The main objective of this session will be to discuss the research around AI system testing and verification methods, the different methodologies that are used to test and verify AI systems.

    Agentic AI shifts the conversation from "what AI says" to "what AI can do," – Agentic AI focus on autonomous AI agent systems as active decision-makers rather than passive AI, and requires embedding oversight, safeguards, and accountability directly into how agents act, interact, and adapt over time. While traditional AI system is about making sure a model functions responsibly, is not biased or hallucinating, Agentic AI testing and verification is about making sure an autonomous agent operate securely, reliably, in a trusted manner and is not deceptive. For example, it will not accidentally spend your company's entire quarterly budget or sign a legally binding contract it should not have (acting as a digital employee).

    The session will explore the following questions:

    • How do autonomous AI agents change the risk profile compared to human-supervised AI systems?
    • What are the most credible short-term and medium-term security risks (e.g., runaway task execution, deceptive behavior, misinformation)? 
    • How should responsibility and liability be allocated when agents take independent actions?
    • What safeguards are essential at the design, deployment, and operational stages?
    • What are the current gaps in standards for autonomous AI agents? How can existing AI laws and standards be extended to cover agentic systems?

    The objective of this session is to explore the gaps in testing AI systems, the evolution of Frontier AI systems and autonomous AI agents, their implications for cybersecurity and opportunities for international collaboration to ensure the effective design, development, and deployment of AI systems that integrate considerations of all countries esp emerging economies and low and middle countries. The need for international collaboration on standards for AI testing and assurance will be highlighted in this session.

    Share this session with your network
    • Session starts in
      Days
      Hours
      Min
      Sec

    Are you sure you want to remove this speaker?