The focus of AI testing has moved from theoretical alignment to active verification. As "Frontier" models (those exceeding 1025 FLOPs) and "Autonomous Agents" become the standard, there is a shift from testing what a model knows to testing what a system can do.
The main objective of this session will be to discuss the research around AI system testing and verification methods, the different methodologies that are used to test and verify AI systems.
Agentic AI shifts the conversation from "what AI says" to "what AI can do," – Agentic AI focus on autonomous AI agent systems as active decision-makers rather than passive AI, and requires embedding oversight, safeguards, and accountability directly into how agents act, interact, and adapt over time. While traditional AI system is about making sure a model functions responsibly, is not biased or hallucinating, Agentic AI testing and verification is about making sure an autonomous agent operate securely, reliably, in a trusted manner and is not deceptive. For example, it will not accidentally spend your company's entire quarterly budget or sign a legally binding contract it should not have (acting as a digital employee).
The session will explore the following questions:
- How do autonomous AI agents change the risk profile compared to human-supervised AI systems?
- What are the most credible short-term and medium-term security risks (e.g., runaway task execution, deceptive behavior, misinformation)?
- How should responsibility and liability be allocated when agents take independent actions?
- What safeguards are essential at the design, deployment, and operational stages?
- What are the current gaps in standards for autonomous AI agents? How can existing AI laws and standards be extended to cover agentic systems?
The objective of this session is to explore the gaps in testing AI systems, the evolution of Frontier AI systems and autonomous AI agents, their implications for cybersecurity and opportunities for international collaboration to ensure the effective design, development, and deployment of AI systems that integrate considerations of all countries esp emerging economies and low and middle countries. The need for international collaboration on standards for AI testing and assurance will be highlighted in this session.
Irakli BeridzeHead of Centre for Artificial Intelligence and Robotics, United Nations Interregional Crime and Justice Research Institute (UNICRI)
Rachel AdamsCEO, Global Centre for AI Governance
Hiromu KitamuraPrincipal Expert for Technical Management, Japan AI Safety Institute (J-AISI)
Balaraman RavindranHead of the Wadhwani School of Data Science and AI, IIT Madras
Kato Steven MubiruCEO and Co-Founder, Crane AI Labs


Register here














