Poisoning attacks on AI

Go back to programme

Poisoning attacks on AI

Battista Biggio (University of Cagliari) will present his research on Poisoning Attacks on AI as part of the Trustworthy AI series.

WHAT IS TRUSTWORTHY AI SERIES?

Artificial Intelligence (AI) systems have steadily grown in complexity, gaining predictivity often at the expense of interpretability, robustness and trustworthiness. Deep neural networks are a prime example of this development. While reaching “superhuman” performances in various complex tasks, these models are susceptible to errors when confronted with tiny (adversarial) variations of the input – variations which are either not noticeable or can be handled reliably by humans. This expert talk series will discuss these challenges of current AI technology and will present new research aiming at overcoming these limitations and developing AI systems which can be certified to be trustworthy and robust.

The expert talk series will cover the following topics:

  • Measuring Neural Network Robustness
  • Auditing AI Systems
  • Adversarial Attacks and Defences
  • Explainability & Trustworthiness
  • Poisoning Attacks on AI
  • Certified Robustness
  • Model and Data Uncertainty
  • AI Safety and Fairness

The Trustworthy AI series is moderated by Wojciech Samek, Head of AI Department at Fraunhofer HHI, one of the top 20 AI labs in the world.

Battista Biggio: Poisoning attacks on AI

Shownotes

00:00 Opening remarks by ITU

00:00:58 Introduction by Wojciech Samek

01:37 Introduction by Battista Biggio – Poisoning Attacks on AI

02:50 Artificial Intelligence Today 

  • Comparing AI vs electricity on how the first one will change once more industrial society.  

04:11 is AI really smart?

  • Can we trust this technology? 
  • Are we happy with current results? 
  • We cannot trust in AI yet. 

04:59 Adversarial Examples – (Gradient-based Evasion Attacks)

  • Input image -> Adversarial perturbation (noise) -> Not trustable output. 

06:31 Not only in the digital domain

  • This is not only applicable in the digital domain, but also in the physical field. 
  • Example of recognising a stop sign when driving a car. 

07:51 Other applicable domain

  • Audio: Sometimes when having digital audio, noise can make it difficult to recognize sentences. 
  • Malware Examples: PDF, Android, Windows. 

11:04 Timeline of Learning Security 

  • This is good to learn the history of technology and to know where it is heading to.

12:21 Attacks against Machine Learning

  • Evasion attacks which can be adversarial examples (noise). 
  • Sponge attacks, which means they affect the time consuming of a model/system.
  • Model extraction/model inversion/membership inference, which are the one that reveal information of users (lack of privacy). 

14:56 Poisoning attacks 

  • Denial-of-service poisoning attacks. Example: a person used 99 phones to trick google into traffic jam alerts.
  • How does it work? Training data -> Preprocessing -> classifier -> output (filter information). 
  • Goal: to maximize classification error by injecting poisoning samples into TR.
  • Strategy: find an optimal attack point in TR that maximizes classification error. 

21:00 Poisoning is a Bilevel Optimization problem 

  • Attacker’s objective: to maximize generalization error on untainted data, w.r.t. posisong point Xc. 
  • You have to look for the maximum level of Xc to find the best filter to minimize error output. 

22:43 Bilevel Optimization 

  • To find the max Xc, get the gradient of the function of Xc. 

23:58 Gradient-based poisoning attacks 

  • Gradient is not easy to compute: the training point affects the classification function. 
  • To solve it: replace the inner learning problem with its equilibrium (KKT) conditions.
  • This enables computing gradients in closed form. 

25:19 Experiment on MNIST digits. 

26:01 is bilevel optimization really needed?

26:48 Towards poisoning deep neural networks 

  • Solving the poisoning problem without exploiting KKT conditions (back gradient).
  • This is to solve bilevel problems in a more efficient way. 
  • This algorithm is more difficult to achieve but has not been demonstrated yet.

28:33 Poisoning attacks on algorithm fairness

29:13 Why do adversarial attacks transfer? 

  • It means the ability of an attack developed against a surrogate model to succeed also against a different target model. 
  • It depends on the vulnerability of the target model and the alignment of gradients.

30:26 Countering Poisoning attacks 

  • Security Measures against poisoning: Rationale, which means injecting outlying training samples. 
  • Two strategics: data sanitization, by removing poisoning samples from training data; robust learning, by learning algorithms that are robust 

32:20 Robust regression with TRIM statistics 

  • TRIM learns the model by retaining only training points with the smallest residuals. 
  • Start with a condition and after several iterations you can remove noise. 
  • Experiments with TRIM (Loan dataset) 

34:10 Strength-detectability dilemma for poisoning attacks 

  • Examples

34:35 Backdoor attacks 

  • Clean training data: ideal  
  • Backdoor: poisoning integrity attacks place mislabeled training points in a region of the feature space far from the rest of training data. The learning algorithm labels such regios as desired, allowing for subsequent intrusions.

37:18 Backdoor poisoning: three main categories

  • BadNets: Training data with trigger
  • Hidden Trigger: No trigger
  • Poison Frogs, Convex polytope, Bulseye Polytope: targets a predefined class/sample   

39:50 Defending against backdoor poisoning attacks 

Process

  • Blind backdoor removal
  • Offline inspection
  • Online inspection
  • Post backdoor removal

40:50 Ongoing work: backdoor smoothing

  • Why do things work or don’t? 
  • Randomized smoothing to measure the instability/variability of the classification output around backdoored samples 
  • Backdoor attacks are more successful if they are able to induce smoother classification around the backdoor samples. 

42:23 Why is AI vulnerable?

  • Bernhard Scholkopf: Underlying assumption (past data is representative of future data, it means, data is not stationary. 
  • The success of modern AI is on task for which we collect enough representative training data. 
  • We cannot build AI models for each task an agent is ever going to encounter, but there is a whole world out there where IID assumption is violated.

44:41 What can we do, then? 

  • We lack testing/debugging/monitoring tools, to understand better how these algorithms work. 

45:30 Conclusion

46:04 Q&A Session 

47:30 What can you say about scalability? There is still research on process to avoid scalability.

48:25 What do you mean when backdoor attacks are becoming popular? They are one of the most threatening attacks in the field, therefore they are constantly in observation. Developers are trying to compromise the learning process to create defense algorithms. 

50:50 Best way to use classifiers? Combining all the classifiers will robust your system, but it depends on how you combine things. 

52:33 Would you trust in a model or there are methods to know if a model is already poisoned? I can do so only if a model is already certified. I would use something that is not certified if the model is not critical for a company or anyone else. 

54:16 How likely is that an attacker is capable of injecting samples during training? A classifier can be poisoned. This is a very challenging process.

55:43 Would stacking be somewhat better than a generic ensemble method? You have to have garantize in the classifier you are using to have an optimal combiner. This will give a more robust system. 

56:59 How can you obtain a certification? There are companies that are involved in the security aspect of machine learning algorithms. Poisoning attacks, etc. But if you want something robust, companies must focus on the quality of the algorithm. 

1:02:24 Closing Q&A Session

1:02:34 Closing from ITU

Share this session
In partnership with:
Scroll Up