AI for Good blog

Bridging the AI gender gap: Why we need better data for an equal world


A real world built and designed using data for men ignores the needs of half its population. This holds true even when artificial intelligence is harnessed to solve challenges facing all of humanity.

The default human at the centre of most data is ‘Reference Man’, said Caroline Criado-Perez, campaigner and author of the book ‘Invisible Women: Exposing Data Bias in a World Designed for Men’. This Caucasian man, who is 25-30 years old and weighs 70 kg, has been the ‘human of reference’ in research studies across sectors for decades.

The gender data gap is this “phenomenon whereby the vast majority of information that we have collected globally and continue to collect – everything from economic data to urban planning data to medical data – have been collected on men”, Criado-Perez said during her Breakthrough Days keynote as part of the AI for Good Global Summit 2020.

When data is not collected and separated by gender, there is no way to learn what works and what doesn’t for different groups.

Fixing this gap by collecting gender-disaggregated data is essential if AI is to fulfil its promise of improving outcomes for everyone.

Missing data leads to missed opportunities

Relying on data from male bodies and lifestyles to define and solve problems results not just in discomfort – it can also be unsafe.

According to Criado-Perez, several female front-line health workers have spoken about feeling more exposed to COVID-19 due to their badly-fitting ‘unisex’ personal protective equipment (PPE). Studies have also found that a woman wearing a seat belt in a car crash is 47 per cent more likely to be seriously injured and 17 per cent more likely to die than a man in the same crash because the dummies used in tests were based on the 50th-percentile man, Criado-Perez said.

Similarly, any algorithm trained on male-dominated datasets is unlikely to predict accurate risks and results for everyone.

Criado-Perez brought up a gender-neutral algorithm that was designed to predict heart attacks, but her research found flaws in the data.

“The paper provided hardly any disaggregated data and the studies on which the AI was trialled were heavily male-dominated,” she said, pointing out the lack of mentions of diabetes or smoking, both higher risk factors for women. When it comes to COVID-19, a lack of gender data will prevent us from understanding potential differences in how men and women respond to the virus.

Threat of increased bias

Another drawback of gender bias and data gaps in AI is that it does not just reflect them; it amplifies them.

One study found that an image-recognition software trained by a deliberately-biased set of photographs ended up making stronger sexist associations.

“The dataset had pictures of cooking, which were over 33 per cent more likely to involve women than men. But the algorithms trained on this dataset connected pictures of kitchens with women 68 per cent of the time. That’s a pretty big jump,” Criado-Perez said.

Google’s AI tool has since dropped gendered labels from image recognition to reduce bias, using ‘person’ instead of ‘man’ or ‘woman’ to tag images.

How do we bridge the gender data gap?

When biased data is used in artificial intelligence, the danger is that it will increase prevailing inequalities in the world.

This is concerning as AI applications are increasingly deployed in healthcare, judicial and policing practices, and human resources.

When it comes to gender, the data gap applies not just to women, but also to transgender and non-binary people.

According to Criado-Perez, one way to bridge the gap is to collect more data disaggregated by gender and sex.

But the first step in getting there is to acknowledge the bias problem in the first place, she acknowledged. “This means […] asking the right questions like ‘What are we missing?’ or ‘Are we even equipped to know what we are missing’?” said Criado-Perez.

Bias need not be inherently malicious and can stem from gaps in knowledge. This is why diversity matters, she asserted.

The more diverse a team or organization, the better positioned it is to provide varied perspectives and spot any omissions.

In 2014, when Apple launched its health tracking app, it allowed users to monitor their copper intake but did not provide the option to track menstrual cycles, Criado-Perez noted.

A diverse team can help answer the question of whether efforts are being directed towards solving the right problem in the first place.

Making data open and accountable

To understand whether AI and algorithms work for everyone, Criado-Perez calls for data to be more accessible.

“We need to make it a right to know what data companies, manufacturers and governments are using to make decisions that affect all of our lives,” she said.

Transparent AI means an end to indecipherable ‘black box’ algorithms with potential hidden biases and unaccountable data-driven decisions, Criado-Perez added.

Designing new solutions for gender equity is a focus track in interactive workshops taking place during the Breakthrough Days.


Teams have come together to present and refine solutions for gender bias with feedback from AI and gender experts. One group hopes to develop global recommendations for judiciaries to address gender-related issues in AI systems. Other projects include building a tool to detect bias in language data used for training artificial intelligence, and intentionally constructing datasets through user research and data science to detect gender bias in algorithms.


Image credit: ThisIsEngineering via Pexels.

Are you sure you want to remove this speaker?