View all newsletters
Receive our newsletter - data, insights and analysis delivered to you

IBM Facial Recognition Dataset Aims to Remove Gender and Skin Bias

“We expect face recognition to work accurately for each of us”

By CBR Staff Writer

IBM is releasing a new dataset called Diversity in Faces in the hope that it will help developers tackle gender and skin type biases in facial recognition software.

The dataset of one million images has been compiled using publicly available images taken from the YFCC-100M Creative Commons dataset.

These images were then annotated using ten facial coding schemes, along with human-labelled gender and age notes.

The development of facial recognition software has been rocky: from its use in player creation in NBA and FIFA videogames that resulted in Cronenbergesque facial models, to the gender and skin type bias experienced in modern facial-analysis programs.

Last year MIT researchers found that facial-analysis software had an error rate of 34.7 percent for dark-skinned women. That’s a worryingly high number – and one aggravated by the fact that light-skinned men had an error rate of 0.8.

Of course software on its own can’t be biased, nor does AI have an agenda. These problems stem from our current approach to training AI and facial recognition software. As John R. Smith, Manager of AI Tech at IBM, puts it in an emailed comment: “The AI systems learn what they’re taught, and if they are not taught with robust and diverse data sets, accuracy and fairness could be at risk.”

IBM is hoping to provide that robust and diverse dataset with today’s release.

Content from our partners
Unlocking growth through hybrid cloud: 5 key takeaways
How businesses can safeguard themselves on the cyber frontline
How hackers’ tactics are evolving in an increasingly complex landscape

IBM Facial Recognition Dataset Project

Previously the focus on facial recognition software training has been on gender, age and skin-tone. Yet, these attributes alone only are not adequate enough to give a full representation of the full scale of diversity that humanity contains.

In order to capture and train humanity’s heterogeneity to a machine, Diversity in Faces uses ten facial coding schemes such as facial symmetry, facial contrast, pose, craniofacial (bone structure) areas and ratios; all in conjunction with the traditional age, gender and skin tone coding schemes.

In a paper outlining their work, IBM’s researchers on the Diversity in Faces project wrote: “Every face is different. Every face reflects something unique about us. Aspects of our heritage – including race, ethnicity, culture, geography – and our individual identity – age, gender and visible forms of self-expression – are reflected in our faces.”

“We expect face recognition to work accurately for each of us. Performance should not vary for different individuals or different populations.”

As of today the dataset is open to the global research community upon request here.

See Also: Introducing Microsoft’s AI-Powered Video Categorisation Tool

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU