IBM is releasing a new dataset called Diversity in Faces in the hope that it will help developers tackle gender and skin type biases in facial recognition software.
The dataset of one million images has been compiled using publicly available images taken from the YFCC-100M Creative Commons dataset.
These images were then annotated using ten facial coding schemes, along with human-labelled gender and age notes.
The development of facial recognition software has been rocky: from its use in player creation in NBA and FIFA videogames that resulted in Cronenbergesque facial models, to the gender and skin type bias experienced in modern facial-analysis programs.
Last year MIT researchers found that facial-analysis software had an error rate of 34.7 percent for dark-skinned women. That’s a worryingly high number – and one aggravated by the fact that light-skinned men had an error rate of 0.8.
Of course software on its own can’t be biased, nor does AI have an agenda. These problems stem from our current approach to training AI and facial recognition software. As John R. Smith, Manager of AI Tech at IBM, puts it in an emailed comment: “The AI systems learn what they’re taught, and if they are not taught with robust and diverse data sets, accuracy and fairness could be at risk.”
IBM is hoping to provide that robust and diverse dataset with today’s release.
IBM Facial Recognition Dataset Project
Previously the focus on facial recognition software training has been on gender, age and skin-tone. Yet, these attributes alone only are not adequate enough to give a full representation of the full scale of diversity that humanity contains.
In order to capture and train humanity’s heterogeneity to a machine, Diversity in Faces uses ten facial coding schemes such as facial symmetry, facial contrast, pose, craniofacial (bone structure) areas and ratios; all in conjunction with the traditional age, gender and skin tone coding schemes.
In a paper outlining their work, IBM’s researchers on the Diversity in Faces project wrote: “Every face is different. Every face reflects something unique about us. Aspects of our heritage – including race, ethnicity, culture, geography – and our individual identity – age, gender and visible forms of self-expression – are reflected in our faces.”
“We expect face recognition to work accurately for each of us. Performance should not vary for different individuals or different populations.”
As of today the dataset is open to the global research community upon request here.