More than 10 million faces were removed from a Microsoft database earlier this week. The faces were used as part of the training and testing of facial recognition algorithm. Microsoft had termed it as MS Celeb and nearly 100,000 people’s pictures were used, rounding it off to nearly 10 million images of different shots. These images were gathered from publically available sources.
Microsoft said that the large volume of the structured data images was very helpful in the training program used for facial recognition. In the dataset, the individual photos were quite easy to find enabling good AI training to recognize a person with multiple pictures. Now, Microsoft has scrapped the entire dataset.
A Financial Times investigation showed that the people’s pictures used weren’t even aware their images were used for training nor had they given consent. Coming under the GDPR regulations, experts say that this could pull Microsoft into legal issues. We know that the General Data Protection Regulation is one of the strictest laws for privacy and security requirements for obtaining, storing, and transferring personal data.
However, Microsoft hasn’t officially announced that it would be removing the database. In a statement to FT, Microsoft said: “The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed.” Interestingly, datasets gathered by Stanford and Duke University were taken down after the FT investigation.