公开数据集
数据结构 ? 280G
README.md
To the best of our knowledge this is the largest publicly available dataset of face images with gender and age labels for training. We provide pretrained models for both age and gender prediction.
Description
Since the publicly available face image datasets are often of small to
medium size, rarely exceeding tens of thousands of images, and often
without
age information we decided to collect a large dataset of celebrities.
For this purpose, we took the list of the most popular 100,000 actors as
listed on the IMDb website and (automatically) crawled from their
profiles date of birth, name, gender and all images related to that
person.
Additionally we crawled all profile images from pages of people from
Wikipedia with the same meta information.
We removed the images without timestamp (the date when the photo was
taken).
Assuming that the images with single faces are likely to show the actor
and that the timestamp and date of birth are correct, we were able to
assign to each such image the biological (real) age. Of course, we can
not vouch for the accuracy of the assigned age information. Besides
wrong timestamps, many images are stills from movies - movies that can
have extended production times. In total we obtained 460,723 face images
from 20,284 celebrities from IMDb and 62,328 from Wikipedia, thus
523,051 in total.
As some of the images (especially from IMDb) contain several people we
only use the photos where the second strongest face detection is below a
threshold. For the network to be equally discriminative for all ages,
we equalize the age distribution for training. For more details please
the see the paper.
Usage
For both the IMDb and Wikipedia images we provide a separate .mat file which can be loaded with Matlab containing all the meta information. The format is as follows:
dob: date of birth (Matlab serial date number)
photo_taken: year when the photo was taken
full_path: path to file
gender: 0 for female and 1 for male, NaN if unknown
name: name of the celebrity
face_location: location of the face. To crop the face in Matlab run
img(face_location(2):face_location(4),face_location(1):face_location(3),:))
face_score: detector score (the higher the better). Inf implies that no face was found in the image and the face_location then just returns the entire image
second_face_score: detector score of the face with the second highest score. This is useful to ignore images with more than one face. second_face_score is NaN if no second face was detected.
celeb_names (IMDB only): list of all celebrity names
celeb_id (IMDB only): index of celebrity name
The age of a person can be calculated based on the date of birth and the time when the photo was taken (note that we assume that the photo was taken in the middle of the year):
[age,~]=datevec(datenum(wiki.photo_taken,7,1)-wiki.dob);
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。