该数据集已在11个不同的会话中收集(8个室内和3个室外)以不同的背景和光线为特征。对于每个会话和每个对象,使用Kinect 2.0传感器记录了15秒的视频(20 fps),该视频提供300 RGB-D帧。
Fig.1 Example images of the 50 objects in CORe50. Each column denotes one of the 10 categories.
The presence of temporal coherent sessions (i.e.,
videos where the objects gently move in front of the camera) is another
key feature since temporal smoothness can be used to simplify object
detection, improve classification accuracy and to address semi-supervised (or unsupervised) scenarios.
In Fig. 1 you can see some image examples of the 50
objects in CORe50 where each column denotes one of the 10 categories and
each row a different object. The full dataset consists of 164,866
128×128 RGB-D images: 11 sessions × 50 objects × (around 300) frames per
session. Three of the eleven sessions (#3, #7 and #10) have been
selected for test and the remaining 8 sessions are used for training. We
tried to balance as much as possible the difficulty of training and
test session with respect to: indoor/outdoor, holding hand (left or
right) and complexity of the background. For more information about the
dataset take a look a the section "CORe50" in the paper.
