公开数据集

VidTIMIT 音频视频数据集

3088.22M

1299 浏览

0 喜欢

11 次下载

0 条讨论

Earth and Nature,Music,Image Data,Linguistics,Video Data Classification

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 3088.22M

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

This dataset was copied from http://conradsanderson.id.au/vidtimit/. Numbered files in video folders are .jpg (extensions are missing). I'm not the author/creator of this dataset! The following are quotations from that website: Example-sentences of the spoked words in the folders: sa1: She had your dark suit in greasy wash water all year sa2: Don't ask me to carry an oily rag like that si1398: Do they make class-biased decisions? si2028: He took his mask from his forehead and threw it, unexpectedly, across the deck si768: Make lid for sugar bowl the same as jar lids, omitting design disk sx138: The clumsy customer spilled some expensive perfume sx228: The viewpoint overlooked the ocean sx318: Please dig my potatoes up before frost sx408: I'd ride the subway, but I haven't enough change sx48: Grandmother outgrew her upbringing in petticoats (those are just examples, full listing can be find here: https://catalog.ldc.upenn.edu/docs/LDC93S1/PROMPTS.TXT) Overview The VidTIMIT dataset is comprised of video and corresponding audio recordings of 43 people, reciting short sentences. It can be useful for research on topics such as automatic lip reading, multi-view face recognition, multi-modal speech recognition and person identification. The dataset was recorded in 3 sessions, with a mean delay of 7 days between Session 1 and 2, and 6 days between Session 2 and 3. The sentences were chosen from the test section of the TIMIT corpus. There are 10 sentences per person. The first six sentences (sorted alpha-numerically by filename) are assigned to Session 1. The next two sentences are assigned to Session 2 with the remaining two to Session 3. The first two sentences for all persons are the same, with the remaining eight generally different for each person. In addition to the sentences, each person performed a head rotation sequence in each session. The sequence consists of the person moving their head to the left, right, back to the center, up, then down and finally return to center. The recording was done in an office environment using a broadcast quality digital video camera. The video of each person is stored as a numbered sequence of JPEG images with a resolution of 512 x 384 pixels. 90% quality setting was used during the creation of the JPEG images. The corresponding audio is stored as a mono, 16 bit, 32 kHz WAV file. PLEASE READ BEFORE DOWNLOADING LICENSE The VidTIMIT dataset is Copyright ? 2001 Conrad Sanderson. Distribution and research usage of this dataset is permitted under the following conditions: This notice is left intact and not modified in any way. The dataset is provided as is. There is no warranty as to the fitness for any particular purpose. The author of the dataset is not responsible for any direct or indirect losses resulting from the use of the dataset. Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of VidTIMIT must cite the following paper: C. Sanderson and B.C. Lovell Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference. Lecture Notes in Computer Science (LNCS), Vol. 5558, pp. 199-208, 2009. NOTES The VidTIMIT dataset is comprised of 44 files, in total taking up about 3 Gb. Each zip is on average 71 Mb Please download only one file at a time -- this is so the server is not overloaded

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

0 去赚积分？

1299浏览
11下载
0点赞
收藏
分享

今日排行

本月搜索

Dataset Category