公开数据集

街景文本（SVT）数据集，来自谷歌街景的图像数据

112.7M

1725 浏览

1 喜欢

0 次下载

0 条讨论

NLP,Energy 2D Box,Classification

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variabil......

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 112.7M

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses. More details about the data set can be found in our paper, Word Spotting in the Wild [1]. For our up-to-date benchmarks on this data, see our paper, End-to-end Scene Text Recognition [2].

This dataset only has word-level annotations (no character bounding boxes) and should be used for

cropped lexicon-driven word recognition and
full image lexicon-driven word detection and recognition.

If you need character training data then you should look into the Chars74K and the ICDAR2003 and ICDAR2005 datasets.

metadata and Ground Truth Data

Task: locate all the words in an image that appear in its lexicon. While there is other text in the image, only the lexicon words are to be detected. This contrasts from the more general OCR problem. Lexicon: HOLIDAY, INN, EXPRESS, HOTEL, NEW, YORK, CITY, FIFTH, AVENUE, MICHAEL, FINA, CINEMA, CAFE, 45TH, STARBUCKS, BINDER, DAVID, DDS, MANHATTAN, DENTIST, BARNES, NOBLE, BOOKSELLERS, AVE, ART, BROWN, INTERNATIONAL, PEN, SHOP, MORTON, THE, STEAKHOUSE, DISHES, BUILD, BEAR, WORKSHOP, HARVARD, CLUB, CORNELL, PACE, UNIVERSITY, LENSCRAFTERS, SETTE, FOSSIL, STORE, 5TH, JEWEL, INDIA, RESTAURANT, KELLARI, TAVERNA, YACHT

We used Amazon's Mechanical Turk to harvest and label the images from Google Street View. To build the data set, we created several Human Intelligence Tasks (HITs) to be completed on Mechanical Turk.

Harvest images

Workers are assigned a unique city and are requested to acquire 20 images that contain text from Google Street view. They were instructed to: (1) perform a Search Nearby:* on their city, (2) examine the businesses in the search results, and (3) look at the associated street view for images containing text from the business name. If words are found, they compose the scene to minimize skew, save a screen shot, and record the business name and address.

Image annotation

Workers are presented with an image and a list of candidate words to label with bounding boxes. This contrasts with the ICDAR Robust Reading data set in that we only label words associated with businesses. We used Alex Sorokin's Annotation Toolkit to support bounding box image annotation. For each image, we obtained a list of local business names using the Search Nearby:* in Google Maps at the image's address. We stored the top 20 business results for each image, typically resulting in 50 unique words. To summarize, the SVT data set consists of images collected from Google Street View, where each image is annotated with bounding boxes around words from businesses around where the image was taken.

The annotations are in XML using tags similar to those from the ICDAR 2003 Robust Reading Competition.

References

Kai Wang, Boris Babenko and Serge Belongie, "End-to-end Scene Text Recognition", ICCV 2011, Barcelona, Spain (PDF). Galleries: ICDAR, SVT.

Kai Wang and Serge Belongie, "Word Spotting in the Wild", ECCV 2010, Heraklion, Crete, Greece (PDF).

Contact Author

Kai Wang
EBU3B, Room 4148
Department of Comp. Sci. and Engr.
University of California, San Diego
9500 Gilman Drive, Mail Code 0404
La Jolla, CA 92093-0404 
Email: k...@cs.ucsd.edu

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

11 去赚积分？

1725浏览
0下载
1点赞
收藏
分享

Select Language

AI社区

今日排行

本月搜索

Dataset Category