公开数据集
数据结构 ? 57.78M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Relational Strategies in Customer Service (RSiCS) Dataset
Human-computer data from three live customer service Intelligent Virtual Agents (IVAs) in the domains of travel and telecommunications were collected, and annotators marked all text that was deemed unnecessary to the determination of user intention. After merging the selections of multiple annotators to create highlighted texts, a s second round of annotation was performed to determine the classes of language present in the highlighted sections such as the presence of Greetings, Backstory, Justification, Gratitude, Rants, or Emotions. This resulting corpus is a valuable resource for improving the quality and relational abilities of IVAs.
Data
Data was collected from four sources. The conversation logs of three commercial customer service IVAs and the Airline forums on TripAdvisor.com during August 2016.
Dataset numbering used in files:
TripAdvisor.com airline forum
Train travel IVA
Airline travel IVA
Telecommunications support IVA
File Contents and Formatting
x_y_align.csv Alignment of annotator x to all other annotators in their group for dataset y. Columns: Annotator A ID: x Annotator B ID: Annotator that the alignment score with x is calculated against. Group ID: The group of 4 annotators that the compared users belong to. Dataset ID: Dataset y that the request originated from. Request ID: Unique ID of a request to allow joining between different files. Text: The original request text. Annotator A Text: The request text with selections from annotator A contained within [ and ]. Annotator B Text: The request text with selections from annotator B contained within [ and ]. Length: The character length (n) of the original request text in column 6. Error: The number of character positions (e) where the binary determination of A and B do not agree. Alignment Score: The alignment as calculated by align = (n - e) / n. Agreement: Whether or not A and B agree that any selection is necessary. all_data_by_threshold.csv All requests with selections merged by threshold. Each request is repeated 4 times, once for each merging threshold. Columns: Dataset ID: Dataset that the request originated from. Group ID: The group of 4 annotators that the selections originated from. Request ID: Unique ID of a request to allow joining between different files. MultiIntent: 1 if at least one annotator flagged the text as containing more that one user intention, 0 otherwise. Threshold: The threshold (i) to merge selections by. MergedSelections: If at least i annotators marked a character as unnecessary then it will be contained within the selected portion denoted by [ and ]. Unselected: All text from MergedSelections not contained by [ and ]. Selected: All text from MergedSelections contained by [ and ]. Removed: Amount of text removed from the original request by the merged selections: length(Selected) / n tagged_selections_by_sentence.csv Second annotation pass tagging relational language present in selections made by first pass of annotation. Only contains requests in all_data_by_threshold.csv not marked as MultiIntent. Columns: Dataset ID: Dataset that the request originated from. Group ID: The group of 4 annotators that the selections originated from. Request ID: Unique ID of a request to allow joining between different files. Threshold: The threshold (i) to merge selections by. MergedSelections: If at least i annotators marked a character as unnecessary then it will be contained within the selected portion denoted by [ and ]. Unselected: All text from MergedSelections not contained by [ and ]. Selected: All text from MergedSelections contained by [ and ]. Greeting: If a greeting of some kind (Hi, How are you) is present in Selected Backstory: If self-exposure language is present in Selected. The user is telling the audience about themselves, their situation, what led them to contact the agent or ask their question. Justification: If justification language is present in Selected. The user is giving facts to build credibility that their request or statement is true. Also can be why they need resolution or a consequence if something is not resolved. Rant: If ranting is present in Selected. Excessive complaining or negative narrative. Gratitude: If some expression of gratitude to the audience for past or future help is present in Selected. Other: If some or all of the highlighted section does not contain any relational language in Selected. Could be additional facts the user gave but annotators determined was unnecessary to determine their intention, or a general question such as Can you help?. Express Emotion: If any emotional language not covered by Rant is present in Selected all_multi_intent.csv All requests flagged as containing multiple intentions by at least one annotator. Useful for developing multiple intent detection strategies. Columns: Dataset ID: Dataset that the request originated from. Group ID: The group of 4 annotators that the selections originated from. Request ID: Unique ID of a request to allow joining between different files. Text: The original request text. Annotator x: Will be 1 if annotator x believed more than one intent was present in the text, 0 otherwise.
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。