公开数据集

深度三法

585.1M

1048 浏览

0 喜欢

0 次下载

0 条讨论

Computer Science Classification

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 585.1M

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

Context From the DeepTriage abstract: For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, which takes the bug title and description as the input, mapping it to one of the available developers (class labels). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data highly noisy. In the past decade, there has been a considerable amount of research in representing a bug report using tf-idf based bag-of-words feature (BOW) model. However, BOW model do not consider the syntactical and sequential word information available in the descriptive sentences. In this research, we propose a novel bug report representation algorithm using an attention based deep bidirectional recurrent neural network (DBRNN-A) model that learns a syntactic and semantic feature from long word sequences in an unsupervised manner. Instead of BOW features, the DBRNN-A based robust bug representation is then used for training the classification model. Further, using an attention mechanism enables the model to learn the context representation over a long word sequence, as in a bug report. To provide a large amount of data to learn the feature learning model, the unfixed bug reports (constitute about 70% bugs in an open source bug tracking system) are leveraged upon as an important contribution of this research, which were completely ignored in the previous studies. Another major contribution is to make this research reproducible by making the source code available and creating a public benchmark dataset of bug reports from three open source bug tracking system: Google Chromium, Mozilla Core, and Mozilla Firefox. For our experiments, we use 383,104 bug reports from Google Chromium, 314,388 bug reports from Mozilla Core, and 162,307 bug reports from Mozilla Firefox. Experimentally we compare our approach with BOW model and softmax classifier, support vector machine, naive Bayes, and cosine distance and observe that DBRNN-A provides a higher rank-10 average accuracy. Content This dataset contains the bug data for Google Chromium with four different training sets and one test set. - **classifier_data_0.csv** is a version of training data with no minimum number of occurrences for any class (most unbalanced). - **classifier_data_5.csv** contains a version of the training data where every class occurs at least 5 times. - **classifier_data_10.csv** contains a version of the training data where every class occurs at least 10 times. - **classifier_data_20.csv** contains a version of the training data where every class occurs at least 20 times. (most balanced) - **deep_data.csv** contains the test data. *In this data, the classes are the owners. Acknowledgements DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging. Senthil Mani, Anush Sankaran, Rahul Aralikatte, IBM Research, India. The dataset, code and paper can be found at this webpage: http://bugtriage.mybluemix.net/

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

0 去赚积分？

1048浏览
0下载
0点赞
收藏
分享

今日排行

本月搜索

Dataset Category