Select Language

AI社区

公开数据集

深度三法

深度三法

585.1M
541 浏览
0 喜欢
0 次下载
0 条讨论
Computer Science Classification

数据结构 ? 585.1M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Context From the DeepTriage abstract: For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, which takes the bug title and description as the input, mapping it to one of the available developers (class labels). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data highly noisy. In the past decade, there has been a considerable amount of research in representing a bug report using tf-idf based bag-of-words feature (BOW) model. However, BOW model do not consider the syntactical and sequential word information available in the descriptive sentences. In this research, we propose a novel bug report representation algorithm using an attention based deep bidirectional recurrent neural network (DBRNN-A) model that learns a syntactic and semantic feature from long word sequences in an unsupervised manner. Instead of BOW features, the DBRNN-A based robust bug representation is then used for training the classification model. Further, using an attention mechanism enables the model to learn the context representation over a long word sequence, as in a bug report. To provide a large amount of data to learn the feature learning model, the unfixed bug reports (constitute about 70% bugs in an open source bug tracking system) are leveraged upon as an important contribution of this research, which were completely ignored in the previous studies. Another major contribution is to make this research reproducible by making the source code available and creating a public benchmark dataset of bug reports from three open source bug tracking system: Google Chromium, Mozilla Core, and Mozilla Firefox. For our experiments, we use 383,104 bug reports from Google Chromium, 314,388 bug reports from Mozilla Core, and 162,307 bug reports from Mozilla Firefox. Experimentally we compare our approach with BOW model and softmax classifier, support vector machine, naive Bayes, and cosine distance and observe that DBRNN-A provides a higher rank-10 average accuracy. Content This dataset contains the bug data for Google Chromium with four different training sets and one test set. - **classifier_data_0.csv** is a version of training data with no minimum number of occurrences for any class (most unbalanced). - **classifier_data_5.csv** contains a version of the training data where every class occurs at least 5 times. - **classifier_data_10.csv** contains a version of the training data where every class occurs at least 10 times. - **classifier_data_20.csv** contains a version of the training data where every class occurs at least 20 times. (most balanced) - **deep_data.csv** contains the test data. *In this data, the classes are the owners. Acknowledgements DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging. Senthil Mani, Anush Sankaran, Rahul Aralikatte, IBM Research, India. The dataset, code and paper can be found at this webpage: http://bugtriage.mybluemix.net/
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 541浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享