Select Language

AI社区

公开数据集

RCV1的子集数据集 该语料库已经用于作者识别实验

RCV1的子集数据集 该语料库已经用于作者识别实验

7.8M
776 浏览
0 喜欢
0 次下载
0 条讨论
Computer Classification

Dataset creator and donator: ZhiLiu, e-mail: liuzhi8673 '@' gmail.com, institution: National Engineering Researc......

数据结构 ? 7.8M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Dataset creator and donator: ZhiLiu, e-mail: liuzhi8673 '@' gmail.com, institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China



    Data Set Information:

    The dataset is the subset of RCV1. These corpus has already been used in author identification experiments. In the top 50 authors (with respect to total size of articles) were selected. 50 authors of texts labeled with at least one subtopic of the class CCAT(corporate/industrial) were selected.That way, it is attempted to minimize the topic factor in distinguishing among the texts. The training corpus consists of 2,500 texts (50 per author) and the test corpus includes other 2,500 texts (50 per author) non-overlapping with the training texts.


    Attribute Information:

    Attributes of the dataset are character n-grams(n=1-5)


    Relevant Papers:

    J. Houvardas, E. Stamatatos, a€?N-gram Feature Selection for Authorship Identification,a€? in Proc. of the 12th Int. Conf. on Artificial Intelligence: Methodology, Systems, Applications, vol. 4183, pp.77-86, (2006) September 12-15; Varna, Bulgaria.
    E. Stamatatos, a€?Author Identification Using Imbalanced and Limited Training Texts,a€? In Proc. of the 4th International Workshop on Text-based Information Retrieval, (2007) September 3-7; Regensburg, Germany.



    Citation Request:

    Please refer to the donator Zhi Liu from National Engineering Research Center For E-Learning Technology???China.

    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:6 去赚积分?
    • 776浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享