公开数据集

分子生物学（启动子基因序列）数据集，可用于评估一种混合学习算法（KBANN）

755 浏览

1 喜欢

5 次下载

0 条讨论

Life Classification

Data Set Information:This dataset has been developed to help evaluate a hybrid learning algorithm (KBANN) that uses exam......

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 5K

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

Data Set Information:

This dataset has been developed to help evaluate a "hybrid" learning algorithm ("KBANN") that uses examples to inductively refine preexisting knowledge. Using a "leave-one-out" methodology, the following errors were produced by various ML algorithms. (See Towell, Shavlik, & Noordewier, 1990, for details.)

System -- Errors -- Comments
----------------------------------------------------------------
KBANN -- 4/106 -- a hybrid ML system
BP -- 8/106 -- std backprop with one hidden layer
O'Neill -- 12/106 -- ad hoc technique from the bio. lit.
Near-Neigh -- 13/106 -- a nearest-neighbor algo (k=3)
ID3 -- 19/106 -- Quinlan's decision-tree builder

Type of domain: non-numeric, nominal (one of A, G, T, C)

Note: DNA nucleotides can be grouped into a hierarchy, as shown below:

X (any)
/
(purine) R Y (pyrimidine)
/ /
A G T C

Here is that hierachy in a text-friendly format:

X (any)
. R (purine)
. . A
. . G
. Y (pyrimidine)
. . T
. . C

Attribute Information:

1. One of {+/-}, indicating the class ("+" = promoter).
2. The instance name (non-promoters named by position in the 1500-long nucleotide sequence provided by T. Record).
3-59. The remaining 57 fields are the sequence, starting at position -50 (p-50) and ending at position +7 (p7). Each of these fields is filled by one of {a, g, t, c}.

Relevant Papers:

Harley, C. and Reynolds, R. 1987. "Analysis of E. Coli Promoter Sequences." Nucleic Acids Research, 15:2343-2361.
[Web link]

Towell, G., Shavlik, J. and Noordewier, M. 1990. "Refinement of Approximate Domain Theories by Knowledge-based Artificial Neural Networks." In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90).
[Web link]

Papers That Cite This Data Set¹:

Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005. [View Context].

Wei-Chun Kao and Kai-Min Chung and Lucas Assun and Chih-Jen Lin. Decomposition Methods for Linear Support Vector Machines. Neural Computation, 16. 2004. [View Context].

Aik Choon Tan and David Gilbert. An Empirical Comparison of Supervised Machine Learning Techniques in Bioinformatics. APBC. 2003. [View Context].

Giorgio Valentini. Ensemble methods based on bias--variance analysis Theses Series DISI-TH-2003. Dipartimento di Informatica e Scienze dell'Informazione . 2003. [View Context].

Zoubin Ghahramani and Hyun-Chul Kim. Bayesian Classifier Combination. Gatsby Computational Neuroscience Unit University College London. 2003. [View Context].

Jinyan Li and Limsoon Wong. Using Rules to Analyse Bio-medical data: A Comparison between C4.5 and PCL. WAIM. 2003. [View Context].

Michael G. Madden. evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. CoRR, csLG/0211003. 2002. [View Context].

Mukund Deshpande and George Karypis. evaluation of Techniques for Classifying Biological Sequences. PAKDD. 2002. [View Context].

Takashi Matsuda and Hiroshi Motoda and Tetsuya Yoshida and Takashi Washio. Mining Patterns from Structured Data by Beam-Wise Graph-based Induction. Discovery Science. 2002. [View Context].

Marina Meila and Michael I. Jordan. Learning with Mixtures of Trees. Journal of Machine Learning Research, 1. 2000. [View Context].

Jie Cheng and Russell Greiner. Comparing Bayesian Network Classifiers. UAI. 1999. [View Context].

Ismail Taha and Joydeep Ghosh. Symbolic Interpretation of Artificial Neural Networks. IEEE Trans. Knowl. Data Eng, 11. 1999. [View Context].

Cesar Guerra-Salcedo and L. Darrell Whitley. Genetic Approach to Feature Selection for Ensemble Creation. GECCO. 1999. [View Context].

Mark A. Hall and Lloyd A. Smith. Feature Selection for Machine Learning: Comparing a Correlation-based Filter Approach to the Wrapper. FLAIRS Conference. 1999. [View Context].

Mark A. Hall. Department of Computer Science Hamilton, NewZealand Correlation-based Feature Selection for Machine Learning. Doctor of Philosophy at The University of Waikato. 1999. [View Context].

Creators:

1. promoter instances: C. Harley (CHARLEY '@' McMaster.CA) and R. Reynolds

2. non-promoter instances and domain theory: M. Noordewier
-- (non-promoters derived from work of lab of Prof. Tom Record, University of Wisconsin Biochemistry Department)

Donor:

M. Noordewier and J. Shavlik, {noordewi,shavlik}@cs.wisc.edu

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

12 去赚积分？

755浏览
5下载
1点赞
收藏
分享

Select Language

AI社区

今日排行

本月搜索

Dataset Category