公开数据集
数据结构 ? 5K
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
前5个变量都是血液检测,被认为对过度饮酒可能引起的肝脏疾病敏感。数据集中的每一行构成一个男性个体的记录。
重要提示:第7个字段(选择器)在过去被广泛误解为代表是否存在肝脏疾病的因变量。这是不正确的[1]。第七个字段由BUPA研究人员创建,作为列车/测试选择器。它不适合作为分类的因变量。数据集不包含任何表示是否存在肝脏疾病的变量。希望使用该数据集作为分类基准的研究人员应遵循捐赠者在实验中使用的方法(Forsyth&Rada,1986,《机器学习:专家系统和信息检索中的应用》)和其他方法(例如,Turney,1995,《成本敏感分类:混合遗传决策树归纳算法的经验评估》),他在二分法后使用第六个字段(饮料)作为分类的因变量。由于过去普遍存在误解,研究人员应注意清楚地说明其方法。
Attribute Information:
1.平均红细胞体积
2.碱性磷酸酶
3.谷丙转氨酶
4.门冬氨酸转氨酶
5.γ-谷氨酰转肽酶
6.饮料每天饮用半品脱酒精饮料的数量
7.BUPA研究人员创建的选择器字段,用于将数据拆分为训练集/测试集
Relevant Papers:
McDermott & Forsyth 2016, Diagnosing a disorder in a classification benchmark, Pattern Recognition Letters, Volume 73.
Papers That Cite This Data Set1:
Zhi-Hua Zhou and Yuan Jiang. NeC4.5: Neural Ensemble based C4.5. IEEE Trans. Knowl. Data Eng, 16. 2004. [View Context].
Yuan Jiang and Zhi-Hua Zhou. Editing Training Data for kNN Classifiers with Neural Network Ensemble. ISNN (1). 2004. [View Context].
Glenn Fung and M. Murat Dundar and Jinbo Bi and Bharat Rao. A fast iterative algorithm for fisher discriminant using heterogeneous kernels. ICML. 2004. [View Context].
Jochen Garcke and Michael Griebel. Classification with sparse grids using simplicial basis functions. Intell. Data Anal, 6. 2002. [View Context].
Michail Vlachos and Carlotta Domeniconi and Dimitrios Gunopulos and George Kollios and Nick Koudas. Non-linear dimensionality reduction techniques for classification and visualization. KDD. 2002. [View Context].
Xavier Llor and David E. Goldberg and Ivan Traus and Ester Bernad i Mansilla. Accuracy, Parsimony, and Generality in Evolutionary Learning Systems via Multiobjective Selection. IWLCS. 2002. [View Context].
Jochen Garcke and Michael Griebel. Data mining with sparse grids using simplicial basis functions. KDD. 2001. [View Context].
Jochen Garcke and Michael Griebel and Michael Thess. Data Mining with Sparse Grids. Computing, 67. 2001. [View Context].
Petri Kontkanen and Jussi Lahtinen and Petri Myllym?ki and Henry Tirri. Unsupervised Bayesian visualization of high-dimensional data. KDD. 2000. [View Context].
Carlotta Domeniconi and Jing Peng and Dimitrios Gunopulos. An Adaptive Metric Machine for Pattern Classification. NIPS. 2000. [View Context].
I?aki Inza and Pedro Larra?aga and Basilio Sierra and Ramon Etxeberria and Jose Antonio Lozano and Jos Manuel Pe?a. Representing the behaviour of supervised classification learning algorithms by Bayesian networks. Pattern Recognition Letters, 20. 1999. [View Context].
Guido Lindner and Rudi Studer. AST: Support for Algorithm Selection with a CBR Approach. PKDD. 1999. [View Context].
Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997. [View Context].
Jennifer A. Blue and Kristin P. Bennett. Hybrid Extreme Point Tabu Search. Department of Mathematical Sciences Rensselaer Polytechnic Institute. 1996. [View Context].
Peter D. Turney. Cost-Sensitive Classification: Empirical evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. CoRR, csAI/9503102. 1995. [View Context].
Gabor Melli. A Lazy Model-based Approach to On-Line Classification. University of British Columbia. 1989. [View Context].
Aynur Akku and H. Altay Guvenir. Weighting Features in k Nearest Neighbor Classification on Feature Projections. Department of Computer Engineering and Information Science Bilkent University. [View Context].
Greg Ridgeway. The State of Boosting. Department of Statistics University of Washington. [View Context].
Creators:
BUPA Medical Research Ltd.
Donor:
Richard S. Forsyth
8 Grosvenor Avenue
Mapperley Park
Nottingham NG3 5DX
0602-621676
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。