公开数据集
数据结构 ? 144K
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Francesca Grisoni, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milano Chemometrics & QSAR Research Group, francesca.grisoni '@' unimib.it
Viviana Consonni, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milano Chemometrics & QSAR Research Group, viviana.consonni '@' unimib.it
Marco Vighi, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences
Sara Villa, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences
RobertoTodeschini, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milano Chemometrics & QSAR Research Group, roberto.todeschini '@' unimib.it
Data Set Information:
This dataset contains manually-curated experimental bioconcentration factor (BCF) for 1058 molecules (continuous values). Each row contains a molecule, identified by a CAS number, a name (if available), and a SMILES string. Additionally, the KOW (experimental or predicted) is reported. In this database, you will also find Extended Connectivity Fingerprints (binary vectors of 1024 bits), to be used as independent variables to predict the BCF. You can find additional information in the referenced papers.
In case you had questions, please do not hesitate to contact us!
Attribute Information:
The provided zip file contains two files.
(I) The file 'QSAR BCF KOW' contains the following attributes:
1. CAS number (molecule identifier)
2. Molecule Name (if not available, marked as 'n.a.')
3. SMILES string to identify the 2D molecular structure
4. LogKOW: octanol water partitioning coefficient (experimental or predicted, as indicated by the column 'KOW Type'
5. KOW Type: indicates whether the logKOW value is experimental or predicted
6. Experimental logBCF (quantitative response): experimental fish bioconcentration factor (logarithm form)
(II) The file 'ECFP_1024_m0-2_b2_c.txt' contains the following molecular descriptors (to be used to predict the BCF):
- Extended Connectivity Fingerprints (ECFPs): binary descriptors useful to predict the experimental logBCF (computed with Dragon7, default settings --> details specified in the file)
Each row corresponds to one molecule, as identified by the SMILES field. The molecules are in the same order as in the previous file.
Relevant Papers:
1. Grisoni, F., Consonni, V., Villa, S., Vighi, M. and Todeschini, R., 2015. QSAR models for bioconcentration: Is the increase in the complexity justified by more accurate predictions?. Chemosphere, 127, pp.171-179. --> Procedure for data curation.
2. Grisoni, F., Consonni, V., Vighi, M., Villa, S. and Todeschini, R., 2016. Expert QSAR system for predicting the bioconcentration factor under the REACH regulation. Environmental research, 148, pp.507-512. --> Benchmark on the performance for this dataset
3. Grisoni, F., Consonni, V., Vighi, M., Villa, S. and Todeschini, R., 2016. Investigating the mechanisms of bioconcentration through QSAR classification trees. Environment international, 88, pp.198-205. --> Relationship between KOW and BCF
Citation Request:
If you publish results based on this dataset or parts of it, please cite the following paper:
@article{grisoni2015,
title={QSAR models for bioconcentration: Is the increase in the complexity justified by more accurate predictions?},
author={Grisoni, Francesca and Consonni, Viviana and Villa, Sara and Vighi, Marco and Todeschini, Roberto},
journal={Chemosphere},
volume={127},
pages={171--179},
year={2015},
publisher={Elsevier}
}
If you use the ECFP values, additionally please cite the following software:
Dragon (Software for Molecular Descriptor Calculation) Version 6.0 a€” 2012
[Web link] (2012)
And paper:
@article{rogers2010,
title={Extended-connectivity fingerprints},
author={Rogers, David and Hahn, Mathew},
journal={Journal of chemical information and modeling},
volume={50},
number={5},
pages={742--754},
year={2010},
publisher={ACS Publications}
}
--> Thanks and happy predicting!
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。