公开数据集
数据结构 ? 14.96M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
# Context
Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data
Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons:
1. I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset.
2. Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend.
3. I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will
# Content
Name: Zeeshan-ul-hassan Usmani
Age: 38 Years
Country of Birth: Pakistan
Country of Ancestors: India (Utter Pradesh - UP)
File: GenomeZeeshanUsmani.csv
Size: 15 MB
Sources: 23andMe Personalized Genome Report
The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters.
The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.
A complete list of the exact SNPs (base pairs) available and their data-set index can be found at
https://api.23andme.com/res/txt/snps.b4e00fe1db50.data
For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes
Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes
# Acknowledgements
Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.”
# Useful Links
You may use the following human genome database sites for help:
- GenBank - https://www.ncbi.nlm.nih.gov/genbank/
- The Human Genome Project - https://www.genome.gov/hgp/
- Genomes OnLine Database (GOLD) - https://gold.jgi.doe.gov
- Complete Genomics - http://www.completegenomics.com/public-data/
# Inspiration
Some ideas worth exploring:
- Is the individual in question more susceptible to cancer?
- Does he tend to gain weight?
- Where is his place of origin?
- Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.
- How does this phenotype SNPs compare with other similar datasets from the western-world?
- What would be the likely cause of death for this person?
- What are the most likely diseases/illnesses this person is going to face in lifetime?
- What is unique about this dataset?
- What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?
# Sample Reports
Please check out following reports to understand what can be done with this data
Ancestry –
https://www.23andme.com/published-report/eeb4f9bbd6b5474f/?share_id=f6c5562848e84586
Weight Report -
https://you.23andme.com/published/reports/65c9af9f8223456d/?share_id=0126f129e4f3458b
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。