公开数据集
数据结构 ? 604K
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Data Set Information:
这些数据是MC生成的(见下文),用成像技术模拟地面大气切伦科夫伽马望远镜中高能伽马粒子的注册。切伦科夫伽马望远镜观察高能伽马射线,利用伽马射线引发的电磁簇射中产生的、在大气中形成的带电粒子发出的辐射。这种切伦科夫辐射(从可见到紫外波长)通过大气泄漏,并被记录在探测器中,从而可以重建淋浴参数。可用的信息由入射切伦科夫光子在光电倍增管上留下的脉冲组成,光电倍增管排列在一个平面上,摄像机。根据初级伽马的能量,总共收集了几百到大约10000个切伦科夫光子,形成模式(称为簇射图像),从而可以从统计上区分初级伽马(信号)引起的光子与高层大气中宇宙射线引发的强子簇射图像(背景)。
通常,经过一些预处理后的淋浴图像是一个拉长的簇。如果淋浴轴平行于望远镜光轴,即如果望远镜轴指向点光源,则其长轴朝向相机中心。在相机平面中执行主成分分析,从而产生相关轴并定义椭圆。如果沉积分布为二元高斯分布,则为等密度椭圆。该椭圆的特征参数(通常称为Hillas参数)是可用于判别的图像参数之一。能量沉积通常沿长轴不对称,这种不对称性也可用于区分。此外,还存在进一步的鉴别特征,如图像平面中簇的范围,或沉积总量。
数据集由蒙特卡罗程序Corsika生成,如所述:
D.Heck等人,CORSIKA,一个模拟大范围空气簇射的蒙特卡罗程序,
Forschungszentrum Karlsruhe FZKA 6019(1998年)。
[Web link]
The program was run with parameters allowing to observe events with energies down to below 50 GeV.
Attribute Information:
1. fLength: continuous # major axis of ellipse [mm]
2. fWidth: continuous # minor axis of ellipse [mm]
3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]
4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]
5. fConc1: continuous # ratio of highest pixel over fSize [ratio]
6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]
7. fM3Long: continuous # 3rd root of third moment along major axis [mm]
8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]
9. fAlpha: continuous # angle of major axis with vector to origin [deg]
10. fDist: continuous # distance from origin to center of ellipse [mm]
11. class: g,h # gamma (signal), hadron (background)
g = gamma (signal): 12332
h = hadron (background): 6688
For technical reasons, the number of h events is underestimated. In the real data, the h class represents the majority of the events.
The simple classification accuracy is not meaningful for this data, since classifying a background event as signal is worse than classifying a signal event as background. For comparison of different classifiers an ROC curve has to be used. The relevant points on this curve are those, where the probability of accepting a background event as signal is below one of the following thresholds: 0.01, 0.02, 0.05, 0.1, 0.2 depending on the required quality of the sample of the accepted events for different experiments.
Relevant Papers:
Bock, R.K., Chilingarian, A., Gaug, M., Hakl, F., Hengstebeck, T., Jirina, M., Klaschka, J., Kotrc, E., Savicky, P., Towers, S., Vaicilius, A., Wittek W. (2004).
Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope.
Nucl.Instr.Meth. A, 516, pp. 511-528.
P. Savicky, E. Kotrc.
Experimental Study of Leaf Confidences for Random Forest.
Proceedings of COMPSTAT 2004, In: Computational Statistics. (Ed.: Antoch J.) - Heidelberg, Physica Verlag 2004, pp. 1767-1774.
J. Dvorak, P. Savicky.
Softening Splits in Decision Trees Using Simulated Annealing.
Proceedings of ICANNGA 2007, Warsaw, (Ed.: Beliczynski et. al), Part I, LNCS 4431, pp. 721-729.
Citation Request:
Please refer to the Machine Learning Repository's citation policy
Original Owner:
R. K. Bock
Major Atmospheric Gamma Imaging Cherenkov Telescope project (MAGIC)
http://wwwmagic.mppmu.mpg.de
rkb '@' mail.cern.ch
Donor:
P. Savicky
Institute of Computer Science, AS of CR
Czech Republic
savicky '@' cs.cas.cz
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。