公开数据集
数据结构 ? 1.5M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Data Set Information:
源数据集需要通过编程进行组合。包括许多变量,以便可以测试选择或学习属性权重的算法。然而,明显不相关的属性没有包括在内;如果与犯罪有任何可能的联系(N=125),加上犯罪变量(潜在因变量),则选择属性。数据集中包含的变量涉及社区,如被视为城市人口的百分比和家庭收入中位数,以及涉及执法的变量,如警察的人均人数和分配到毒品单位的警察的百分比。可以预测的犯罪属性(N=18)是联邦调查局认定的8种“指数犯罪”(谋杀、强奸、抢劫等)、每种犯罪的人均(实际上每100000人)版本以及人均暴力犯罪和人均非暴力犯罪)。
一个限制是,LEMAS调查的对象是至少有100名警官的警察部门,外加随机抽样的较小部门。出于我们的目的,省略了普查和犯罪数据集中未发现的社区。许多社区缺少LEMAS数据。
人均犯罪变量是使用1995年联邦调查局数据中的人口值(与1990年人口普查值不同)计算的。
人均暴力犯罪变量是使用人口和在美国被视为暴力犯罪的犯罪变量之和来计算的:谋杀、强奸、抢劫和袭击。显然,在一些州对强奸案的计数存在一些争议。这些导致强奸价值缺失,从而导致人均暴力犯罪价值缺失。许多被忽略的社区来自美国中西部(明尼苏达州、伊利诺伊州和密歇根州有许多这样的社区)。
人均非暴力犯罪变量是使用在美国被视为非暴力犯罪的犯罪变量之和来计算的:入室盗窃、盗窃、汽车盗窃和纵火。(还有许多其他类型的犯罪,仅包括FBI的“指数犯罪”)
必须对数据集进行进一步的预处理。从18个可能的变量中选择所需的因变量。将小计(如谋杀)作为自变量,预测总犯罪率(如暴力犯罪)既不有趣也不合适。还有一些识别变量(社区名称、县代码、社区代码)是不可预测的,并且会妨碍某些算法。Weka的无监督属性移除过滤器可用于移除不需要的属性。
联邦调查局指出,使用这些数据来评估社区过于简单,因为许多相关因素没有包括在内。例如,在其他条件相同的情况下,拥有大量游客的社区的人均犯罪率(以居民衡量)将高于游客较少的社区。
Attribute Information:
(125 predictive, 4 non-predictive, 18 potential goal)
-- communityname: Community name - not predictive - for information only (string)
-- state: US state (by 2 letter postal abbreviation)(nominal)
-- countyCode: numeric code for county - not predictive, and many missing values (numeric)
-- communityCode: numeric code for community - not predictive and many missing values (numeric)
-- fold: fold number for non-random 10 fold cross validation, potentially useful for debugging, paired tests - not predictive (numeric - integer)
-- population: population for community: (numeric - expected to be integer)
-- householdsize: mean people per household (numeric - decimal)
-- racepctblack: percentage of population that is african american (numeric - decimal)
-- racePctWhite: percentage of population that is caucasian (numeric - decimal)
-- racePctAsian: percentage of population that is of asian heritage (numeric - decimal)
-- racePctHisp: percentage of population that is of hispanic heritage (numeric - decimal)
-- agePct12t21: percentage of population that is 12-21 in age (numeric - decimal)
-- agePct12t29: percentage of population that is 12-29 in age (numeric - decimal)
-- agePct16t24: percentage of population that is 16-24 in age (numeric - decimal)
-- agePct65up: percentage of population that is 65 and over in age (numeric - decimal)
-- numbUrban: number of people living in areas classified as urban (numeric - expected to be integer)
-- pctUrban: percentage of people living in areas classified as urban (numeric - decimal)
-- medIncome: median household income (numeric - may be integer)
-- pctWWage: percentage of households with wage or salary income in 1989 (numeric - decimal)
-- pctWFarmSelf: percentage of households with farm or self employment income in 1989 (numeric - decimal)
-- pctWInvInc: percentage of households with investment / rent income in 1989 (numeric - decimal)
-- pctWSocSec: percentage of households with social security income in 1989 (numeric - decimal)
-- pctWPubAsst: percentage of households with public assistance income in 1989 (numeric - decimal)
-- pctWRetire: percentage of households with retirement income in 1989 (numeric - decimal)
-- medFamInc: median family income (differs from household income for non-family households) (numeric - may be integer)
-- perCapInc: per capita income (numeric - decimal)
-- whitePerCap: per capita income for caucasians (numeric - decimal)
-- blackPerCap: per capita income for african americans (numeric - decimal)
-- indianPerCap: per capita income for native americans (numeric - decimal)
-- AsianPerCap: per capita income for people with asian heritage (numeric - decimal)
-- OtherPerCap: per capita income for people with 'other' heritage (numeric - decimal)
-- HispPerCap: per capita income for people with hispanic heritage (numeric - decimal)
-- NumUnderPov: number of people under the poverty level (numeric - expected to be integer)
-- PctPopUnderPov: percentage of people under the poverty level (numeric - decimal)
-- PctLess9thGrade: percentage of people 25 and over with less than a 9th grade education (numeric - decimal)
-- PctNotHSGrad: percentage of people 25 and over that are not high school graduates (numeric - decimal)
-- PctBSorMore: percentage of people 25 and over with a bachelors degree or higher education (numeric - decimal)
-- PctUnemployed: percentage of people 16 and over, in the labor force, and unemployed (numeric - decimal)
-- PctEmploy: percentage of people 16 and over who are employed (numeric - decimal)
-- PctEmplManu: percentage of people 16 and over who are employed in manufacturing (numeric - decimal)
-- PctEmplProfServ: percentage of people 16 and over who are employed in professional services (numeric - decimal)
-- PctOccupManu: percentage of people 16 and over who are employed in manufacturing (numeric - decimal) #### No longer sure of difference from PctEmplManu - may include unemployed manufacturing workers ####
-- PctOccupMgmtProf: percentage of people 16 and over who are employed in management or professional occupations (numeric - decimal)
-- MalePctDivorce: percentage of males who are divorced (numeric - decimal)
-- MalePctNevMarr: percentage of males who have never married (numeric - decimal)
-- FemalePctDiv: percentage of females who are divorced (numeric - decimal)
-- TotalPctD
-- Creator: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA
-- culled from 1990 US Census, 1995 US FBI Uniform Crime Report,
1990 US Law Enforcement Management and Administrative Statistics Survey,
available from ICPSR at U of Michigan.
-- Donor: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。