您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 其它文档 > Chapter 5. Concept Description
2020年2月26日星期三DataMining:ConceptsandTechniques1DataMining:ConceptsandTechniques—SlidesforTextbook——Chapter5—DepartmentofComputerScienceAndEngineeringZhongmeiZhouEmailzzm@zju.edu.cn2020年2月26日星期三DataMining:ConceptsandTechniques2Chapter5:ConceptDescription:CharacterizationandComparisonWhatisconceptdescription?Datageneralizationandsummarization-basedcharacterizationAnalyticalcharacterization:AnalysisofattributerelevanceMiningclasscomparisons:DiscriminatingbetweendifferentclassesMiningdescriptivestatisticalmeasuresinlargedatabasesDiscussionSummary2020年2月26日星期三DataMining:ConceptsandTechniques3Descriptivevs.predictivedataminingFromadataanalysispointofview,dataminingcanbeclassifiedintotwocategories:descriptivedataminingandpredictivedatamining.descriptivedataminingdescribesthedatasetinaconciseandsummarativemannerandpresentsinterestinggeneralpropertiesofthedata;predictivedatamininganalyzesthedatainordertoconstructoneorasetofmodels,andattemptstopredictthebehaviorofnewdatasets.WhatisConceptDescription?Thesimplestkindofdescriptivedataminingisconceptdescription.Conceptdescription:Conceptdescriptiongeneratesdescriptionsforcharacterizationandcomparisonofthedata.Itissometimescalledclassdescription.Characterization:providesaconciseandsuccinctsummarizationofthegivencollectionofdataComparison(alsoknownasdiscrimination):providesdescriptionscomparingtwoormorecollectionsofdata2020年2月26日星期三DataMining:ConceptsandTechniques5ConceptDescriptionvs.OLAPConceptdescription:canhandlecomplexdatatypesoftheattributesandtheiraggregationsamoreautomatedprocessOLAP:restrictedtoasmallnumberofdimensionandmeasuretypesuser-controlledprocess2020年2月26日星期三DataMining:ConceptsandTechniques6Chapter5:ConceptDescription:CharacterizationandComparisonWhatisconceptdescription?Datageneralizationandsummarization-basedcharacterizationAnalyticalcharacterization:AnalysisofattributerelevanceMiningclasscomparisons:DiscriminatingbetweendifferentclassesMiningdescriptivestatisticalmeasuresinlargedatabasesDiscussionSummary2020年2月26日星期三DataMining:ConceptsandTechniques7DataGeneralizationandSummarization-basedCharacterizationDatageneralizationAprocesswhichabstractsalargesetoftask-relevantdatainadatabasefromalowconceptuallevelstohigherones.Approaches:Datacubeapproach(OLAPapproach)Attribute-orientedinductionapproach12345Conceptuallevels2020年2月26日星期三DataMining:ConceptsandTechniques8Characterization:DataCubeApproachDataarestoredindatacubeIdentifyexpensivecomputationse.g.,count(),sum(),average(),max()PerformcomputationsandstoreresultsindatacubesGeneralizationandspecializationcanbeperformedonadatacubebyroll-upanddrill-downAnefficientimplementationofdatageneralization2020年2月26日星期三DataMining:ConceptsandTechniques9DataCubeApproach(Cont…)Limitationscanonlyhandledatatypesofdimensionstosimplenonnumericdataandofmeasurestosimpleaggregatednumericvalues.Lackofintelligentanalysis,can’ttellwhichdimensionsshouldbeusedandwhatlevelsshouldthegeneralizationreach2020年2月26日星期三DataMining:ConceptsandTechniques10Attribute-OrientedInductionProposedin1989(KDD‘89workshop)Notconfinedtocategoricaldatanorparticularmeasures.Howitisdone?Collectthetask-relevantdata(initialrelation)usingarelationaldatabasequeryPerformgeneralizationbyattributeremovalorattributegeneralization.Applyaggregationbymergingidentical,generalizedtuplesandaccumulatingtheirrespectivecountsInteractivepresentationwithusers2020年2月26日星期三DataMining:ConceptsandTechniques11ExampleDMQL:DescribegeneralcharacteristicsofgraduatestudentsintheBig-UniversitydatabaseuseBig_University_DBminecharacteristicsas“Science_Students”inrelevancetoname,gender,major,birth_place,birth_date,residence,phone#,gpafromstudentwherestatusin“graduate”CorrespondingSQLstatement:Selectname,gender,major,birth_place,birth_date,residence,phone#,gpafromstudentwherestatusin{“Msc”,“MBA”,“PhD”}BasicPrinciplesofAttribute-OrientedInductionDatafocusing:task-relevantdata,includingdimensions,andtheresultistheinitialrelation.Attribute-removal:removeattributeAifthereisalargesetofdistinctvaluesforAbut(1)thereisnogeneralizationoperatoronA,or(2)A’shigherlevelconceptsareexpressedintermsofotherattributes.Attribute-generalization:IfthereisalargesetofdistinctvaluesforA,andthereexistsasetofgeneralizationoperatorsonA,thenselectanoperatorandgeneralizeA.Attribute-thresholdcontrol:typical2-8,specified/default.Generalizedrelationthresholdcontrol:controlthefinalrelation/rulesize.seeexamplep1862020年2月26日星期三DataMining:ConceptsandTechniques13Table5.1NameGenderMajorBirth-PlaceBirth_dateResidencePhone#GPAJimWoodmanMCSVancouver,BC,Canada8-12-763511MainSt.,Richmond687-45983.67ScottLachanceMCSMontreal,Que,Canada28-7-753451stAve.,Richmond253-91063.70LauraLee…F…Physics…Seattle,WA,USA…25-8-70…125AustinAve.,Burnaby…420-5232…3.83…RemovedRetainedSci,Eng,BusCountryAgerangeCityRemovedExcl,VG,..2020年2月26日星期三DataMining:ConceptsandTechniques14thegeneralizationproceedsasfollows:1.name:Sincetherearealargenumberofdistinctvaluesfornameandthereisnogeneralizationoperationdefin
本文标题:Chapter 5. Concept Description
链接地址:https://www.777doc.com/doc-4011385 .html