您好,欢迎访问三七文档
IntroductiontoDataMiningInstructor’sSolutionManualPang-NingTanMichaelSteinbachVipinKumarCopyrightc2006PearsonAddison-Wesley.Allrightsreserved.Contents1Introduction12Data53ExploringData194Classification:BasicConcepts,DecisionTrees,andModelEvaluation255Classification:AlternativeTechniques456AssociationAnalysis:BasicConceptsandAlgorithms717AssociationAnalysis:AdvancedConcepts958ClusterAnalysis:BasicConceptsandAlgorithms1259ClusterAnalysis:AdditionalIssuesandAlgorithms14710AnomalyDetection157iii1Introduction1.Discusswhetherornoteachofthefollowingactivitiesisadataminingtask.(a)Dividingthecustomersofacompanyaccordingtotheirgender.No.Thisisasimpledatabasequery.(b)Dividingthecustomersofacompanyaccordingtotheirprof-itability.No.Thisisanaccountingcalculation,followedbytheapplica-tionofathreshold.However,predictingtheprofitabilityofanewcustomerwouldbedatamining.(c)Computingthetotalsalesofacompany.No.Again,thisissimpleaccounting.(d)Sortingastudentdatabasebasedonstudentidentificationnum-bers.No.Again,thisisasimpledatabasequery.(e)Predictingtheoutcomesoftossinga(fair)pairofdice.No.Sincethedieisfair,thisisaprobabilitycalculation.Ifthediewerenotfair,andweneededtoestimatetheprobabilitiesofeachoutcomefromthedata,thenthisismoreliketheproblemsconsideredbydatamining.However,inthisspecificcase,solu-tionstothisproblemweredevelopedbymathematiciansalongtimeago,andthus,wewouldn’tconsiderittobedatamining.(f)Predictingthefuturestockpriceofacompanyusinghistoricalrecords.Yes.Wewouldattempttocreateamodelthatcanpredictthecontinuousvalueofthestockprice.Thisisanexampleofthe2Chapter1Introductionareaofdataminingknownaspredictivemodelling.Wecoulduseregressionforthismodelling,althoughresearchersinmanyfieldshavedevelopedawidevarietyoftechniquesforpredictingtimeseries.(g)Monitoringtheheartrateofapatientforabnormalities.Yes.Wewouldbuildamodelofthenormalbehaviorofheartrateandraiseanalarmwhenanunusualheartbehavioroccurred.Thiswouldinvolvetheareaofdataminingknownasanomalyde-tection.Thiscouldalsobeconsideredasaclassificationproblemifwehadexamplesofbothnormalandabnormalheartbehavior.(h)Monitoringseismicwavesforearthquakeactivities.Yes.Inthiscase,wewouldbuildamodelofdifferenttypesofseismicwavebehaviorassociatedwithearthquakeactivitiesandraiseanalarmwhenoneofthesedifferenttypesofseismicactivitywasobserved.Thisisanexampleoftheareaofdataminingknownasclassification.(i)Extractingthefrequenciesofasoundwave.No.Thisissignalprocessing.2.SupposethatyouareemployedasadataminingconsultantforanIn-ternetsearchenginecompany.Describehowdataminingcanhelpthecompanybygivingspecificexamplesofhowtechniques,suchasclus-tering,classification,associationrulemining,andanomalydetectioncanbeapplied.Thefollowingareexamplesofpossibleanswers.•Clusteringcangroupresultswithasimilarthemeandpresentthemtotheuserinamoreconciseform,e.g.,byreportingthe10mostfrequentwordsinthecluster.•Classificationcanassignresultstopre-definedcategoriessuchas“Sports,”“Politics,”etc.•Sequentialassociationanalysiscandetectthatthatcertainqueriesfollowcertainotherquerieswithahighprobability,allowingformoreefficientcaching.•Anomalydetectiontechniquescandiscoverunusualpatternsofusertraffic,e.g.,thatonesubjecthassuddenlybecomemuchmorepopular.Advertisingstrategiescouldbeadjustedtotakeadvantageofsuchdevelopments.33.Foreachofthefollowingdatasets,explainwhetherornotdataprivacyisanimportantissue.(a)Censusdatacollectedfrom1900–1950.No(b)IPaddressesandvisittimesofWebuserswhovisityourWebsite.Yes(c)ImagesfromEarth-orbitingsatellites.No(d)Namesandaddressesofpeoplefromthetelephonebook.No(e)NamesandemailaddressescollectedfromtheWeb.No2Data1.IntheinitialexampleofChapter2,thestatisticiansays,“Yes,fields2and3arebasicallythesame.”Canyoutellfromthethreelinesofsampledatathatareshownwhyshesaysthat?Field2Field3≈7forthevaluesdisplayed.Whileitcanbedangeroustodrawcon-clusionsfromsuchasmallsample,thetwofieldsseemtocontainessentiallythesameinformation.2.Classifythefollowingattributesasbinary,discrete,orcontinuous.Alsoclassifythemasqualitative(nominalorordinal)orquantitative(intervalorratio).Somecasesmayhavemorethanoneinterpretation,sobrieflyindicateyourreasoningifyouthinktheremaybesomeambiguity.Example:Ageinyears.Answer:Discrete,quantitative,ratio(a)TimeintermsofAMorPM.Binary,qualitative,ordinal(b)Brightnessasmeasuredbyalightmeter.Continuous,quantitative,ratio(c)Brightnessasmeasuredbypeople’sjudgments.Discrete,qualitative,ordinal(d)Anglesasmeasuredindegreesbetween0◦and360◦.Continuous,quan-titative,ratio(e)Bronze,Silver,andGoldmedalsasawardedattheOlympics.Discrete,qualitative,ordinal(f)Heightabovesealevel.Continuous,quantitative,interval/ratio(de-pendsonwhethersealevelisregardedasanarbitraryorigin)(g)Numberofpatientsinahospital.Discrete,quantitative,ratio(h)ISBNnumbersforbooks.(LookuptheformatontheWeb.)Discrete,qualitative,nominal(ISBNnumbersdohaveorderinformation,though)6Chapter2Data(i)Abilitytopasslightintermsofthefollowingvalues:opaque,translu-cent,transparent.Discrete,qualitative,ordinal(j)Militaryrank.Discrete,qualitative,ordinal(k)Distancefromthecenterofc
本文标题:数据挖掘导论答案
链接地址:https://www.777doc.com/doc-4436661 .html