您好,欢迎访问三七文档
.AnintroductiontomachinelearningandprobabilisticgraphicalmodelsKevinMurphyMITAILabPresentedatIntel’sworkshopon“Machinelearningforthelifesciences”,Berkeley,CA,3November20032OverviewSupervisedlearningUnsupervisedlearningGraphicalmodelsLearningrelationalmodelsThankstoNirFriedman,StuartRussell,LeslieKaelblingandvariouswebsourcesforlettingmeusemanyoftheirslides3SupervisedlearningyesnoColorShapeSizeOutputBlueTorusBigYBlueSquareSmallYBlueStarSmallYRedArrowSmallNF(x1,x2,x3)-tLearntoapproximatefunctionfromatrainingsetof(x,t)pairs4SupervisedlearningX1X2X3TBTBYBSSYBSSYRASNX1X2X3TBAS?YCS?LearnerTrainingdataHypothesisTestingdataTYNPrediction5Keyissue:generalizationyesno??Can’tjustmemorizethetrainingset(overfitting)6HypothesisspacesDecisiontreesNeuralnetworksK-nearestneighborsNaïveBayesclassifierSupportvectormachines(SVMs)Boosteddecisionstumps…7Perceptron(neuralnetwithnohiddenlayers)Linearlyseparabledata8Whichseparatinghyperplane?9Thelinearseparatorwiththelargestmarginisthebestonetopickmargin10Whatifthedataisnotlinearlyseparable?11Kerneltrickx1x2z1z2z3kernel222xxxyyyKernelimplicitlymapsfrom2Dto3D,makingproblemlinearlyseparable12SupportVectorMachines(SVMs)Twokeyideas:LargemarginsKerneltrick13BoostingSimpleclassifiers(weaklearners)canhavetheirperformanceboostedbytakingweightedcombinationsBoostingmaximizesthemargin14SupervisedlearningsuccessstoriesFacedetectionSteeringanautonomouscaracrosstheUSDetectingcreditcardfraudMedicaldiagnosis…15UnsupervisedlearningWhatiftherearenooutputlabels?16K-meansclustering1.Guessnumberofclusters,K2.Guessinitialclustercenters,1,23.Assigndatapointsxitonearestclustercenter4.Re-computeclustercentersbasedonassignmentsReiterate17AutoClass(Cheesemanetal,1986)EMalgorithmformixturesofGaussians“Soft”versionofK-meansUsesBayesiancriteriontoselectKDiscoverednewtypesofstarsfromspectraldataDiscoverednewclassesofproteinsandintronsfromDNA/proteinsequencedatabases18Hierarchicalclustering.PrincipalComponentAnalysis(PCA)PCAseeksaprojectionthatbestrepresentsthedatainaleast-squaressense.PCAreducesthedimensionalityoffeaturespacebyrestrictingattentiontothosedirectionsalongwhichthescatterofthecloudisgreatest.20Discoveringnonlinearmanifolds21Combiningsupervisedandunsupervisedlearning22Discoveringrules(datamining)Occup.IncomeEduc.SexMarriedAgeStudent$10kMAMS22Student$20kPhDFS24Doctor$80kMDMM30Retired$30kHSFM60Findthemostfrequentpatterns(associationrules)Numinhousehold=1^numchildren=0=language=EnglishLanguage=English^Income$40k^Married=false^numchildren=0=education{college,gradschool}23Unsupervisedlearning:summaryClusteringHierarchicalclusteringLineardimensionalityreduction(PCA)Non-lineardim.ReductionLearningrules24Discoveringnetworks?Fromdatavisualizationtocausaldiscovery25NetworksinbiologyMostprocessesinthecellarecontrolledbynetworksofinteractingmolecules:MetabolicNetworkSignalTransductionNetworksRegulatoryNetworksNetworkscanbemodeledatmultiplelevelsofdetail/realismMolecularlevelConcentrationlevelQualitativelevelDecreasingdetail26Molecularlevel:Lysis-LysogenycircuitinLambdaphageArkinetal.(1998),Genetics149(4):1633-485genes,67parametersbasedon50yearsofresearchStochasticsimulationrequiredsupercomputer27Concentrationlevel:metabolicpathwaysUsuallymodeledwithdifferentialequationsw23g1g2g3g4g5w12w5528Qualitativelevel:BooleanNetworks29ProbabilisticgraphicalmodelsSupportsgraph-basedmodelingatvariouslevelsofdetailModelscanbelearnedfromnoisy,partialdataCanmodel“inherently”stochasticphenomena,e.g.,molecular-levelfluctuations…Butcanalsomodeldeterministic,causalprocesses.Theactualscienceoflogicisconversantatpresentonlywiththingseithercertain,impossible,orentirelydoubtful.Thereforethetruelogicforthisworldisthecalculusofprobabilities.--JamesClerkMaxwellProbabilitytheoryisnothingbutcommonsensereducedtocalculation.--PierreSimonLaplace30Graphicalmodels:outlineWhataregraphicalmodels?InferenceStructurelearning31Simpleprobabilisticmodel:linearregressionYY=+X+noiseDeterministic(functional)relationshipX32Simpleprobabilisticmodel:linearregressionYY=+X+noiseDeterministic(functional)relationshipX“Learning”=estimatingparameters,,from(x,y)pairs.CanbeestimatebyleastsquaresIstheempiricalmeanIstheresidualvariance33PiecewiselinearregressionLatent“switch”variable–hiddenprocessatwork34ProbabilisticgraphicalmodelforpiecewiselinearregressionXYQ•HiddenvariableQchooseswhichsetofparameterstouseforpredictingY.•ValueofQdependsonvalueofinputX.outputinput•Thisisanexampleof“mixturesofexperts”LearningisharderbecauseQishidden,sowedon’tknowwhichdatapointstoassigntoeachline;canbesolvedwithEM(c.f.,K-means)35ClassesofgraphicalmodelsProbabilisticmodelsGraphicalmodelsDirectedUndirectedBayesnetsMRFsDBNs36FamilyofAlarmBayesianNetworksQualitativepart:Directedacyclicgraph(DAG)Nodes-randomvariablesEdges-directinfluenceQuantitativepart:Setofconditionalprobabilitydistributions0.90.1ebe0.20.80.010.990.90.1bebbeBEP(A|E,B)EarthquakeRadioBurglaryAlarmCallCompactrepresentationofprobabilitydistributionsv
本文标题:An introduction to machine learning and graphical
链接地址:https://www.777doc.com/doc-6447487 .html