您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 人事档案/员工关系 > semi-supervised learning半监督学习
Semi-supervisedLearning2017.11651TraditionalmachinelearningparadigmsSupervisedlearningLearningfromlabeleddata,,e.g.,classification,regressionUnsupervisedlearningLearningfromdatawithoutlabel,e.g.,clusteringdata652•Bigdataera,obtainingdataisgettingeasierandeasier.Isdataenough?653•Labelingthedataisdifficult!expensive,timeconsuming,sometimesneedexperts.•Forexample,medicalimageanalysis,webpagerecommendation.Whysemi-supervisedlearning654WhenfewlabeleddataandlargeamountofunlabeleddataWhysemi-supervisedlearning655OutlineWhatissemi-supervisedlearning?UsefulnessofunlabeleddataSemi-supervisedmethods•Generativemethod•Semi-supervisedSVM•Graphic-basedsemi-supervisedlearning•Disagreement-basedmethodsSemi-supervisedclustering656Conceptofsemi-supervisedlearningGivetrainingdatasetDl={(x1,y1),(x2,y2),…,(xl,yl)},yi(i=1…l)islabel,anddatasetDu={xl+1,xl+2,…,xl+u},lu.Semi-supervisedlearningishalfwaybetweensupervisedandunsupervisedlearning.Semi-supervisedlearning(SSL)islearningafunctionf(x)byusingbothDlandDu,andthenimprove(atleastnotreduce)theclassificationperformance.657UsefulnessofunlabeleddataCanweuseunlabeleddatatoimprovethelearningability?658AfterseeingunlabeleddataOutlineWhatissemi-supervisedlearning?UsefulnessofunlabeleddataSemi-supervisedmethodsSemi-supervisedclustering659Semi-supervisedlearningmethodsSemi-supervisedmethods•Disagreement-basedmethods•Semi-supervisedSVM•Generativemethod•Graphic-basedsemi-supervisedlearning2008年ML大会“十年最佳论文奖”2009年ML大会“十年最佳论文奖”2013年ML大会“十年最佳论文奖”6510Semi-supervisedlearningmethodsSemi-supervisedmethods•Disagreement-basedmethods•Semi-supervisedSVM•Generativemethod•Graphic-basedsemi-supervisedlearning6511Disagreement-basedmethodsSelf-trainingCo-trainingTri-training6512Self-trainingSelf-trainingisprobablytheearliestSSLidea,anditisalsocalledself-learning,self-labeling.6513Self-trainingRepeatTrainaclassifierhwithtrainingdataLClassifydatainUwithhFindasubsetU’ofUwiththemostconfidentpsudo-labelsL+U’-LU-U’-UEndWhile(hisnotchanged)psudo-label6514++--+-++++++++++++--------------+-++-+Applications:NLP(Naturallanguageprocess)NetworkSafetyFaceRecognitionSelf-training6515Self-training•Advantages:Straightforwardandintuitive.Itproducesoutstandingresults.•Shortcomings:Psudo-labeleddatamaypollutetheoriginallabeleddata.6516Typicalalgorithm:Co-trainingCo-training6517Anexampleofmulti-viewdata•VideosView1:picturesView2:text6518•webpages•driverlessvehicle6519Moreexamplesofmulti-viewdataView1:textView2:hyperlinksDatafromdifferentsensorsCo-trainingtrainstwoclassifiersseparatelyontwoviews,thenusesthepredictionsofeachclassifieronunlabeledexamplestoaugmentthetrainingsetoftheother.Co-trainingsufficientandRedundant:每个属性集都足以描述该类样本,且相互独立6520互帮互助共同进步Co-training6521View1:pictureView2:textDuPsudo-labeleddataDlD1,lD2,lf1f2Psudo-labeleddata自用or它用?repeattrainstwoclassifiersseparatelyontwoviewsaugmentthetrainingsetoftheotherendNaturalLanguageProcessing(NLP)Multimediainformationprocessingi.e.,ImageRetrieval,MultimediaRetrievalMicroprocessordesignfieldApplicationsofCo-training6522Advantages:Thelearningprocessisverysimple;Theoreticalsupport;当数据满足“充分冗余视图”时,可以显著提高分类器的性能。Co-training6523DisadvantagesandChallenges:Inmostcases,“sufficientandredundantview”conditioncannotbesatisfied.SingleviewdataCo-training6524Tri-trainingInordertorelaxtheconstraintsof“co-training”,[ZhouL05a]proposeda“tri-training”algorithm.6525Tri-training•Input:L:labeleddataset,U:unlabeleddataset;•BootstrapsamplingfromL,getthreesubsetL1,L2,L3;•RepeattrainthreeclassifiersforL1,L2,L3;addpsudo-labeleddatabyhjandhktoLiUntilnoneofhi(i=1,2,3)changesEveryunlabeledx,ifhj(x)==hk(x)6526LL1L2L3h1h2h3Psudo-labeleddataMultimediainformationprocessingi.e.,tri-tracking(Objecttrackingframework)Networkinformationprocessingi.e.,spamdetection,P2PnetworktrafficclassificationApplicationP2P网络流量分类6527•TriTrainTriTrainisasemi-supervisedalgorithm,whichiterativelyrefineseachofthethreecomponentclassifiers,andfinallycombinestheirpredictionviamajorityvoting.•CoForestCoForestisasemi-supervisedalgorithm,whichexploitsthepowerofensemblelearningandlargeamountofunlabeleddataavailabletoproducehypothesiswithbetterperformance.•COREGCOREGisaco-trainingstylesemi-supervisedregressionalgorithm,whichemploystwok-NNregressorsusingdifferentdistancemetricstoselectthemostconfidentlylabeledunlabeledexamplesforeachother.Otherdisagree-basedalgorithms6528Semi-supervisedlearningmethodsSemi-supervisedmethods•Disagreement-basedmethods•Semi-supervisedSVM•Generativemethod•Graphic-basedsemi-supervisedlearning6529Semi-supervisedSVM•Semi-supervisedSVMtriestofindahyper-planethatcandividethedifferentclassdataandgoesthroughthelowestdensityarea.SVMhyperplaneUnlabeleddata6530++--S3VMhyperplane•TSVMalgorithmSemi-supervisedSVM6531++--SVMhyperplaneS3VMhyperplaneUnlabeleddataStep1:trainSVMonlabeleddataStep2:predictunlabeleddatabyusingSVMStep4:findapairofdatawhichmostwronglypredictedandswaptheirlabelsStep5:gotostep2Step3:trainSVMonlabeldataandpsudo-labeldataSemi-supervisedSVMTSVMalgorithm6532+-+++++++++++++--------------
本文标题:semi-supervised learning半监督学习
链接地址:https://www.777doc.com/doc-4008447 .html