您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据库 > Deep-Learning-Tutorial-
DeepLearningTutorial李宏毅Hung-yiLeeDeeplearningattractslotsofattention.•GoogleTrendsDeeplearningobtainsmanyexcitingresults.20072009201120132015ThetalksinthisafternoonThistalkwillfocusonthetechnicalpart.OutlinePartIV:NeuralNetworkwithMemoryPartIII:TipsforTrainingDeepNeuralNetworkPartII:WhyDeep?PartI:IntroductionofDeepLearningPartI:IntroductionofDeepLearningWhatpeoplealreadyknewin1980sExampleApplication•HandwritingDigitRecognitionMachine“2”HandwritingDigitRecognitionInputOutput16x16=2561x2x256x……Ink→1Noink→0……y1y2y10Eachdimensionrepresentstheconfidenceofadigit.is1is2is0……0.10.70.2Theimageis“2”ExampleApplication•HandwritingDigitRecognitionMachine“2”1x2x256x…………y1y2y10𝑓:𝑅256→𝑅10Indeeplearning,thefunction𝑓isrepresentedbyneuralnetworkbwawawazKK2211ElementofNeuralNetwork𝑓:𝑅𝐾→𝑅z1w2wKw…1a2aKabzbiasaActivationfunctionweightsNeuronOutputLayerHiddenLayersInputLayerNeuralNetworkInputOutput1x2xLayer1……Nx……Layer2……LayerL…………………………y1y2yMDeepmeansmanyhiddenlayersneuronExampleofNeuralNetworkzzzez11SigmoidFunction1-11-21-1104-20.980.12ExampleofNeuralNetwork1-21-1104-20.980.122-1-1-23-14-10.860.110.620.8300-221-1ExampleofNeuralNetwork1-21-1100.730.52-1-1-23-14-10.720.120.510.8500-22𝑓00=0.510.85Differentparametersdefinedifferentfunction𝑓1−1=0.620.83𝑓:𝑅2→𝑅200𝜎MatrixOperation2y1y1-21-1104-20.980.121−11−2−11+100.980.12=1-14−21x2x……Nx……………………………………y1y2yMNeuralNetworkW1W2WLb2bLxa1a2yb1W1x+𝜎b2W2a1+𝜎bLWL+𝜎aL-1b1=𝜎𝜎1x2x……Nx……………………………………y1y2yMNeuralNetworkW1W2WLb2bLxa1a2yy=𝑓xb1W1x+𝜎b2W2+bLWLx+…b1…UsingparallelcomputingtechniquestospeedupmatrixoperationSoftmax•SoftmaxlayerastheoutputlayerOrdinaryLayer11zy22zy33zy1z2z3zIngeneral,theoutputofnetworkcanbeanyvalue.MaynotbeeasytointerpretSoftmax•Softmaxlayerastheoutputlayer1z2z3zSoftmaxLayereee1ze2ze3ze3111jzzjeey31jzje3-312.7200.050.880.12≈0Probability:1𝑦𝑖0𝑦𝑖=1𝑖3122jzzjeey3133jzzjeeyHowtosetnetworkparameters16x16=2561x2x……256x……………………Ink→1Noink→0……y1y2y100.10.70.2y1hasthemaximumvalueSetthenetworkparameters𝜃suchthat……Input:y2hasthemaximumvalueInput:is1is2is0HowtolettheneuralnetworkachievethisSoftmax𝜃=𝑊1,𝑏1,𝑊2,𝑏2,⋯𝑊𝐿,𝑏𝐿TrainingData•Preparingtrainingdata:imagesandtheirlabelsUsingthetrainingdatatofindthenetworkparameters.“5”“0”“4”“1”“3”“1”“2”“9”Cost1x2x……256x…………………………y1y2y10Cost0.20.30.5“1”……100……CostcanbeEuclideandistanceorcrossentropyofthenetworkoutputandtargetGivenasetofnetworkparameters𝜃,eachexamplehasacostvalue.target𝐿(𝜃)TotalCostx1x2xRNNNNNN…………y1y2yR𝑦1𝑦2𝑦𝑅𝐿1𝜃…………x3NNy3𝑦3Foralltrainingdata…𝐶𝜃=𝐿𝑟𝜃𝑅𝑟=1Findthenetworkparameters𝜃∗thatminimizethisvalueTotalCost:Howbadthenetworkparameters𝜃isonthistask𝐿2𝜃𝐿3𝜃𝐿𝑅𝜃GradientDescent𝑤1𝑤2Assumethereareonlytwoparametersw1andw2inanetwork.ThecolorsrepresentthevalueofC.Randomlypickastartingpoint𝜃0Computethenegativegradientat𝜃0−𝛻𝐶𝜃0𝜃0−𝛻𝐶𝜃0Timesthelearningrate𝜂−𝜂𝛻𝐶𝜃0𝛻𝐶𝜃0=𝜕𝐶𝜃0/𝜕𝑤1𝜕𝐶𝜃0/𝜕𝑤2−𝜂𝛻𝐶𝜃0𝜃=𝑤1,𝑤2ErrorSurface𝜃∗GradientDescent𝑤1𝑤2Computethenegativegradientat𝜃0−𝛻𝐶𝜃0𝜃0Timesthelearningrate𝜂−𝜂𝛻𝐶𝜃0𝜃1−𝛻𝐶𝜃1−𝜂𝛻𝐶𝜃1−𝛻𝐶𝜃2−𝜂𝛻𝐶𝜃2𝜃2Eventually,wewouldreachaminima…..Randomlypickastartingpoint𝜃0LocalMinima•Gradientdescentneverguaranteeglobalminima𝐶𝑤1𝑤2Differentinitialpoint𝜃0Reachdifferentminima,sodifferentresultsWhoisAfraidofNon-ConvexLossFunctions?……costparameterspaceVeryslowattheplateauStuckatlocalminima𝛻𝐶𝜃=0Stuckatsaddlepoint𝛻𝐶𝜃=0𝛻𝐶𝜃≈0Inphysicalworld……•MomentumHowaboutputthisphenomenoningradientdescent?MomentumcostMovement=NegativeofGradient+MomentumGradient=0Stillnotguaranteereachingglobalminima,butgivesomehope……NegativeofGradientMomentumRealMovementMini-batchx1NN……y1𝑦1𝐿1x31NNy31𝑦31𝐿31x2NN……y2𝑦2𝐿2x16NNy16𝑦16𝐿16Pickthe1stbatchRandomlyinitialize𝜃0𝜃1←𝜃0−𝜂𝛻𝐶𝜃0Pickthe2ndbatch𝜃2←𝜃1−𝜂𝛻𝐶𝜃1…Mini-batchMini-batchCisdifferenteachtimewhenweupdateparameters!𝐶=𝐿1+𝐿31+⋯𝐶=𝐿2+𝐿16+⋯Mini-batchOriginalGradientDescentWithMini-batchunstableThecolorsrepresentthetotalConalltrainingdata.Mini-batchx1NN……y1𝑦1𝐶1x31NNy31𝑦31𝐶31x2NN……y2𝑦2𝐶2x16NNy16𝑦16𝐶16Pickthe1stbatchRandomlyinitialize𝜃0𝜃1←𝜃0−𝜂𝛻𝐶𝜃0Pickthe2ndbatch𝜃2←𝜃1−𝜂𝛻𝐶𝜃1Untilallmini-batcheshavebeenpicked…oneepochFasterBetter!Mini-batchMini-batchRepeattheaboveprocess𝐶=𝐶1+𝐶31+⋯𝐶=𝐶2+𝐶16+⋯Backpropagation•Anetworkcanhavemillionsofparameters.•Backpropagationisthewaytocomputethegradientsefficiently(nottoday)•Ref:~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.ecm.mp4/index.html•ManytoolkitscancomputethegradientsautomaticallyRef:~tlkagk/courses/MLDS_2015_2/Lecture/Theano%20DNN.ecm.mp4/index.htmlPartII:WhyDeep?LayerXSizeWordErrorRate(%)LayerXSizeWordErrorRate(%)1X2k24.22X2k20.43X2k18.44X2k17.85X2k17.21X377222.57X2k17.11X463422.61X16k22.1DeeperisBetter?Seide,Frank,GangLi,andDongYu.ConversationalSpeechTranscriptionUsingContext-DependentDeepNeuralNetworks.Interspeech.2011.Notsurprised,moreparamete
本文标题:Deep-Learning-Tutorial-
链接地址:https://www.777doc.com/doc-1412582 .html