您好,欢迎访问三七文档
1DigitalSpeechProcessingDigitalSpeechProcessing——Lecture9Lecture91ShortShort--TimeFourierTimeFourierAnalysisMethodsAnalysisMethods--IntroductionIntroductionGeneralDiscreteGeneralDiscrete--TimeModelofTimeModelofSpeechProductionSpeechProduction2VoicedSpeech:VoicedSpeech:••AAVVP(z)G(z)V(z)R(z)P(z)G(z)V(z)R(z)UnvoicedSpeech:UnvoicedSpeech:••AANNN(z)V(z)R(z)N(z)V(z)R(z)ShortShort--TimeFourierAnalysisTimeFourierAnalysis•representsignalbysumofsinusoidssumofsinusoidsorcomplexexponentialsasitleadstoconvenientsolutionstoproblems(formantestimation,pitchperiodestimation,analysis-by-synthesis3methods),andinsightintothesignalitself•suchFourierrepresentationsFourierrepresentationsprovide–convenientmeanstodetermineresponsetoasumofsinusoidsforlinearsystems–clearevidenceofsignalpropertiesthatareobscuredintheoriginalsignalWhySTFTforSpeechSignalsWhySTFTforSpeechSignals•steadystatesounds,likevowels,areproducedbyperiodicexcitationofalinearsystemperiodicexcitationofalinearsystem=speechspectrumistheproductoftheexcitationspectrumandthevocaltractfrequencyresponse•speechisatimetimevaryingsignalvaryingsignal=needmore4•speechisatimetime--varyingsignalvaryingsignal=needmoresophisticatedanalysistoreflecttimevaryingproperties–changesoccuratsyllabicrates(~10times/sec)–overfixedtimeintervalsof10-30msec,propertiesofmostspeechsignalsarerelativelyconstant(whenisthisnotthecase)OverviewofLectureOverviewofLecture•definetimetime--varyingFouriertransformvaryingFouriertransform(STFT)analysismethod•definesynthesismethodsynthesismethodfromtime-varyingFT(filter-banksummation,overlapaddition)5(p)•showhowtime-varyingFTcanbeviewedintermsofabankoffiltersmodelbankoffiltersmodel••computationmethodscomputationmethodsbasedonusingFFT••applicationapplicationtovocoders,spectrumdisplays,formatestimation,pitchperiodestimationFrequencyDomainProcessingFrequencyDomainProcessing6••CodingCoding:–transform,subband,homomorphic,channelvocoders••Restoration/Enhancement/ModificationRestoration/Enhancement/Modification:–noiseandreverberationremoval,heliumrestoration,time-scalemodifications(speed-upandslow-downofspeech)2FrequencyandtheFrequencyandtheDTFTDTFT00002sinusoids()cos()()/whereisthe(inradians)ofthesinusoidtheDiscrete-TimeFourierTransform()frequencyωωωω−•==+•jnjnxnneeDTFT7()()(ωω∞−=−∞==∑jjnnXexnexDTFT{}{}12-1)()()()whereistheof()frequencyvariableπωωωπωωπω−==∫jjnjjnxnXeedXeXeDTFTDTFTandDFTofSpeechDTFTandDFTofSpeech1TheDTFTandtheDFTfortheinfinitedurationsignalcouldbecalculated(theDTFT)andapproximated(theDFT)bythefollowing:()()()jjmmNXexmeDTFTωω∞−=−∞=∑i81(2/)0()()()NjNkmmXkxmwmeπ−−==(2/),0,1,...,1()()usingavalueof=25000wegetthefollowingplotjkNkNXeDFTNωωπ==−=∑i2500025000--PointDFTofSpeechPointDFTofSpeechMagnitudeMagnitude9LogMagnitude(dB)LogMagnitude(dB)ShortShort--TimeFourierTimeFourierTransform(STFT)Transform(STFT)()()10ShortShort--TimeFourierTransformTimeFourierTransform•speechisnotastationarysignalstationarysignal,i.e.,ithaspropertiesthatchangewithtimechangewithtime•thusasinglerepresentationsinglerepresentationbasedonallthesamplesofaspeechutterance,forthe11pp,mostpart,hasnomeaning•instead,wedefineatimetime--dependentdependentFouriertransformFouriertransform(TDFTorSTFT)ofspeechthatchangesperiodicallyasthespeechpropertieschangeovertimeDefinitionofSTFTDefinitionofSTFTˆˆˆˆˆˆˆˆ()()()bothandarevariablesˆˆ()isarealwindowwhichdeterminestheportionof()thatisusedinthecomputationof()ωωωω∞−=−∞=−•−∑jjmnmjnXexmwnmenwnmxnXe123ShortTimeFourierTransformShortTimeFourierTransformˆˆˆSTFTisafunctionoftwovariables,thetimeindex,,whichisdiscrete,andthefrequencyvariable,,whichiscontinuousˆ()()()ˆˆˆ(()())fixed,variablωωωω∞−=−∞•=−=−⇒∑jjmnmnXexmwnmexmwnmnDTFTe13ˆˆˆ()ˆˆˆˆalternativeformofSTFT(basedonchangeofvariables)isˆ()()()ˆ()()ωωωω∞−−=−∞∞−•=−=−∑∑jjnmnmjnjmXewmxnmeexnmwmeShortShort--TimeFourierTransformTimeFourierTransform14ˆˆˆˆifwedefineˆ()()()then(ωω=−∞∞=−∞•=−•∑∑mjjmnmjnXexnmwmeXeˆˆˆˆˆˆˆˆ)canbeexpressedas(using)ˆ()()()()ωωωωω−−′=−==+−⎡⎤⎣⎦jjnjjnnnmmXeeXeexnmwmDTFTSTFTSTFT--DifferentTimeOriginsDifferentTimeOriginsˆˆˆtheSTFTcanbeviewedashavingtwodifferenttimeorigins1.timeorigintiedtosignal()ˆ()()()ˆˆˆ()()fidiblωω∞−=−∞•=−⎡⎤∑jjmnmxnXexmwnmeDTFT15ˆ()(),fixed,variable2.timeorigintiω=−⎡⎤⎣⎦xmwnmnDTFTˆˆˆˆˆˆˆˆˆˆedtowindowsignal()ˆ()()()()ˆˆˆ()(),fixed,variableωωωωωωω∞−−=−∞−−−=+−==−+⎡⎤⎣⎦∑jjnjmnmjnjjnwmXeexnmwmeeXeewmxnmnDTFTTimeOriginforSTFTTimeOriginforSTFTˆ[0]mnx=−⇒16ˆ[][]Timeorigintiedtowindowwmxnm−+InterpretationsofSTFTInterpretationsofSTFT1ˆˆˆˆˆˆthereare2distinctinterpretationsof()ˆ.assumeisfixed,then()issimplythenormalFourierˆtransformofthesequence()(),forˆfixed,()hastheωωω•−−∞∞=jnjnjnXenXewnmxmmnXesamepropertiesasanormalFouriertransform17ˆˆˆˆˆˆˆˆ2.consider()asafunctionofthetimeindexwithfixed.ˆThen()isintheformofaconvolutionofthesignal()withωωωω−jnjjnnXenXexneˆˆˆthewindow().Thisleadstoaninterpretationintheformofˆˆlinearfilteringofthefrequencymodulatedsignal()by().-wewillnowconsidereachoftheseinterpretationsofthω−jnwnxnewneSTFTinalotmoredetailFourierTransformInterpretationFourierTransformInterp
本文标题:Digital Speech Processing Lawrence Lecture 9_fall_
链接地址:https://www.777doc.com/doc-3254825 .html