您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 管理学资料 > 基于神经网络的自动口语识别(IJITCS-V10-N8-2)
I.J.InformationTechnologyandComputerScience,2018,8,11-17PublishedOnlineAugust2018inMECS()DOI:10.5815/ijitcs.2018.08.02Copyright©2018MECSI.J.InformationTechnologyandComputerScience,2018,8,11-17AutomaticSpokenLanguageRecognitionwithNeuralNetworksValentinGazeauDepartmentofComputerScienceatSamHoustonStateUniversity,Huntsville,TX,USAE-mail:vcg006@shsu.eduCihanVarolDepartmentofComputerScienceatSamHoustonStateUniversity,Huntsville,TX,USAE-mail:cxv007@shsu.eduReceived:22June2018;Accepted:15July2018;Published:08August2018Abstract—TranslationhasbecomeveryimportantinourgenerationaspeoplewithcompletelydifferentculturesandlanguagesarenetworkedtogetherthroughtheInternet.NowadaysonecaneasilycommunicatewithanyoneintheworldwiththeservicesofGoogleTranslateand/orothertranslationapplications.Humanscanalreadyrecognizelanguagesthattheyhavepriorybeenexposedto.Eventhoughtheymightnotbeabletotranslate,theycanhaveagoodideaofwhatthespokenlanguageis.ThispaperdemonstrateshowdifferentNeuralNetworkmodelscanbetrainedtorecognizedifferentlanguagessuchasFrench,English,Spanish,andGerman.ForthetrainingdatasetvoicesampleswerechoosedfromShtooka,VoxForge,andYoutube.Fortestingpurposes,notonlydatafromthesewebsites,butalsopersonallyrecordedvoiceswereused.Attheend,thisresearchprovidestheaccuracyandconfidencelevelofmultipleNeuralNetworkarchitectures,SupportVectorMachineandHiddenMarkovModel,withtheHiddenMarkovModelyieldingthebestresultsreachingalmost70percentaccuracyforalllanguages.IndexTerms—HiddenMarkovModel,LanguageIdentification,LanguageTranslation,NeuralNetworks,SupportVectorMachine.I.INTRODUCTIONThereareroughly6,500spokenlanguagesavailableintheworldtoday.Whilesomeofthemarepopular,i.e.Chinesespokenbyover1Billionpeople,thereareotherssuchasLiki,Njerepwhichareonlyusedbylessthan1,000people.Definitely,understandingthespokenlanguageandcorrespondingaccordinglywhenneededarevitaltocreatecommunicationbetweendifferentbackgroundandlanguagespeakingpeople.Spokenlanguagerecognitionreferstotheautomaticprocessthatdeterminestheidentityofthelanguagespokeninaspeechsample.Thistechnologycanbeusedforawiderangeofmultilingualspeechprocessingapplications,suchasspokenlanguagetranslationand/ormultilingualspeechrecognition[1].Inpractice,spokenlanguagerecognitionisfarmorechallengingthantext-basedlanguagerecognitionbecausethereisnoguaranteethatamachineisabletotranscribespeechtotextwithouterrors.Weknowthathumansrecognizelanguagesthroughaperceptualorpsychoacousticprocessthatisinherenttotheauditorysystem.Therefore,thetypeofspeechperceptionthathumanlistenersuseisalwaysthesourceofinspirationforautomaticspokenlanguagerecognition[2].ThispaperoutlineshowNeuralNetworkscanbetrainedviaSupportVectorMachineandHiddenMarkovModeltoautomaticallyidentifythelanguagedirectlyfromspeechusinglibrariessuchasTensorFlow[3].Thiscanbedoneinmanydifferentwaysastheresultsdependonanumberoffactors:forexamplethediversityandsizeofthetrainingdata,and/ortheNeuralNetworkmodel.Italsodemonstrateshowthetrainingdatawasgathered,theproblemsfacedandhowtestingwasconducted.Thepaperisorganizedasfollows:SectionIIintroducespreviousattemptsatautomaticspokenlanguageidentificationusingphonemeclassifiers,statisticalrecurrentNeuralNetworks,anddeeplearningapproaches.SectionIIIcoverswhatmodelswerechosentoimplementtheNeuralNetworkaswellashowthetrainingdatawasgatheredandhowthetrainingwasconducted.SectionIVdetailshowthetestingwasperformedandhowtheperformanceofthemultiplemodelswereevaluatedaswellastheresults.TheconclusionisdrawnoutinSectionValongwiththefutureimprovementsthatcanbemade.YourgoalistosimulatetheusualappearanceofpapersinaJournaloftheAcademyPublisher.Wearerequestingthatyoufollowtheseguidelinesascloselyaspossible.II.RELATEDWORKSSeveralattemptshavealreadybeenmadetorecognizespokenlanguageswithdifferentdatasets.WhileNeuralNetworkswasusedforvarietyofidentification/classificationproblems[4,5],thissectioncoverssomeofthemostpromisingrecentworksinautomaticspokenlanguageidentificationusingNeuralNetworks:Srivastavaetal.[6]usealanguage12AutomaticSpokenLanguageRecognitionwithNeuralNetworksCopyright©2018MECSI.J.InformationTechnologyandComputerScience,2018,8,11-17independentphonemeclassifierthatextractsthesequenceofphonemesfromanaudiofile.ThephonemesequenceisthenclassifiedusingstatisticalandrecurrentNeuralNetworkmodels(RNNs).Withphonemeclassificationofthreelanguages(Turkish,UzbekandMandarin)authorsachievedanaverageaccuracyof58%.Montavon’s[7]attemptusesadeepneuralnetworkfeaturingthreeconvolutionallayersaswellasasmallerattemptwithasingleconvolutionallayertime-delaymodel.TryingtoclassifyEnglish,German,andFrench,authorachieves91.3%accuracyforknownspeakers(usedintraining)and80%forunknownspeakers.Leiet.alproposedtwonovelfrontendsforrobustlanguageidentification(LID)usingaconvolutionalneuralnetwork(CNN)trainedforautomaticspeechrecognition(ASR)[8].IntheCNN/i-vectorfrontend,theCNNisusedforgettingtheposteriorprobabilitiesfori-vectortrainingandextraction.Theauthorsevaluatedtheirapproachonheavilylowqualityspeechdata,butstilltheywe
本文标题:基于神经网络的自动口语识别(IJITCS-V10-N8-2)
链接地址:https://www.777doc.com/doc-7721720 .html