基于神经网络的自动口语识别(IJITCS-V10-N8-2)

I.J.InformationTechnologyandComputerScience,2018,8,11-17PublishedOnlineAugust2018inMECS()DOI:10.5815/ijitcs.2018.08.02Copyright©2018MECSI.J.InformationTechnologyandComputerScience,2018,8,11-17AutomaticSpokenLanguageRecognitionwithNeuralNetworksValentinGazeauDepartmentofComputerScienceatSamHoustonStateUniversity,Huntsville,TX,USAE-mail:vcg006@shsu.eduCihanVarolDepartmentofComputerScienceatSamHoustonStateUniversity,Huntsville,TX,USAE-mail:cxv007@shsu.eduReceived:22June2018;Accepted:15July2018;Published:08August2018Abstract—TranslationhasbecomeveryimportantinourgenerationaspeoplewithcompletelydifferentculturesandlanguagesarenetworkedtogetherthroughtheInternet.NowadaysonecaneasilycommunicatewithanyoneintheworldwiththeservicesofGoogleTranslateand/orothertranslationapplications.Humanscanalreadyrecognizelanguagesthattheyhavepriorybeenexposedto.Eventhoughtheymightnotbeabletotranslate,theycanhaveagoodideaofwhatthespokenlanguageis.ThispaperdemonstrateshowdifferentNeuralNetworkmodelscanbetrainedtorecognizedifferentlanguagessuchasFrench,English,Spanish,andGerman.ForthetrainingdatasetvoicesampleswerechoosedfromShtooka,VoxForge,andYoutube.Fortestingpurposes,notonlydatafromthesewebsites,butalsopersonallyrecordedvoiceswereused.Attheend,thisresearchprovidestheaccuracyandconfidencelevelofmultipleNeuralNetworkarchitectures,SupportVectorMachineandHiddenMarkovModel,withtheHiddenMarkovModelyieldingthebestresultsreachingalmost70percentaccuracyforalllanguages.IndexTerms—HiddenMarkovModel,LanguageIdentification,LanguageTranslation,NeuralNetworks,SupportVectorMachine.I.INTRODUCTIONThereareroughly6,500spokenlanguagesavailableintheworldtoday.Whilesomeofthemarepopular,i.e.Chinesespokenbyover1Billionpeople,thereareotherssuchasLiki,Njerepwhichareonlyusedbylessthan1,000people.Definitely,understandingthespokenlanguageandcorrespondingaccordinglywhenneededarevitaltocreatecommunicationbetweendifferentbackgroundandlanguagespeakingpeople.Spokenlanguagerecognitionreferstotheautomaticprocessthatdeterminestheidentityofthelanguagespokeninaspeechsample.Thistechnologycanbeusedforawiderangeofmultilingualspeechprocessingapplications,suchasspokenlanguagetranslationand/ormultilingualspeechrecognition[1].Inpractice,spokenlanguagerecognitionisfarmorechallengingthantext-basedlanguagerecognitionbecausethereisnoguaranteethatamachineisabletotranscribespeechtotextwithouterrors.Weknowthathumansrecognizelanguagesthroughaperceptualorpsychoacousticprocessthatisinherenttotheauditorysystem.Therefore,thetypeofspeechperceptionthathumanlistenersuseisalwaysthesourceofinspirationforautomaticspokenlanguagerecognition[2].ThispaperoutlineshowNeuralNetworkscanbetrainedviaSupportVectorMachineandHiddenMarkovModeltoautomaticallyidentifythelanguagedirectlyfromspeechusinglibrariessuchasTensorFlow[3].Thiscanbedoneinmanydifferentwaysastheresultsdependonanumberoffactors:forexamplethediversityandsizeofthetrainingdata,and/ortheNeuralNetworkmodel.Italsodemonstrateshowthetrainingdatawasgathered,theproblemsfacedandhowtestingwasconducted.Thepaperisorganizedasfollows:SectionIIintroducespreviousattemptsatautomaticspokenlanguageidentificationusingphonemeclassifiers,statisticalrecurrentNeuralNetworks,anddeeplearningapproaches.SectionIIIcoverswhatmodelswerechosentoimplementtheNeuralNetworkaswellashowthetrainingdatawasgatheredandhowthetrainingwasconducted.SectionIVdetailshowthetestingwasperformedandhowtheperformanceofthemultiplemodelswereevaluatedaswellastheresults.TheconclusionisdrawnoutinSectionValongwiththefutureimprovementsthatcanbemade.YourgoalistosimulatetheusualappearanceofpapersinaJournaloftheAcademyPublisher.Wearerequestingthatyoufollowtheseguidelinesascloselyaspossible.II.RELATEDWORKSSeveralattemptshavealreadybeenmadetorecognizespokenlanguageswithdifferentdatasets.WhileNeuralNetworkswasusedforvarietyofidentification/classificationproblems[4,5],thissectioncoverssomeofthemostpromisingrecentworksinautomaticspokenlanguageidentificationusingNeuralNetworks:Srivastavaetal.[6]usealanguage12AutomaticSpokenLanguageRecognitionwithNeuralNetworksCopyright©2018MECSI.J.InformationTechnologyandComputerScience,2018,8,11-17independentphonemeclassifierthatextractsthesequenceofphonemesfromanaudiofile.ThephonemesequenceisthenclassifiedusingstatisticalandrecurrentNeuralNetworkmodels(RNNs).Withphonemeclassificationofthreelanguages(Turkish,UzbekandMandarin)authorsachievedanaverageaccuracyof58%.Montavon’s[7]attemptusesadeepneuralnetworkfeaturingthreeconvolutionallayersaswellasasmallerattemptwithasingleconvolutionallayertime-delaymodel.TryingtoclassifyEnglish,German,andFrench,authorachieves91.3%accuracyforknownspeakers(usedintraining)and80%forunknownspeakers.Leiet.alproposedtwonovelfrontendsforrobustlanguageidentification(LID)usingaconvolutionalneuralnetwork(CNN)trainedforautomaticspeechrecognition(ASR)[8].IntheCNN/i-vectorfrontend,theCNNisusedforgettingtheposteriorprobabilitiesfori-vectortrainingandextraction.Theauthorsevaluatedtheirapproachonheavilylowqualityspeechdata,butstilltheywe

基于神经网络的自动口语识别(IJITCS-V10-N8-2)

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

商业地产基础手册

室内设计与人体工程学_2

施工方案24789669

银行用语

蚌埠市出租汽车驾驶员诚信考核指标和计分标准

模块4药品研发阶段的药事管理

酒店综合体推介-宜兴经济技术开发区

地理标志国际保护法律制度研究

第五单元第一节质量守恒定律

申论培训课件(1)

相关文档

相关搜索