1506.02640v4-You-Only-Look-Once-YOLO-Unified--Real

YouOnlyLookOnce:Uniﬁed,Real-TimeObjectDetectionJosephRedmonUniversityofWashingtonpjreddie@cs.washington.eduSantoshDivvalaAllenInstituteforArtiﬁcialIntelligencesantoshd@allenai.orgRossGirshickFacebookAIResearchrbg@fb.comAliFarhadiUniversityofWashingtonali@cs.washington.eduAbstractWepresentYOLO,anewapproachtoobjectdetection.Priorworkonobjectdetectionrepurposesclassiﬁerstoper-formdetection.Instead,weframeobjectdetectionasare-gressionproblemtospatiallyseparatedboundingboxesandassociatedclassprobabilities.Asingleneuralnetworkpre-dictsboundingboxesandclassprobabilitiesdirectlyfromfullimagesinoneevaluation.Sincethewholedetectionpipelineisasinglenetwork,itcanbeoptimizedend-to-enddirectlyondetectionperformance.Ouruniﬁedarchitectureisextremelyfast.OurbaseYOLOmodelprocessesimagesinreal-timeat45framespersecond.Asmallerversionofthenetwork,FastYOLO,processesanastounding155framespersecondwhilestillachievingdoublethemAPofotherreal-timedetec-tors.Comparedtostate-of-the-artdetectionsystems,YOLOmakesmorelocalizationerrorsbutisfarlesslikelytopre-dictfalsedetectionswherenothingexists.Finally,YOLOlearnsverygeneralrepresentationsofobjects.Itoutper-formsallotherdetectionmethods,includingDPMandR-CNN,byawidemarginwhengeneralizingfromnaturalim-agestoartworkonboththePicassoDatasetandthePeople-ArtDataset.1.IntroductionHumansglanceatanimageandinstantlyknowwhatob-jectsareintheimage,wheretheyare,andhowtheyin-teract.Thehumanvisualsystemisfastandaccurate,al-lowingustoperformcomplextaskslikedrivingwithlittleconsciousthought.Fast,accurate,algorithmsforobjectde-tectionwouldallowcomputerstodrivecarsinanyweatherwithoutspecializedsensors,enableassistivedevicestocon-veyreal-timesceneinformationtohumanusers,andunlockthepotentialforgeneralpurpose,responsiveroboticsys-tems.Currentdetectionsystemsrepurposeclassiﬁerstoper-formdetection.Todetectanobject,thesesystemstakea1.Resizeimage.2.Runconvolutionalnetwork.3.Non-maxsuppression.Dog:0.30Person:0.64Horse:0.28Figure1:TheYOLODetectionSystem.ProcessingimageswithYOLOissimpleandstraightforward.Oursystem(1)resizestheinputimageto448448,(2)runsasingleconvolutionalnet-workontheimage,and(3)thresholdstheresultingdetectionsbythemodel’sconﬁdence.classiﬁerforthatobjectandevaluateitatvariouslocationsandscalesinatestimage.Systemslikedeformablepartsmodels(DPM)useaslidingwindowapproachwheretheclassiﬁerisrunatevenlyspacedlocationsovertheentireimage[10].MorerecentapproacheslikeR-CNNuseregionproposalmethodstoﬁrstgeneratepotentialboundingboxesinanim-ageandthenrunaclassiﬁerontheseproposedboxes.Afterclassiﬁcation,post-processingisusedtoreﬁnethebound-ingbox,eliminateduplicatedetections,andrescoretheboxbasedonotherobjectsinthescene[13].Thesecomplexpipelinesareslowandhardtooptimizebecauseeachindi-vidualcomponentmustbetrainedseparately.Wereframeobjectdetectionasasingleregressionprob-lem,straightfromimagepixelstoboundingboxcoordi-natesandclassprobabilities.Usingoursystem,youonlylookonce(YOLO)atanimagetopredictwhatobjectsarepresentandwheretheyare.YOLOisrefreshinglysimple:seeFigure1.Asin-gleconvolutionalnetworksimultaneouslypredictsmulti-pleboundingboxesandclassprobabilitiesforthoseboxes.YOLOtrainsonfullimagesanddirectlyoptimizesdetec-tionperformance.Thisuniﬁedmodelhasseveralbeneﬁtsovertraditionalmethodsofobjectdetection.First,YOLOisextremelyfast.Sinceweframedetectionasaregressionproblemwedon’tneedacomplexpipeline.Wesimplyrunourneuralnetworkonanewimageattest1arXiv:1506.02640v4[cs.CV]12Nov2015timetopredictdetections.Ourbasenetworkrunsat45framespersecondwithnobatchprocessingonaTitanXGPUandafastversionrunsatmorethan150fps.Thismeanswecanprocessstreamingvideoinreal-timewithlessthan25millisecondsoflatency.Furthermore,YOLOachievesmorethantwicethemeanaverageprecisionofotherreal-timesystems.Forademoofoursystemrun-ninginreal-timeonawebcampleaseseeour(anonymous)YouTubechannel:[14],mistakesbackgroundpatchesinanimageforobjectsbecauseitcan’tseethelargercontext.YOLOmakeslessthanhalfthenumberofbackgrounderrorscomparedtoFastR-CNN.Third,YOLOlearnsgeneralizablerepresentationsofob-jects.Whentrainedonnaturalimagesandtestedonart-work,YOLOoutperformstopdetectionmethodslikeDPMandR-CNNbyawidemargin.SinceYOLOishighlygen-eralizableitislesslikelytobreakdownwhenappliedtonewdomainsorunexpectedinput.Allofourtrainingandtestingcodeisopensourceandavailableonlineat[removedforreview].Avarietyofpre-trainedmodelsarealsoavailabletodownload.2.UniﬁedDetectionWeunifytheseparatecomponentsofobjectdetectionintoasingleneuralnetwork.Ournetworkusesfeaturesfromtheentireimagetopredicteachboundingbox.Italsopredictsallboundingboxesforanimagesimultane-ously.Thismeansournetworkreasonsgloballyaboutthefullimageandalltheobjectsintheimage.TheYOLOde-signenablesend-to-endtrainingandreal-timespeedswhilemaintaininghig

1506.02640v4-You-Only-Look-Once-YOLO-Unified--Real

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

电力电子课程设计

流体输送机械讲义(离心泵)

电力工程厂房安全文明施工总策划

火灾自动报警系统施工及验收规范讲义

顺德区工程技术研究开发中心计划任务书

硕士论文-ISG轻度混合动力电动汽车控制策略的研究

《华尔街日报》十大行业未来发展趋势报告

XX经营模式借鉴报告

装饰流程与细节

基金投资管理系统O32用户手册(01更新)

相关文档

相关搜索