您好,欢迎访问三七文档
YouOnlyLookOnce:Unified,Real-TimeObjectDetectionJosephRedmonUniversityofWashingtonpjreddie@cs.washington.eduSantoshDivvalaAllenInstituteforArtificialIntelligencesantoshd@allenai.orgRossGirshickFacebookAIResearchrbg@fb.comAliFarhadiUniversityofWashingtonali@cs.washington.eduAbstractWepresentYOLO,anewapproachtoobjectdetection.Priorworkonobjectdetectionrepurposesclassifierstoper-formdetection.Instead,weframeobjectdetectionasare-gressionproblemtospatiallyseparatedboundingboxesandassociatedclassprobabilities.Asingleneuralnetworkpre-dictsboundingboxesandclassprobabilitiesdirectlyfromfullimagesinoneevaluation.Sincethewholedetectionpipelineisasinglenetwork,itcanbeoptimizedend-to-enddirectlyondetectionperformance.Ourunifiedarchitectureisextremelyfast.OurbaseYOLOmodelprocessesimagesinreal-timeat45framespersecond.Asmallerversionofthenetwork,FastYOLO,processesanastounding155framespersecondwhilestillachievingdoublethemAPofotherreal-timedetec-tors.Comparedtostate-of-the-artdetectionsystems,YOLOmakesmorelocalizationerrorsbutisfarlesslikelytopre-dictfalsedetectionswherenothingexists.Finally,YOLOlearnsverygeneralrepresentationsofobjects.Itoutper-formsallotherdetectionmethods,includingDPMandR-CNN,byawidemarginwhengeneralizingfromnaturalim-agestoartworkonboththePicassoDatasetandthePeople-ArtDataset.1.IntroductionHumansglanceatanimageandinstantlyknowwhatob-jectsareintheimage,wheretheyare,andhowtheyin-teract.Thehumanvisualsystemisfastandaccurate,al-lowingustoperformcomplextaskslikedrivingwithlittleconsciousthought.Fast,accurate,algorithmsforobjectde-tectionwouldallowcomputerstodrivecarsinanyweatherwithoutspecializedsensors,enableassistivedevicestocon-veyreal-timesceneinformationtohumanusers,andunlockthepotentialforgeneralpurpose,responsiveroboticsys-tems.Currentdetectionsystemsrepurposeclassifierstoper-formdetection.Todetectanobject,thesesystemstakea1.Resizeimage.2.Runconvolutionalnetwork.3.Non-maxsuppression.Dog:0.30Person:0.64Horse:0.28Figure1:TheYOLODetectionSystem.ProcessingimageswithYOLOissimpleandstraightforward.Oursystem(1)resizestheinputimageto448448,(2)runsasingleconvolutionalnet-workontheimage,and(3)thresholdstheresultingdetectionsbythemodel’sconfidence.classifierforthatobjectandevaluateitatvariouslocationsandscalesinatestimage.Systemslikedeformablepartsmodels(DPM)useaslidingwindowapproachwheretheclassifierisrunatevenlyspacedlocationsovertheentireimage[10].MorerecentapproacheslikeR-CNNuseregionproposalmethodstofirstgeneratepotentialboundingboxesinanim-ageandthenrunaclassifierontheseproposedboxes.Afterclassification,post-processingisusedtorefinethebound-ingbox,eliminateduplicatedetections,andrescoretheboxbasedonotherobjectsinthescene[13].Thesecomplexpipelinesareslowandhardtooptimizebecauseeachindi-vidualcomponentmustbetrainedseparately.Wereframeobjectdetectionasasingleregressionprob-lem,straightfromimagepixelstoboundingboxcoordi-natesandclassprobabilities.Usingoursystem,youonlylookonce(YOLO)atanimagetopredictwhatobjectsarepresentandwheretheyare.YOLOisrefreshinglysimple:seeFigure1.Asin-gleconvolutionalnetworksimultaneouslypredictsmulti-pleboundingboxesandclassprobabilitiesforthoseboxes.YOLOtrainsonfullimagesanddirectlyoptimizesdetec-tionperformance.Thisunifiedmodelhasseveralbenefitsovertraditionalmethodsofobjectdetection.First,YOLOisextremelyfast.Sinceweframedetectionasaregressionproblemwedon’tneedacomplexpipeline.Wesimplyrunourneuralnetworkonanewimageattest1arXiv:1506.02640v4[cs.CV]12Nov2015timetopredictdetections.Ourbasenetworkrunsat45framespersecondwithnobatchprocessingonaTitanXGPUandafastversionrunsatmorethan150fps.Thismeanswecanprocessstreamingvideoinreal-timewithlessthan25millisecondsoflatency.Furthermore,YOLOachievesmorethantwicethemeanaverageprecisionofotherreal-timesystems.Forademoofoursystemrun-ninginreal-timeonawebcampleaseseeour(anonymous)YouTubechannel:[14],mistakesbackgroundpatchesinanimageforobjectsbecauseitcan’tseethelargercontext.YOLOmakeslessthanhalfthenumberofbackgrounderrorscomparedtoFastR-CNN.Third,YOLOlearnsgeneralizablerepresentationsofob-jects.Whentrainedonnaturalimagesandtestedonart-work,YOLOoutperformstopdetectionmethodslikeDPMandR-CNNbyawidemargin.SinceYOLOishighlygen-eralizableitislesslikelytobreakdownwhenappliedtonewdomainsorunexpectedinput.Allofourtrainingandtestingcodeisopensourceandavailableonlineat[removedforreview].Avarietyofpre-trainedmodelsarealsoavailabletodownload.2.UnifiedDetectionWeunifytheseparatecomponentsofobjectdetectionintoasingleneuralnetwork.Ournetworkusesfeaturesfromtheentireimagetopredicteachboundingbox.Italsopredictsallboundingboxesforanimagesimultane-ously.Thismeansournetworkreasonsgloballyaboutthefullimageandalltheobjectsintheimage.TheYOLOde-signenablesend-to-endtrainingandreal-timespeedswhilemaintaininghig
本文标题:1506.02640v4-You-Only-Look-Once-YOLO-Unified--Real
链接地址:https://www.777doc.com/doc-5135410 .html