您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 管理学资料 > 基于周期性矿业的时间序列数据库约束研究(IJCNIS-V4-N10-4)
I.J.ComputerNetworkandInformationSecurity,2012,10,37-46PublishedOnlineSeptember2012inMECS()DOI:10.5815/ijcnis.2012.10.04Copyright©2012MECSI.J.ComputerNetworkandInformationSecurity,2012,10,37-46ConstraintBasedPeriodicityMininginTimeSeriesDatabasesDr.Ramachandra.V.Pujeri,G.M.KarthikVice-Principal,KGiSLInstituteofTechnology,Saravanampatti,Coimbatore-641035,TamilNadu,INDIAAssistantProfessor,CSEDept.,SACSMAVMMEngineeringCollege,Madurai-625301,TamilNadu,INDIAsriramu.vp@gmail.com,gmkarthik16@gmail.comAbstract—Thesearchfortheperiodicityintime-seriesdatabasehasanumberofapplication,isaninterestingdataminingproblem.Inrealworlddatasetaremostlynoisyandrarelyaperfectperiodicity,thisproblemisnottrivial.Periodicityisverycommonpracticeintimeseriesminingalgorithms,sinceitismorelikelytryingtodiscoverperiodicitysignalwithnotimelimit.WeproposeanalgorithmusesFP-treeforfindingsymbol,partialandfullperiodicityintimeseries.WedesignedthealgorithmcomplexityasO(kN),whereNisthelengthofinputsequenceandkislengthofperiodicpattern.Wehaveshownouralgorithmisfixedparametertractablewithrespecttofixedsymbolsetsizeandfixedlengthofinputsequences.Experimentresultsonbothsyntheticandrealdatafromdifferentdomainshaveshownouralgorithms‘timeefficientandnoise-resilientfeature.Acomparisonwithsomecurrentalgorithmsdemonstratestheapplicabilityandeffectivenessoftheproposedalgorithm.IndexTerms—DataMining,CBPM,FP-tree,Periodicitymining,Timeseriesdata,NoiseresilientI.INTRODUCTIONAcollectionofdataaregatheredandobservedatuniformintervaloftimetoreflectcertainbehaviorofanentity.Atimeseriesismostlydiscretizedbeforeitisanalyzed[8],[9],[13],[18],and[19].Severalexampleoftimeseriessuchasfrequentlysoldproductsinaretailmarket,frequentregularintervalpatterninDNAsequence,stockgrowth,powerconsumption,computernetworkfaultanalysis,transactionsinasuperstore,geneexpressiondataanalysis[7],[12],[22],[23]etc.Intheaboveexamples,weobservethattheoccurrenceperiodicityplaysanimportantroleindiscoveringsomeinterestingfrequentpatternsinawidevarietyofapplicationareas.Identifyingrepeating(periodic)patternscouldrevealimportantobservationsaboutthebehaviorandfuturetrendsofthecaserepresentedbythetimeseries[35],andhencewouldleadtomoreeffectivedecisionmaking.Thegoaloftimeseriesanalysisistofindwhetherandhowfrequentaperiodicpattern(fullorpartial)isrepeatedwithinthedata.Intimeseriesissaidtohavethreetypesofperiodicpatterns(symbol/Sequence/Segment)canbedetected[26].Forexample,intimeseriescontainthehourlynumberoftransactionsinretailstore;themappingdifferentrangesoftransactions(isreferredasdiscretizationprocess);a:{0}transactions,b:{1-300}transactions,c:{301-600}transactions,d:{601-1200}transactions,e:{1200}transactions.Basedonthismapping,thetimeseriesT‘=0,212,535,0,398,178,0,78,0,0,102,423canbediscretizedintoT=abdacbabaabc.AtleastonesymbolisrepeatedperiodicallyintimeseriesTisreferredasSymbolperiodicity.ForexampleT=abdacbabaabc,symbol‘a’isperiodicwithperiodicityp=3,startingatpositionzero.Sequenceperiodicorpartialperiodicpatternconsistsofmorethanonesymbol,maybeperiodicinatimeseries.ForexampleT=abdacbabaabc,symbol‘ab’isperiodicwithp=5startingatpositionzero.Inwholetimeseries,arepetitionofpatternorsegmentiscalledsegmentorfull-cycleperiodicity.ForexampleT=abdcabdcabdchassegmentperiodicityofp=5startingatpositionzero.Realtimeexamplesaremostlynotcharacterizedbyperfectperiodicityintimeseries.Atimeseriesissaidtohavethreetypeofperiodicpattern:1)symbolperiodicity,2)sequenceperiodicityorpartialperiodicpattern,and3)segmentorfull-cycleperiodicity[26].Thedegreesofperfectioncalculatedbyconfidence,andaremostlycharacterizedbythepresenceofnoiseinthedata.Manyexistingalgorithms[8],[9],[13],[17]detectsperiodsthatspanthroughentiretimeseries.Somealgorithmsdetectalltheabovementionedthreetypeofperiodicity,alongwithnoisewithinsubsectionoftimeseries,separatelyforeachpatterns[26].Comparedtothis,weshowthatourConstraintBasedPeriodicityMining(CBPM)techniqueismoreefficientandflexible.WealsodemonstratethroughempiricalevaluationthatCBPMismorescalableandfasterthanexistingmethods.Weproposeanewefficientpatternenumerationapproachonideasoffrequentpatternminingtechniques.First,weconstructaTRIE–likedatastructurecalledconsensustreewhichexploresthespaceofallmotifs,andenablesahighlyparallelizedsearchalongthetreemotif.Thegrowthofthetreeisrestrainedbyprovidingadditionalminingconstraints.Theconsensustreeisfixedandanchoredwithsymbolsetsizeandlengthofinputsequence.Theconstructionofconsensustreedetectssymbol,sequence,andsegmentpatternswithoutperiodicity,withinsubsectionoftheseries.The38ConstraintBasedPeriodicityMininginTimeSeriesDatabasesCopyright©2012MECSI.J.ComputerNetworkandInformationSecurity,2012,10,37-46additionalconstraint(namelyuser-specifiedlevelandruleconstraint)willpruneandeliminateredundantpatterns.Secondly,thealgorithmlooksforallperiodsstartingfromallpositionsavailableinaparticularnodeofconsensustree.Allthenodeoftheconsensustreeexistsbasedonconfidencegreaterthanorequaltotheuser-specifiedperiodicitythreshold.Wemakethef
本文标题:基于周期性矿业的时间序列数据库约束研究(IJCNIS-V4-N10-4)
链接地址:https://www.777doc.com/doc-7750134 .html