您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 企业文化 > 空气污染监控资料之序列群组探勘
ClusteringSequencesofAirPollutionSurveillanceData1.2.3.Somefundamentaldataitemsofafactory,suchastypeofproduct,heightofchimney,andproductioncapacity,arerelatedtotheairpollutionitemitted.However,continuoussurveillantdataofairpollutionhasmoredirectimpactforsettingupcontrolstrategy.Clusteringtechniquegroupsobjectsintoclusters,suchthattheinter-clustersimilarityisminimal,andtheintra-clustersimilarityismaximal.Thetime-gappedsequenceofairpollutionNSC91-2622-E-415-002-CC3eventscausedbyafactoryistheobjectinmining.Weproposethreetypesofsimilaritybetweentwosequences:(1)kindsofairpollutionevents,(2)periodofeventoccurrence,(3)longestcommonsubsequence.Afeaturevectorisderivedfromaclusteroffactories.Eachclusterhasitsowncontrolstrategyaccordingtoitsfeature.Whenthedataofafactoryisnewlyaddedtothedatabase,thefactoryiscomparedwiththeclusterstodeterminewhichclusterthefactoryismostsimilarto.Acontrolstrategysetupfortheclusteroffactoriescanbethenappliedtothefactory.11.1[GGR99,GRS99]webaccesslogunfoldedproteinsequencessystemtracesstreamdata(frequentpatterns)(itemsets)[HKKM97]1.2(sequentialpattern)…[ABKS99][GRS99](E1,…,En)(timegap)E1,d1,E2,d2,…,dn-1,EndiEiEi+132CURE[GRS98]ROCK[GRS99](hierarchicalmethod)CURE(outliers)ROCK[GRS99](agglomerativehierarchicalclusteringalgorithm){X,A,B}{Y,A,B}XYABXY[KW02]CLUSEQ[YW03]POPC-GA[MWZ01]POPC-GAGSP[SA96](sequencesets)co-occurrenceoffrequentpatternco-occurrenceoffrequentpattern(sequencespairs)[MWZ01]JaccardcoefficientPOPC-GA(bottom-upmerge)k33.1()SO2O3123F13/59:00AF13/519:00BF13/520:00DF23/58:00EF23/520:00FF33/58:00CF33/521:00D3.2():3.2.1F1202.010A,10,B,1,DF2251.722E,18,FF3301.525C,19,DNosequencedatabasek?YesEip(Ei)p(Ei)x%xEi“”p(Ei)y%EiJaccardcoefficientSiSjFiFjSiSj1g2r3a|SiSj|4b|SiSj|FiFjtype_sim(Fi,Fj)=a,b,g,rF1,F2,F3,F40.80.2p(A)=0.9Ap(E)=0.15EF1A,5,C,12,B,22,DS1={A,B,C,D}F2D,12,A,15,BS2={A,B,D}F3D,13,A,6,ES3={A,C,E}F4B,5,E,7,DS4={B,C,E}12g=1,r=0,a=3,b=4,type_sim(S1,S2)=(3-1)/(4-1)=2/3=0.6723g=1,r=0,a=1,b=5,type_sim=(S2,S3)=0/(4-1)=034g=0,r=1,a=2,b=4,type_sim(S3,S4)=((2-0)/(4-0)+1)/(1+1)=0.753.2.2p=/pp99999pi,pjFiFjFiFjperiod_sim(Fi,Fj)=1–norm(|pi-pj|)norm(|pi-pj|)|pi-pj|013.2.3rrg)-(bg)-a++1((1)Fi,Fj0(2)Fi,Fj0Fi,Fj(a1,a2,…,al)lFi,FjFi(a1,di1,a2,di2,…,al),Fj(a1,dj1,a2,dj2,…al)di1Fia1,a2Fi,FjFi,FjFi,FjSIM(Fi,Fj)=[type_sim(Fi,Fj)+period_sim(Fi,Fj)+seq_sim(Fi,Fj)]/33.3hh(featurevector)vv(num,sum,avg,select)num(C2)sumavgselect100vn=0sum=0avg=0select=1(pair)vvn=121sumavgsumselect1select0(iteration)sumnum+1C2sumseq_sim(Fi,Fj)=∑−×jkikdd1l0ddif,jkik≠−∑2*l,otherwisekselect1{A,B,C,D,E,F,G}{A,B}select1C1C1{F,G}select1C2C2{C1,C}C1CC1{A,B}select0{A,B,C,F,G}{D,E}44.11…F1202.010SO2…F9281.212NO2…F11301.525SO2…11261.5714SO2…4.1.14.2(TGSP)(1,0.8,0.8,0)ABCDEFG(1,0.85,0.85,0)(3,2.1,0.7.,0)(10,6.3,0.63,1):(1,0.7,0.7,1)[YL02]NO2,(1,150),O3TGSPTGSP(density)TGSPSP1NO2,(1,3,1.5,0.7),O3SP2NO2,(30,38,33,4),O3SP3NO2,(120,150,130,10),O3NO2O3“”“”“”“”NO2NO2(SP1SP2SP3)(C1C2C3)C2NO23038(33,4)O35hhhh[KW02]“”,2002,pp.651-658.[YL02]Show-JaneYenandYue-ShiLee,“MiningTime-GapSequentialPatternsfromTransactionDatabase,”,2002,pp.30-46.[GGR99]V.Ganti,J.Gehrke,andR.Ramakrishnan,“CACTUS-ClusteringCategoricalDataUsingSummaries,”ACMSIGMOD,1999,pp.73-83.[GRS98]S.Guha,R.Rastogi,andK.Shim,“CUREAnEfficientClusteringAlgorithmforLargeDatabase,”ACMSIGMOD,1998,pp.73-84.[GRS99]S.Guha,R.Rastogi,andK.Shim,“ROCK:ARobustClusteringAlgorithmforCategoricalAttributes,”IEEEICDE,1999,pp.512-521.[HKKM97]Eui-HongHan,GeorgeKarypis,andVipinKumar,B.Mobasher,“ClusteringBasedonAssociationRuleHypergraphs,”ACMSIGMOD,1997.[MWZ01]TadeuszMorzy,MarekWojciechowski,andMaciejZakrzewicz,“ScalableHierarchicalClusteringMethodforSequencesofCategoricalValue,”PAKDD,2001,pp.282-293.[SA96]RamakrishnanSrikantandRakeshAgrawal,”MiningSequentialPatterns:GeneralizationsandPerformanceImprovements,”InternationalConferenceonExtendingDatabaseTechnology(EDBT),1996,pp.3-17.[YW03]JiongYangandWeiWang,“CLUSEQ:EfficientandEffectiveSequenceClustering,”IEEEICDE,2003.
本文标题:空气污染监控资料之序列群组探勘
链接地址:https://www.777doc.com/doc-921070 .html