您好,欢迎访问三七文档
Method-LevelCodeCloneDetectiononTransformedAbstractSyntaxTreesUsingSequenceMatchingAlgorithmsKevinGreenankmgreen@soe.ucsc.eduDepartmentofComputerScienceUniversityofCalifornia-SantaCruzMarch16,2005AbstractCurrentresearchshowsthatalargefractionofsourcecodeinmanylarge-scaleapplicationscontainscodeclones[4].Theexistenceofcodeclonescanintroducemanyinstabilitieswithinasoftwareapplication,suchasunnecessaryduplicates.Theseinstabilitiescanover-complicateroutinemaintenancetasks,sinceachangeinonemethodmayleadtochangesacrossmanymethods.Inaddition,unnecessaryduplicatescanpotentiallyinducethespreadofbugsandpreventchangesfrombeingpropagated[6].Therefore,inordertopreventandtreattheseinstabilities,wemustfigureoutwherepotentialclonesoccur.Aproperlyannotatedabstractsyntaxtreesuppliesagreatdealofinformationaboutthestructureofanapplication’ssourcecode.Thesetreescanbeeasilytransformedintoasequenceofsubstrings.Borrowingafewideasfrombiologicalsequencealign-ment,similaritiesbetweentransformedsubtreescanbeidentified.Onceidentified,anapplicationarchitectormaintainercanaccept(orreject)thesimilarblocksofcodeasa(non-)clone.Giventheuseofsubtreetransformation,Iwillpresentthreecodeclonealgorithmsinthispaper.Theresearchpresentedservesasastartingpointforcodeclonedetectionusingtransformedsubtreesandsequencesimilarity.Thus,myresultswilljustifyintuitionandopenafewdoorstofutureworkintheareaofcodeclonedetection.1IntroductionWhatexactlyisacodeclone?Usingavariationofthedefinitionpresentedin[8],semi-formally,twomethodsaresaidtobeclonesiftheyareidenticalornear-identical.Thewordidenticalseemsprettyvague.Infact,mostoftheliteratureoncodecloneanalysisdoesnotgiveaconcretedefinitiontothewordidenticalwithrespecttocodecloneanalysis.Basically,ifthecardinalityoftheintersectionoftwoprogramentitiesexceedsaprescribedthreshold,thetwoentitiesareclonecandidates.Thesecandidatesareusuallyrejectedoracceptedthroughsomesortofcontextualanalysis.Variousstudiessuggestthatmanyprogrammersinthesoftwaredevelopmentindustryresorttocopyandpastetechniques,whichisgenerallyusedasaformofreuse[6].Thisformofreuseusuallyresultsincode1clones.Unfortunately,practicingcopyandpastetechniquescancreateverycomplicatedmaintenancetasks[9].Inaddition,amongmanyotherfactors,codeclonescanariseasaresultofdesigndecisions,poorcohesionbetweenmodulesandpoorcommunicationbetweendevelopers.Codeclonescanbedetectedonanexactmatchbasis.Unfortunately,usingexactmatchcriteriaforcodeclonedetectionmaynotbesufficientinallcases.Forinstance,aprogrammermaycopyaparticularpieceofcode,pasteittoanotherlocationinthesystemandproceedtochangethepastedcodesuchthatitremainssyntacticallysimilartotheoriginalcode,butnotexactlysimilar.Thus,someformofnear-exactclonedetectionmustbeemployed.Thispaperpresentsoneexactmatchandtwonear-exactmatchalgorithmsforcodeclonedetection.Thesealgorithmsrelyonanabstractsyntaxtree(AST),whichstoresattributessuchasnonterminalproductioninformation,typeinformation,parentfile,parentclassandlinenumber.AllotherterminalsymbolsareintentionallynotstoredintheAST.Thus,thealgorithmsaredesignedtomatchthestructureofthecode,withrespecttotypes,ratherthanmatchonattributessuchasvariablenames,methodnamesorliterals.Theeffectivenessofthealgorithmswillbedeterminedbyahigh-levelanalysisofthealgorithmswhenrunagainstselectedmodulesinEclipseandHibernate.Amoredetailedanalysiswillbeconductedoncodegeneratedmanuallyforthepurposeofanalyzingthealgorithms.InthenexttwosectionsIwillcoversourcecodeandASTtransformation.Section4willpresentthecodeclonedetectionalgorithmsanddefinethematchingcriteriausedforthealgorithms.Then,insections5and6IdeterminetheeffectivenessofthealgorithmsandpresenttheresultsproducedwhenthealgorithmsarerunagainstmodulesfromEclipse,Hibernateandmanuallygeneratedcode.Finally,sections7-8coverthreatstovalidity,futureworkandconclusions.2CodeTransformationInordertoeffectivelyparsethecodewithinthecontextofmyanalysis,transformationoftheactualsourcecodetextisrequired.Thesetransformationsincluderemovalofcomments,substitutionofliteralscontainingtheterminalsymbol”//”,additionoffilenameidentifiersandfinallyconcatenationofallsourcefilesintoonelargesourcefile.Thesetransformationswillbeexplainedinthefollowingparagraphs.Thesubstitutionofstringliteralscanbecompletedinoneswoop,usingastreameditor,suchassed.IwroteasmallCprogramtoremovethecomments,withoutactuallydeletingthelinesthemselves.Wedonotwanttodeletethelines,sincelineinformationshouldremainconsistentthroughouttransformationandanalysis,sincewewouldliketoreportLOCpermethodandrelativelinenumberswhenreportingmethodmatches.Thus,asanexample,Figure1illustrateshowasourcefilewillbetransformed.Asafinalstepinthesourcecodetransformationphase,filenamesareaddedtothefirstlineofeachsourcefile.Then,allofthefilesareconcatenatedintoonelargefilefortheparsingstep.Byconcatenatingallofthefilesintoonelargefile,allofthemethodsubtreescanbeextractedinonesearch.2--Begintest.java}--Begintest.java\begin{verbatim}//Programmer:JohnDoe/*Mytestclass*//*Thisclassimplementsthefunctionfunc1*/classTestClass{classTestClass{//Function:func1//Preconditio
本文标题:Method-Level Code Clone Detection on Transformed A
链接地址:https://www.777doc.com/doc-3378378 .html