您好,欢迎访问三七文档
Time-SeriesClusteringandAssociationAnalysisofFinancialDataToddWittmanCS8980ProjectDecember15,2002Abstract:EachstocksoldontheNewYorkStockExchangeisclassifiedbyindustry.Thispaperdescribesapplyingdataminingtechniquestoreachtwobasicgoals.First,wewishtodeterminetheindustryclassificationgiventhehistoricalpricerecordofastock.Toachievethis,weexperimentedwithhierarchicalagglomerativeclusteringandafeed-forwardneuralnetwork.Issueswithoutliers,pre-andpost-processing,anddistancemeasuresarediscussed.Second,wewishtodeterminetherelationshipsamongthevariousindustries.WeappliedassociationanalysisontheDowJonesindustrialindicestogeneraterulesdescribingthestockmovementacrossindustries.Weconcludethepaperbydiscussingpossibleimprovementsandareasoffutureresearch.1.IntroductionDataminingoffinancialdatahasproventobeveryeffectiveandveryprofitable[1].Thegoalofthisprojectistoapplydataminingtechniquestoaninteresting,albeitnotverylucrative,areaoffinance.EverystocksoldontheNewYorkStockExchangeisclassifiedintoanindustrialcategory,orjustcalledan“industry.”Acompanyisidentifiedwithanindustrybasedonitsprimaryactivities,whichusuallymeanstheareafromwhichitderivesthelargestshareofitsrevenue.ThesecategoriesincludeChemicals,Biotechnology&Drugs,Retail,Healthcare,Utilities,andsoon.Inaddition,eachindustryisfurtherdividedintosub-categories.Forexample,theMedia&Advertisingindustryconsistsoffoursub-categories:Advertising,Movies&Music,Publishing&Printing,andTV&Radio.However,thesesub-categoriesarenotstandardized.Inthispaper,wewillworkwiththestandardindustrialcategories,asgivenby[2].Twonaturalproblemsarisefromindustrialcategories:classificationandassociation.Wewishtodevelopmethodsthatcananswerthefollowingtwoquestions:1.Canwedetermineastock’sindustrialcategorygivenahistoricalrecordofthestock’sprices?2.Howarethemovementsinstockpricesacrossthevariousindustriesassociated?InSection2,wedescribeclusteringandneuralnetworkapproachesforansweringthefirstquestion.InSection3,wedescribeassociationruleminingtoanswerthesecondquestion.WeconcludeinSection4bydiscussingthemeritsofeachapproachandareasforfutureresearch.2.Time-SeriesClusteringOurgoalistoproperlyidentifytheindustrialcategorytowhichastockbelongsgivenonlythehistoricalpricerecord.Gavrilov,Anguelov,Indyk,andMotwaniaccomplishedjustthisbyusinghierarchicalagglomerativeclustering[3].Forcomparison,wereferto[3]throughoutthispaper,pointingoutwhereourapproachdiffersfromthetechniquesofGavrilovet.al.Itshouldbenotedthatthisproblemisnotjustanexerciseinclustering.Thereareimportantapplicationstofinanceandinterestingconclusionsthatcanbedrawnaboutspecificstocks.Stockclusteringhasapplicationstoquantizingtheeffectoftrendswithinandbetweenindustries,identifyingmisclassifiedstocks,andportfolioevaluation.Anyattemptatclusteringthestocksisbasedonthefollowingcrucialassumption:Atime-seriesclusteringwillbevalidifandonlyifthepricefluctuationsofstockswithinagrouparecorrelated,butpricefluctuationsofstocksindifferentgroupsareuncorrelatedornotasstronglycorrelated.Thisstatementcanbeinterpretedtwoways.Theforwardversionofthisstatementsaysthatifweobtainagoodclusteringwithrespecttoourdistancemeasure,thenstockstendtomoveasagroup.Thisimplicationinandofitselfwouldbeusefultothefinancialworld.Moreover,clusteringstatisticscanquantizetowhatextentstocksmoveasagroup.Externalclusteringstatisticssuchasentropy,purity,andcohesiontellushowcloselystockswithinanindustryresembleeachother.Internalstatisticssuchasseparationandthesilhouettecoefficientcantellustowhatdegreetheindustries’behaviorsareseparatefromeachother.Thereverseversionoftheassumptionsaysthatifstockstendtomoveasagroup,thenweshouldbeabletoobtainasensibleclustering.Withoutthisassumption,anyclusteringwouldbemeaninglesseveniftheclusteringwasperfectwithrespecttothechosendistancemeasure.Also,weneedtoseedifferentiationbetweenthegroupsorelseourclusteringwillfail.Thisalsopointsoutapossibleapplicationoffinancialclustering.Sinceourclusteringisbasedsolelyonthehistoricalpricerecord,theclusteringwilldeterminewhichgroupthestockmostbehaveslike,whichisnotnecessarilythegrouptheNYSEcategorizeditas.Thestockmarketisbasedonperception,notreality.Perceptionhasbecomeevenmoreimportantinrecentyearswiththeadventofon-linetrading,whichreleasedafloodofanxiousandoftenill-informeddaytraders.Forexample,AOLTimeWarneriscategorizedbytheNYSEintheMedia&Advertisinggroup,becausethatiswhereitearnsthelargestportionofitsrevenue.However,ifmostinvestorsperceiveAOLTimeWarnerasbeinganInternetstock,thenitspricefluctuationswilltrackwiththoseoftheInternetgroup.BycomparingtheresultsofourclusteringtothegroupingsgivenbytheNYSE,wecanactuallylearnmorefromthemisclassificationsthanfromthosethatmatchtheNYSEstandard.Also,byexaminingclassstatisticssuchaspurityandentropy,wecansaytowhatdegreeastockfollowseachgroup.Forexample,wemightdeterminethatAOLTimeWarner’sbehavioris20%aMediastockand80%anInternetstock.Clusteringcouldalsobeveryhelpfulinanalyzingthetimeseriesofaportfolioincludingseveralstocks.Thebehaviorofaninvestmentportfolioisnotnecessarilydeterminedbythestockthatmakes
本文标题:Time-Series Clustering and Association Analysis of
链接地址:https://www.777doc.com/doc-5482416 .html