您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 信息化管理 > 点击流数据仓库构建方案研究与实现
华侨大学硕士学位论文点击流数据仓库构建方案研究与实现姓名:郑传钦申请学位级别:硕士专业:计算机应用技术指导教师:陈维斌2007102823ETLETLETLwebETL4ABSTRACTE-commercesitemakeagreatdealclickstreamdataeverday,whichincludemanyusefulinformation.Forexample,wherearecustomerfrom,whatactioncustomertook,whatdidcustomerinterestin.analysisingthesedatacannotonlydirectbuildingofsite,strengthenadhesion,butreflectenterprise’statusonmarketingandfinace.Inshort,deeplyanalysisingthesedatamayhelpownerofsitetoimprovecustomerrelationshipandqualitiesofservice.Basingonshoppingwebsite,thethesisisaclick-streamdatawarehouseforexperiment.Takethewebsiteasanexample,wemakeanalysissubjectandbuildvariousgranularitydatamartbasingonmulti-demensionmodeling.afteranalyzingsomekindsofclick-streamdata,thetextbringforwardaETLsystemfully.Itcompriseattemperproject,datapretreatment,etltools,breech-loadingmechanism.finally,forshowingthevalueofclick-streamdata,thetexthaveaon-lineanalysisprocessforit.Theclick-streamdatawarehousebuildingprojectcannotonlygetaanlysisfortheinformationofwebsite,butsupportdeep-seatedminingforuserprofileandsaleanalysis.Keywords:click-stream,datawarehouse,weblog,ETL71.1.CRMWEBCRM1.2.81PageSessionReferrer2.ReferringURLReferring1.3SQLServerAnalysisService1.491)perlperl2)sessionIDcookieIP3)ETLinformatica1.5ETLETLETL10WebWebWebWeb2.111CNNIC202025663421112IPIP33123ReferrerReferrerLink45[4]122.2IBMWebSphere,BroadVision,ILog,IF-THENWebWatch,Letizia,ETFIUserProfileWebNetPerceptionsGroupLens,FireflyPassport,WebWatcherKNNtheKNearest-NeighborKNNAggarwaletal.OConnerandHerlockerSarwaretal.132.3WebWebwebusageWebWebWebWebWeb1Web2URLIPWebWebWeb3WebWebWeb4WebWebWebWebWebWebWebWeb14WebWeb[4]2.4WebRalphKilmallInmonWebWebWebWeb[14]2.5BlogGoogleADWorldADCenterWebWebWeb15Web2.0Web[16]163.11IndexloginregeditSearchCategorylistproductInfonewGoodshotGoodsdiscountproductInfo1collection2carteBankalipay.3.1IndexloginregeditSearcCategorylistproductInfonewGoodshotGoodsdiscountcollectioncarteBankalipay173.12webProduct31IdNameGradedetailPriceRecommendSoldviewSumCategoryidDiscountpicspicCategory32CategoryidparentIDcategoryfirstUser33userIdUsernameuserEmailuserMobileuserQQQQuserLoginlastLoginRealnamevipipipmsnmsnAddressshengshi18daysOrder34actionIdUsernameipactionDateproductIdproductnumreceiptaddresspostcodedeliverymethodpaymethodsexpaidemailuserteluserId0Sheng35IdnameShi36IdnameShengID3.2193.2.13.2.212/3,204.1KPIWeb[14]4.2ETLETL[15]4-1XXXXXXXXXXXXXXX4.3ETL214.1OLAPsqlHTML4.14.4224.4.14-2(d_user_Date)IDDataTypeSeasoncalendarTypeTimeStampYearquarterInYearmonth112CalendarMonthDay1366weekNamedayNumberInWeek17dayNumberInMonth131workDay4-3(d_user_time)IDDataType23PeriodTimeStampGMT_HOUR023GMT_MINUTE059GMT_SECOND0594.4.2(d_site_geography)webURLURL4-4idParent_urlurlurlurldateurl_id4.4.3(d_user)web4-5idsessionIDuserIDipAgentReal_nameUser_typeemailUser_gradephonesexmobileUser_zhuceadress24Last_days4.4.4(d_promotion)4-6idproductIDType000102name4.4.5d_geography4-7IDdataType01ProvinceProvinceIDCityCityID4.4.6d_product4-8IdproductIDGradeNamePricedetail25DiscountRecommend4.54.5.11.2.3.4.5.6.7.8.9.4.24.2HTTPididididididid264.5.21.2.3.4.5.6.,.4.34.3URLURLidididididid274.5.3webd_channel,web1.2.4.4ididididid28ETL5.1ETLETL3webETLweb5.2ETL5.2.1web1(exYYMMDD.log)2perlcopyETLYYMMDD.flg293perl4informatica5.15.2.25.2perl1commandcmd_clearpealclear.pl,cmd_analysispealanalysis.pl,cmd_userpealuser.plcmd_pagepealpage.pl,cmd_sessionpealsession.pl;(exYYMMDD.log)302commanddecisioncommandcommand3command4dec_dimension55.25.2.3ETL1perlperlperlperlperlperl31perlinformatica2.informaticainformaticaPowerCenter71)InformaticaPowerCenter2)MappingMapping3)workflows3informatica132:2:34Informatica5InformaticaETLETL336ETL5.3webISPwebcookie5.3.1webWebIISApachIISW3CExtendedLogFileFormatW3CCentralizedLoggingNCSACommonLogFileFormatIISLogFileFormatODBCLoggingCentralizedBinaryLoggingHTTP.sysErrorLogFilesHTTPWebCLF51CLFECLFwebHTTP5-1web34CLF*ECLF*IPIDCLF{dd/mm/yyyy:hh:mm:sszone}HTTPURLIPIPIDstrftime3URLURLCookiecookiecookie1.HTTPinternetIPwebwebinternetIP4035pcIPIPISPpcpcIP2.identdweb3.HTTPSSLIDSSL4.webHTTP5.“GET/images/under-c.gifHTTP/1.0”GET”HTTP,”/image/under-c.gif”URL”HTTP/1.0”HTTPGETPOSTGETwebPOSTwebwebwebCGI366.200ok404()7.8.HTTP10URL9.webActiveX“Mozilla/4.0(compatible;msie4.01;window98)”“scooter/2.0G.R.A.B.V1.1.0”window98MicrosoftAltavista10.URL11.webHTTP37web12.IPIP4IP613.TCP/IPHTTP80,443HTTPSSLweb14.IDweb15.URLURLHTTPFTPinternetIP():8080/jwc/index.html?teacherID=150151:8080/jwc/index.htmlteacherID=1501515.3.2cookieSet-Cookie38CookieCookieHeaderCookieCookieCookiecookiewebName=cookieValue=cookieExpires=cookieDomain=cookiePath=cookieSecure=SSLHTTPScookieusername=zcq;+ASPSESSIONIDSSTDSQDB=MMGPHNHBDLAEHCLOIKCDGPON5.3.3web5-2-3-15-2-3-15-239idParent_urlurlurlurlParent_namenamedate5.4:1):ETL2)3)HTML4):5):.5.4.11datetimec-ipcs-methodcs-uri-stemsc-statussc-bytescs-bytestime-takencs-hostcs(User-Agent)cs(Cookie)cs(Referer)2RegularExpressionperlclear.plDate[0-9]{4}\-[0-9]{2}\-[0-9]{2}Time[0-9]{2}\:[0-9]{2}\:[0-9]{2}c-ipip\d{1,3}\.\d{1,3}\.\d{1,3
本文标题:点击流数据仓库构建方案研究与实现
链接地址:https://www.777doc.com/doc-27823 .html