您好,欢迎访问三七文档
2020/3/9lgao@mail.xidian.edu.cnDataWarehousingandOLAPTechnologyGaolinSchoolofComputerScienceandTechnologyXidianUniversityContentsoflastcourseWhypreprocessthedata?Datacleaning清理Dataintegration集成Datatransformation变换Datareduction约简Datadiscretization离散化SummaryContentsoftodayWhatisadatawarehouse?Amulti-dimensionaldatamodelFromdatawarehousingtodataminingWhatisDataWarehouse?Definedinmanydifferentways,butnotrigorously.Adecisionsupportdatabasethatismaintainedseparatelyfromtheorganization’soperationaldatabaseSupportinformationprocessingbyprovidingasolidplatformofconsolidated,historicaldataforanalysis“Adatawarehouseisasubject-oriented面向主题,integrated,time-variant,andnonvolatile非易失collectionofdatainsupportofmanagement’sdecisionmakingprocess.”DatawarehousingTheprocessofconstructingandusingdatawarehousesDataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,salesFocusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessingProvideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocessDataWarehouse—IntegratedConstructedbyintegratingmultipleheterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareappliedEnsureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconvertedDataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystemsOperationaldatabase:currentvaluedataDatawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”DataWarehouse—NonvolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironmentOperationalupdateofdatadoesnotoccurinthedatawarehouseenvironmentDoesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdataDataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsprogramontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysisDataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)联机事务处理MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)联机分析处理MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER(entity-relationship)+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueriesWhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciledDataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelFromdatawarehousingtodataminingFromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid(方体).Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapex(顶点)cuboid.Thelatticeofcuboidsformsadatacube2-DviewA2-DviewofsalesdataforAllElectronicsaccordingtothedimensionstimeanditem,wherethesalesarefrombrancheslocatedinthecityofVancouver.Themeasuredisplayedisdollarssold(inthousands)3-Dview3-Ddatacuberepresentation4-DdatacuberepresentationCube:ALatticeofCuboidsSchemasforMultidimensionalDatabases:StarSchemaSchemasforMultidimensionalDatabases:SnowflakeSchemaSchemasforMultidimensionalDatabases:FactConstellationAConceptHierarchy:Dimension(location)Aconcepthierar
本文标题:Lec-3 Data warehousing and OLAP__ technology 09
链接地址:https://www.777doc.com/doc-4254632 .html