您好,欢迎访问三七文档
XuXinPresalesConsultantOracle(China)Co.,Ltd.数据仓库的概念Whatis...数据仓库(DataWarehouse)/数据集市(DataMart)决策支持系统(DecisionSupportSystem)联机分析处理(OLAP)/ROLAP/MOLAP元数据(MetaData)分析指标(Measure)/维(Dimension)星型模型(StarSchema)/雪花模型(SnowSchema)数据钻入/数据钻出(DrillDown/DrillUp)表旋转(TableRotation)数据挖掘(DataMining)数据仓库几大功能Query/ReportDrillup/DrillDownCompareExceptionForcast,WhatifDataMining数据仓库实施方法建立数据仓库需要考虑的因素扩展性灵活性集成性可靠性数据仓库专家的建议需要业务人员的积极参与通过原型设计验证需求确定数据仓库的范围,不要试图Warehouse所有数据为不同需求选择合适工具控制风险利用外部Consultant的经验重点放在不同系统的集成建立数据仓库举例UseaBuildingEstateOLTPdatabaseasanexampletoillustratetheconceptsandhowtobuildasuccessfulDataWarehousewhichusedtocheckandforecasttherentalrateandsellamountinHongKong.步骤1:确定数据仓库的问题范围列出4月份香港地区每日房屋销售情况找出销售额大于4百万的居民住宅项目比较Whampoo和Kornhill地区上月销售情况找出售屋数量最多的前3个地区截止到当月的累计销售数量用图表反映最佳销售模式时间序列分析确定数据仓库的问题范围确定业务需求和用户需求:•用户查询执行的频度•系统保留数据的年限•用户主要希望从哪些角度,哪些层次分析数据•数据源是哪些系统步骤2:选择合适的软硬件平台可靠的供应商数据建模和管理工具易用性开放集中管理性能并行处理选择数据库平台的依据:前3位的考虑因素:易用性92.4%集中管理65.2%可靠的供应商65.2%数据仓库的考虑因素(Source:DataWarehouseInstitute-February96)MOLAP还是ROLAP?ClientinterfaceMetadataOLAPengineMDDBDataWarehouseOperationalDataDataloadingOLAPserverSQLpassthroughOLAPengineOLAPserverDataWarehouseMetadataClientinterfaceMOLAPROLAPMiddle-tierserverCLIENTDatabaseserverSQLROLAP和MOLAP的功能区别Legacy/Operational/ExternalSystemsTransactionSystemsDecisionSupportSystemsKnowledgeWorkersStrategicTacticalWhat-IfAnalysisCustomizedApplicationsDrill&PivotAnalysisForecastingModelingTime-SeriesQuery&ReportingMOLAPROLAPMDBRDBMSDataCachelinkage步骤3:根据需要创建新的实体#Code_noNo_of_transactionConstructor_IDDeveloper_IDBuildingdatePurchasedatePurchasepriceAddressAreaApartment#Code_no#Transaction_noName/CompanyHKIDContactPhone#ContactAddressPurchaseDatePurchasePriceOwner#Code_no#Flat#Transaction_noNameHKIDOccupy_type(P,R)ContactPhone#ContactAddressDatePriceOccupantContractor_IDCompanyNameAddressContactPhone#Constructor#Code_no#FlatNo_of_transTypeFloorArea(Building)Area(Actual)FlatDetailsDeveloper_IDCompanyNameAddressContactPhone#DeveloperDayMonthQuarterYearTimeTerritoryDistrictRegionBuilding/EstateGeographicLocationTypeSizeAreaHousingTypes步骤4:确定维表删除不必要的表#Code_noNo_of_transactionConstructor_IDDeveloper_IDBuildingdatePurchasedatePurchasepriceAddressAreaApartment#Code_no#Transaction_noName/CompanyHKIDContactPhone#ContactAddressPurchaseDatePurchasePriceOwner#Code_no#Flat#Transaction_noNameHKIDOccupy_type(P,R)ContactPhone#ContactAddressDatePriceOccupantContractor_IDCompanyNameAddressContactPhone#Constructor#Code_no#FlatNo_of_transTypeFloorArea(Building)Area(Actual)FlatDetailsDeveloper_IDCompanyNameAddressContactPhone#DeveloperDayMonthQuarterYearTimeTerritoryDistrictRegionBuilding/EstateGeographicLocationTypeSizeAreaHousingTypes步骤5:建立层次结构Date1-Jan-9413-Jun-9512-Jan-9612-Apr-9615-Apr-9620-Oct-9620-Oct-9612-Dec-961-Jan-9731-Mar-9715-Apr-97?.TimeYearQuarterMonthDayTimeHierarchy步骤6:确定属性ClassSSMMMLLLSMLSMAreasq.m10-39.940-49.950-69.970-99.9100-129.9130-159.9160-199.9Over20010-129.9130-299.9Over30010-49.950-129.9SizeABCDETypeDomesticIndustrialShops&CommercialOthers/MiscAreasq.m2563035125400810400300100210300?.TypeSizeAreaClass:AttributesofHousingTypeHousingTypeOccupantHousingTypedimensionlookuptableAttributes步骤7:建立FactTable,确定合适的粒度TimeLocationTypeAreaOccupantNamePurchasePriceRent?.SalesFactTable步骤8:建立数据仓库模型BuildingEstateOLTPEnvironmentTimeLocationTypeAreaOccupantNamePurchasePriceRent?.SalesFactTableDayMonthQuarterYearTimeTerritoryDistrictRegionBuilding/EstateGeographicLocationTypeSizeAreaHousingTypes#Code_noNo_of_transactionConstructor_IDDeveloper_IDBuildingdatePurchasedatePurchasepriceAddressAreaApartment#Code_no#Transaction_noName/CompanyHKIDContactPhone#ContactAddressPurchaseDatePurchasePriceOwner#Code_no#Flat#Transaction_noNameHKIDOccupy_type(P,R)ContactPhone#ContactAddressDatePriceOccupantContractor_IDCompanyNameAddressContactPhone#Constructor#Code_no#FlatNo_of_transHousingTypeFloorArea(Building)Area(Actual)FlatDetailsDeveloper_IDCompanyNameAddressContactPhone#DeveloperTransformBuildingEstateDataWarehouseOLAPEnvironment步骤9:数据仓库模型优化TerritoryDistrictTimeLocationTypeAreaOccupantNamePurchasePriceRent?.SalesFactTableTypeSizeAreaDistrictDiatrictRegionRegionRegionBuilding/EstateYearQuarterQuarterYearQuarterMonthQuarterMonthDayMonthDayHousingTypesTimeLocationTypeAreaOccupantNamePurchasePriceRent?.SalesFactTableDayMonthQuarterYearTimeTerritoryDistrictRegionBuilding/EstateGeographicLocationTypeSizeAreaHousingTypesStarschemaSnowflakeschema数据仓库设计优化的原则避免数据实时汇总(建立汇总表)减少表连接操作(不要超过3-5个)用IDcode作关键字减少I/O竞争利用分区技术提高性能和可管理性估算数据仓库容量的算法X=sizeofonerowinthefacttableTherefore,SalesFactTable=98TimeLocationTypeAreaOccupantNamePurchasePriceRent?.SalesFactTableDayMonthQuarterYearTimeTerritoryDistrictRegionBuilding/EstateGeographicLocationTypeSizeAreaHousingTypesBuildingPropertyDWDesignDimensionEstimated#Time96monthHousingTypeGeographicLoc.20types1000locs.Y=sparsity(densityfactor)-0.75Estimatedsizeofdatabase=98*96*20*1000*0.75=141.12Mb步骤10:从业务系统中抽取数据到数据仓库数据抽取的要求:可访问各种数据源可满足时间要求可满足数据转换要求可检测源系统中数据的变化步骤1
本文标题:数据仓库概念
链接地址:https://www.777doc.com/doc-27469 .html