您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 信息化管理 > OLAP Over Uncertain and Imprecise Data
OLAPOverUncertainandImpreciseDataT.S.Jayram(IBMAlmaden)withDougBurdick(Wisconsin),PrasadDeshpande(IBM),RaghuRamakrishnan(Wisconsin),ShivakumarVaithyanathan(IBM)CAMANYTXEastWestAllLocationCivicSierraF150CamryTruckSedanAllAutomobileDimensionsinOLAPAuto=TruckLoc=EastSUM(Repair)=?Measures,Facts,andQueriesMANYTXCAWestEastALLCivicSierraF150CamryTruckSedanALLAutomobilep1p2p3p4p5p6p7p8Auto=F150Loc=NYRepair=$200CellLocationExtendtheOLAPmodeltohandledataambiguityImprecisionUncertaintyMANYTXCAWestEastALLLocationCivicSierraF150CamryTruckSedanALLAutomobilep1p2p3p4p5p6p7p8Auto=F150Loc=EastRepair=$200p9p10Imprecisionp11RepresentingImprecisionusingDimensionHierarchiesDimensionhierarchiesleadtoanaturalspaceof“partiallyspecified”objectsSourcesofimprecision:incompletedata,multiplesourcesofdataSierraF150TruckMANYEastp1p3p5p4p2MotivatingExampleWeproposedesideratathatenableappropriatedefinitionofquerysemanticsforimprecisedataQuery:COUNTDesideratumI:ConsistencyConsistencyspecifiestherelationshipbetweenanswerstorelatedqueriesonafixeddatasetSierraF150TruckMANYEastp1p3p5p4p2DesideratumII:FaithfulnessFaithfulnessspecifiestherelationshipbetweenanswerstoafixedqueryonrelateddatasetsSierraF150MANYp3p1p4p2p5SierraF150MANYp3p1p4p2p5SierraF150MANYp3p1p4p2p5DataSet1DataSet2DataSet3FormaldefinitionsofbothConsistencyandFaithfulnessdependontheunderlyingaggregationoperatorCanwedefinequerysemanticsthatsatisfythesedesiderata?p3p1p4p2p5MANYSierraF150QuerySemanticsPossibleWorlds[Kripke63,…]SierraF150MANYp4p1p3p5p2p1p3p4p5p2p4p1p3p5p2MANYMANYSierraF150SierraF150p3p4p1p5p2MANYSierraF150w1w2w3w4AllocationAllocationgivesfactsweightedassignmentstopossiblecompletions,leadingtoanextendedversionofthedataSizeincreaseislinearinnumberof(completionsof)imprecisefactsQueriesoperateoverthisextendedversionKeycontributions:AppropriatecharacterizationofthelargespaceofallocationpoliciesDesigningefficientallocationpoliciesthattakeintoaccountthecorrelationsinthedataPossibleWorldsQuerySemanticsGivenallpossibleworldstogetherwiththeirprobabilities,queriesareeasilyanswered(usingexpectedvalues)Butnumberofpossibleworldsisexponential!StoringAllocationsusingExtendedDataModelp3p1p4p2p5MANYSierraF150IDFactIDAutoLocRepairWeight11F150NY1001.022SierraNY5001.033F150MA1500.643F150NY1500.454SierraMA2001.065F150MA1000.575SierraMA1000.5TruckEastClassifyingAllocationPoliciesIgnoredUsedIgnoredUsedUniformEMCountMeasureCorrelationDimensionCorrelationResultsonQuerySemanticsEvaluatingqueriesoverextendedversionofdatayieldsexpectedvalueoftheaggregationoperatoroverallpossibleworldsintuitively,thecorrectvaluetocomputeEfficientqueryevaluationalgorithmsforSUM,COUNTconsistencyandfaithfulnessforSUM,COUNTaresatisfiedunderappropriateconditionsDynamicprogrammingalgorithmforAVERAGEUnfortunately,consistencydoesnotholdforAVERAGEAlternativeSemanticsforAVERAGEAPPROXIMATEAVERAGEE[SUM]/E[COUNT]insteadofE[SUM/COUNT]simplerandmoreefficientsatisfiesconsistencyextendstoaggregationoperatorsforuncertainmeasuresUncertaintyMeasurevalueismodeledasaprobabilitydistributionfunctionoversomebasedomaine.g.,measureBrakeisapdfovervalues{Yes,No}sourcesofuncertainty:measuresextractedfromtextusingclassifiersAdaptwell-knownconceptsfromstatisticstoderiveappropriateaggregationoperatorsOurframeworkandsolutionsfordealingwithimprecisionalsoextendtouncertainmeasuresSummaryConsistencyandfaithfulnessdesideratafordesigningquerysemanticsforimprecisedataAllocationisthekeytoourframeworkEfficientalgorithmsforaggregationoperatorswithappropriateguaranteesofconsistencyandfaithfulnessIterativealgorithmsforallocationpoliciesCorrelation-basedAllocationInvolvesdefininganobjectivefunctiontocapturesomeunderlyingcorrelationstructureamorestringentrequirementontheallocationssolvingtheresultingoptimizationproblemyieldstheallocationsEM-basediterativeallocationpolicyinterestinghighlight:allocationsarere-scalediterativelybycomputingappropriateaggregations
本文标题:OLAP Over Uncertain and Imprecise Data
链接地址:https://www.777doc.com/doc-3358838 .html