您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 质量控制/管理 > 高级数据库技术问答题(附答案)
DatabaseSystemProblem1Thehashjoinalgorithmdescribedinthetextbookcomputesthenaturaljoinoftworelations.Writepseudocodeforaniteratorthatimplementsahashbasedalgorithmforcomputingan“anti-join”operation:Ranti-joinScontainsalltuplesinRthathavenomatchingtuplesinS.Althoughanti-joinisnotanoperationthatcanbespecifieddirectlyinSQL,itisimplementedinthequeryprocessortosupportsometypesofqueries.Yourpseudocodemustdefinethestandarditeratorfunctionsopen(),next(),andclose(),andmustfetchtheinputtuplesusingiteratorcalls(inotherwords,RandSmaynotbebaserelations,butmaybeoutputsofotheroperators).Showwhatstateinformationtheiteratormustmaintainbetweencalls.Forsimplicity,youcanassumethatSissmallandeasilyfitsinmemory.Herearemoredetailsonanti-join::(a)Forthediskthatwillholdtheindex,thetimetoreadagivenblockintomemorycanbeapproximatedby(25+.02*m)milliseconds.The25millisecondsrepresenttheseekandlatencycomponentsoftheread,the.02*mmillisecondsisthetransfertime.(Thatis,asmbecomeslarger,thelargertheblockwillbeandthemoretimeitwilltaketoreaditintomemory.)(b)Oncetheblockisinmemoryabinarysearchisusedtofindtherightpointer.Sothetimetoprocessablockinmainmemoryis(a+blog_2m)milliseconds,forsomeconstantsa,b.(log_2mmeanslogbase2ofm.)(c)Themainmemorytimeconstantaismuchsmallerthanthediskseekandlatencytimeof25milliseconds.(d)Theindexisfull,sothatthenumberofblocksthatmustbeexaminedpersearchislog_mN.Questions:(i)Whatvalueofmminimizesthetimetosearchforagivenrecord?AnapproximateanswerisOK.(HINT:Ifyoucomeupwithanequationwhichishardtosolvealgebraically,tryplugginginvaluestolocatetherootoftheequation.)Thevalueyouobtainshouldbeindependentofb!(ii)Whathappensastheseekandlatencyconstant(25ms)decreases?Forinstance,ifthisconstantiscutinhalf,howdoestheoptimummvaluechange?Answer:LetT1bethetimetakentoreadoneblockintomainmemoryandT2bethetimetoprocessthatblockinmemory.Ifnblocksareexaminedtosearchforagivenrecord,thenthetimetakenforthissearchis:n*(T1+T2)Substitutingtheappropriateexpressionsforallthevariables,wegetT(m)=[(25+0.02m)+(a+blog2m)]*logmNIgnoringaincomparisonwith30,wegetT(m)=[25+0.02m+blog2m]*logmNSubstitutinglog2mbyln(m)/ln(2)andlogmNbyln(N)/ln(m),wegetT(m)=[(25+0.02m)/ln(m)]*ln(N)+b*ln(N)/ln(2)Hence,tominimizeT(m),weneedtominimizef(m)=(25+0.02m)/ln(m).Anumberoftechniquescanbeappliedtodeterminethevalueofmthatminimizesthisexpression.Forexample,takingthederivativeandsettingittozero(f'(m)=0),wegetm*(ln(m)-1)=1250Thiscanbesolvedtoyieldanintegralvalueof272.Itiseasytoverifythatthisvaluedoesindeedminimizef(m)andhenceT(m).Seek+latencyconstantdecreases:LetussplitT(m)intothreecomponentsasfollows(here,tsistheseek+latencytimeconstantwhichwas30intheabovecase):TotalseekandlatencytimeTa=ts*ln(N)/ln(m)TotaltransfertimeTb=0.02*m*ln(N)/ln(m)TotalbinarysearchtimeTc=b*ln(N)/ln(2)Ofthese,Tcisindependentofm,TaisadecreasingfunctionofmwhereasTbisanincreasingfunctionofm.Iftsisverysmall,thenTaisnegligiblecomparedtoTb.Hence,wemustchooseasmallvalueofmtominimizethesearchtime.Ontheotherhand,ifTaisverylargecomparedtoTb,thentheformerdominatesandwechoosealargevalueofmtominimizesearchtimes.Fromtheselimitingcases,itiseasytoseethatiftsdecreases,theoptimumvalueofmdecreases.Inparticular,ifts=25/2=12.5,wegetanoptimumvalueof155(byproceedingalongthesamelinesasabove).Problem3Tobuildahashindexforamulti-attributesearchkey,wecanuseanapproachcalledpartitionedhashing.Thepartitionedhashfunctionisreallyalistofhashfunctions,oneforeachattributeinthesearchkey.SupposethatwewishtobuildapartitionedhashindexonR(A,B)with2nbucketsnumbered0to2n1.Inthiscase,thepartitionedhashfunctionconsistsoftwohashfunctionshAandhB.HashfunctionhAtakesavalueofAasinputandproducesaresultwithnAbits,andhBtakesavalueofBasinputandproducesaresultwith(nnA)bits.Thetworesultsareconcatenatedtogethertoproducetheresultofthepartitionedhashfunction,whichisthenusedtoindexthebuckets.TolocaterecordswithA=aandB=b,wesimplygodirectlytothebucketnumberedhA(a)hB(b)(inbinary).(a)WhichbucketsdowehavetoexamineinordertolocateallrecordswithA=a?(b)Supposewearegivenaquerymix.EachqueryinthismixwilleitheraskforrecordswithagivenvalueofA,oritwillaskforrecordswithagivenvalueofB(butneverboth).Withprobabilityp,thevalueofAwillbespecified.Giveaformulaintermsofn,nA,andpfortheexpectednumberofbucketsthatmustbeexaminedtoanswerarandomqueryfromthemix.(c)FindthevalueofnA,asafunctionofnandp,thatminimizestheexpectednumberofbucketsexaminedperquery?Answer:(a)(b)(c)Firstf(x)=thendeterminethevalueofNathatminimizesthef(x)(Tips:useyourcalculusknowledge)
本文标题:高级数据库技术问答题(附答案)
链接地址:https://www.777doc.com/doc-7391958 .html