您好,欢迎访问三七文档
当前位置:首页 > 金融/证券 > 股票报告 > IR-Solution-Manual
Preliminarydraft(c)2007CambridgeUPAnIntroductiontoInformationRetrievalChristopherD.ManningPrabhakarRaghavanHinrichSchützeCambridgeUniversityPressCambridge,EnglandPreliminarydraft(c)2007CambridgeUPDRAFT!DONOTDISTRIBUTEWITHOUTPRIORPERMISSION©2007CambridgeUniversityPressByChristopherD.Manning,PrabhakarRaghavan&HinrichSchützePrintedonDecember12,2007Website::informationretrieval@yahoogroups.comPreliminarydraft(c)2007CambridgeUPvSolutionstoexercisesThisdocumentcontainssolutionstomostexercisesinIntroduc-tiontoInformationRetrieval.Thesolutionswereprovidedbyastudentandmanyhavenotbeencheckedbytheauthorsyet.Pleasesendcommentsandcorrectionstoinformationretrieval(at)yahoogroups.com.Requestsforthisdocumentshouldbedirectedtotheaddressgiveninthesection“Problemsets”at(c)2007CambridgeUPPreliminarydraft(c)2007CambridgeUPDRAFT!©December12,2007CambridgeUniversityPress.Feedbackwelcome.viiBooleanretrieval?Exercise0.1[]Drawtheinvertedindexthatwouldbebuiltforthefollowingdocumentcollection.(SeeFigure1.3foranexample.)Doc1newhomesalestopforecastsDoc2homesalesriseinjulyDoc3increaseinhomesalesinjulyDoc4julynewhomesalesriseSOLUTION.InvertedIndex:forecast-1home-1-2-3-4in-2-3increase-3july-2-3new-1-4rise-2-4sale-1-2-3-4top-1Exercise0.2[]Considerthesedocuments:Doc1breakthroughdrugforschizophreniaDoc2newschizophreniadrugDoc3newapproachfortreatmentofschizophreniaDoc4newhopesforschizophreniapatientsa.Drawtheterm-documentincidencematrixforthisdocumentcollection.b.Drawtheinvertedindexrepresentationforthiscollection,asinFigure1.3(page7).SOLUTION.Term-Documentmatrix:d1d2d3d4Approach0010breakthrough1000drug1100for1011hopes0001new0111of0010patients0001schizophrenia1111treatment0010InvertedIndex:Approach-3breakthrough-1drug-1-2for-1-3-4hopes-4new-.2-3-4of-3patients-4schizophrenia-1-2-3-4treatment3Exercise0.3[]ForthedocumentcollectionshowninExercise1.2,whatarethereturnedresultsforthesequeries:a.schizophreniaANDdrugPreliminarydraft(c)2007CambridgeUPviiiBooleanretrievalb.forANDNOT(drugORapproach)SOLUTION.(i)doc1,doc2(ii)doc4?Exercise0.4[]Forthequeriesbelow,canwestillrunthroughtheintersectionintimeO(x+y),wherexandyarethelengthsofthepostingslistsforBrutusandCaesar?Ifnot,whatcanweachieve?a.BrutusANDNOTCaesarb.BrutusORNOTCaesarSOLUTION.a.TimeisO(x+y).Insteadofcollectingdocumentsthatoc-curinbothpostingslists,collectthosethatoccurinthefirstoneandnotinthesecond.b.TimeisO(N)(whereNisthetotalnumberofdocumentsinthecollection)assumingweneedtoreturnacompletelistofalldocumentssatis-fyingthequery.ThisisbecausethelengthoftheresultslistisonlyboundedbyN,notbythelengthofthepostingslists.Exercise0.5[]ExtendthepostingsmergealgorithmtoarbitraryBooleanqueryformulas.Whatisitstimecomplexity?Forinstance,consider:c.(BrutusORCaesar)ANDNOT(AnthonyORCleopatra)Canwealwaysmergeinlineartime?Linearinwhat?Canwedobetterthanthis?SOLUTION.WecanalwaysintersectinO(qN)whereqisthenumberofquerytermsandNthenumberofdocuments,sotheintersectiontimeislinearinthenumberofdocumentsandqueryterms.SincethetightestboundforthesizeoftheresultslistisN,thenumberofdocuments,youcannotdobetterthanO(N).Exercise0.6[]WecanusedistributivelawsforANDandORtorewritequeries.a.Showhowtorewritetheabovequeryintodisjunctivenormalformusingthedis-tributivelaws.b.Wouldtheresultingquerybemoreorlessefficientlyevaluatedthantheoriginalformofthisquery?c.Isthisresulttrueingeneralordoesitdependonthewordsandthecontentsofthedocumentcollection?Preliminarydraft(c)2007CambridgeUPixSOLUTION.Queryindisjunctivenormalform:(brutusandnotanthonyandnotcleopatra)or(caesarandnotanthonyandnotcleopatra).Inthiscase,disjunctivenormalformismoreefficientthanconjunctivenormalform.Intheformercase,wecomputeintersectionsfirstandthen,inthelaststep,oneunionoftwopostingsliststhat(hopefully)aresmall.Inthelattercase,westartwithaunionofpostingslistsandhavetodealwithpotentiallylargeintermediateresults.Theabovereasoningisprobablynottrueforsomewords,e.g.,(rare-word-1orrare-word-2)andnot(hongorkong),assuminghongandkongareveryfrequentandoccurinthesamedocuments.Theaboveisnottrueifthereareonlynegatedquerywordsinthedisjunctivenormalform.Exercise0.7[]Recommendaqueryprocessingorderford.(tangerineORtrees)AND(marmaladeORskies)AND(kaleidoscopeOReyes)giventhefollowingpostingslistsizes:TermPostingssizeeyes213312kaleidoscope87009marmalade107913skies271658tangerine46653trees316812SOLUTION.Usingtheconservativeestimateofthelengthofunionedpostingslists,therecommendedorderis:(kaleidoscopeOReyes)(300,321)AND(tangerineORtrees)(363,465)AND(marmaladeORskies)(379,571)However,dependingontheactualdistributionofpostings,(tangerineORtrees)maywellbelongerthan(marmaladeORskies)becausethetwocom-ponentsoftheformeraremoreasymmetric.Forexample,theunionof11and9990isexpectedtobelongerthantheunionof5000and5000eventhoughtheconservativeestimatepredictsotherwise.S.Singh’ssoluti
本文标题:IR-Solution-Manual
链接地址:https://www.777doc.com/doc-5476213 .html