您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 经营企划 > python 基础练习题
Advancedcomputationlinguistics1.Collectthemostfrequentwordsin5genresofBrownCorpus:news,adventure,hobbies,science_fiction,romanceTocollectmostfrequentwordsfromthegivengenreswecanfollowthefollowingsteps:importnltkfromnltk.corpusimportbrownbrown.categories()['adventure','belles_lettres','editorial','fiction','government','hobbies','humor','learned','lore','mystery','news','religion','reviews','romance','science_fiction']news_text=brown.words(categories=['news','adventure','hobbies','science_fiction','romance'])fromnltk.probabilityimportFreqDistfdist=FreqDist([w.lower()forwinnews_text])voca=fdist.keys()voca[:50]['the',',','.','and','of','to','a','in','he','','``','was','for','that','it','his','on','with','i','is','at','had','?','as','be','you',';','her','but','she','this','from','by','--','have','they','said','not','are','him','or','an','one','all','were','would','there','!','out','will']voca1=fdist.items()voca1[:50][('the',18635),(',',17215),('.',16062),('and',8269),('of',8131),('to',7125),('a',7039),('in',5549),('he',3380),('',3237),('``',3237),('was',3100),('for',2725),('that',2631),('it',2595),('his',2237),('on',2162),('with',2157),('i',2034),('is',2014),('at',1817),('had',1797),('?',1776),('as',1725),('be',1610),('you',1600),(';',1394),('her',1368),('but',1296),('she',1270),('this',1248),('from',1174),('by',1157),('--',1151),('have',1099),('they',1093),('said',1081),('not',1051),('are',1019),('him',955),('or',950),('an',911),('one',903),('all',894),('were',882),('would',850),('there',807),('!',802),('out',781),('will',775)]Thismeansthatthefrequencyofword“the”ismorethanothers.2.Excludeorfilteroutallwordsthathaveafrequencylowerthan15occurrencies.(hintusingconditionalfrequencydistribution)Byaddingfunctionalitiesonthefirsttaskofcollectingwordsbasedontheirfrequencyofoccurrences,wecanfilterwordswhichhasfrequencyoccurrenceof=15.filteredText=filter(lambdaword:fdist[word]=15,fdist.keys())voca=fdist.keys()filteredText[:50]/*first50words*/['the',',','.','and','of','to','a','in','he','','``','was','for','that','it','his','on','with','i','is','at','had','?','as','be','you',';','her','but','she','this','from','by','--','have','they','said','not','are','him','or','an','one','all','were','would','there','!','out','will']filteredText[-50:]/*last50words*/['musical','naked','names','oct.','offers','orders','organizations','parade','permit','pittsburgh','prison','professor','properly','regarded','release','republicans','responsible','retirement','sake','secrets','senior','sharply','shipping','sir','sister','sit','sought','stairs','starts','style','surely','symphony','tappet',they'd,'tied','tommy','tournament','understanding','urged','vice','views','village','vital','waddell','wagner','walter','waste',we'd,'wearing','winning']3.Thenexcludeorfilteroutallstopwordsfromthelistsyouhavecreated.(hintusingconditionalfrequencydistribution)Tofilterthestopwordswehavetodefinetinyfunctionusingthewordnetlibraryfor'english'language.fromnltk.corpusimportstopwordsstopwords.words('english')['i','me','my','myself','we','our','ours','ourselves','you','your','yours','yourself','yourselves','he','him','his','himself','she','her','hers','herself','it','its','itself','they','them','their','theirs','themselves','what','which','who','whom','this','that','these','those','am','is','are','was','were','be','been','being','have','has','had','having','do','does','did','doing','a','an','the','and','but','if','or','because','as','until','while','of','at','by','for','with','about','against','between','into','through','during','before','after','above','below','to','from','up','down','in','out','on','off','over','under','again','further','then','once','here','there','when','where','why','how','all','any','both','each','few','more','most','other','some','such','no','nor','not','only','own','same','so','than','too','very','s','t','can','will','just','don','should','now']defcontent_fraction(text):...stopwords=nltk.corpus.stopwords.words('english')...content=[wforwintextifw.lower()notinstopwords]...returnlen(content)/len(text)...content_fraction(nltk.corpus.reuters.words())0.65997695393285261filterdText=filterStopword(freqDist)filterdText[:50][',','.','','``','?',';','--','said','would','one','!','could','(',')',':','time','like','back','two','first','man','made','Mrs.','new','get','way','last','long','much','even','years','good','little','also','Mr.','see','right','make','got','home','many','never','work','know','day','around','year','may','came','still']freqDist[:50][',','the','.','of','and','to','a','in','','``','was','for','that','he','on','with','his','I','it','is','The','had','?','at','as','be',';','you','her','He','--','from','by','said','have','not','are','this','him','or','were','an','but','would','she','they','one','!','all','out']FromtheresultinfilterdTextwordslike'the','it','is'andsoondoesnotexistcomparedtothesamenumberofoutputwithstopwords.len(freqDist)2341len(filterdText)2153Wecanfur
本文标题:python 基础练习题
链接地址:https://www.777doc.com/doc-4210401 .html