您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 冶金工业 > 自然与语言处理review
1一.1.NLP中两种派系Rational和Statistal的基本观点和方法Symbolicapproach:Encodealltherequiredinformationintocomputer(rationalism).linguisticknowledge(staticknowledge,context-dependentknowledge).worldknowledge(uniquenessofreference,typeofnoun,situationalassociativitybetweennoun)Statisticapproach:inferlanguagepropertiesfromlanguagesamples(empiricism)CollectalargecollectionoftextsrelevanttoyourdomainForeachnoun,computeitsprobabilitytotakeacertaindeterminerP(determiner|noun)=)()mindet,(nounfreqerernounfreqGivenanewnoun,selectadeterminerwiththehighestlikelihoodasestimatedonthetrainingcorpus2.给定“自然”两个字的三种编码,判断是Big5,GB2312或者UTF-8,说明理由Reason:Big5:-thefirstbyterangesfrom0xA0-0xF9,-thesecondbyterangefrom0x40-0x7e,0xA0to0xFE-ASCIIcharactersarestillrepresentedwithasinglebyte-theMSBofthefirstbyteofaBig5characterisalways1-Big5isan8-bitencodingwitha15-bitcodespaceGB2312:-containedonlyonecodepointforeachcharacter-MSB.bit-8ofeachbyte,issetto1,andthereforebecomesa8-bitcharacter.Otherwise,thebyteisinterpretedasASCII-everyChinesecharacterisrepresentedbyatwo-bytecode.theMSBofboththefirstandsecondbytesareset自然语言处理GB2312D7D4C8BBD3EFD1D4B4A6C0EDBig5A6DBB54DBB79A8A5B342B27AUTF-8E887AAE784B6E8AFADE8A880E5A484E790863.给定五个中文词语,判断属于哪种构词法(注意英文术语)Modifiednouncompound(大人,小人,热心,水手,黑板,去年)Modifiedverbcompound(寄生,飞驰,杂居,火葬,面授,单恋)2Coordinativecompound(报告,声音,奇怪,帮助,学习,购买)Antonymouscompounds(买卖,左右,高矮,大小,开关,长短)Verb-objectcompound(放心,鼓掌,动员,司机,主席,干事)Verbcomplementcompound(进来,进去,介入,改良,打破,推翻)Subject-predicatecompound(地震,心疼,民主,自决,胆小,年轻)Noun-measurecomplementcompounds(人口,羊群,书本,花朵,枪支)Modifier-noun(情人节,小说家,加油站,大学生,金黄色)Verb-objecttri-syllabiccompound(开玩笑,吹牛皮,吃豆腐)Subject-verb-object(胆结石,鬼画符,鬼打墙)Descriptive+noun(棒棒糖,乒乓球,呼啦圈)4.给出三种StructuralAmbiguities的词语的例子(Overlapping,Combinatorial,Mixed)Overlappingambiguity(交集型歧义)[网球场,美国会]Combinatorialambiguity(组合型歧义)[才能,学生会]Mixedtype(混合型歧义)[太平洋,太平,平淡]5.Writedownthreetypesoffeatureofunknownwords-abbreviation(国考-国家公务员考试)-propername/NameEntity.namesofpeople小月月3.namesofplaces延坪岛.nameoforganization上海合作组织-derivedwords:(审计人,审计员,审计局,审计处)-compounds:(光敏感,流体力学)-Numerictypecompounds:(五月三日,八点十分,第一)第二题:信息论(1)熵是什么意思?WhatistheentropyDefinedbythesecondlawofthermodynamicsAmeasureoftheenergynotavailableforworkinathermodynamicprocessAclosedsystemalwaystendstowardsachievingastatewithamaximumofentropy1.针对Limitedsubstitutability,limitedmodifiability,Limitedextentcompositional,分别给出两个QuantitativeFeatures-synonymysubstitutionandratio-featurecharacterizesthedistributionsignificanceofhowtwowordsco-occuratdifferentpositions-thenumberofpeakco-occurrence开门,斗志昂扬2.WordNet是如何识别单词的不同意思的?HowNet和TongYiCilin又是如何识别单词的不同意思的?请区分wordnet和hownet对词语的语义进行描述的方法差异WordNet:-followdifferentgrammaticalrules4-everysynsetcontainsagroupofsynonymouswordsorcollocations-differentsensesofawordareindifferentsynsets-themeaningofthesynsetsisfurtherclarifiedwithshortdefiningglossing-synsetsareconnectedtoothersynsetsviaanumberofsemanticrelationsHowNet:-theconceptdefinitioninhownetisbasedonsememes-sememesareinastructuredmarkedlanguage-hownetconstructsagraphstructureofitsknowledgebaseontheinter-concept-therepresentationinbasedonconceptsdenotedbywordsandexpressionsinbothChineseandEnglishTongYiCilin:-thishierarchicalstructurereflectsthesemanticrelationshipbetweenwords-eachminorsemanticclusterconsistsofasetofwords-wordsunderthesameminorsemanticclustersharetheconceptofthisclass3.WordNet把单词划分成synsets,那它是怎么建立synsets之间的联系的,例如Nouns和Adjectives.synsetsareconnectedtoothersynsetsviaanumberofsemanticrelations54.Homonyms,Antonyms反义,Hypernymy上位,Hyponomy下位,Holonymy整体,当然了,考试是没有给出中文的。Homonyms(同音):oneofagroupofwordsthatsharethesamespellingandthesamepronunciationbuthavedifferentmeaningAntonyms(反义):differentwordshavingcontradictoryorcontrarymeaningsSynonyms(同义):differentwordshavingsimilaroridenticalmeaningsHypernymy(上位):thesemanticrelationofbeingsuper-ordinateorbelongtoahigherrankorclassHyponomy(下位):thesemanticrelationofbeingsub-ordinateorbelongtoalowerrankorclassHolonymy(整体):awordthatdefinestherelationshipbetweenatermdenotingthewholeandatermdenotingMeronym(部分):awordthatnamesapartofalargewholeMetonymy(转指):afigureofspeechinwhichaconceptisreferredtobythenameofsomethingcloselyassociatedwiththatconceptProposition:itreferstothemeaningofastatement解释他们的意思,并给出例子;(解释的话应该是英文,例子是中文吧)4.(好像是Thomas还是谁)关于Wordsenseambiguity的两个基本假设是什么onesensepercollocation,onesenseperdiscourse66,请区分在语言建模中平滑(smoothing)和线性插值(linearinterpolation)方法在处理零概率情况时的不同7,请解释词语消歧中常用的两个假设onesensepercollocation和onesenseperdiscourseonesensepercollocation:nearbywordsprovidestrongandconsistentcluestothesenseofatargetword,conditionalonrelativedistance,orderandsyntacticrelationshiponesenseperdiscourse:thesenseofatargetwordishighlyconsistentwithinanygivendocument-truefortopicdependentwords-nottrueforverbs8,请回答关于隐马尔可夫模型(HiddenMarkovModel)的下列问题:(1)写出马尔科夫模型的三个基本元素和三个基本问题Markovassumption:)|()...|(111tkttktXsXPXXsXP三个基本元素1.Evaluationproblem2.Decodingproblem3.Learningproblem(2)请描述Viterbi算法的基本思想,同时回答该算法是针对隐马尔可夫模型的哪个基本问题(Decodingproblem)Generalidea:7Ifbestpathendinginjksqgoesthroug
本文标题:自然与语言处理review
链接地址:https://www.777doc.com/doc-4870984 .html