您好,欢迎访问三七文档
1ChineseTextZero-WatermarkBasedonSentence’sEntropyMengYingjieSchoolofInformationScience&Engineering,LanzhouUniversity,Lanzhou,ChinaGuoTaoSchoolofInformationScience&Engineering,LanzhouUniversity,Lanzhou,Chinaguot08@lzu.cnGuoZhihuaSchoolofInformationScience&Engineering,LanzhouUniversity,Lanzhou,ChinaGaoLimingSchoolofInformationScience&Engineering,LanzhouUniversity,Lanzhou,ChinaAbstract—InordertopromotetheChinesedigitaltext’scopyrightprotectiontechnology,thispaperproposesaChineseTextZero-WatermarkscenariobasedonSentence’sEntropy.Thisscenariocalculatestheentropiesofsentencesbasedonwordfrequencyandmakescrucialsentenceselectionbasedonentropy.Afterthat,watermarkisconstructedwiththeorderofcrucialsentences.Furthermore,toproveitseffectiveness,thescenarioisvalidatedbysimulationandithasshownfinerobustandabilityofanti-attack.Keywords:Chinesetext;CopyrightProtection;Zero-watermark;EntropyI.INTRODUCTIONLargeamountsoftextualdata,suchasarticles,documentsandletters,arepublishedinInternetbecauseoftherapiddevelopmentandwideapplicationofInternet,whichtakegreatchallengetocopyrightprotection.Now,digitalwatermarkisanimportanttechnologyfordigitalcopyrightprotection,butmostresearchesofChinesedigitaltextwatermarkareonembeddingwatermark,andonlyafewpapersareaboutzero-watermark.Embeddingwatermarkwouldchangetheinformationofhosts,soithasnarrowapplicationfieldandlessvaluecomparedwithzero-watermark.Zero-watermark,differentfromtheembeddingwatermark,willconstructwatermarkwiththefeaturesofcarrierswithoutchanginginformation[1].Entropiesofdifferentlanguagesvarygreatly.Forexample,averageinformationentropyofEnglishwasjust1.75bit/character,butthatofChinesewas9.6bit/character[7].Becauseofthis,constructingwatermarkwithentropiesoflanguageshasenormoussuperiority.Toovercomethedisadvantagesofthesewatermarkscenariosmentionedin[1,2],weproposedanewscenariowhichchoosessentenceasbasicelementoftextandconstructswatermarkwiththeselectedimportantsentenceswhoseimportanceismeasuredbyinformationentropy.Theresultsofexperimentshowedthatourscenariohadgreaterabilityofanti-attackandmorerobustthanotherwatermarkscenarios.II.DEFINITIONWerepresentthetextasTwhichcomprisesmsentences,andwewillgivesomedefinitionsandstipulationstoillustratethewatermarkscenario.Definition1Aword’swordfrequencydonatesaratioofthetimesofthiswordappearsinthetexttothenumberofallwordsthistexthas.Definition2Normalizationisaprocessthatmakesalltheelementsoftherealnumbersetmeetthefollowingproperties:(a)allelementsshouldbeininterval(0,1);(b)thesumofallelementsis1;Processofnormalizationisasfollowing:SupposeCisarealnumberset,Cxi∈∀(1≤i≤n),'ixisthenumberofixafternormalization,andSxxii='(1)and∑==niixS1(S≠0),n=|C|.Definition3Sentence’sentropyisamountofaverageinformationaftertrimpunctuationsandotheruselessinformation.SupposeT’ssentences=(w1w2…wn),wi(1≤i≤n)istheithwordofsentences,iffiiswordfrequencyofwiintext,thenentropyofsentenceis∑=−niiiff1''ln,whilefi=∑=niiiff1,'ifisthewordfrequencyoffiafternormalization.Definition4IfA,BandCarewordsets,havingthefollowingproperties:(a)A≠Φ;(b)IfxAand∃yB,xandyisanearsynonymorsynonymofeachother.(c)IfxC,thenxA,∃yBandx,yisanearsynonymorsynonymofeachother.OverlapratioofAtoBisr=|B||A|.Definition5pisproportionofselectedsentencesusedtoconstructwatermark.Ifthenumberofsentenceswhichareselectedtoconstructwatermarkiskwhilethenumberofalltext’ssentencesism,thenpisratioofktom.Definition6qispertinenceoftwowatermarks,whichmeasuringdegreeoftwowatermarks’similarity.Definition7tistwosynonyms’degreeofhowmuchthey2similartoeachother.III.MODELOFCHINESETEXTZERO-WATERMARKBASEDONSENTENCE’SENTROPYA.PrincipleofZero-WatermarkConstructionThesemanticsofChineseiscomplex,variableinexpressionanddifficulttoprocess,buttexthasitsspecialthematicsignificance,whichisthevalueoftextandthekeyfactorofwatermarktoprotect.Thebasicunitoftextissentencewhichcompriseswords,andhighfrequencywordsdeterminethethemeandsignificanceoftext.Ingeneral,asentence,whichhasmorehighfrequencywords,willpossessesmoreinformationanditsentropyisrelativelybiggerthanother.Therefore,wecancalculatetheentropyofsentencebywordfrequency,afterthatwecanchoosethemostimportantsentencestoconstructwatermark.Sothewatermarkwillbeconstructedthroughthreeparts:(1)Pre-processworkforthetextshouldbedonefirstsoastoreducetheamountofworkandcomplexityofwholescenario.Primaryworkofthispartistodividethetextintosentencesandwords,sothatwewillgetwordsetH1andwordfrequencysetH2throughH1.(2)Constructwatermark.PrimaryworkofthispartistocomputetheentropyofsentenceswithwordfrequencyH2andchoosesomemostimportantsentences,theinformationofwhichwillbeusedtoconstructwatermark.(3)Registerforthewatermark.Consideringwatermark’sauthority,weintroducedthird-partyauthoritativeorganizationtoregisterwatermark,whichwasmentionedasin[6].Accordingtotheaboveparts,modelofconstructingwatermarkcanbedesignedasfigure3.1.Figure3.1Modeloftextzero-watermark’sconstructionB.PrincipleofZero-WatermarktestPrimarytaskofwatermarktestistojudgethewatermarkwhencopyrightdisputeappeared,andsupposethetextmentionedindisputeisT′.
本文标题:Chinese-Text-Zero-Watermark-Based-on-Sentences-Ent
链接地址:https://www.777doc.com/doc-6860419 .html