您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 酒店餐饮 > Title日中共同汉字文献资料库的新技术
21COE“”200512244SUNYigang2005717TableofContentsYASUOKAKoichiText-SearchableImageandItsApplications................................1SUNWei........................7ChristianWITTERN)FromTexttoInformation–SmallStepstowardsaKnowledgebaseofTangCivi-lization................................................................38NIUZhendong...................................................48(MORIOKATomohiko)CharacterProcessingBasedonCharacterOntology........................55ZHAIXikui.............................................64A.CharlesM¨ULLERUsingXMLforStorageandDeliveryofanOnlineDictionaryofBuddhism–theDigitalDictionaryofBuddhism...........................................72Text-SearchableImageandItsApplicationsKoichiYasuokaDocumentationandInformationCenterforChineseStudies,InstituteforResearchinHumanities,KyotoUniversity1IntroductionSince1996proposaloftheCouncilforScience,theuniversitylibrariesinJapanhaveprogressed“TheDigitalLibraryProject”.Nowadaystheunioncataloguedatabaseoftheuniversitylibraries(NACSIS-CAT)isalmostcom-pletelyequipped,andwecaneasilyfindanybooksandmagazinesinthelibrariesthroughthedatabaseontheInternet.Butwearestillfarandawayfromthegoalof“TheDigitalLibraryProject”,whichisthedigitalizationofallthebooksandthemagazinesinthelibraries.Theuniversitylibrarieshaveonlymadedisplaysofimagesoftherarebookswithouttheirdigitaltexts,theirdigitaltablesofcontents,ortheirdigitalindices.ThedigitallibrariesinJapannowarenot“libraries”butsomethinglike“museums”,sincetheydon’tgiveusthewayto“read”thebooksdigitally.Inthispapertheauthorrepresentstheconceptoftext-searchableimagesanditsapplications.Theauthorshowstwoformats,PortableDocumentFormatandScalableVectorGraphics,toactualizetext-searchableimages,andalsoshowsaJavaScript-basedprogram“ttext-kanbun”toproducetext-searchableimagesintheseformats.Theauthorcontributesthispapertowardthetrueprogressofthedigital“libraries”.2Text-SearchableImagesInthissectionweexaminetwoformats,PortableDocumentFormat(PDF)andScalableVectorGraphics(SVG),toactualizetext-searchableimages.2.1PDFforText-SearchableImagesTheauthorhasstudiedlongtimeabouttext-searchableimagesusingPDF[2].AndAdobeadoptedsomeresultsofthestudyintoPDF-1.4[3]as“trans-parenttext”.Nowwehavetwowaystoactualizetext-searchableimagesusingPDF.Theoneistoputatransparenttextuponanimage,andtheotheristoputanimageuponatextwritteninwhitecharacters.TheformerwayisonlyavailablewiththebrowsersofPDF-1.4andafter,andthelatterwayPDF-1.2andafter.Inthispaperweusethelatterwayforbackwardcompatibility.PDFcanrepresentbothimagesandtexts,buthassomelimitationsonitsformat.PDFsupportsonlytwocompressionmethodsforcolorimages,thatareJPEGandZIP.PDFsupportsseveralcharacter-setsforCJKtexts,Adobe-Japan1-6[7](including14663characters),Adobe-GB1-4[1](in-cluding27629characters),Adobe-CNS1-4[5](including17625characters),andAdobe-Korea1-2[6](including4620characters)underJapanese,mainlandChinese,Taiwanese,andKoreancircumstances,respec-tively.Weneed“JapaneseLanguagePack”toreadandsearchPDFswrit-teninAdobe-Japan1-6character-set,soasmainlandChinese,Taiwanese,andKorean.Thismeansthatthesecharacter-setsareincompatiblewithoneanother,andthatPDFsfortext-searchableimagesactuallycannotgetacrosstheborderlines.InthispaperweuseJPEGforcolorimagesandAdobe-Japan1-6character-setfortextstoproducetext-searchableimageswithPDF.2.2SVGforText-SearchableImagesTomohikoMoriokahasstudiedabouttext-searchableimagesusingSVG[4].Heactualizedatext-searchableimagetoputanimageuponatext.Butinthispaperweputatransparenttextuponanimagetoactualizeatext-searchableimageusingSVG.SVGcanincludebothimagesandtexts,butthemostcontemporaryviewer“AdobeSVGViewer3.0”hassomelimitations.SVGsupportsanykindofformatsforcolorimages,buttheviewersuportsonlyJPEG,PNG,andGIF.SVGsupoortsanytext-encodingsbutprefersUTF-8.InthispaperweuseJPEGforcolorimagesandUTF-8fortextstoproducetext-searchableimageswithSVG.3ExperimentandResultTheauthorwroteaJavaScript-basedprogram“ttext-kanbun”toproducetext-searchableimagesusingPDForSVG.“ttext-kanbun”runsonInternetExplorer6underMicrosoftWindowsXP.We,membersofCOE21-projectatInstituteforResearchinHumanities,KyotoUniversity,triedtomaketext-searchableimagesof(ex--collection)with“ttext-kanbun”(Figure1).Weprepared319JPEGimagesfor,whereeachimagehas2100×1950pixelsandtotalsizeofallimagesis196807821bytes,anditstextwritteninUTF-8consistingof104725characters(3138different).Figure1:Snapshotof“ttext-kanbun”Firstweproducedtext-searchableimagesusingPDF(Figure2).Thetotalsizeof319PDFfileswas202662390bytes,2.97%increasingfromorig-inalJPEGimages.Wecouldn’twrite390charactersoutof104725usingPDFsincetheywerenotincludedinAdobe-Japan1-6.The390charactersconsistedof51differentcharactersshowninTable1.Thenwecombinedthe319PDFfilesintoamulti-pagePDF.Thefile-sizeofthecombinedPDFwas202440575bytes,2.86%increasingfromoriginalJPEGimages.Secondweproducedtext-searchableimagesusingSVG(Figure3).TheFigure2:Searching“”on“AdobeReader6.0”Table1:CharactersnotinAdobe-Japan1-6Figure3:Searching“”on“AdobeSVGViewer3.0”Table2:Invisiblecharacterson“AdobeSVGViewer3.0”tot
本文标题:Title日中共同汉字文献资料库的新技术
链接地址:https://www.777doc.com/doc-340463 .html