您好,欢迎访问三七文档
当前位置:首页 > 电子/通信 > 数据通信与网络 > VC++搜索引擎网络爬虫设计与实现
-I-搜索引擎网络爬虫设计与实现摘要网络中的资源非常丰富,但是如何有效的搜索信息却是一件困难的事情。建立搜索引擎就是解决这个问题的最好方法。本文首先详细介绍了基于英特网的搜索引擎的系统结构,然后具体阐述了如何设计并实现搜索引擎的搜索器——网络爬虫。多线程网络爬虫程序是从指定的Web页面中按照宽度优先算法进行解析、搜索,并把搜索到的每条URL进行抓取、保存并且以URL为新的入口在互联网上进行不断的爬行的自动执行后台程序。网络爬虫主要应用socket套接字技术、正则表达式、HTTP协议、windows网络编程技术等相关技术,以C++语言作为实现语言,并在VC6.0下调试通过。在网络爬虫的设计与实现的章节中除了详细的阐述技术核心外还结合了多线程网络爬虫的实现代码来说明,易于理解。本网络爬虫是一个能够在后台运行的以配置文件来作为初始URL,以宽度优先算法向下爬行,保存目标URL的网络程序,能够执行普通用户网络搜索任务。关键词搜索引擎;网络爬虫;URL搜索器;多线程-II-DesignandRealizationofSearchEngineNetworkSpiderAbstractTheresourceofnetworkisveryrich,buthowtosearchtheeffectiveinformationisadifficulttask.Theestablishmentofasearchengineisthebestwaytosolvethisproblem.Thispaperfirstintroducestheinternet-basedsearchenginestructure,andthenillustrateshowtoimplementsearchengine----networkspiders.Themulti-threadnetworkspiderprocedureisfromtheWebpagewhichassignsaccordingtothewidthpriorityalgorithmconnectionforanalysisandsearch,andeachURLissnatchedandpreserved,andmaketheresultURLasthenewsourceentranceunceasingcrawlingoninternettocarryoutthebackgoudautomatically.Mypaperofnetworkspidermainlyappliestothesockettechnology,theregularexpression,theHTTPagreement,thewindowsnetworkprogrammingtechnologyandothercorrelationtechnique,andtakingC++languageasimplementedlanguage,andpassesunderVC6.0debugging.Inthechapterofthespiderdesignandimplementation,besidesadetailedexpositionofthecoretechnologyinconjunctionwiththemulti-threadednetworkspidertoillustratetherealizationofthecode,itiseasytounderstand.ThisnetworkspidersisinitialURLbasedonconfigurationfileswhichcanoperateonbackground,usingwidthpriorityalgorithmtocrawldown,preservingnetworkprogrammeoftargetURL.KeywordsInternetsearchengine;Networkspider;URLsearchprogramme;Multithreaded-III-目录摘要......................................................................................................................IAbstract...............................................................................................................II第1章绪论........................................................................................................11.1课题背景...................................................................................................11.2搜索引擎的历史和分类...........................................................................21.2.1搜索引擎的历史................................................................................21.2.2搜索引擎的分类................................................................................21.3搜索引擎的发展趋势...............................................................................31.4搜索引擎的组成部分...............................................................................41.5课题研究的主要内容...............................................................................4第2章网络爬虫的技术要点分析....................................................................62.1网络爬虫Spider工作原理......................................................................62.1.1Spider的概念....................................................................................62.1.2网络爬虫抓取内容分析....................................................................62.2HTTP协议.................................................................................................72.2.1HTTP协议的请求..............................................................................72.2.2HTTP协议的响应..............................................................................82.2.3HTTP的消息报头..............................................................................82.3SOCKET套接字.....................................................................................102.3.1什么是SOCKET套接字................................................................102.3.2SOCKET各函数分析......................................................................102.4正则表达式.............................................................................................142.4.1正则表达式应用分析......................................................................142.4.2正则表达式的元字符分析..............................................................152.5本章总结.................................................................................................15第3章网络爬虫系统模型的分析和概要设计..............................................163.1网络爬虫模型分析.................................................................................163.1.1单线程爬虫模型分析......................................................................163.1.2多线程爬虫模型分析......................................................................163.1.3爬虫集群模型分析..........................................................................173.2网络爬虫的搜索策略的分析与设计.....................................................173.3网络爬虫主要性能评价指标分析.........................................................203.4本论文中网络爬虫的概要设计.............................................................20-IV-第4章网络爬虫的详细设计与实现..............................................................244.1网络爬虫总体设计.................................................................................244.2Socket功能模块的设计与实现......................................................
本文标题:VC++搜索引擎网络爬虫设计与实现
链接地址:https://www.777doc.com/doc-4335346 .html