您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 咨询培训 > EMC Greenplum Features and Strengths
1©Copyright2010EMCCorporation.Allrightsreserved.EMCGreenplum功能概览EMCDataComputingDivisionNov20102©Copyright2010EMCCorporation.Allrightsreserved.Greenplum架构产品功能产品优势Agenda3©Copyright2010EMCCorporation.Allrightsreserved.Greenplum数据库海量并行处理(MassivelyParallelProcessing)DBMS基于PostgreSQL8.2–相同的客户端功能–增加支持并行处理的技术–增加支持数据仓库和BI的特性外部表(externaltables)/并行加载(parallelloading)资源管理查询优化器增强(queryoptimizerenhancements)基于所有开放式X86PC服务器,支持多种Linux平台4©Copyright2010EMCCorporation.Allrightsreserved.物理连接图X4200(MasterHost)e1000g4e1000g5e1000g3e1000g2e1000g1e1000g0iLOMSMC8748MSMC8748M172.16.0172.16.1172.16.2172.16.3Catalyst2960192.168.X4500(SegmentHost2)e1000g3e1000g2e1000g1e1000g0iLOMX4500(SegmentHost1,StandbyMaster)e1000g4e1000g5e1000g3e1000g2e1000g1e1000g0iLOMLocalLAN/WANaslocallydefined5©Copyright2010EMCCorporation.Allrightsreserved.Greenplum架构MPP(MassivelyParallelProcessing)Shared-NothingArchitectureNetworkInterconnect............MasterSevers生成查询计划分配派发汇总执行结果SegmentSevers执行查询计划数据存储管理SQLMapReduceExternalSources并行装载或导出6©Copyright2010EMCCorporation.Allrightsreserved.SegmentHostSegmentHostSegmentHostSegmentHostGreenplum基本体系架构ClientMasterHostLANInterconnect-GigabitEthernetSwitch7©Copyright2010EMCCorporation.Allrightsreserved.客户端接口和程序•psql•pgAdminIII•ODBC•JDBC•PerlDBI•Python•libpqClient8©Copyright2010EMCCorporation.Allrightsreserved.MasterHost•访问系统的入口•数据库侦听进程(postgres)•处理所有用户连接•建立查询计划•协调工作处理过程•管理工具•系统目录表和元数据(数据字典)•不存放任何用户数据MasterHost9©Copyright2010EMCCorporation.Allrightsreserved.Segments•每段(Segment)存放一部分用户数据•一个系统可以有多段•用户不能直接存取访问•所有对段的访问都经过Master•数据库监听进程(postgres)监听来自Master的连接SegmentHostSegmentHostSegmentHostSegmentHost10©Copyright2010EMCCorporation.Allrightsreserved.Interconnect•Greenplum数据库之间的连接层•进程间协调和管理•基于千兆以太网架构•属于系统内部私网配置•支持两种协议:TCPorUDPLANInterconnect-GigabitEthernetSwitch11©Copyright2010EMCCorporation.Allrightsreserved.Greenplum架构产品功能产品优势Agenda12©Copyright2010EMCCorporation.Allrightsreserved.私有云计算平台PrivateComputingCloud云计算特点Greenplum特点超大规模大规模并行处理无共享架构,支持1000个以上节点虚拟化支持主流虚拟化技术高可靠性多级容错机制,高可靠Mirror技术通用性支持各种工具、开发框架和接口高可扩展性增加节点,性能和存储容量线性扩展按需服务多维负载管理技术,资源随需定制低总体成本采用X86架构PCServer,更低TCO13©Copyright2010EMCCorporation.Allrightsreserved.并行数据流引擎MPP核心技术QueryPlanner及优化器(SQL)并行数据流引擎交易管理器及日志文件ODBCJDBCetc数据库存储外部存储MapReduce代码(Python、Perl等)•利用原生MapReduce模型实现,比传统快数十倍•全部SQL逻辑都可以并行执行•并行技术加载和导出数据•并行数据备份和恢复14©Copyright2010EMCCorporation.Allrightsreserved.数据均匀分布并行处理的关键43Oct2020051264Oct2020051145Oct2020054246Oct2020056477Oct2020053248Oct20200512OrderOrderOrderDateCustomerID50Oct2020053456Oct2020052163Oct2020051544Oct2020051053Oct2020058255Oct20200555•数据均匀分布在每一块磁盘上面•发挥每一块磁盘性能,根本上解决I/O瓶颈•支持数据Distribution分布和Partition分区15©Copyright2010EMCCorporation.Allrightsreserved.Hash分布•CREATETABLE…DISTRIBUTEDBY(column[,…])•同样数值的内容被分配到同一个Segment上循环分布•CREATETABLE…DISTRIBUTEDRANDOMLY•具有同样数值的行内容并不一定在同一个Segment上S1S2S3hashassignAACCBBinputrowssegmentsmasterABCCABACACS1S2S3assignnextinputrowssegmentsmasterABCCABACCAACCBAB表分布的策略16©Copyright2010EMCCorporation.Allrightsreserved.salecnintegervnintegerpnintegerdtdateqtyintegerprcfloatcustomercnintegercnametextvendorvnintegervnametextloctextproductpnintegerpnametextmasterglobalcatalogsalepart3segment3customerpart3productpart3vendorpart3salepart1segment1customerpart1productpart1vendorpart1salepart2segment2customerpart2productpart2vendorpart2分布存储17©Copyright2010EMCCorporation.Allrightsreserved.均衡的数据分布数据源2区段1区段2区段3区段4区段5区段6区段7区段8数据源1数据源3并行加载并进行能够Hash分布18©Copyright2010EMCCorporation.Allrightsreserved.MasterqueryplanClientSegmentsSegmentsSegmentsSegments查询命令的执行19©Copyright2010EMCCorporation.Allrightsreserved.SLICE1SLICE2SLICE3SELECTcustomer,amountFROMsalesJOINcustomerUSING(cust_id)WHEREdate=04302008;TableScanRedistributeMotionTableScanHashHashJoinGatherMotionTableScanRedistributeMotionTableScanHashHashJoinGatherMotionSLICE3SLICE1SLICE2SEGMENT2SEGMENT1并行查询计划20©Copyright2010EMCCorporation.Allrightsreserved.数据分布和分区Segment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DJan2005Feb2005Mar2005Apr2005May2005Jun2005Jul2005Aug2005Sep2005Oct2005Nov2005Dec2005每个分区表的数据自动分布到各个节点表分区可减少数据的搜索范围,提高查询性能21©Copyright2010EMCCorporation.Allrightsreserved.Segment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DFullTableScanvsPartitionPruningSELECTCOUNT(*)FROMordersWHEREorder_date=‘Oct202005’ANDorder_date‘Oct272005’VSHashDistributionHashDistribution+TablePartitioningSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3D22©Copyright2010EMCCorporation.Allrightsreserved.(privateLAN)segmenthostprimarysegmentsegmenthostprimarysegmentsegmenthostprimarysegmentclientmasterhostmasterinstancegigabitethernetmirrorsegmentmirrorsegmentmirrorsegmentstandbymasterhostsynchprocessstandbymasterGre
本文标题:EMC Greenplum Features and Strengths
链接地址:https://www.777doc.com/doc-6182005 .html