您好,欢迎访问三七文档
当前位置:首页 > 电子/通信 > 综合/其它 > LinkedIn的数据处理架构
大会官方网站与资料下载地址:全球架构师峰会!DataInfrastructureatLinkedInLeiGao§ LinkedInProducts§ DataEcosystem§ LinkedInDataInfrastructureSolutions§ NextPlay2LinkedInByTheNumbers§ 150M+users*§ ~4.2BPeopleSearchesin2011**§ 2McompanieswithLinkedInCompanyPages**§ 16languages§ 75%ofFortune100CompaniesuseLinkedIntohire****AsofFebruary9th2012**AsofDecember31st2011***AsofSeptember30th20113BroadRangeofProducts&Services45UserProfilesLargedatasetMediumwritesVeryhighreadsFreshness1s6CommunicationsLargedatasetHighwritesHighreadsFreshness1sPeopleYouMayKnow7LargedatasetComputeintensiveHighreadsFreshness~hrsLinkedInToday8MovingdatasetHighwritesHighreadsFreshness~minsOutline§ LinkedInProducts§ DataEcosystem§ LinkedInDataInfrastructureSolutions§ NextPlay9ThreeParadigms:SimplifyingtheDataContinuum• MemberProfiles• CompanyProfiles• Connections• CommunicationsOnline• LinkedinToday• ProfileStandardization• News• Recommendations• Search• CommunicationsNearline• PeopleYouMayKnow• ConnectionStrength• News• Recommendations• NextbestideaOffline10ActivitythatshouldbereflectedimmediatelyActivitythatshouldbereflectedsoonActivitythatcanbereflectedlaterLinkedInProductArchitecture11LinkedInProductArchitecture12LinkedInProductArchitecture13Databus:Timeline-ConsistentChangeDataCaptureLinkedInDataInfrastructureSolutions14DatabusatLinkedIn15DBBootstrapCaptureChangesOn-lineChangesOn-lineChangesDBCompressedDeltaSinceTConsistentSnapshotatUConsumer1ConsumernClientDatabusClientLibConsumer1ConsumernDatabusClientLibClientRelayEventWinDatabusatLinkedIn16DBBootstrapCaptureChangesOn-lineChangesOn-lineChangesDBCompressedDeltaSinceTConsistentSnapshotatU§ Transportindependentofdatasource:Oracle,MySQL,…§ Transactionalsemantics§ Inorder,atleastoncedelivery§ Tensofrelays§ Hundredsofsources§ Lowlatency-millisecondsConsumer1ConsumernClientDatabusClientLibConsumer1ConsumernDatabusClientLibClientRelayEventWinLinkedInProductArchitecture17LinkedInProductArchitecture18Voldemort:Highly-AvailableDistributedKVStoreLinkedInDataInfrastructureSolutions19• Pluggablecomponents• Tunableconsistency/availability• Key/valuemodel,serverside“views”• 10clusters,100+nodes• Largestcluster–10K+qps• Avglatency:3ms• HundredsofStores• Largeststore–2.8TB+Voldemort:ArchitectureLinkedInProductArchitecture21Kafka:High-VolumeLow-LatencyMessagingSystemLinkedInDataInfrastructureSolutions22LinkedInProductArchitecture23Kafka:Architecture24WebTierTopic1BrokerTierPushEventsTopic2TopicNZookeeperOffsetManagementTopic,PartitionOwnershipSequentialwritesendfileKafkaClientLibConsumersPullEventsIterator1IteratornTopicàOffset100MB/sec200MB/secKafka:Architecture25WebTierTopic1BrokerTierPushEventsTopic2TopicNZookeeperOffsetManagementTopic,PartitionOwnershipSequentialwritesendfileKafkaClientLibConsumersPullEventsIterator1IteratornTopicàOffset100MB/sec200MB/sec§ BillionsofEvents,TBsperday§ 50K+persecatpeak§ InterandIntra-clusterreplication§ End-to-endlatency:fewseconds§ Atleastoncedelivery§ Veryhighthroughput§ Lowlatency§ DurabilityLinkedInProductArchitecture26Espresso:IndexedTimeline-ConsistentDistributedDataStoreLinkedInDataInfrastructureSolutions27ApplicationView28HierarchicaldatamodelRichfunctionalityonresourcesü Conditionalupdatesü Partialupdatesü AtomiccountersRichfunctionalitywithinresourcegroupsü Transactionsü Secondaryindexü TextsearchPartitioning29Node3Node2EspressoPartitionLayout:Master,SlaveClusterManagerPartition:P.1Node:1…Partition:P.12Node:3DatabaseNode:1M:P.1–Active…S:P.5–Active…ClusterNode1P.1P.2P.4P.3P.5P.6P.9P.10P.5P.6P.8P.7P.1P.2P.11P.12P.9P.10P.12P.11P.3P.4P.7P.8MasterSlave3StorageEnginenodes,2wayreplicationEspresso:SystemComponents31GenericClusterManager:Helix• GenericDistributedStateModel• CentralizedConfigManagement• AutomaticLoadBalancing• Faulttolerance• Healthmonitoring• Clusterexpansionandrebalancing• Espresso,DatabusandSearch• OpenSourceApr2012• @Linkedin§ LaunchedfirstapplicationOct2011§ Opensource2012§ Future– Multi-Datacentersupport– Globalsecondaryindexes– Time-partitioneddata33LinkedInProductArchitecture34AcknowledgmentsSiddharthAnand,AdityaAuradkar,ChavdarBotev,VinothChandar,ShirshankaDas,DaveDeMaagd,AlexFeinberg,JohnFung,PhanindraGanti,MihirGandhi,LeiGao,BhaskarGhosh,KishoreGopalakrishna,BrendanHarris,RajappaIyer,SwaroopJagadish,JoelKoshy,KevinKrawez,JayKreps,ShiLu,SunilNagaraj,NehaNarkhede,SashaPachev,IgorPerisic,LinQiao,TomQuiggle,JunRao,BobSchulman,AbrahamSebastian,OliverSeeliger,AdamSilberstein,BorisShkolnik,ChinmaySoman,SubbuSubramaniam,RoshanSumbaly,KapilSurlaker,SajidTopiwala,CuongTran,BalajiVaradarajan,JemiahWesterman,ZachWhite,VictorYe,DavidZhang,andJasonZhang35Questions?36杭州站·2012年10月25日~27日大会官网:
本文标题:LinkedIn的数据处理架构
链接地址:https://www.777doc.com/doc-5352461 .html