您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 项目/工程管理 > hbase_nosql
HBase Ryan Rawson Sr Developer @ SU, HBase commi8er June 11th, NOSQL Quick Backstory • Needed large data store @ SU • Started looking back in Jan ‘09 • Looked at the field of stores, tried: – Cassandra – Hypertable (fast) – HBase • Ended picking HBase NOSQL Meetup Now • Personally rewri8en large porRons of HBase for 0.20 – Code easy to work with, understand, modify • Recently voted to commi8er status (thanks!) • Now giving presentaRons (hi!) NOSQL Meetup Four Point Agenda • What is HBase? • Why HBase? • HBase 0.20 • HBase At Stumbleupon NOSQL Meetup What is HBase? • Clone of Bigtable ‐ h8p://labs.google.com/papers/bigtable.html • Created originally at Powerset in 2007 • Hadoop‐subproject – The usual ASF things apply (license, JIRA, etc) NOSQL Meetup What is HBase? • Column‐oriented semi‐structured data store • Distributed over many machines – Bigtable known to scale to 1000 nodes • Tolerant of machine failure • Layered over HDFS (& KFS) • Strong consistency (important) NOSQL Meetup Table & Regions • Rows stored in byte‐lexographic sorted order • Table dynamically split into “regions” • Each region contains values [startKey, endKey) • Regions hosted on a regionserver NOSQL Meetup Table & Regions NOSQL Meetup Column Storage • In HBase, don’t think of a spreadsheet: All columns same ‘size’ and present (as NULL) NOSQL Meetup Column Storage • Instead think of tags. Values any length, no predefined names or widths: Column names carry info (just like tags) NOSQL Meetup Column Families • Table consists of 1+ “column families” • Column family is unit of performance tuning • Stored in separate set of files • Column names scoped like so: – “Family:qualifier” NOSQL Meetup SorCng • Rows stored in byte‐lexographical order (row keys are raw bytes, not just strings) • Furthermore within a row, columns stored in sorted order • Fast, cheap easy to scan adjacent rows & columns NOSQL Meetup SorCng (but there’s more!) • Not just scanning, but can do parRal‐key lookups • When combined with compound keys, has the same properRes as leading‐lel edge indexes in standard RDBMS – (Except your index is distributed of course) • Can use a second table to index a primary table. NOSQL Meetup Values • Row id, column name, value all byte [] • Can store ascii, any binary or use serializaRon (eg: thril, protobuf) • Atomic increments available • SerializaRon good for structs that are always read in one unit (eg: Address book entry) NOSQL Meetup Values & Versions • Each row id + column – stored with Rmestamp • HBase stores mulRple versions • Can be useful to recover data due to bugs! • Use to detect write conflicts/collisions NOSQL Meetup API Example Scan scan = new Scan(startRow, endRow).addFamily(“family”); ResultScanner scanner = table.getScanner(scan); Result result; while ( (result=scanner.next()) != null) { EnRty e = new EnRty(); dser.deserialize(e, result.getValue(default”, “0”); } scanner.close(); NOSQL Meetup Why HBase? • Community is highly acRve, diverse, helpful • User list Email acRvity for May: 78 threads • IRC Channel #hbase highly acRve • Helpful people in mulRple Rmezones, email answered all hours of the day/night/weekend. NOSQL Meetup Why HBase? • Commi8er & contributor base broad: – PSet, Streamy, SU, Trend Micro, Openplaces, and more! • No monopoly on experts – deep knowledge at these companies and more! • (We’re really friendly… honest!) NOSQL Meetup Why HBase? • Used in producRon at many companies • 12 companies listed on h8p://wiki.apache.org/hadoop/Hbase/PoweredBy • Openplaces, Streamy, SU serve websites out of HBase • Lots of experience to draw upon! NOSQL Meetup Why HBase? (Features) • Full web management/monitoring UI (master & regionservers) • Push metrics to log files & Ganglia • Rolling upgrades possible! (Including master!) • Non‐SQL shell – re‐enforces the non‐SQL‐ness of HBase NOSQL Meetup HBase Features • Easy integraRon with Hadoop MR – table input and output formats ship • Cascading connectors for input and output • Other ancillary open source acRviRes around the edges (ORM, schema management, etc) NOSQL Meetup Why HBase? • But… HBase is slow! • That metabrew/last.fm blog post said so! – (Also other people too…) • “It’s much more than a KV store, but latency is too great to serve data to the website.” • Answer: 0.20 NOSQL Meetup HBase 0.20 • Two major and exciRng themes: • #1: Performance • #2: ZooKeeper integraRon, mulRple masters NOSQL Meetup HBase 0.20 vs 0.19 0.19 0.20 Master Single master – if it fails, so does the cluster Master elecRon and membership via ZK Compression Not really GZ, LZO Memory usage Small values cause big indexes and OOM New file‐format limits index size (800kB for 10m entries) Scan Speed 300‐600ms per 500 rows 20‐30ms per 500 rows NOSQL Meetup Zookeeper? • A highly available configuraRon storage system • Set up in a 2N+1 quorum • Hadoop subproject NOSQL Meetup Master & Zookeeper • Store membership info in ZK • Detect dead servers (via ephemeral nodes) • Master elecRon and recove
本文标题:hbase_nosql
链接地址:https://www.777doc.com/doc-4310519 .html