HBase CRUD Put

Submitted by Lizhe on Sat, 04/29/2017 - 19:51

HBase 通过org.apache.hadoop.hbase.client 包中的HTable类提供存储,检索和删除数据的基本功能

所有对数据的操作都以行为级别保证了原子性, 当许多客户端需要同时修改同一行时会产生问题,

应该使用batch来更新减少单独操作同一行数据的次数.

在创建HTable实例时,每个实例都需要扫描.META.表, 以检查该表是否存在,可用.

推荐用户只创建一个或是为每个线程只创建一个HTable实例 (当然在Java环境中我们还是推荐对所有对象在可能的情况下都采用单例模式)

存取数据的主要方法如下

public Put add(byte [] family, byte [] qualifier, byte [] value)

public List<Cell> get(byte[] family, byte[] qualifier)

Cell类实现了KeyValue接口, 代表了一个唯一的数据单元格,使用行键,列簇,列限定符,时间戳指向一个单元格的值.

使用 public NavigableMap<byte [], List<Cell>> getFamilyCellMap() 可以得到一个familyMap 

package hbase.sample;

import java.io.IOException;
import java.util.List;
import java.util.NavigableMap;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

public class TestPut {

    public static void main(String[] args) throws IOException {

        Put put = new Put(Bytes.toBytes("rowkey"));
        put.add(Bytes.toBytes("columnfamily1"), Bytes.toBytes("column1"), Bytes.toBytes("value1"));
        put.add(Bytes.toBytes("columnfamily1"), Bytes.toBytes("column2"), Bytes.toBytes("value2"));
        put.add(Bytes.toBytes("columnfamily2"), Bytes.toBytes("column2"), Bytes.toBytes("value3"));
        NavigableMap<byte[], List<Cell>> map = put.getFamilyCellMap();
        for (byte[] key : map.keySet()) {
            List<Cell> cells = map.get(key);
            System.out.println(cells.size());
            for(Cell cell:cells){
                System.out.println(new String(cell.getFamily())+" "+new String(cell.getQualifier())+" "+new String(cell.getValue()));
            }
        }
    }

}

2
columnfamily1 column1 value1
columnfamily1 column2 value2
1
columnfamily2 column2 value3

 

每一个Put操作实际上都一个RPC操作,也就是说它将请求发送到服务器以后都需要返回

HBase的API配备了一个客户端的写缓冲区,缓冲区负责搜集所有的put操作,然后调用RPC操作一次性的将put发往服务器(是不是想起来jdbc的batch)

table.setAutoFlush(false);

table.flushCommits();

注意这个操作并不能保证所有commits的原子性, HBase是以row为单位保证原子性的,

table对象提供一个checkAndPut方法来保证对行级别的原子性操作,它会对参数列表中的值先进行check,然后才会put

下面是个例子

package hbase.sample;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

public class TestCheckAndPut {
    
    static Configuration conf = HBaseConfiguration.create();
    static {
        conf.set("hbase.zookeeper.quorum", "lzvm");
        conf.set("hbase.rootdir","hdfs://lzvm:9000/hbase" );
        conf.set("hbase.zookeeper.quorum", "lzvm");
    }

    public static void main(String[] args) throws IOException {
        HTable table = new HTable(conf, "test");
        Put put = new Put(Bytes.toBytes("row10"));
        put.add(Bytes.toBytes("cf"), Bytes.toBytes("column10"), Bytes.toBytes("value10"));
        boolean b = table.checkAndPut("row10".getBytes(), "cf".getBytes(), "column10".getBytes(), "valueNotvalue10".getBytes(), put);
        System.out.println(b);
    }

}
 

checkAndPut 方法会检查所要提交的put (最后一个参数)是否和前几个参数提供的值匹配,这里因为value不匹配所以并不会插入row10到数据库

而且会给出一个false作为返回值

2017-04-30 22:05:23 INFO [org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper] Process identifier=hconnection-0x721e34d8 connecting to ZooKeeper ensemble=lzvm:2181
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:host.name=LZ-PC
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.version=1.7.0_40
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.vendor=Oracle Corporation
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.home=C:\Program Files\Java\jdk1.7.0_40\jre
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.class.path=D:\dev\hadoopworkspace\hadoop.sort\target\classes;E:\Maven_Repo\org\apache\hadoop\hadoop-common\2.6.5\hadoop-common-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-annotations\2.6.5\hadoop-annotations-2.6.5.jar;E:\Maven_Repo\com\google\guava\guava\11.0.2\guava-11.0.2.jar;E:\Maven_Repo\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;E:\Maven_Repo\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;E:\Maven_Repo\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;E:\Maven_Repo\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;E:\Maven_Repo\commons-codec\commons-codec\1.4\commons-codec-1.4.jar;E:\Maven_Repo\commons-io\commons-io\2.4\commons-io-2.4.jar;E:\Maven_Repo\commons-net\commons-net\3.1\commons-net-3.1.jar;E:\Maven_Repo\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;E:\Maven_Repo\javax\servlet\servlet-api\2.5\servlet-api-2.5.jar;E:\Maven_Repo\org\mortbay\jetty\jetty\6.1.26\jetty-6.1.26.jar;E:\Maven_Repo\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;E:\Maven_Repo\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;E:\Maven_Repo\com\sun\jersey\jersey-json\1.9\jersey-json-1.9.jar;E:\Maven_Repo\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;E:\Maven_Repo\com\sun\xml\bind\jaxb-impl\2.2.3-1\jaxb-impl-2.2.3-1.jar;E:\Maven_Repo\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;E:\Maven_Repo\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;E:\Maven_Repo\javax\activation\activation\1.1\activation-1.1.jar;E:\Maven_Repo\org\codehaus\jackson\jackson-xc\1.8.3\jackson-xc-1.8.3.jar;E:\Maven_Repo\com\sun\jersey\jersey-server\1.9\jersey-server-1.9.jar;E:\Maven_Repo\asm\asm\3.1\asm-3.1.jar;E:\Maven_Repo\tomcat\jasper-compiler\5.5.23\jasper-compiler-5.5.23.jar;E:\Maven_Repo\tomcat\jasper-runtime\5.5.23\jasper-runtime-5.5.23.jar;E:\Maven_Repo\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;E:\Maven_Repo\commons-el\commons-el\1.0\commons-el-1.0.jar;E:\Maven_Repo\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;E:\Maven_Repo\log4j\log4j\1.2.17\log4j-1.2.17.jar;E:\Maven_Repo\net\java\dev\jets3t\jets3t\0.9.0\jets3t-0.9.0.jar;E:\Maven_Repo\org\apache\httpcomponents\httpclient\4.1.2\httpclient-4.1.2.jar;E:\Maven_Repo\org\apache\httpcomponents\httpcore\4.1.2\httpcore-4.1.2.jar;E:\Maven_Repo\com\jamesmurty\utils\java-xmlbuilder\0.4\java-xmlbuilder-0.4.jar;E:\Maven_Repo\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;E:\Maven_Repo\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;E:\Maven_Repo\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;E:\Maven_Repo\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;E:\Maven_Repo\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;E:\Maven_Repo\org\slf4j\slf4j-api\1.7.5\slf4j-api-1.7.5.jar;E:\Maven_Repo\org\slf4j\slf4j-log4j12\1.7.5\slf4j-log4j12-1.7.5.jar;E:\Maven_Repo\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;E:\Maven_Repo\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;E:\Maven_Repo\org\apache\avro\avro\1.7.4\avro-1.7.4.jar;E:\Maven_Repo\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;E:\Maven_Repo\org\xerial\snappy\snappy-java\1.0.4.1\snappy-java-1.0.4.1.jar;E:\Maven_Repo\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;E:\Maven_Repo\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-auth\2.6.5\hadoop-auth-2.6.5.jar;E:\Maven_Repo\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;E:\Maven_Repo\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;E:\Maven_Repo\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;E:\Maven_Repo\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;E:\Maven_Repo\org\apache\curator\curator-framework\2.6.0\curator-framework-2.6.0.jar;E:\Maven_Repo\com\jcraft\jsch\0.1.42\jsch-0.1.42.jar;E:\Maven_Repo\org\apache\curator\curator-client\2.6.0\curator-client-2.6.0.jar;E:\Maven_Repo\org\apache\curator\curator-recipes\2.6.0\curator-recipes-2.6.0.jar;E:\Maven_Repo\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;E:\Maven_Repo\org\htrace\htrace-core\3.0.4\htrace-core-3.0.4.jar;E:\Maven_Repo\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;E:\Maven_Repo\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;E:\Maven_Repo\org\tukaani\xz\1.0\xz-1.0.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-hdfs\2.6.5\hadoop-hdfs-2.6.5.jar;E:\Maven_Repo\commons-daemon\commons-daemon\1.0.13\commons-daemon-1.0.13.jar;E:\Maven_Repo\io\netty\netty\3.6.2.Final\netty-3.6.2.Final.jar;E:\Maven_Repo\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;E:\Maven_Repo\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-client\2.6.5\hadoop-client-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-mapreduce-client-app\2.6.5\hadoop-mapreduce-client-app-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-mapreduce-client-common\2.6.5\hadoop-mapreduce-client-common-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-yarn-client\2.6.5\hadoop-yarn-client-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-yarn-server-common\2.6.5\hadoop-yarn-server-common-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.6.5\hadoop-mapreduce-client-shuffle-2.6.5.jar;E:\Maven_Repo\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-yarn-api\2.6.5\hadoop-yarn-api-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-mapreduce-client-core\2.6.5\hadoop-mapreduce-client-core-2.6.5.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-yarn-common\2.6.5\hadoop-yarn-common-2.6.5.jar;E:\Maven_Repo\com\sun\jersey\jersey-client\1.9\jersey-client-1.9.jar;E:\Maven_Repo\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.6.5\hadoop-mapreduce-client-jobclient-2.6.5.jar;C:\Java\jdk1.7.0_40\lib\tools.jar;E:\Maven_Repo\org\apache\hbase\hbase-client\1.2.5\hbase-client-1.2.5.jar;E:\Maven_Repo\org\apache\hbase\hbase-annotations\1.2.5\hbase-annotations-1.2.5.jar;E:\Maven_Repo\com\github\stephenc\findbugs\findbugs-annotations\1.3.9-1\findbugs-annotations-1.3.9-1.jar;E:\Maven_Repo\org\apache\hbase\hbase-common\1.2.5\hbase-common-1.2.5.jar;E:\Maven_Repo\org\apache\hbase\hbase-protocol\1.2.5\hbase-protocol-1.2.5.jar;E:\Maven_Repo\io\netty\netty-all\4.0.23.Final\netty-all-4.0.23.Final.jar;E:\Maven_Repo\org\apache\htrace\htrace-core\3.1.0-incubating\htrace-core-3.1.0-incubating.jar;E:\Maven_Repo\org\jruby\jcodings\jcodings\1.0.8\jcodings-1.0.8.jar;E:\Maven_Repo\org\jruby\joni\joni\2.1.2\joni-2.1.2.jar;E:\Maven_Repo\com\yammer\metrics\metrics-core\2.2.0\metrics-core-2.2.0.jar;E:\Maven_Repo\org\apache\hbase\hbase-server\1.2.5\hbase-server-1.2.5.jar;E:\Maven_Repo\org\apache\hbase\hbase-procedure\1.2.5\hbase-procedure-1.2.5.jar;E:\Maven_Repo\org\apache\hbase\hbase-common\1.2.5\hbase-common-1.2.5-tests.jar;E:\Maven_Repo\org\apache\hbase\hbase-prefix-tree\1.2.5\hbase-prefix-tree-1.2.5.jar;E:\Maven_Repo\org\apache\hbase\hbase-hadoop-compat\1.2.5\hbase-hadoop-compat-1.2.5.jar;E:\Maven_Repo\org\apache\hbase\hbase-hadoop2-compat\1.2.5\hbase-hadoop2-compat-1.2.5.jar;E:\Maven_Repo\org\apache\commons\commons-math\2.2\commons-math-2.2.jar;E:\Maven_Repo\org\mortbay\jetty\jetty-sslengine\6.1.26\jetty-sslengine-6.1.26.jar;E:\Maven_Repo\org\mortbay\jetty\jsp-2.1\6.1.14\jsp-2.1-6.1.14.jar;E:\Maven_Repo\org\mortbay\jetty\jsp-api-2.1\6.1.14\jsp-api-2.1-6.1.14.jar;E:\Maven_Repo\org\mortbay\jetty\servlet-api-2.5\6.1.14\servlet-api-2.5-6.1.14.jar;E:\Maven_Repo\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;E:\Maven_Repo\org\jamon\jamon-runtime\2.4.1\jamon-runtime-2.4.1.jar;E:\Maven_Repo\com\lmax\disruptor\3.3.0\disruptor-3.3.0.jar
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.library.path=C:\Program Files\Java\jdk1.7.0_40\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:\Program Files (x86)\Intel\iCLS Client\;C:\Program Files\Intel\iCLS Client\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\Java\jdk1.7.0_40\bin;;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;d:\Program Files (x86)\SSH Communications Security\SSH Secure Shell;D:\dev\hadoopworkspace\hadoop-2.6.5\bin;.
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.io.tmpdir=d:\Temp\
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.compiler=<NA>
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.name=Windows 7
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.arch=amd64
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.version=6.1
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.name=LZ
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.home=C:\Users\LZ
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.dir=D:\dev\hadoopworkspace\hadoop.sort
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ZooKeeper] Initiating client connection, connectString=lzvm:2181 sessionTimeout=90000 watcher=hconnection-0x721e34d80x0, quorum=lzvm:2181, baseZNode=/hbase
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ClientCnxn] Opening socket connection to server lzvm/192.168.1.10:2181. Will not attempt to authenticate using SASL (unknown error)
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ClientCnxn] Socket connection established to lzvm/192.168.1.10:2181, initiating session
2017-04-30 22:05:23 INFO [org.apache.zookeeper.ClientCnxn] Session establishment complete on server lzvm/192.168.1.10:2181, sessionid = 0x15bbf1cba31000a, negotiated timeout = 90000
false
 

但是使用 

table.setAutoFlush(false);

Put put1 = new Put();

Put put2 = new Put();

table.put(put1);

table.put(put2);

table.flushCommits();

上面的一系列操作不存在原子性

flushCommits() 操作会一次性把所有Put都推向服务器,只使用一次RPC调用,被缓冲的Put对象可以跨越多行

客户端能够批量处理这些更新,并把它们传送到对应的region服务器.

timg

实际上你还可以使用

table.put(puts) 

puts 是一个List<Put>