Today, a friend added friends to discuss some questions with me. I think these questions are quite valuable. Therefore, I want to open a q&A column in this public account, which is convenient for technical exchanges and sharing. The column name is: “Letters from Readers”. If there is a problem that is difficult to solve due to my limited ability, the post will be forwarded to my resource circle for help, with the questioner’s wechat QR code attached. You are also welcome to discuss solutions in the comments section

From: Huang * Wei

Small apes questions

If I use Spark to write files to HBase, the files are delivered incretiously by date every day. If I only want to keep the file data of the last 90 days in HBase, is there any good method? TTL will disable the table operation, the backend query will report an error. Is there another solution besides TTL?

The ape analysis

The main crux of this problem lies in: at the beginning of table construction, TTL was not set for the column family in time. After data entry, IT occurred to me that THE TTL attribute of the table could be set to ensure data timeliness but I did not want to disable the table. What to do?

Little ape to answer

Here, little Ape offers two solutions:

Solution a:

In earlier versions of HBase, setting the TTL attribute of the table can be done online. Disable table is not required. If in doubt, create a test table and set the TTL online. If yes, you can manually change the timeliness of a column family by using HBase Shell during off-peak periods.

Hbase (main):030:0> create 'test',' F1 '0 row(s) in 1.2990 seconds => hbase :: table-test hbase(main):031:0> desc 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION {NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0480 seconds hbase(main):032:0> ALTER 'test',{NAME => 'f1',TTL => '86400'} Updating all regions with the new schema... 1/1 Regions updated.done. 0 row(s) in 1.9870 secondsCopy the code

Unit: s

Scheme 2:

If solution 1 does not work, you can use the API to set the TTL of a single data in real time when inserting data. It’s just that the historical data has to be manually deleted.

Put put = new Put(Bytes.toBytes("row1"));
put.setTTL(86400L);Copy the code

Unit: s

Knowledge supplement

What if you want to reset the TTL to ‘FOREVER’? A: The maximum value of the HBase TTL is INT: 2147483647. You only need to set the TTL to this integer.

hbase(main):033:0> alter 'test',{NAME => 'f1',TTL => '2147483647'} Updating all regions with the new schema... 1/1 Regions updated.done. 0 row(s) in 1.9170 seconds hbase(main):034:0> desc 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION {NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0140 secondsCopy the code

If you have a better answer, welcome to leave a message in the message area oh ~

Reprint please indicate the source! Welcome to follow my wechat official account [HBase Work Notes]