Developer.aliyun.com/article/666… When creating a table, you can set the normal and partitioned columns. In most cases, you can think of ordinary columns as data in a data file, and partitioned columns as directories in a file system. So the storage footprint of a table is the space footprint of an ordinary column. Although partition columns do not store data directly, they are similar to directories in a file system. They facilitate data management and reduce computation by querying only the corresponding partition when specifying only specific partitions.

Partition column Settings

See the syntax for creating partitioned tableshere. Here are two examples for easy understanding:

Here you can see how partitioned tables are created. Currently, only STRING columns are supported. BIGINT is currently being tested and is not recommended for use.

Currently, the number of partitioned columns in a partitioned table cannot exceed 6 levels.

Partition creation

For the difference between partitions and partitioning keys, please refer to this note. In response to the previous instructions, the partitioning key setting simply sets up a specification that defines the directory rule for storing files under the table to be ds=’ XXX ‘. Then partition DS =’20150101′ corresponds to one directory and partition DS =’20150102’ corresponds to another directory.

The role of partitioning

There are two main parts to partitioning. One is to facilitate data management. With partitioning, a table’s data is divided into multiple partitions. For example, if the log table is partitioned by date (day), then each partition contains a single day’s data. If one day we want to archive historical data somewhere, or delete old data, we just need to deal with the corresponding partition. Lifecycle can be set for data expiration in days. MaxCompute will determine whether to reclaim each table based on its LastDataModifiedTime and Lifecycle Settings. If the table is a partitioned table, determine whether the partition should be reclaimed based on the LastDataModifiedTime of each partition. Therefore, if the expiration time is set to 100 days, data is synchronized to a partition every day, and data in the historical partition is not modified after being written (LastDataModifiedTime remains unchanged), the historical data generated before 100 days will be automatically deleted, reducing o&M costs.

But more importantly, if the calculation is done properly, the partitioned table can participate in the calculation by only reading data from the specified partition as input, thus reducing the amount of calculation, shortening the calculation time, and reducing the cost. Here is a typical example: ods_oplog table has 2 days (20161113, 20161114) logs (5 in total) that I simulated, ds as partition. If I only need 20161113 data for this calculation, we can use SQL:



SQL =’20161113′; SQL =’20161113′; SQL =’20161113′; SQL =’20161113′

Use restrictions

Partition is good, but it should not be abused. Currently there are two main restrictions on partitions. The first is that the maximum number of partitions for a single table is currently 60,000. Second, during the execution plan resolution of a single query, the number of queried partitions cannot be greater than 10,000; otherwise, an error will be reported. To solve this problem, you need to design the table structure so that you do not use fields such as user IDS for partition columns, otherwise you may get an error when you need to do a full table query.

There is also a problem with this error if the query is filtered only by secondary partitions, because no primary partition is specified and therefore all primary partitions are scanned.

SQL

The previous partitioning example already mentioned how SQL query criteria can be used to benefit from partitioning. It is important to note that the partition column is of type STRING, so in SQL you should write ds=’20161113′ instead of ds=20161113 to avoid automatic type conversion results. For a partitioned table, specify the partition to which data is to be written.

    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
        select customer_id, shop_name, total_price from sale_detail;
Copy the code

The directory of data does not store specific data. You only need to specify which directory to write the data to and select the data from the common column into the table.

But there are indeed some scenarios, need to query results, according to the value of a field, intelligent write to the corresponding partition, then need to use dynamic partition, the specific syntax can refer to

    create table total_revenues (revenue bigint) partitioned by (region string);
    insert overwrite table total_revenues partition(region)
        select total_price as revenue, region
            from sale_detail;
Copy the code

You can refer to the documentation for detailed descriptions of the above two functions.

JAVA

In JAVA, the corresponding partition is com. The aliyun. Odps. PartitionSpec. This class has two constructs, in addition to the no-argument construct, and one that is passed in as a string.

Public PartitionSpec(String spec) Constructs such an object from a String: spec-partition defines a String, for example: pt='1',ds='2'Copy the code

This is an example of a real use that requires passing in the PartitionSpec (creating a new partition using the SDK)

        Account account = new AliyunAccount(accessId, accessKey);
        Odps odps = new Odps(account);
        odps.setEndpoint(endpoint);
        odps.setDefaultProject(project);

        Tables ts = odps.tables();
        Table t = ts.get("p2");
        String partition = "area='CN',pdate='20160101'";
        PartitionSpec partitionSpec = new PartitionSpec(partition);
        t.createPartition(partitionSpec);
Copy the code

Sharing:

Recommended attention:

· 2021 Developer Skills Competition waiting for you! · Cloud native developer insight white paper launched worldwide · Low code essay call, come and see · Wind Explorer plan invites you to enter the community, wonderful rights and interests immediately

Copyright Notice: The content of this article is contributed by aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright, nor bear the corresponding legal responsibility. For specific rules, please refer to “Ali Cloud Developer Community User Service Agreement” and “Ali Cloud Developer Community Intellectual Property Protection Guidelines”. If you find any content suspected of plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringement content.