Can we create a table which is range partitioned where partition keys are not in order?
For eg.
PARTITION 00001 ENDING AT ('2018-07-02') INCLUSIVE
PARTITION 00002 ENDING AT ('2018-07-03') INCLUSIVE
PARTITION 00003 ENDING AT ('2018-07-08') INCLUSIVE
PARTITION 00004 ENDING AT ('2018-07-05') INCLUSIVE
PARTITION 00005 ENDING AT ('2018-07-20') INCLUSIVE
is this a valid DDL for creating partitioned table in DB2?
No, partitions must be "in order". See here for more detail (search for partition-element). I've highlighted the relevant part.
The key values are subject to the following rules:
The first value corresponds to the first column of the key, the second value to the second column, and so on. Using fewer values than
there are columns in the key has the same effect as using the highest
or lowest values for the omitted columns, depending on whether they
are ascending or descending.
The highest value of the key in any partition must be lower than the highest value of the key in the next partition for ascending
cases.
The values specified for the last partition are enforced. The value specified for the last partition is the highest value of the key
that can be placed in the table. Any key values greater than the value
specified for the last partition are out of range.
If the concatenation of all the values exceeds 255 bytes, only the first 255 bytes are considered.
If a key includes a ROWID column or a column with a distinct type that is based on a ROWID data type, 17 bytes of the constant that is
specified for the corresponding ROWID column are considered.
If a null value is specified for the partitioning key and the key is ascending, an error is returned unless MAXVALUE is specified. If
the key is descending, an error is returned unless MINVALUE is
specified.
Related
I would like to have a table with deleted column containing the date the item was soft-deleted. Rows with NULL value in deleted column are the active ones. I was not able to figure our the syntax to create a partition for null values in deleted column. What is the syntax of creating such column?
create table my_table_pointing(street_id int, p_city_id int, name varchar(10), deleted date)
PARTITION BY RANGE (deleted);
CREATE TABLE my_table_pointing_2020 PARTITION OF my_table_pointing
FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
CREATE TABLE my_table_pointing_active PARTITION OF my_table_pointing
"for all rows where date is null"...
Thanks!
Provided you are on PG11 or later, you can create a default partition, and rows with deleted is null will be routed there.
create table my_table_pointing_active partition of my_table_pointing default;
I have a table partitioned by range in Postgres 10.6. Is there a way to tell one of its partitions to accept NULL for the column used as partition key?
The reason I need this is: my table size is 200GB and it's actually not yet partitioned. I want to partition it going forward, so I thought I would create an initial partition including all of the current rows, and then at the start of each month I would create another partition for that month's data.
The issue is, currently this table doesn't have the column I'll use for partitioning, so I want to add the column (initially null) and then tell that initial partition to hold all rows that have null in the partitioning key.
Another option would be to not add the column as null but to set an initial date value, but that would be time and space consuming because of the size of that table.
I would upgrade to v11 and initially define the partitioned table with just a default partition that contains all the NULL values.
Then you can add other partitions and gradually move the data by updating the NULL values.
Can the Cassandra SELECT DISTINCT operation be used to find all the unique values of a column if that column has an index on it?
My question is not the same as simply asking how to find distinct values of a non primary key columns. I realize that Cassandra does not allow queries that would require a table-scan, because they would be inefficient; here the presence of an index eliminates the need for a table scan.
If I have a table thus:
CREATE TABLE thing (
id uuid,
version bigint,
name text,
... data columns ...
PRIMARY KEY ((id),version)
);
CREATE INDEX ON thing(name);
I can SELECT DISTINCT id FROM thing; to get all the thing IDs. That requires one response from each node in my cluster, with each response returning the keys for its node.
But can I SELECT DISTINCT name FROM thing; to get all the thing names? That should also require only one response from each node in my cluster, with each response constructed only by examining the portion of the index on its node. And if name is a good column on which to have an index, each response would be smaller that the query for the primary keys (there should be fewer names than partition keys).
At least to me the documentation suggests that I should be able to select distinct values of any column:
DISTINCT selection_list
selection_list is one of:
A list of partition keys (used with DISTINCT)
selector AS alias, selector AS alias, ...| *
Where selector is column name. The documentation makes no restriction on what column name could be.
Matter of fact, you can only use DISTINCT with partition key columns (C* 2.2.4). Using it on anything else will yield an error:
cqlsh:stresscql> SELECT distinct name FROM thing ;
InvalidRequest: code=2200 [Invalid query] message="SELECT DISTINCT queries must only request partition key columns and/or static columns (not name)"
I don't have any in-depth understanding on the workings of secondary indexes, but I also have the feeling that allowing a DISTINCT count on an indexed column should not be worse in terms of reads incurred than querying the index for a particular value.
But as indexed values repeat across nodes it would be worse in terms of memory and network overhead relative to the result size as the coordinator would condense down the nodes' responses to only contain unique values.
Though, for replication factors > 1 this is also the case for partition key values.
Redshift's documentation (http://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) states that the definition of the column skew_sortkey1 is - Ratio of the size of the largest non-sort key column to the size of the first column of the sort key, if a sort key is defined. Use this value to evaluate the effectiveness of the sort key.
What does this imply? What does it mean if this value is large? or alternatively small?
Thanks!
A large skew_sortkey1 value means that the ratio of the size of largest non-sort key column to the first column of sort key is large which means row offsets in one disk block for the sort key corresponds to more disk blocks in the data column.
For example lets say skew_sortkey1 value is 5 for a table. Now the row offsets in one disk block for the sort key corresponds to 5 disk blocks for other data columns. Zone map stores the min and max value for the sort key disk block, so when you query this table with a where clause on sort key redshift identifies the sort key block which contains this data (block min < where clause value < block_max) and fetches the row offsets for that column. Now since the skew_sortkey1 is 5, it has to fetch 5 blocks for the data columns before filtering the records to the desired ones.
So to conclude having a high skew_sortkey1 value is not desirable.
Sortkeys define the order in which each field of a table row are stored in a disk block of redshift. This means that column data belonging to a sort key region gets stored together in a single disk block (1 MB size) . Since redshift applies compression to different columns, sortkey columns would have a potential advantage of storing similar data within the same disk block, which leads to higher compression/more efficient storage of data. The same thing cannot be said about other non-sortkey columns.
The column skew_sortkey1 in SVV_TABLE_INFO quantifies the effectiveness of the first sort key within the table. The returned value allows a user to determine whether the selected sort key has improved the compression/efficiency of data storage.
Having a table with a columnn ID as primary key, and a column MyNumber that contains integers defined by the sequence myUniqueSequence. I would like to define myUniqueSequence in PostgreSQL that will return the next free and unique number for the column MyNumber.
This means, the next time a new row is created programatically will start by number 1, if it's free it will use it for the column myNumber, if not, it tries with 2 and so on.
Use the serial data type for your column (instead of your own sequence):
http://www.postgresql.org/docs/9.0/static/datatype-numeric.html#DATATYPE-SERIAL