Postgres 11.9 : Partitioning by tstzrange - postgresql

I need to range partition a table by tstzrange column. Not finding any examples
CREATE TABLE schedule_slots
(
id int8 NOT NULL DEFAULT nextval('schedule_slots_id_seq'::regclass),
time_range tstzrange NOT NULL,
CONSTRAINT schedule_slots_pkey2 PRIMARY KEY (id, time_range)
) partition by range(time_range);
Was trying to create a partition as below
CREATE TABLE schedule_slots_part1 PARTITION OF schedule_slots
FOR VALUES FROM tstzrange('2022-10-01 00:00:00-07','2022-10-01 23:59:59-07', '[)') to tstzrange('2023-10-01 00:00:00-07','2023-10-01 23:59:59-07', '[)')
with the intention of all rows between 2022-10-1 and 2023-10-1 being writen to this partition. But, partition creation is failing with error as
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near "tstzrange"
Position: 83
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.execute(JDBCStatementImpl.java:327)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:130)
... 12 more
Please let me know how to partition the table based on tstzrange column

CREATE TABLE schedule_slots_part1 PARTITION OF schedule_slots
FOR VALUES FROM ('[2022-10-01 00:00:00-07,2022-10-01 23:59:59-07)') to ('[2023-10-01 00:00:00-07,2023-10-01 23:59:59-07)');
Is working !!

Related

Unexpected type: BINARY

I am trying to read parquet files via the Flink table, and it throws the error when I select one of the timestamps.
My parquet table is something like this.
I create a table with this SQL :
CREATE TABLE MyDummyTable (
`id` INT,
ts BIGINT,
ts_ltz AS TO_TIMESTAMP_LTZ(ts, 3),
ts2 TIMESTAMP,
ts3 TIMESTAMP,
ts4 TIMESTAMP,
ts5 TIMESTAMP
)
It throws an error when I select one of the ts2, ts3, ts4, ts5
The error stack is this.
Caused by: java.lang.IllegalArgumentException: Unexpected type: BINARY
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:77)
at org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:369)
at org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:264)
at org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:254)
at org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createPoolOfBatches(ParquetVectorizedInputFormat.java:244)
at org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:137)
at org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:73)
at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.checkSplitOrStartNext(FileSourceSplitReader.java:112)
at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:65)
at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:56)
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:138)
... 7 more
My Current appproach
I am currently using this approach to jump over the problem but it does not seem a legit solution since I create two columns in the table.
CREATE TABLE MyDummyTable (
`id` INT,
ts2 STRING,
ts2_ts AS TO_TIMESTAMP(ts2)
)
Flink Version: 1.13.2
Scala Version: 2.11.12

PostgreSQL: ON CONFLICT DO UPDATE command cannot affect row a second time

I have two PostreSQL tables:
CREATE TABLE source.staticprompts (
id INT,
projectid BIGINT,
scriptid INT,
promptnum INT,
prompttype VARCHAR(20),
inputs VARCHAR(2000),
attributes VARCHAR(2000),
text VARCHAR(2000),
corpuscode VARCHAR(2000),
comment VARCHAR(2000),
created TIMESTAMP,
modified TIMESTAMP
);
and
CREATE TABLE target.dim_collect_user_inp_configs (
collect_project_id BIGINT NOT NULL,
prompt_type VARCHAR(20),
prompt_input_desc VARCHAR(3000),
prompt_input_name VARCHAR(1000),
no_of_prompt_count BIGINT,
prompt_input_value VARCHAR(100),
prompt_input_value_id BIGSERIAL PRIMARY KEY,
script_id BIGINT,
corpuscode VARCHAR(20),
min_recordings VARCHAR(2000),
max_recordings VARCHAR(2000),
recordings_count VARCHAR(2000),
lease_duration VARCHAR(2000),
date_created TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
date_updated TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT must_be_unique UNIQUE (prompt_input_value, collect_project_id)
);
I need copy data from source to target with this conditions:
Each value need to be stored as one row in the dim_collect_user_inp_configs table. Example, Indoor-Loud as one row and it will have it’s own unique identifier as prompt_input_value_id, Indoor-Normal as one row and it will have it’s own unique identifier as prompt_input_value_id till the Semi-Outdoor-Whisper.
There could be multiple input “name” in one inputs column. Each name and its value need to be stored separately.
prompt_input_value_id - Generate unique sequence number for the combination of each prompt_input_value and collect_project_id
Source table have this data:
20030,input,m66,,null,"[{""desc"": ""Select the setting that you will do the recordings under."", ""name"": ""ambient"", ""type"": ""dropdown"", ""values"": ["""", ""Indoors + High + Loud"", ""Indoors + High + Normal"", ""Indoors + Low + Normal"", ""Indoors + Low + LowVolume"", ""Outdoors + High + Normal"", ""Outdoors + Low + Loud"", ""Outdoors + Low + Normal"", ""Outdoors + Low + LowVolume""]}, {""desc"": ""Select the noise type that you will do the recordings under."", ""name"": ""Noise type"", ""type"": ""dropdown"", ""values"": ["""", ""Human Speech"", ""Ambient Speech"", ""Non-Speech""]}]",,2018-12-13 13:49:24.408933,1,5,5906,2021-08-26 12:43:54.061000
I try to do this task with this query:
INSERT INTO target.dim_collect_user_inp_configs AS t (
collect_project_id,
prompt_type,
prompt_input_desc,
prompt_input_name,
prompt_input_value,
script_id,
corpuscode)
SELECT
s.projectid,
s.prompttype,
el.inputs->>'name' AS name,
el.inputs->>'desc' AS description,
jsonb_array_elements(el.inputs->'values') AS value,
s.scriptid,
s.corpuscode
FROM source.staticprompts AS s,
jsonb_array_elements(s.inputs::jsonb) el(inputs)
ON CONFLICT
(prompt_input_value, collect_project_id)
DO UPDATE SET
(prompt_input_desc, prompt_input_name, date_updated) =
(EXCLUDED.prompt_input_desc,
EXCLUDED.prompt_input_name,
NOW())
WHERE t.prompt_input_desc != EXCLUDED.prompt_input_desc
OR t.prompt_input_name != EXCLUDED.prompt_input_name
RETURNING *;
But I get an error:
ON CONFLICT DO UPDATE command cannot affect row a second time Hint: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Can you help where is mistake?
Change the SELECT so that all rows with the same prompt_input_value and collect_project_id are grouped together, then each targer row will be updated at most once. Use aggregate functions for all other columns.
Something like
SELECT s.projectid,
max(s.prompttype),
max(el.inputs->>'name') AS name,
max(el.inputs->>'desc') AS description,
v.value,
max(s.scriptid),
max(s.corpuscode)
FROM source.staticprompts AS s
CROSS JOIN LATERAL jsonb_array_elements(s.inputs::jsonb) AS el(inputs)
CROSS JOIN LATERAL jsonb_array_elements(el.inputs->'values') AS v(value)
GROUP BY s.projectid, v.value

How to change date format of a column based on regex in PostgreSQL11.0

I have a table in PostgreSQL 11.0 with following column with date (column type: character varying).
id date_col
1 April2006
2 May2005
3 null
4
5 May16,2019
I would like to convert the column to 'date'
As there are two different date format, I am using a CASE statement to alter the column type based on a date pattern.
select *,
case
when date_col ~ '^[A-Za-z]+\d+,\d+' then alter table tbl alter date_col type date using to_date((NULLIF(date_col , 'null')), 'MonthDD,YYYY')
when date_col ~ '^[A-Za-z]+,\d+' then alter table tbl alter date_col type date using to_date((NULLIF(date_col, 'null')), 'MonthYYYY')
else null
end
from tbl
I am getting following error:
[Code: 0, SQL State: 42601] ERROR: syntax error at or near "table"
Position: 93 [Script position: 93 - 98]
The expected output is:
id date_col
1 2006-04-01
2 2005-05-01
3 null
4
5 2019-05-16
Any help is highly appreciated!!
You definitely can't alter a column one row at a time. Your better bet is to update the existing values so they are all the same format, then issue a single ALTER statement.

insert multiple rows into table with column that has default value

I have a table in PostgreSQL and one of the column has default value.
DDL of the table is:
CREATE TABLE public.my_table_name
(int_column_1 character varying(6) NOT NULL,
text_column_1 character varying(20) NOT NULL,
text_column_2 character varying(15) NOT NULL,
default_column numeric(10,7) NOT NULL DEFAULT 0.1,
time_stamp_column date NOT NULL);
I am trying to insert multiple rows in a single query. And in those I have some rows to which I have value for default_column and i have some rows to which i don't have any value for default_column and want to Postgres to use default value for these rows.
Here's what i tried:
INSERT INTO "my_table_name"(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES
(91,'text_row_11','text_row_21',8,current_timestamp),
(91,'text_row_12','text_row_22',,current_timestamp),
(91,'text_row_13','text_row_23',19,current_timestamp),
(91,'text_row_14','text_row_24',,current_timestamp),
(91,'text_row_15','text_row_25',27,current_timestamp);
this gives me an error. So, when i try to insert:
INSERT INTO "my_table_name"(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES (91,'text_row_12','text_row_22',,current_timestamp), -- i want null to be appended here, so i left it empty.
--error from this query is: ERROR: syntax error at or near ","
and
INSERT INTO "my_table_name"(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES (91,'text_row_14','text_row_24',NULL,current_timestamp),
-- error from this query is: ERROR: new row for relation "glycemicindxdir" violates check constraint "food_item_check"
So, how do i fix this; And insert value when i have it or have Postgres insert default when I don't have a value?
Use the default keyword:
INSERT INTO my_table_name
(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES
(91, 'text_row_11', 'text_row_21', 8 , current_timestamp),
(91, 'text_row_12', 'text_row_22', default, current_timestamp),
(91, 'text_row_13', 'text_row_23', 19 , current_timestamp),
(91, 'text_row_14', 'text_row_24', default, current_timestamp),
(91, 'text_row_15', 'text_row_25', 27 , current_timestamp);

Specified partition columns do not match the partition columns of the table.Please use () as the partition columns

here i'm trying to persist the data frame in to a partitioned hive table and getting this silly exception. I have looked in to it many times but not able to find the fault.
org.apache.spark.sql.AnalysisException: Specified partition columns
(timestamp value) do not match the partition columns of the table.
Please use () as the partition columns.;
Here is the script by which the external table is created with,
CREATE EXTERNAL TABLEIF NOT EXISTS events2 (
action string
,device_os_ver string
,device_type string
,event_name string
,item_name string
,lat DOUBLE
,lon DOUBLE
,memberid BIGINT
,productupccd BIGINT
,tenantid BIGINT
) partitioned BY (timestamp_val DATE)
row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
location 'maprfs:///location/of/events2'
tblproperties ('serialization.null.format' = '');
Here is the result of describe formatted of table "events2"
hive> describe formatted events2;
OK
# col_name data_type comment
action string
device_os_ver string
device_type string
event_name string
item_name string
lat double
lon double
memberid bigint
productupccd bigint
tenantid bigint
# Partition Information
# col_name data_type comment
timestamp_val date
# Detailed Table Information
Database: default
CreateTime: Wed Jan 11 16:58:55 IST 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: maprfs:/location/of/events2
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
serialization.null.format
transient_lastDdlTime 1484134135
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.078 seconds, Fetched: 42 row(s)
Here is the line of code where the data is partitioned and stored in to a table,
val tablepath = Map("path" -> "maprfs:///location/of/events2")
AppendDF.write.format("parquet").partitionBy("Timestamp_val").options(tablepath).mode(org.apache.spark.sql.SaveMode.Append).saveAsTable("events2")
While running the application, i'm getting the below
Specified partition columns (timestamp_val) do not match the partition
columns of the table.Please use () as the partition columns.
I might be committing an obvious error, any help is much appreciated with an upvote :)
Please print schema of df:
AppendDF.printSchema()
Make sure it is not type mismatch??