Performing a multi-valued update in postgres with dates - postgresql

I am performing a multi-valued update in postgres. However, the datatype DATE in postgres is giving me issues. I got the following code to perform the update, but it gives an error
update users as u set
id = u2.id,
appointment = u2.appointment
from (values
(1, '2022-12-01'),
(2, '2022-12-01')
) as u2(id, appointment)
where u2.id = u.id;
ERROR: column "appointment" is of type date but expression is of type text
LINE 3: appointment = u2.appointment
^
HINT: You will need to rewrite or cast the expression.
Normally postgres accepts dates in such a format, how should I perform this update?

Postgres first casts the date values as string when creating u2, meaning it cannot cast them as date again when performing the actual update. Cast the dates as date to solve the issue:
update users as u set
id = u2.id,
appointment = u2.appointment
from (values
(1, CAST('2022-12-01' as date)),
(2, CAST('2022-12-01' as date))
) as u2(id, appointment)
where u2.id = u.id;
SQLFiddle: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=9f30e708c8508a2df94458b70399eb3a

Related

Impala - handle CAST exception for column

SELECT src_user, CAST(start_time as timestamp) as start_time_ts, start_time, dest_ip, src_ip, count(*) as `count`
FROM mytable
WHERE
start_time like '2022-06%'
AND src_ip = '2.3.4.5'
AND rule_name like '%XASDF%'
group by 1, 2, 3, 4, 5 order by 2 desc)
I'm getting an error with pyspark:
[Cloudera][JDBC](10140) Error converting value to Timestamp.
Now, if I don't order by in the query I can get a good result set with timestamps properly converted, so there's a row somewhere that starts with 2022-06 that is not parsing correctly.
How can I setup my error handling so that it will show me the actual value that is causing the error, rather than telling me what the error is?
Here's the code:
df = spark.read.jdbc(url= 'jdbc:impala://asdf.com/dasafsda', table = select_sql, properties = properties)
df.printSchema()
df.show(truncate=False)
I dont know Impala specific function but lets say your normal date is in format yyyy-mm-dd so you can do length check on it if you suspect 2022-06 is causing problem and then set it to default date for error value and store it in separate column to see its real value
like
SELECT src_user,CASE WHEN LENGTH (start_time) =10 THEN CAST(start_time as timestamp) ELSE CAST("1900-01-01" as timestamp) END as start_time_ts, start_time as start_time_raw,start_time, dest_ip, src_ip, count(*) as `count`
FROM mytable
WHERE
start_time like '2022-06%'
AND src_ip = '2.3.4.5'
AND rule_name like '%XASDF%'
group by 1, 2, 3, 4, 5 order by 2 desc)
after this query just filter 1900-01-01 to see vales in start_time_raw which were not casted

Update column with input from a VALUES expression without explicit type cast

I'm trying to update a nullable date column with NULL and for some reason Postgres takes NULL as text and gives the below error
UPDATE tbl
SET
order_date = data.order_date
FROM
(VALUES (NULL, 100))
AS data(order_date,id)
WHERE data.id = tbl.id
And the error shows:
[42804] ERROR: column "order_date" is of type date but expression is
of type text
Hint: You will need to rewrite or cast the expression.
I can fix this by explicitly converting NULL to date as below:
NULL::date
But, Is there a way to achieve this without explicit type conversion?
You can avoid the explicit cast by copying data types from the target table:
UPDATE tbl
SET order_date = data.order_date
FROM (
VALUES
((NULL::tbl).order_date, (NULL::tbl).id)
(NULL, 100)
) data(order_date, id)
WHERE data.id = tbl.id;
The added dummy row with NULL values is filtered by WHERE data.id = tbl.id.
Related answer with detailed explanation:
Casting NULL type when updating multiple rows

How to query from the result of a changed column of a table in postgresql

So I have a string time column in a table and now I want to change that time to date time type and then query data for selected dates.
Is there a direct way to do so? One way I could think of is
1) add a new column
2) insert values into it with converted date
3) Query using the new column
Here I am stuck with the 2nd step with INSERT so need help with that
ALTER TABLE "nds".”unacast_sample_august_2018"
ADD COLUMN new_date timestamp
-- Need correction in select statement that I don't understand
INSERT INTO "nds".”unacast_sample_august_2018” (new_date)
(SELECT new_date from_iso8601_date(substr(timestamp,1,10))
Could some one help me with correction and if possible a better way of doing it?
Tried other way to do in single step but gives error as Column does not exist new_date
SELECT *
FROM (SELECT from_iso8601_date(substr(timestamp,1,10)) FROM "db_name"."table_name") AS new_date
WHERE new_date > from_iso8601('2018-08-26') limit 10;
AND
SELECT new_date = (SELECT from_iso8601_date(substr(timestamp,1,10)))
FROM "db_name"."table_name"
WHERE new_date > from_iso8601('2018-08-26') limit 10;
Could someone correct these queries?
You don't need those steps, just use USING CAST clause on your ALTER TABLE:
CREATE TABLE foobar (my_timestamp) AS
VALUES ('2018-09-20 00:00:00');
ALTER TABLE foobar
ALTER COLUMN my_timestamp TYPE timestamp USING CAST(my_timestamp AS TIMESTAMP);
If your string timestamps are in a correct format this should be enough.
Solved as follows:
select *
from
(
SELECT from_iso8601_date(substr(timestamp,1,10)) as day,*
FROM "db"."table"
)
WHERE day > date_parse('2018-08-26', '%Y-%m-%d')
limit 10

Using many connection.cursor()

I want to fetch data from 3 tables in a single database at once. I used 3 conn.cursor() to it.. Are there any sophisticated ways to do it?
conn = psycopg2.connect(database="plottest", user="postgres")
self.statusbar.showMessage("Database opened Sucessfully", 1000)
cur = conn.cursor()
cur1 = conn.cursor()
cur2 = conn.cursor()
cur.execute("SELECT id ,actual from \"%s\" " % date)
rows = cur.fetchall()
cur1.execute("SELECT qty from DAILY where date = \'%s\'" % date)
dailyqty = cur1.fetchone()
cur2.execute("SELECT qty from MONTHLY where month = \'%s\'" % month)
monthqty = cur2.fetchone()
Awoogah awoogah, SQL injection warning! Don't write code using string interpolation. What happens if someone calls your code with the "date" ');-- DROP TABLE DAILY;-- ?
Use bind parameters. Always.
The only exception is for dynamic identifiers, like in the case above where you seem to use a table named after the current date. In that case you must "double quote" them and double any contained double-quotes. In your case that means that date should be date.replace('"', '""') where you substitute it into the SQL.
Now, back to our regular programming.
Since you fetchall from each cursor you can just re-use it. You don't need new cursors each time.
You can also combine the daily and monthly stats if you want, with a UNION ALL. I fixed your capitalisation and parameter binding in the process:
cur.execute("""SELECT 1, qty FROM daily WHERE date = %s
UNION ALL
SELECT 2, qty FROM monthly WHERE month = %s
ORDER BY 1""",
(date, month))
Note that string interpolation isn't used, instead a 2-tuple of parameters is passed to psycopg2 to bind directly. There's no need for quotes around the parameters, psycopg2 adds them if needed.
This avoids a client-server round trip by bundling the two queries. The extra column andORDER BY is technically needed so you can safely assume the first row is the daily results and second is the monthly. In practice PostgreSQL won't re-order them with UNION ALL though.
You can combine
SELECT a1 FROM t1 WHERE b1 = 'v1';
and
SELECT a2 FROM t2 WHERE b2 = 'v2';
to a single statement like this:
SELECT t1.a1, t2.a2 FROM t1, t2
WHERE t1.b1 = 'v1' AND t2.b2 = 'v2';
provided that both queries return exactly one row.

Postgres query including time range

I have a query that pulls part of the data that I need for tracking. What I need to add is either a column that includes the date or the ability to query the table for a date range. I would prefer the column if possible. I am using psql 8.3.3.
Here is the current query:
select count(mailing.mailing_id) as mailing_count, department.name as org
from mailing, department
where mailing.department_id = department.department_id
group by department.name;
This returns the following information:
mailing_count | org
---------------+-----------------------------------------
2 | org1 name
8 | org2 name
22 | org3 name
21 | org4 name
39 | org5 name
The table that I am querying has 3 columns that have date in a timestamp format which are target_launch_date, created_time and modified_time.
When I try to add the date range to the query I get an error:
Query:
select count(mailing.mailing_id) as mailing_count, department.name as org
from mailing, department
where mailing.department_id = department.department_id
group by department.name,
WHERE (target_launch_date)>= 2016-09-01 AND < 2016-09-05;
Error:
ERROR: syntax error at or near "WHERE" LINE 1:
...department.department_id group by department.name,WHERE(targ...
I've tried moving the location of the date range in the string and a variety of other changes, but cannot achieve what I am looking for.
Any insight would be greatly appreciated!
Here's a query that would do what you need:
SELECT
count(m.mailing_id) as mailing_count,
d.name as org
FROM mailing m
JOIN department d USING( department_id )
WHERE
m.target_launch_date BETWEEN '2016-09-01' AND '2016-09-05'
GROUP BY 2
Since your target_launch_date is of type timestamp you can safely do <= '2016-09-05' which will actually convert to 2016-09-05 00:00:00.00000 giving you all the dates that are before start of that day or exactly 2016-09-05 00:00:00.00000
Couple of additional notes:
Use aliases for table names to shorten the code, eg. mailing m
Use explicit JOIN syntax to connect data from related tables
Apply your WHERE clause before GROUP BY to exclude rows that don't match it
Use BETWEEN operator to handle date >= X AND date <= Y case
You can use USING instead of ON in JOIN syntax when joined column names are the same
You can use column numbers in GROUP BY which point to position of a column in your select
To gain more insight on the matter of how processing of a SELECT statement behaves in steps look at the documentation.
Edit
Approach using BETWEEN operator would account 2015-09-05 00:00:00.00000 to the resultset. If this timestamp should be discarded change BETWEEN x AND y to either of those two:
(...) BETWEEN x AND y::timestamp - INTERVAL '1 microsecond'
(...) >= x AND (...) < y
You were close, you need to supply the column name on second part of where too and you would have a single where:
select count(mailing.mailing_id) as mailing_count, department.name as org
from mailing
inner join department on mailing.department_id = department.department_id
where target_launch_date >= '2016-09-01 00:00:00'
AND target_launch_date < '2016-09-05 00:00:00'
group by department.name;
EDIT: This part is just for Kamil G. showing clearly that between should NOT be used:
create table sample (id int, d timestamp);
insert into sample (id, d)
values
(1, '2016/09/01'),
(2, '2016/09/02'),
(3, '2016/09/03'),
(4, '2016/09/04'),
(5, '2016/09/05'),
(6, '2016/09/05 00:00:00'),
(7, '2016/09/05 00:00:01'),
(8, '2016/09/06');
select * from sample where d between '2016-09-01' and '2016-09-05';
Result:
1;"2016-09-01 00:00:00"
2;"2016-09-02 00:00:00"
3;"2016-09-03 00:00:00"
4;"2016-09-04 00:00:00"
5;"2016-09-05 00:00:00"
6;"2016-09-05 00:00:00"
BTW if you wouldn't believe without seeing explain, then here it is:
Filter: ((d >= '2016-09-01 00:00:00'::timestamp without time zone) AND
(d <= '2016-09-05 00:00:00'::timestamp without time zone))