DB2 record have timestamps from multiple timezones - db2

Hi I have a db2 DB on mainframe that has data being written to by a web app and phone voice telephony ivr app. They were all located in the Central us time zone. The web app is being rewritten and moved to Eastern US time. The insert records all use sysdate for records timestamps like when the record was created or last updated. Since the db queries are "select ...... order by create_time" or "select .... order by update_time" The different time zones cause order problems.
I want a way to display all the records in one TZ (probably eastern).
select some_time_util(*,easternTZ) from table where condition=easternTz order by some_time_util( create_date, easten_tz)
union
select some_time_util(*,centralTZ) from table where condition=centralTz order by some_time_util( create_date, centeral_tz)
how is db2 supporting the concept of timestamsp and timezones?

Yes it is supported by DB2 - check out Time zone specific expressions
http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_tzspecificexpression.dita

Related

Can redshift stored procs be used to make a date range UNION ALL query

Since redshift does not natively support date partitioning, other than in redshift spectrum, all our tables are date partitioned
my_table_name_YYYY_MM_DD
So every time we do queries it's usually looks like this
select columns, i, want from
(select * from tbl1_date UNION ALL
select * from tbl2_date UNION ALL
select * from tbl3_date UNION ALL
select * from tbl4_date);
Where there's one UNION ALL per day.
Can stored procedures generate a date rangeso our business analysts stop losing their hair when I send them a python or bash script to generate the date range?
Yes, you could create a stored procedure that generates dynamic SQL using only the needed tables. See my answer here for a template to start from: Issue with passing column name as a parameter to "PREPARE" in Redshift
However, you should be aware that Redshift is able to achieve most of what you want automatically using a "Time Series Table" view. This documented here:
Using Time Series Tables
Use Time-Series Tables
You define a view that is composed of a UNION ALL over a sequence of identical tables with a sort key defined on a commonly filtered date or timestamp column. When you query that view Redshift is able to eliminate the scans on any UNION'ed tables that would not contain relevant data.
For example:
CREATE OR REPLACE VIEW store_sales_vw
AS SELECT * FROM store_sales_1998
UNION ALL SELECT * FROM store_sales_1999
UNION ALL SELECT * FROM store_sales_2001
UNION ALL SELECT * FROM store_sales_2002
UNION ALL SELECT * FROM store_sales_2003
;
SELECT cd.cd_education_status
,COUNT(*) sales_count
,AVG(ss_quantity) avg_quantity
FROM store_sales_vw vw
JOIN customer_demographics cd
ON vw.ss_cdemo_sk = cd.cd_demo_sk
WHERE ss_sold_ts BETWEEN '1999-09-01' AND '2000-08-31'
GROUP BY cd.cd_education_status
In this example Redshift will only use the store_sales_1999 and store_sales_2000 tables, skipping the other tables in the view. Note that the table skipping is not based the name of the table. Redshift knows the MIN and MAX values of the sort key timestamp in each table.
If you purse this approach please be sure to keep the total size of the UNION fairly low. I recommend (at most) daily tables for the last week [7], weekly tables for the last month [5], quarterly tables for the last year [4], and then yearly tables for older data.
You can use ALTER TABLE … APPEND to merge the daily tables in weekly tables and so on.

Looping through unique dates in PostgreSQL

In Python (pandas) I read from my database and then I use a pivot table to aggregate data each day. The raw data I am working on is about 2 million rows per day and it is per person and per 30 minutes. I am aggregating it to be daily instead so it is a lot smaller for visualization.
So in pandas, I would read each date into memory and aggregate it and then load it into a fresh table in postgres.
How can I do this directly in postgres? Can I loop through each unique report_date in my table, groupby, and then append it to another table? I am assuming doing it in postgres would be fast compared to reading it over a network in python, writing a temporary .csv file, and then writing it again over the network.
Here's an example: Suppose that you have a table
CREATE TABLE post (
posted_at timestamptz not null,
user_id integer not null,
score integer not null
);
representing the score various user have earned from posts they made in SO like forum. Then the following query
SELECT user_id, posted_at::date AS day, sum(score) AS score
FROM post
GROUP BY user_id, posted_at::date;
will aggregate the scores per user per day.
Note that this will consider that the day changes at 00:00 UTC (like SO does). If you want a different time, say midnight Paris time, then you can do it like so:
SELECT user_id, (posted_at AT TIME ZONE 'Europe/Paris')::date AS day, sum(score) AS score
FROM post
GROUP BY user_id, (posted_at AT TIME ZONE 'Europe/Paris')::date;
To have good performace for the above queries, you might want to create a (computed) index on (user_id, posted_at::date), or similarly for the second case.

SQL Timestamp offset (Postgres)

I want to copy records from one database to another using Pentaho. But I ran into a problem. Let's say there is a transaction_timestamp column ir Table1 with data type timestamp with time zone. But once I select records from the source DB and insert them into the other database - values of the column are offset by one hour or so. The weirdest thing - this doesn't even affect all records. I also tried something like this :
select
transaction_timestamp::timestamp without time zone as transaction_timestamp,
t1.* from
table1 t1
And it didn't work. Could the problem be that when I copy the records to the 2nd DB, it sets all values to the local timezone? But why then the select statement I mentioned doesn't work? And only a part of the records is affected?

Amazon Redshift how to get the last date a table inserted data

I am trying to get the last date an insert was performed in a table (on Amazon Redshift), is there any way to do this using the metadata? The tables do not store any timestamp column, and even if they had it, we need to find out for 3k tables so it would be impractical so a metadata approach is our strategy. Any tips?
All insert execution steps for queries are logged in STL_INSERT. This query should give you the information you're looking for:
SELECT sti.schema, sti.table, sq.endtime, sq.querytxt
FROM
(SELECT MAX(query) as query, tbl, MAX(i.endtime) as last_insert
FROM stl_insert i
GROUP BY tbl
ORDER BY tbl) inserts
JOIN stl_query sq ON sq.query = inserts.query
JOIN svv_table_info sti ON sti.table_id = inserts.tbl
ORDER BY inserts.last_insert DESC;
Note: The STL tables only retain approximately two to five days of log history.

How to query just the last record of every day

I have a table power with a datetimetz field called sample_time and a column called amp_hours.
The amp_hours field gains a record about every two minute and is reset every night at midnight.
I would like to see sample_time and amp_hours for the last record of every day.
I'm very new to SQL so I may be overlooking an obvious answer.
I saw this post on how to select the last record of a group but I'm not familiar enough with SQL to get it to work for datetimes:
I thought to use lead() or lag() to compare the date of a record with the next record but I'm using postgresql 8.3 and I think windowing was introduced in 8.4.
Try this:
SELECT DISTINCT ON (sample_time::date) sample_time, amp_hours
FROM power
ORDER BY sample_time::date DESC, sample_time DESC;