DB2 select last day with a lot of data

DB2 select last day with a lot of data - db2

I have a very big table in DB2 with around 500 million rows
I need to select the last day only based on a timestamp column and other conditions
I did something like this, but it takes forever (about 10 minutes) to get the results, is there any other way to query this faster, I am not familiar with db2
DTM is a timestamp datatype
select a, b, c, d, e, DTM from table1
where e = 'I' and DTM > current timestamp - 1 days
Any help please

Besides an index, another option may be range partitioning on this table. If you could range partition by month, you would only have to scan the month, for example. Even better if you could partition by day (and have the partitioning key in the index so you had partitioned index too).

Related

Get latest rows in PostgresSQL table ordered by Date: Index or Sort table?

I had a hard time titling this question but I hope its appropriate.
I have a table of transactions, and each transaction has a Date column (of type Date).
I want to run a query that gets the latest 100 transactions by date (simple enough with an ORDERBY query).
My question is, in order to make this an extremely cheap operation, would it make sense to sort my entire table so that I just need to select the top 100 rows every time, or do i simply create an index on the date column? Not sure if first option is even possible and or/good sql db practice.

You would add an index on the column with the date and query:
SELECT * FROM tab
ORDER BY datecol DESC
LIMIT 100;
The problem with your other idea is that there is no well-defined order in a table. Every UPDATE changes this "order", and even if you don't modify anything, a sequential scan need not start at the beginning of the table.

How to get a list of dates in Pervasive SQL

Our time & attendance database is a Pervasive/Actian Zen database. What I'm trying to do is create a query that just lists the next 14 days from today. I'll then cross apply this list of dates with employee records so that in effect I have a list of people/dates for the next 14 days.
I've done it with a recursive CTE on SQL server quite easily. I could also do it with a loop in SQL Server too but I can't figure it out with Pervasive SQL. Loops can only exist within Stored Procedures and triggers.
Looking around I thought that this code that I found and adapted might work, but it doesn't (and further research suggests that there isn't a recursive option within Pervasive at all.
WITH RECURSIVE cte_numbers(n, xDate)
AS (
SELECT
0, CURDATE() + 1
UNION ALL
SELECT
n+1,
dateAdd(day,n,xDate)
FROM
cte_numbers
WHERE n < 14
)
SELECT
xDate
FROM
cte_numbers;
I just wondered whether anyone could help me write an SQL query that gives me this list of dates, outside of a stored procedure.

When you create a table like this:
CREATE TABLE dates(d DATE PRIMARY KEY, x INTEGER);
And create a first record like this:
INSERT INTO dates VALUES ('2021-01-01',0);
Then you can use this statement which doubles the number of records in the table dates, every time it is executed. (so you need to run it a couple of times
When you run it 10 times the table dates will have 21 oktober 2023 as last date.
When you run it 12 times the last date will be 19 march 2032.
INSERT INTO dates
SELECT
DATEADD(DAY,m.m+1,d),
x+m.m+1
from dates
cross join (select max(x) m from dates) m
order by d;
Of course the column x can be deleted (optionally) with next statement, but you cannot add more records using the previous statement:
ALTER TABLE dates DROP COLUMN x;
Finally, to return the next 14 day from today:
SELECT d
FROM DATES
WHERE d BETWEEN CURDATE( ) AND DATEADD(DAY,13,CURDATE());

Get max timestamps efficiently for large table for a set of ids

I have a large PostgreSQL db table (Actually lots of partition tables divided up by yearly quarters) that for simplicity sake is defined something like
id bigint
ts (timestamp)
value (float)
For a particular set of ids what is an efficient way of finding the last timestamp in the table for each specified id ?
The table is indexed by (id, timestamp)
If I do something naive like
SELECT sensor_id, MAX(ts)
FROM sensor_values
WHERE ts >= (NOW() + INTERVAL '-100 days') :: TIMESTAMPTZ
GROUP BY 1;
Things are pretty slow.
Is there a way of perhaps narrowing down the times first by a binary search of one id
(I can assume the timestamps are similar for a particular set of ids)
I am accessing the db through psycopg so the solution can be in code or SQL if I am missing something easy to speed this up.
The explain for the query can be seen here. https://explain.depesz.com/s/PVqg
Any ideas appreciated.

How to optimize a table for queries sorted by insertion order in Postgres

I have a table of time series data where for almost all queries, I wish to select data ordered by collection time. I do have a timestamp column, but I do not want to use actual Timestamps for this, because if two entries have the same timestamp it is crucial that I be able to sort them in the order they were collected, which is information I have at Insert time.
My current schema just has a timestamp column. How would I alter my schema to make sure I can sort based on collection/insertion time, and make sure querying in collection/insertion order is efficient?

Add column based on sequence (i.e. serial), and create index on (timestamp_column, serial_column). Then you can have insertion order (more or less) by doing:
ORDER BY timestamp_column, serial_column;

You could use a SERIAL column called insert_order. This way there will be no two rows with the same value. However, I am not sure that you requirement of being in absolute time order is possible to achieve.
For example suppose there are two transactions, T1 and T2 and they do happen at the same time, and you are running on a machine with multiple processor, so in fact both T1 and T2 did the insert at exactly the same instant. Is this a case that you are concerned about? There was not enough info your question to know exactly.
Also with a serial column you have the issue of gaps, for example T1 cloud grab serial value 14 and T2 can grab value 15, then T1 rolls back and T2 does not, so you have to expect that the insert_order column might have gaps in it.

how do you sum over a related period

I need to sum values that are + 2 months or within a quarter period (related date table)
is there a way to use dense rank to partition those periods (custom periods)?
select
FiscalMonth
,Value
from table

The sql will have to do the following:
Join the value table and the period table
Include the period in the select list and sum the value, grouping by the period
i.e
select b.period, sum(a.value)
from table a
inner join period b on a.FiscalMonth between b.StartMonth and b.EndMonth
group by b.period
Note: The join condition will have to be modified based on what data you actually have in the period table.
Hope this helps

Well, If you need value from an X interval, by month you could use something like:
SELECT *
FROM yourTable
MONTH(some_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH) //Could be X interval!
This is an example (which show the results of the previous month, from the actual one). Just trying to write that it is possible to massage the query in functions on intervals.
Of course, you could use the SUMcommand for the adding.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

DB2 select last day with a lot of data - db2

Besides an index, another option may be range partitioning on this table. If you could range partition by month, you would only have to scan the month, for example. Even better if you could partition by day (and have the partitioning key in the index so you had partitioned index too).

Related

Get latest rows in PostgresSQL table ordered by Date: Index or Sort table?

How to get a list of dates in Pervasive SQL

Get max timestamps efficiently for large table for a set of ids

How to optimize a table for queries sorted by insertion order in Postgres

how do you sum over a related period

Categories

Resources