Querying & purging data between date and getdate()

Querying & purging data between date and getdate() - tsql

I've tried to search the question history for this similar issue but could not find anything that matched what I am trying to do. Hopefully someone can assist. I have a table that has some archive data in it that has a timestamp back to 1/1/1900. Not sure how that date came about, i think it may have been timestamped based on some older application. Anyway that data is good, valid data that my company needs to retain in those archive tables. However in an effort to clean out data periodically going forward, we want to be able to purge data that has a date older than 60 days but without touching the data with that old timestamp of year 1900. Is there a way i can use the between syntax to accomplish this? I was thinking i could do something like this but may be oversimplifying it:
--example:
--Query to find the records older than 60 days & dump into temp table
select transid into #cdm_delete_table from cdmtrans where exporttime < getdate()-60
--Then with those records saved in my temp table, perform my delete from the main table.
--But this gets rid of ALL the records like you'd expect
delete from cdmtrans where transid in (select transid from #cdm_delete_table)
--However, i'm wanting to do a delete based on a range of dates.
--Older then 60 days but greated than 01/01/1900. Would this work??
delete FROM cdmtrans WHERE transid BETWEEN '01/01/1901' and getdate()-60
OR
delete from cdmtrans where transid in
(select transid from #cdm_delete_table WHERE transid BETWEEN '01/01/1901' and getdate()-60)
This is where i'm getting lost. Hoping someone could clarify for me if i'm on the right track. Thx in advance.

This query should give you what you need:
DELETE FROM cdmtrans
WHERE exporttime < getdate() -60
AND exporttime <> '1900-01-01'

You're definitely on the right track. I think you are looking for a query like this:
DELETE FROM cdmtrans
WHERE exporttime > '19010101'
AND exporttime < getdate() - 60;
(Note that you do not need the transid in your query to delete the correct records.)
If you want to check if this gives you the right set-to-delete, make a SELECT of the query first, as follows:
SELECT *
FROM cdmtrans
WHERE exporttime > '19010101'
AND exporttime < getdate() - 60;

The AND comparator can be your friend. Use it to select records that don't equal '01/01/1901' and are older than 60 days.
-- select all records older than 60 days and are not '1/1/1900' into temp table
SELECT transid
INTO #cdm_delete_table
FROM cdmtrans
WHERE exporttime < DATEDIFF(dd, GETDATE(), 60)
AND exporttime != '01/01/1901'
-- remove from main table
DELETE
FROM cdmtrans
WHERE transid IN (
SELECT transid
FROM #cdm_delete_table
)
-- or skip the temp table and do it in one query
DELETE
FROM cdmtrans
WHERE exporttime < DATEDIFF(dd, GETDATE(), 60)
AND exporttime != '01/01/1901'

Related

how to delete all rows after 1000 rows in postgresql Table

I have huge Database with 15 tables.
I need to make light version of that and leave only first 1000 rows in each table based on DESC Date. I did try to find on google how to do that but nothing really works.
It will be perfect it there will be automated way to go through each table and leave only 1000 rows.
But If I need to do that manually with each table it will be fine as well.
Thank you,

This looks positively awful, but maybe it's a starting point from which you can build.
with cte as (
select mod_date, row_number() over (order by mod_date desc) as rn
from table1
),
min_date as (
select mod_date
from cte
where rn = 1000
)
delete from table1 t1
where t1.mod_date < (select mod_date from min_date)

So solution is:
DELETE FROM "table" WHERE "date" < now() - interval '1 year';
That way it will delete all data from table where Date is older that 1 year.

T-SQL Count of items based on date

To make the example super simple, lets say that I have a table with three rows, ID, Name, and Date. I need to find the count of all ID's belonging to a specific name where the ID does not belong to this month.
Using that example, I would want this output:
In other words, I want to count how many ID's that a name has that aren't this month/year.
I'm more into PowerShell and still fairly new to SQL. I tried doing a case statement, but because it's not a foreach it seems to be returning "If the Name has ANY date in this month, return NULL" which is not what I want. I want it to count how many ID's per name do not appear in this month.
SELECT NAME,
CASE
WHEN ( Month(date) NOT LIKE Month(Getdate())
AND Year(date) NOT LIKE Year(Getdate()) ) THEN Count(id)
END AS TotalCount
FROM dbo.table
GROUP BY NAME,
date
I really hope this makes sense, but if it doesn't please let me know and I can try to clarify more. I tried researching cursors, but I'm having a hard time grasping them to get them into my statement. Any help would be greatly appreciated!

You only want to group by the non-aggregated columns that are in the result set (in this case, Name). You totally don't need a cursor for this, it's a fairly straight-forward query.
select
Name,
Count(*) count
from
tbl
where
tbl.date > eomonth(getdate()) or
tbl.date <= eomonth(dateadd(mm, -1, getdate())
group by
Name
I did a little bit of trickery on the exclusion of rows that are in the current month. Generally, you want to avoid running functions on the columns you're comparing to if you can so that SQL Server can use an index to speed up its search. I assumed that the ID column is unique, if it's not, change count(*) to count(distinct ID).
Alternative where clause if you're using older versions of sql server. If the table is small enough, you can just do it directly (similar to what you tried originally, it just goes in the query where clause and not embedded in a case)
where
Month(date) <> Month(Getdate())
AND Year(date) <> Year(Getdate())
If you have a large table and sarging on the index is important, there some fun stuff you can build eomonth with dateadd and the date part functions, but it's a pain.

SELECT Name, COUNT(ID) AS TotalCount
FROM dbo.[table]
WHERE DATEPART(MONTH, [Date]) != DATEPART(MONTH, GETDATE()) OR DATEPART(YEAR, [Date]) != DATEPART(YEAR, GETDATE())
GROUP BY Name;

In T-SQL:
SELECT
NAME,
COUNT(id)
FROM dbo.table
WHERE MONTH(Date_M) <> MONTH(GETDATE())
GROUP BY NAME

SQL Server 2008 R2 - store top result and use in SSRS

I have a SSRS Report that displays the total number of days lapsed since a complaint was received. This SQL Query is the difference between today's date and date of the last received complaint.
SELECT DATEDIFF(day, MAX(complaints.ComplaintReceived1Date),CURRENT_TIMESTAMP) as total
FROM complaints WITH (nolock)
If for example this is set to 30 (days) and then a complaint is received in my SSRS report I would like to display 30 as previous number of days with no complaint record. Is there a way to store previous results and recall this data? Maybe a temp table?

You are already storing it in the table referenced by your SQL query.
I would just retrieve it from there:
; with previouscomplaint as (
select
complaintreceived1date,
RN = ROW_NUMBER() over (partition by complaintreceived1dateorder by complaintreceived1date desc))
select datediff(day,complaintreceived1date,current_timestamp) as previoustotal from previouscomplaint where RN=2
If you want the dates between the two rows, make the second statement:
select datediff(day, (select complaintreceived1date from previouscomplaint where rn = 2),(select complaintreceived1date from previouscomplaint where rn = 1)) as previoustotal
This was not tested, but should work.

PostgreSQL - get records with null values

I'm trying to get a query which would show distributors that haven't sell anything in 90 days, but the problem I get is with NULL values. It seems PostgreSQL ignores null values, even when I queried to show it (or maybe I did it in wrong way).
Let say there are 1000 distributors, but with this query I only get 1 distributor, but there should be more distributors that didn't sell anything, because if I write SQL query to show distributors that sold by any amount in the last 90 days, it shows about 500. So I wonder where are those other 499? If I understand correctly, those other 499, didn't have any sales, so all records are null and are not showed in query.
Does anyone know how to make it show null values of one table where in relation other table is not null? (like partners table (res_partner) is not null, but sale_order table (sales) or object is null? (I also tried to filter like so.id IS NULL, but in such way I get empty query)
Code of my query:
(
SELECT
min(f1.id) as id,
f1.partner as partner,
f1.sum1
FROM
(
SELECT
min(f2.id) as id,
f2.partner as partner,
sum(f2.null_sum) as sum1
FROM
(
SELECT
min(rp.id) as id,
rp.search_name as partner,
CASE
WHEN
sol.price_subtotal IS NULL
THEN
0
ELSE
sol.price_subtotal
END as null_sum
FROM
sale_order as so,
sale_order_line as sol,
res_partner as rp
WHERE
sol.order_id=so.id and
so.partner_id=rp.id
and
rp.distributor=TRUE
and
so.date_order <= now()::timestamp::date
and
so.date_order >= date_trunc('day', now() - '90 day'::interval)::timestamp::date
and
rp.contract_date <= date_trunc('day', now() - '90 day'::interval)::timestamp::date
GROUP BY
partner,
null_sum
)as f2
GROUP BY
partner
) as f1
WHERE
sum1=0
GROUP BY
partner,
sum1
)as fld
EDIT: 2012-09-18 11 AM.
I think I understand why Postgresql behaves like this. It is because of the time interval. It checks if there is any not null value in that inverval. So it only found one record, because that record had sale order with zero (it was not converted from null to zero) and part which checked for null values was just skipped. If I delete time interval, then I would see all distributors that didn't sell anything at all. But with time interval for some reason it stops checking null values and looks if there are only not null values.
So does anyone know how to make it check for null values too in given interval?.. (for the last 90 days to be exact)

Aggregates like sum() and and min() do ignore NULL values. This is required by the SQL standard and every DBMS I know behaves like that.
If you want to treat a NULL value as e.g. a zero, then use something like this:
sum(coalesce(f2.null_sum, 0)) as sum1
But as far as I understand you question and your invalid query you actually want an outer join between res_partner and the sales tables.
Something like this:
SELECT min(rp.id) as id,
rp.search_name as partner,
sum(coalesce(sol.price_subtotal,0)) as price_subtotal
FROM res_partner as rp
LEFT JOIN sale_order as so ON so.partner_id=rp.id and rp.distributor=TRUE
LEFT JOIN sale_order_line as sol ON sol.order_id=so.id
WHERE so.date_order <= CURRENT_DATE
and so.date_order >= date_trunc('day', now() - '90 day'::interval)::timestamp::date
and rp.contract_date <= date_trunc('day', now() - '90 day'::interval)::timestamp::date
GROUP BY rp.search_name
I'm not 100% sure I understood your problem correctly, but it might give you a headstart.

Try to name subqueries, and retrieve their columns with col.q1, col.q2 etc. to make sure which column from which query/subquery you're dealing with. Maybe it's somewhat simple, e.g. it unites some rows containing only NULLs into one row? Also, at least for debugging purposes, it's smart to add , count(*) at the end of each query/subquery to get implicit number of rows returned on result.. hard to guess what exactly happened..

How to prune a table down to the first 5000 records of 50000

I have a rather large table of 50000 records, and I want to cut this down to 5000. How would I write an SQL query to delete the other 45000 records. The basic table structure contains the column of a datetime.
A rough idea of the query I want is the following
DELETE FROM mytable WHERE countexceeded(5000) ORDER BY filedate DESC;
I could write this in C# somehow grabbing the row index number and doing some work around that, however is there a tidy way to do this?

The answer you have accepted is not valid syntax as DELETE does not allow an ORDER BY clause. You can use
;WITH T AS
(
SELECT TOP 45000 *
FROM mytable
ORDER BY filedate
)
DELETE FROM T

DELETE TOP(45000) FROM mytable ORDER BY filedate ASC;
Change the order by to ascending to get the rows in reverse order and then delete the top 45000.
Hope this helps.
Edit:-
I apologize for the invalid syntax. Here is my second attempt.
DELETE FROM myTable a INNER JOIN
(SELECT TOP(45000) * FROM myTable ORDER BY fileDate ASC) b ON a.id = b.id
If you do not have a unique column then please use Martin Smith's CTE answer.

if the table is correctly ordered:
DELETE FROM mytable LIMIT 5000
if not and the table has correctly ordered auto_increment index:
get the row
SELECT id, filedate FROM mytable LIMIT 1, 50000;
save the id and then delete
DELETE FROM mytable WHERE id >= #id;
if not ordered correctly, you could use filedate instead of id, but if it's a date without time, you could get undesired rows deleted from the same date, so be carefull with filedate deletion solution

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse