PostgreSQL: Delete all but most recent date - postgresql

I have a table defined like so:
CREATE TABLE contracts (
ContractID TEXT DEFAULT NULL,
ContractName TEXT DEFAULT NULL,
ContractEndDate TIMESTAMP WITHOUT TIME ZONE,
ContractPOC TEXT DEFAULT NULL
);
In this table, a ContractID may have more than one record, for each ContractID I want to delete all records but the one with the latest ContractEndDate. I know how to do this in MySQL using:
DELETE contracts
FROM contracts
INNER JOIN (
SELECT
ContractID,
ContractName,
max(ContractEndDate) as lastDate,
ContractPOC
FROM contracts
GROUP BY EmployeeID
HAVING COUNT(*) > 0) Duplicate on Duplicate.ContractID = contracts.ContractID
WHERE contracts.ContractEndDate < Duplicate.lastDate;
But I need help to get this working in PostgreSQL.

You could use this
delete
from
contracts c
using (SELECT
ContractID,
max(ContractEndDate) as lastDate
FROM
contracts
GROUP BY
ContractID) d
where
d.ContractID = c.ContractID
and c.ContractEndDate < d.lastDate;

Related

PostgreSQL group by date dynamic columns

I have a table like this
CREATE TABLE public.conferimenti
(
id smallint NOT NULL,
datetime timestamp without time zone NOT NULL,
weight numeric(10,2) NOT NULL DEFAULT 0.0,
type smallint NOT NULL
)
like so
I want to get the SUM(weight) by every type grouped by day to build a timeseries chart with the best performance
SELECT c.datetime::date, SUM(weight) FROM conferimenti AS c
WHERE c.datetime >= '2019-01-01' AND c.datetime <= '2019-12-31' AND type = 1
GROUP BY c.datetime::date
This group by day but only type=1 .. i need for every type (there are 10-15 different types)
Of course you only get type 1, it's part of the WHERE clause. Move it type to both select list and group by:
select c.datetime::date
, c.type
, sum(weight) total_weight
from c.conferimenti as c
where c.datetime >= '2019-01-01'
and c.datetime <= '2019-12-31'
group by c.datetime::date, c.type;

Add Column in table with value partition by group

My table is somethingg like
CREATE TABLE table1
(
_id text,
name text,
data_type int,
data_value int,
data_date timestamp -- insertion time
);
Now due to a system bug, many duplicate entries are created and I need to remove those duplicated and keep only unique entries excluding data_date because it is a system generated date.
My query to do that is something like:
DELETE FROM table1 A
USING ( SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value
HAVING count(data_date) > 1) B
WHERE A._id = B._id
AND A.name = B.name
AND A.data_type = B.data_type
AND A.data_value = B.data_value
AND A.data_date != B.min_date;
However this query works, having millions of records in the table, I want a faster way for it. My idea is to create a new column with value as partition by [_id, name, data_type, data_value] or columns which are in group by. However, I could not find the way to create such column.
I would appretiate if any one may suggest a way to create such column.
Edit 1:
There is another thing to add, I don't want to use CTE or subquery for updating this new column because it will be same as my existing query.
The best way is simply creating a new table without duplicated records:
CREATE...
SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value;
Alternatively, you can create a rank and then filter, but a subquery is needed.
RANK() OVER (PARTITION BY your_variables ORDER BY data_date ASC) r
And then filter r=1.

Difference between dates in different rows

Hy
my problem is, that I need the average time between a chargebegin & chargeend row (timestampserver) grouped by stationname and connectornumber and day.
The main problem is, that i can not use a Max oder Min function because I have the same stationname/connecternumber combination several times in the table.
So in fact I have to select the first chargebegin and find the next chargeend (the one with the same station/connectornumber combination and the min(id) > chargebegin.id) to get the difference.
I tried a lot but in fact i have no idea how to do this.
Database is postgresql 9.2
Testdata:
create table datatable (
id int,
connectornumber int,
message varchar,
metercount int,
stationname varchar,
stationuser varchar,
timestampmessage varchar,
timestampserver timestamp,
authsource varchar
);
insert into datatable values (181,1,'chargebegin',4000,'100','FCSC','2012-10-10 16:39:10','2012-10-10 16:39:15.26');
insert into datatable values (182,1,'chargeend',4000,'100','FCSC','2012-10-10 16:39:17','2012-10-10 16:39:28.379');
insert into datatable values (184,1,'chargebegin',4000,'100','FCSC','2012-10-11 11:06:31','2012-10-11 11:06:44.981');
insert into datatable values (185,1,'chargeend',4000,'100','FCSC','2012-10-11 11:16:09','2012-10-11 11:16:10.669');
insert into datatable values (191,1,'chargebegin',4000,'100','MSISDN_100','2012-10-11 13:38:19','2012-10-11 13:38:26.583');
insert into datatable values (192,1,'chargeend',4000,'100','MSISDN_100','2012-10-11 13:38:53','2012-10-11 13:38:55.631');
insert into datatable values (219,1,'chargebegin',4000,'100','MSISDN_','2012-10-12 11:38:03','2012-10-12 11:38:29.029');
insert into datatable values (220,1,'chargeend',4000,'100','MSISDN_','2012-10-12 11:40:14','2012-10-12 11:40:18.635');
This might have some syntax errors as I can't test it right now, but you should get an idea, how to solve it.
with
chargebegin as (
select
stationname,
connectornumber,
timestampserver,
row_number() over(partition by stationname, connectornumber order by timestampserver) as rn
from
datatable
where
message = 'chargebegin'
),
chargeend as (
select
stationname,
connectornumber,
timestampserver,
row_number() over(partition by stationname, connectornumber order by timestampserver) as rn
from
datatable
where
message = 'chargeend'
)
select
stationname,
connectornumber,
avg(b.timestampserver - a.timestampserver) as avg_diff
from
chargebegin a
join chargeend b using (stationname, connectornumber, rn)
group by
stationname,
connectornumber
This assumes that there is always end event for begin event and that these event cannot overlap (means that for stationname and connectornumber, there can be only one connection at any time). Therefore you can user row_number() to get matching begin/end events and then do whatever calculation is needed.

SQL statement that detects calendar appointment collisions for the iPhone

I try to create an application for the iPhone where you can set appointments. Everything is saved into a MySQL database and I currently get the data through JSON into my app. This is a workflow:
User1 defines when he is working. E.g. 8am - 4pm.
User2 wants to have an appointment with user1, e.g. 8am-9am.
The script should be able to do this:
the appointment is within the user's work hours; and
it does not clash with an existing appointment, which can happen in three possible ways:
the clashing appointment starts during the new appointment; and/or
the clashing appointment ends during the new appointment; or
the clashing appointment starts before and ends after the new appointment.
These are the important tables:
// new row should be added here when the conditions above are met
create table ios_appointment (
appointmentid int not null auto_increment,
start timestamp,
end timestamp,
user_id_fk int
)
// a working hour has a n:1 relationshipt to ios_worker
create table ios_workinghours (
workinghoursid int not null auto_increment,
start timestamp,
end timestamp,
worker_id_fk int
)
// employee, has a 1:n relationship to ios_workinghours
create table ios_worker (
workerid int not null auto_increment,
prename varchar(255),
lastname varchar(255),
...
)
The input for the select clause are two timestamps, start and end. These are defined by the user. So the script should check if user 2 is working at that specific time and if there are already appointments.
I currently have something like this, but that uses the user_id to link the tables:
SELECT EXISTS (
SELECT *
FROM ios_appointments a JOIN ios_workhours h USING (user_id)
WHERE user_id = 1
AND h.start <= '08:00:00' AND h.end >= '09:00:00'
AND (
a.start BETWEEN '08:00:00' AND '09:00:00'
OR a.end BETWEEN '08:00:00' AND '09:00:00'
OR (a.start < '08:00:00' AND a.end > '09:00:00')
)
LIMIT 1
)
Every help is appreciated. Thx.
You either need to have your app read in the data and determine if the time is available OR you need to create a view that has the available "time slots" (e.g. every 30 minutes).
Here's how I would do it:
CREATE TABLE #timeslot
(
timeslot_id INT PRIMARY KEY IDENTITY(1,1),
timeslot_time DATETIME NOT NULL
)
DECLARE #startime DATETIME, #endtime DATETIME
SELECT #starttime = '12/25/2012 08:00:00.000', #endtime = '12/25/2012 15:00:00.000'
WHILE #starttime < #endtime BEGIN
INSERT INTO #timeslot (timeslot_time)
VALUES (#starttime)
SET #starttime = DATEADD(mm,30,#starttime)
END
SELECT
w.workerid,
ts.timeslot_time
INTO
ios_workertimeslot
FROM
#timeslot ts
FULL OUTER JOIN
ios_worker w
ON (1 = 1)
SELECT
wts.workerid,
wts.timeslot_time,
ap.appointmentid,
CASE WHEN ap.appointmentid IS NOT NULL THEN 0 ELSE 1 END AS AvailableSlot
FROM
ios_workertimeslot wts
JOIN
ios_workinghours wh
ON (wts.workerid = wh.workerid)
AND (wts.timeslot_time >= wh.start)
AND (wts.timeslot_time < wh.end)
LEFT JOIN
ios_appointment ap
ON (wts.workerid = ap.workerid)
AND (wts.timeslot_time >= ap.start)
AND (wts.timeslot_time < ap.end)
This will leave you with a data set that indicates the available and non-available timeslots.
Hope this helps!

Speeding up TSQL

Hi all i wondering if there's a more efficient way of executing this TSQl script. It basically goes and gets the very latest activity ordering by account name and then join this to the accounts table. So you get the very latest activity for a account. The problem is there are currently about 22,000 latest activities, so obviously it has to go through alot of data, just wondering if theres a more efficient way of doing what i'm doing?
DECLARE #pastAppointments TABLE (objectid NVARCHAR(100), account NVARCHAR(500), startdate DATETIME, tasktype NVARCHAR(100), ownerid UNIQUEIDENTIFIER, owneridname NVARCHAR(100), RN NVARCHAR(100))
INSERT INTO #pastAppointments (objectid, account, startdate, tasktype, ownerid, owneridname, RN)
SELECT * FROM (
SELECT fap.regardingobjectid, fap.regardingobjectidname, fap.actualend, fap.activitytypecodename, fap.ownerid, fap.owneridname,
ROW_NUMBER() OVER (PARTITION BY fap.regardingobjectidname ORDER BY fap.actualend DESC) AS RN
FROM FilteredActivityPointer fap
WHERE fap.actualend < getdate()
AND fap.activitytypecode NOT LIKE 4201
) tmp WHERE RN = 1
ORDER BY regardingobjectidname
SELECT fa.name, fa.owneridname, fa.new_technicalaccountmanagername, fa.new_customerid, fa.new_riskstatusname, fa.new_numberofopencases,
fa.new_numberofurgentopencases, app.startdate, app.tasktype, app.ownerid, app.owneridname
FROM FilteredAccount fa LEFT JOIN #pastAppointments app on fa.accountid = app.objectid and fa.ownerid = app.ownerid
WHERE fa.statecodename = 'Active'
AND fa.ownerid LIKE #owner_search
ORDER BY fa.name
You can remove ORDER BY regardingobjectidname from the first INSERT query - the only (narrow) purpose such a sort would have on an INSERT query is if there was an identity column on the table being inserted into. And there isn't in this case, so if the optimizer isn't smart enough, it'll perform a pointless sort.