How to combine select and insert into sql - postgresql

I have an add request:
INSERT INTO LIKES_PRODUCTS AS L (L.USER_ID, L.PRODUCT_ID) VALUES('7', '1')
There is a request for the number of rows in the table:
SELECT COUNT(L.USER_ID) AS LIKES FROM LIKES_PRODUCTS AS L
Is it possible to combine them into a single query, so that the addition occurs first, and then only the counting of rows in the table?

You can do that with a data modifying CTE
with new_row as (
insert into likes_products (user_id, product_id)
values (7,1)
)
select count(user_id) as likes
from likes_products;
However, the final select does not see the effects of the previous CTE. If you always insert one row, you can simply count(user_id) + 1 in the select. Another option is to return the inserted rows and add them to the count:
with new_rows as (
insert into likes_products (user_id, product_id)
values (7,1),(8,2)
returning *
)
select count(user_id) + (select count(*) from new_rows) as likes
from likes_products;

Related

Why this SELECT of the recent added records doesn't work

I was trying to SELECT records that were just inserted, but it doesn't seen to work.
Example:
create table tt_teste (id bigserial, descricao varchar);
with inseridos as (
insert into tt_teste (descricao) values('Test 1') returning id
)
select *
from tt_teste
where id in (select id from inseridos);
I tried to rewrite in another way but the result is the same.
with inseridos as (
insert into tt_teste (descricao) values('Test 2') returning id
)
select *
from inseridos i
join tt_teste t on t.id = i.id;
The result is always empty. Even if I change the WHERE to "where 1=1 or id in (select id from inseridos)" the new records don't show up. They show up in the next run.
I am doing this because I want to SELECT and INSERT in another table with more data coming from a JOIN, but the database can't select the records just inserted. Seens some kind of concurrency issue.
You can do
with inseridos as (
insert into tt_teste (descricao) values('Test 1') returning id
)
select *
from inseridos;
or if you want some other field you just inserted, then
with inseridos as (
insert into tt_teste (descricao) values('Test 1') returning *
)
select *
from inseridos;
There shouldn't be much more to retrieve, unless you have some computed values, triggers filling some fields or some other automation which you didn't tell in your question.

Sum and average total columns in PostgreSQL

I'm using this query to find duplicate dates but not sure how to sum each duplicate dates, average it and remove duplicate dates.
DB Schema
date_time
datapoint_1
datapoint_2
SQL Query
SELECT date_time, COUNT(date_time)
FROM MYTABLE
GROUP BY date_time
HAVING COUNT(date_time) > 1
ORDER BY COUNT(date_time)
I would create a new table to replace the old one. That is easier and might even perform better:
CREATE TABLE mytable2 (LIKE mytable);
INSERT INTO mytable2 (date_time, datapoint_1, datapoint_2)
SELECT m.date_time, avg(m.datapoint_1), avg(m.datapoint_2)
FROM mytable AS m
GROUP BY m.date_time;
Then you can drop mytable and rename mytable2 to replace it.
To prevent new rows from creating duplicates, you could change the way you insert data:
-- to keep track of counts
ALTER TABLE mytable ADD numval integer DEFAULT 1;
-- to prevent duplicates
ALTER TABLE mytable ADD UNIQUE (date_time);
-- to insert new rows
INSERT INTO mytable (date_time, datapoint_1, datapoint_2)
VALUES ('2021-06-30', 42.0, -34.9)
ON CONFLICT (date_time)
DO UPDATE SET numval = mytable.numval + 1,
datapoint_1 = mytable.datapoint_1 + excluded.datapoint_1,
datapoint_2 = mytable.datapoint_2 + excluded.datapoint_2;
-- to select the averages
SELECT date_time,
datapoint_1 / numval AS datapoint_1,
datapoint_2 / numval AS datapoint_2
FROM mytable;
When you use GROUP BY you can also use aggregate functions to reduce multiple lines to a single one (COUNT, that you used is one of such functions). In your case the query would be:
SELECT date_time, avg(datapoint_1), avg(datapoint_2)
FROM MYTABLE
GROUP BY date_time
For every distinct date_time you will get a single row with the average of datapoint_1 and datapoint_2.

I want to delete duplicate rows from a MySQL table. Please click on the below link to see the table data

I tried to do it with this query, but it's not working...
DELETE FROM employee
WHERE ( SELECT * FROM
(SELECT row_number() OVER (partition by id) rn FROM employee) alias
) > 1;
Please click on this link to view the table
The above query is not working and giving this error message:
Error Code: 1242. Subquery returns more than 1 row
try like below by using subquery
delete from
(
select *.row_number() over (partition by id order by id) rn
from employee
) alias where rn > 1;
You are matching an integer (1) with set of rows returned from the subquery, which SQL will not allow
You can match an integer (1) with a single value returned from the subquery.
Use below query (using CTE) to remove duplicates.
;WITH TempEmp (id,duplicateRecCount)
AS
(
SELECT id,ROW_NUMBER() OVER(PARTITION by id ORDER BY id)
AS duplicateRecCount
FROM employee
)
DELETE FROM TempEmp
WHERE duplicateRecCount > 1

insert into temp table without creating it from union results

I have the below query that get results from more than one select.
Now I want these to be in a temp table.
Is there any way to insert these into a temp table without creating the table?
I know how to do that for select
Select * into #s --like that
However how to do that one more than one select?
SELECT Ori.[GeoBoundaryAssId], Ori.[FromGeoBoundaryId], Ori.Sort
From [GeoBoundaryAss] As Ori where Ori.[FromGeoBoundaryId] = (select distinct [FromGeoBoundaryId] from inserted )
Union
SELECT I.[GeoBoundaryAssId], I.[FromGeoBoundaryId], I.Sort
From [inserted] I ;
Add INTO after the first SELECT.
SELECT Ori.[GeoBoundaryAssId], Ori.[FromGeoBoundaryId], Ori.Sort
INTO #s
From [GeoBoundaryAss] As Ori where Ori.[FromGeoBoundaryId] = (select distinct [FromGeoBoundaryId] from inserted )
Union
SELECT I.[GeoBoundaryAssId], I.[FromGeoBoundaryId], I.Sort
From [inserted] I ;
Try this,
INSERT INTO #s ([GeoBoundaryAssId], [FromGeoBoundaryId], Sort)
(
SELECT Ori.[GeoBoundaryAssId], Ori.[FromGeoBoundaryId], Ori.Sort
FROM [GeoBoundaryAss] AS Ori WHERE Ori.[FromGeoBoundaryId] in (SELECT DISTINCT [FromGeoBoundaryId] FROM inserted )
UNION
SELECT I.[GeoBoundaryAssId], I.[FromGeoBoundaryId], I.Sort
FROM [inserted] I
)

TSQL: Remove duplicates based on max(date)

I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1