Create a GROUP BY query to show the latest row - mysqli

So my tables are:
user_msgs: http://sqlfiddle.com/#!9/7d6a9
token_msgs: http://sqlfiddle.com/#!9/3ac0f
There are only these 4 users as listed. When a user sends a message to another user, the query checks if there is a communication between those 2 users already started by checking the token_msgs table's from_id and to_id and if no token exists, create token and use that in the user_msgs table. So the token is a unique field in these 2 tables.
Now, I want to list the users with whom user1 has started the conversation. So if from_id or to_id include 1 those conversation should be listed.
There are multiple rows for conversations in the user_msgs table for same users.
I think I need to use group_concat but not sure. I am trying to build the query to do the same and show the latest of the conversation on the top, hence ORDER BY time DESC:
SELECT * FROM (SELECT * FROM user_msgs ORDER BY time DESC) as temp_messages GROUP BY token
Please help in building the query.
Thanks.
CREATE TABLE `token_msgs` (
`id` int(11) NOT NULL,
`from_id` int(100) NOT NULL,
`to_id` int(100) NOT NULL,
`token` varchar(50) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Dumping data for table `token_msgs`
--
INSERT INTO `token_msgs` (`id`, `from_id`, `to_id`, `token`) VALUES
(1, 1, 2, '1omcda84om2'),
(2, 1, 3, '1omd0666om3'),
(3, 4, 1, '4om6713bom1'),
(4, 3, 4, '3om0e1abom4');
---
CREATE TABLE `user_msgs` (
`id` int(11) NOT NULL,
`token` varchar(50) NOT NULL,
`from_id` int(50) NOT NULL,
`to_id` int(50) NOT NULL,
`message` text NOT NULL,
`time` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Dumping data for table `user_msgs`
--
INSERT INTO `user_msgs` (`id`, `token`, `from_id`, `to_id`, `message`, `time`) VALUES
(1, '1omcda84om2', 1, 2, '1 => 2\r\nCan I have your picture so I can show Santa what I want for Christmas?', '2016-08-14 22:50:34'),
(2, '1omcda84om2', 2, 1, 'Makeup tip: You\'re not in the circus.\r\n2=>1', '2016-08-14 22:51:26'),
(3, '1omd0666om3', 1, 3, 'Behind every fat woman there is a beautiful woman. No seriously, your in the way. 1=>3', '2016-08-14 22:52:08'),
(4, '1omd0666om3', 3, 1, 'Me: Siri, why am I alone? Siri: *opens front facing camera*', '2016-08-14 22:53:24'),
(5, '1omcda84om2', 1, 2, 'I know milk does a body good, but damn girl, how much have you been drinking? 1 => 2', '2016-08-14 22:54:36'),
(6, '4om6713bom1', 4, 1, 'Hi, Im interested in your profile. Please send your contact number and I will call you.', '2016-08-15 00:18:11'),
(7, '3om0e1abom4', 3, 4, 'Girl you\'re like a car accident, cause I just can\'t look away. 3=>4', '2016-08-15 00:42:57'),
(8, '3om0e1abom4', 3, 4, 'Hola!! \r\n3=>4', '2016-08-15 00:43:34'),
(9, '1omd0666om3', 3, 1, 'Sometext from 3=>1', '2016-08-15 13:53:54'),
(10, '3om0e1abom4', 3, 4, 'More from 3->4', '2016-08-15 13:54:46');

Let's try this (on fiddle):
SELECT *
FROM (SELECT * FROM user_msgs
WHERE from_id = 1 OR to_id = 1
ORDER BY id DESC
) main
GROUP BY from_id + to_id
ORDER BY id DESC
Thing to mention GROUP BY from_id + to_id this is because sum makes it unique for each conversation between two persons: like from 1 to 3 is same as from 3 to 1. No need for extra table, and it makes it harder to maintain
UPDATE:
Because sometimes GROUPing works weird in MySQL I've created new approach to this problem:
SELECT
a.*
FROM user_msgs a
LEFT JOIN user_msgs b
ON ((b.`from_id` = a.`from_id` AND b.`to_id` = a.`to_id`)
OR (b.`from_id` = a.`to_id` AND b.`to_id` = a.`from_id`))
AND a.`id` < b.`id`
WHERE (a.from_id = 1 OR a.to_id = 1)
AND b.`id` IS NULL
ORDER BY a.id DESC

Related

Eliminate duplicate values in a csv string

I've values in a table which are selected as duplicates if the name is same then the corresponding ids are included in a csv string column as below:
Original table:
create table #original(id int, unique_id varchar(500), name varchar(200))
insert into #original
values
( 1, '12345', 'A'), ( 2, '12345', 'A'), ( 3, null, 'B'), ( 4, '45678', 'B'),
( 5, '900', 'C'), ( 6, '901', 'C'), ( 7, null, 'D'), ( 8, null, 'D'),
( 9, null, 'E'), (10, '1000', 'E'), (11, null, 'E'), (12, '1100', 'F'),
(13, '1101', 'F'), (14, '1102', 'F')
, (15, '9999', 'G'), (16, '9998', 'G'), (17, '', 'G')
, (18, '1111', 'H')
, (19, '1010', 'I'), (20, '1010', 'I'), (21, '', 'I')
Person with name A, B are same but C isn't because unique id is different for C.
A record is duplicate if the name is same AND unique id is there or null. When name is same but the unique ids are different then they aren't the same people.
I'm selecting the data as below:
;with cte as
(select name
from #original
group by name
having count(*) > 1)
I need to get the data as below:
Id unique_id Name
1,2 12345 A
3,4 45678 B
7 null D
8,10,11 1000 E
19,20,21 1010 I
C and F should be avoided as their unique_ids are different though names are same. H should be avoided because it's not a duplicate. G should be avoided because unique ids are different. I should be selected because if unique id is present for duplicates by name, it should be the same for all duplicates to be selected.
Thanks
I think you are looking for something like:
SELECT
STRING_AGG(id, ',') WITHIN GROUP (ORDER BY id) id,
(SELECT top 1 unique_id FROM original o3 WHERE o3.name = o1.name AND o3.unique_id IS NOT NULL) unique_id,
name
FROM original o1
WHERE NOT EXISTS
(SELECT 1 FROM original o2 WHERE o2.name = o1.name AND o2.unique_id <> o1.unique_id)
GROUP BY name
ORDER BY name
The NOT EXISTS condition eliminates names C and F (you could use an IN clause if you prefer, but I don't think it's any prettier in this case).
The GROUP BY name combined with the aggregate STRING_AGG gets the comma separated list of ids for the name.
This uses a subquery with top 1 to get a non-null unique_id. You could use max(unique_id) instead which certainly looks better, but you will get a warning. If you're comfortable ignoring the warning and don't think it will be confusing, I would use the max version.
You can see both versions working in this Fiddle.
Edited to add: To address the new requirement in the comments, please see this Fiddle.
There will be multiple ways of doing this, but the condition...
(SELECT COUNT(DISTINCT unique_id) FROM original o2 WHERE o2.name = o1.name and COALESCE(unique_id, '') <> '') <= 1
... will count the number of non-null and non-empty strings to ensure it is always 0 (to allow cases like D and G) or 1 (to allow the other cases).
Note that this also adds an ORDER BY to the TOP 1 subquery version in order to prefer unique_ids with values over both nulls and empty strings.

Concatenate mutliple contiguous rows to single row

i have a huge table with iot-datas from a lot of iot-devices. Every device is sending data one time per minute but only if counter-input got some singals. If not, no data will be sended. So in my database the datas looks like
Today I'm loading all this data in my application and aggregate them by iterating and checking row by row to 3 rows based on contiguous rows. Contiguous rows are all rows where next row is one minute later. It is working but it feels not smart and nice.
Does it make sense to generate this aggregation on sql server - espacialy increase performance?
How would you start?
This is a classic Islands and Gaps problem. I'm still mastering Islands and Gaps so I'd love any feedback on my solution from others in the know (please be gentle). There are at least a couple different ways to solve Islands and Gaps but this is the one that is easiest on my brain. Here's how I got it to work:
DDL to set up data:
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp;
CREATE TABLE #tmp
(IoT_Device INT,
Count INT,
TimeStamp DATETIME);
INSERT INTO #tmp
VALUES
(1, 5, '2021-10-27 14:03'),
(1, 4, '2021-10-27 14:04'),
(1, 7, '2021-10-27 14:05'),
(1, 8, '2021-10-27 14:06'),
(1, 5, '2021-10-27 14:07'),
(1, 4, '2021-10-27 14:08'),
(1, 7, '2021-10-27 14:12'),
(1, 8, '2021-10-27 14:13'),
(1, 5, '2021-10-27 14:14'),
(1, 4, '2021-10-27 14:15'),
(1, 5, '2021-10-27 14:21'),
(1, 4, '2021-10-27 14:22'),
(1, 7, '2021-10-27 14:23');
Islands and Gaps Solution:
;WITH CTE_TIMESTAMP_DATA AS (
SELECT
IoT_Device,
Count,
TimeStamp,
LAG(TimeStamp) OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS previous_timestamp,
LEAD(TimeStamp) OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS next_timestamp,
ROW_NUMBER() OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_location
FROM #tmp
)
,CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_number,
IoT_Device,
TimeStamp AS island_start_timestamp,
island_location AS island_start_location
FROM CTE_TIMESTAMP_DATA
WHERE DATEDIFF(MINUTE, previous_timestamp, TimeStamp) > 1
OR previous_timestamp IS NULL
)
,CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_number,
IoT_Device,
TimeStamp AS island_end_timestamp,
island_location AS island_end_location
FROM CTE_TIMESTAMP_DATA
WHERE DATEDIFF(MINUTE, TimeStamp, next_timestamp) > 1
OR next_timestamp IS NULL
)
SELECT
S.IoT_Device,
(SELECT SUM(Count)
FROM CTE_TIMESTAMP_DATA
WHERE IoT_Device = S.IoT_Device
AND TimeStamp BETWEEN S.island_start_timestamp AND E.island_end_timestamp) AS Count,
S.island_start_timestamp,
E.island_end_timestamp
FROM CTE_ISLAND_START AS S
INNER JOIN CTE_ISLAND_END AS E
ON E.IoT_Device = S.IoT_Device
AND E.island_number = S.island_number;
The CTE_TIMESTAMP_DATA query pulls the IoT_Device, Count, and TimeStamp along with the TimeStamp before and after each record using LAG and LEAD, and assigns a row number to each record ordered by TimeStamp.
The CTE_ISLAND_START query gets the start of each island.
The CTE_ISLAND_END query gets the end of each island.
The main SELECT at the bottom then uses this data to sum the Count within each island.
This will work with multiple IoT_Devices.
You can read more about Islands and Gaps here or numerous other places online.

Grouping user id columns together with string_agg on PostgreSQL 13

This is my emails table
create table emails (
id bigint not null primary key generated by default as identity,
name text not null
);
And contacts table:
create table contacts (
id bigint not null primary key generated by default as identity,
email_id bigint not null,
user_id bigint not null,
full_name text not null,
ordering int not null
);
As you can see I have user_id field here. There can be multiple same user ID's on my result so i want to join them using comma ,
Insert some data to the tables:
insert into emails (name)
values
('dennis1'),
('dennis2');
insert into contacts (id, email_id, user_id, full_name, ordering)
values
(5, 1, 1, 'dennis1', 9),
(6, 2, 1, 'dennis1', 5),
(7, 2, 1, 'dennis1', 1),
(8, 1, 3, 'john', 2),
(9, 2, 4, 'dennis7', 1),
(10, 2, 4, 'dennis7', 1);
My query is:
select em.name,
c.user_ids
from emails em
join (
select email_id, string_agg(user_id::text, ',' order by ordering desc) as user_ids
from contacts
group by email_id
) c on c.email_id = em.id
order by em.name;
Actual Result
name user_ids
dennis1 1,3
dennis2 1,1,4,4
Expected Result
name user_ids
dennis1 1,3
dennis2 1,4
On my real-world data, I get same user id like 50 times. Instead it should appear 1 time only. In example above, you see user 1 and 4 appears 2 times for dennis2 user.
How can I unique them?
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2e957b52eb46742f3ddea27ec36effb1
P.S: I tried to add user_id it to group by but this time I get duplicate rows...
demo:db<>fiddle
SELECT
name,
string_agg(user_id::text, ',' order by ordering desc)
FROM (
SELECT DISTINCT ON (em.id, c.user_id)
*
FROM emails em
JOIN contacts c ON c.email_id = em.id
) s
GROUP BY name
Join the tables
DISTINCT ON email and the user_id, so for every email record, there is no equal users
Aggregate

Performance Issue with finding recent date of each group and joining to all records

I have following tables:
CREATE TABLE person (
id INTEGER NOT NULL,
name TEXT,
CONSTRAINT person_pkey PRIMARY KEY(id)
);
INSERT INTO person ("id", "name")
VALUES
(1, E'Person1'),
(2, E'Person2'),
(3, E'Person3'),
(4, E'Person4'),
(5, E'Person5'),
(6, E'Person6');
CREATE TABLE person_book (
id INTEGER NOT NULL,
person_id INTEGER,
book_id INTEGER,
receive_date DATE,
expire_date DATE,
CONSTRAINT person_book_pkey PRIMARY KEY(id)
);
/* Data for the 'person_book' table (Records 1 - 9) */
INSERT INTO person_book ("id", "person_id", "book_id", "receive_date", "expire_date")
VALUES
(1, 1, 1, E'2016-01-18', NULL),
(2, 1, 2, E'2016-02-18', E'2016-10-18'),
(3, 1, 4, E'2016-03-18', E'2016-12-18'),
(4, 2, 3, E'2017-02-18', NULL),
(5, 3, 5, E'2015-02-18', E'2016-02-23'),
(6, 4, 34, E'2016-12-18', E'2018-02-18'),
(7, 5, 56, E'2016-12-28', NULL),
(8, 5, 34, E'2018-01-19', E'2018-10-09'),
(9, 5, 57, E'2018-06-09', E'2018-10-09');
CREATE TABLE book (
id INTEGER NOT NULL,
type TEXT,
CONSTRAINT book_pkey PRIMARY KEY(id)
) ;
/* Data for the 'book' table (Records 1 - 8) */
INSERT INTO book ("id", "type")
VALUES
( 1, E'Btype1'),
( 2, E'Btype2'),
( 3, E'Btype3'),
( 4, E'Btype4'),
( 5, E'Btype5'),
(34, E'Btype34'),
(56, E'Btype56'),
(67, E'Btype67');
My query should list name of all persons and for persons with recently received book types of (book_id IN (2, 4, 34, 56, 67)), it should display the book type and expire date; if a person hasn’t received such book type it should display blank as book type and expire date.
My query looks like this:
SELECT p.name,
pb.expire_date,
b.type
FROM
(SELECT p.id AS person_id, MAX(pb.receive_date) recent_date
FROM
Person p
JOIN person_book pb ON pb.person_id = p.id
WHERE pb.book_id IN (2, 4, 34, 56, 67)
GROUP BY p.id
)tmp
JOIN person_book pb ON pb.person_id = tmp.person_id
AND tmp.recent_date = pb.receive_date AND pb.book_id IN
(2, 4, 34, 56, 67)
JOIN book b ON b.id = pb.book_id
RIGHT JOIN Person p ON p.id = pb.person_id
The (correct) result:
name | expire_date | type
---------+-------------+---------
Person1 | 2016-12-18 | Btype4
Person2 | |
Person3 | |
Person4 | 2018-02-18 | Btype34
Person5 | 2018-10-09 | Btype34
Person6 | |
The query works fine but since I'm right joining a small table with a huge one, it's slow. Is there any efficient way of rewriting this query?
My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.
Problems with your setup
My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.
That makes two major problems before even looking at the query:
Postgres 8.4 is just too old. Especially for "production". It has reached EOL in July 2014. No more security upgrades, hopelessly outdated. Urgently consider upgrading to a current version.
It's a loaded footgun to use very different versions for development and production. Confusion and errors that go undetected. We have seen more than one desperate request here on SO stemming from this folly.
Better query
This equivalent should be substantially simpler and faster (works in pg 8.4, too):
SELECT p.name, pb.expire_date, b.type
FROM (
SELECT DISTINCT ON (person_id)
person_id, book_id, expire_date
FROM person_book
WHERE book_id IN (2, 4, 34, 56, 67)
ORDER BY person_id, receive_date DESC NULLS LAST
) pb
JOIN book b ON b.id = pb.book_id
RIGHT JOIN person p ON p.id = pb.person_id;
To optimize read performance, this partial multicolumn index with matching sort order would be perfect:
CREATE INDEX ON person_book (person_id, receive_date DESC NULLS LAST)
WHERE book_id IN (2, 4, 34, 56, 67);
In modern Postgres versions (9.2 or later) you might append book_id, expire_date to the index columns to get index-only scans. See:
How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?
About DISTINCT ON:
Select first row in each GROUP BY group?
About DESC NULLS LAST:
PostgreSQL sort by datetime asc, null first?

postgresql json: keys as values

I just discovered the json capabilities of postgresql but have trouble understanding how to generate json with queries. I hope the question I am asking makes sense and please excuse me if I am missing something obvious.
my problem ? how to generate json with some values being keys to others.
here an example
drop table if exists my_table;
create table my_table(id int, sale_year int, sale_qty int);
insert into my_table values (10, 2007, 2);
insert into my_table values (10, 2008, 1);
insert into my_table values (10, 2009, 0);
insert into my_table values (20, 2009, 2);
insert into my_table values (30, 2011, 1);
insert into my_table values (30, 2012, 3);
The following statement
SELECT id, json_agg(to_json(my_table)) FROM public.my_table group by id;
gives me a json per id (e.g. for id = 20)
20, [{"id":20, "sale_year": 2009, "sale_qty": 2}]
my question is:
is it possible to return a json with the following structure ?
{"2009": 2}
I think you want something like this:
select id, json_agg(json_build_object(sale_year, sale_qty))
from my_table
group by id
order by id;
This returns:
id | json_agg
---+-------------------------------------------
10 | [{"2007" : 2}, {"2008" : 1}, {"2009" : 0}]
20 | [{"2009" : 2}]
30 | [{"2011" : 1}, {"2012" : 3}]
I hope that this will help someone else
in some cases, one would want to get, not an array of jsonb data but a single jsonb element.
inspired from this post this post this is an example of how to do it
with tx1
as
(
select
*
from
(values
(10, 2007, 2),
(10, 2008, 1),
(10, 2009, 0),
(20, 2009, 2),
(30, 2011, 1),
(30, 2012, 3))
as t (id, sale_year, sale_qty)),
tx2
as
(select id,
jsonb_agg(json_build_object(sale_year, sale_qty)) as x_data
from tx1
group by id
order by id)
SELECT
id,
x_data,
jo.obj
FROM tx2
CROSS JOIN
LATERAL
(
SELECT JSON_OBJECT_AGG(jt.key, jt.value) obj
FROM JSONB_ARRAY_ELEMENTS(x_data) je
CROSS JOIN
LATERAL JSONB_EACH(je.value) jt
) jo
This gives
{ "2007" : 2, "2008" : 1, "2009" : 0 }
{ "2011" : 1, "2012" : 3 }