Select columns outside of group by - postgresql

I am looking to select the first and last click by each ID, along with the corresponding source. Here is a sample table:
ID Click Source
--------------------------
1 1 Google
1 2 Facebook
1 3 Yahoo
2 1 Google
2 2 Yahoo
3 1 Facebook
4 1 Yahoo
5 1 Pinterest
5 2 Google
Here is the desired result:
ID First Last
-------------------------
1 Google Yahoo
2 Google Yahoo
3 Facebook Facebook
4 Yahoo Yahoo
5 Pinterest Google
I've already managed to get the first click by simply setting click=1 in the where clause. I am not able to get MAX(click) without grouping by ID and Source. When I include the Source in the group by I don't get the results I want.

You can join two derived tables getting the first and last click per id using DISTINCT ON on the common ID.
SELECT f.id,
f.source "first",
s.source "last"
FROM (SELECT DISTINCT ON (id)
id,
source
FROM elbat
ORDER BY id ASC,
click ASC) f
INNER JOIN (SELECT DISTINCT ON (id)
id,
source
FROM elbat
ORDER BY id ASC,
click DESC) s
ON s.id = f.id;
db<>fiddle

sticky bit's solution is quite good, but you can also do this with window functions. You should test to see which one works better for you:
select distinct id,
first_value(source) OVER (partition by id order by click),
last_value(source) OVER (partition by id order by click
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM your_table
ORDER BY id;

Related

how to make atomic rows using function regexp_split_to_table #postgresql

i have a table that stores amenities of a room (wifi,tv etc) a room can have many amenities i want to make a column where every amenity will be atomic
id
amenity_name
1
tv
2
wifi
3
bed
4
Smokling allowed
current table :
id
Another header
1
Wifi,Breakfast
2
Wifi,Kitchen,Smoking allowed,Pets allowed,Heating,Washer,Essentials,Lock on bedroom door,24-hour check-in,Hangers,Hair dryer,Laptop friendly workspace
i have tried using regexp_split_to_table but i can't make anything out from this function
any ideas?
thanks.
Try a lateral join:
SELECT tab.id, a.name
FROM tab
CROSS JOIN LATERAL regexp_split_to_table(tab.amenity_name, ',') AS a(name);

Top N rows by group in ClickHouse

What is the proper way to query top N rows by group in ClickHouse?
Lets take an example of tbl having id2, id4, v3 columns and N=2.
I tried the following
SELECT
id2,
id4,
v3 AS v3
FROM tbl
GROUP BY
id2,
id4
ORDER BY v3 DESC
LIMIT 2 BY
id2,
id4
but getting error
Received exception from server (version 19.3.4):
Code: 215. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception
: Column v3 is not under aggregate function and not in GROUP BY..
I could put v3 into GROUP BY and it does seems to work, but it is not efficient to group by a metric.
There is any aggregate function, but we actually want all values (limited to 2 by LIMIT BY clause) not any value, so it doesn't sound like to be proper solution here.
SELECT
id2,
id4,
any(v3) AS v3
FROM tbl
GROUP BY
id2,
id4
ORDER BY v3 DESC
LIMIT 2 BY
id2,
id4
It can be used aggregate functions like this:
SELECT
id2,
id4,
arrayJoin(arraySlice(arrayReverseSort(groupArray(v3)), 1, 2)) v3
FROM tbl
GROUP BY
id2,
id4
You can also do it the way you would do it in "normal" SQL as described in this thread
While vladimir's solutions works for many cases, it didn't work for my case. I have a table, that looks like this:
column | group by
++++++++++++++++++++++
A | Yes
B | Yes
C | No
Now, imagine column A identifies the user and column B stands for whatever action a user could do e. g. on your website or your online game. Column C is the sum of how often the user has done this particular action. Vladimir's solution would allow me to get column A and C, but not the action the user has done (column B), meaning I would know how often a user has done something, but not what.
The reason for this is that it doesn't make sense to group by both A and B. Every row would be a unique group and you aren't able to find the top K rows since every group has only 1 member. The result is the same table you query against. Instead, if you group only by A, you can apply vladimir's solution but would get only columns A and C. You can't output column B because it's not part of the Group By statement as explained.
If you would like to get the top 2 (or top 5, or top 100) actions a user has done, you might look for a solution that this:
SELECT rs.id2, rs.id4, rs.v3
FROM (
SELECT id2, id4, v3, row_number()
OVER (PARTITION BY id2, id4 ORDER BY v3 DESC) AS Rank
FROM tbl
) rs WHERE Rank <= 2
Note: To use this, you have to set allow_experimental_window_functions = 1.

Finding top searched country and ip from a table

I have a table "user" with columns ip,os,country and browser. I want to find the ip,os,country and browser with maximum count.Is there any query for that in PostgreSQL
The current query I'm using is
SELECT *
FROM
(
SELECT COUNT(ip),ip FROM user GROUP BY ip
UNION ALL
SELECT COUNT(os),os FROM user GROUP BY os
UNION ALL
SELECT COUNT(country),country FROM user GROUP BY country
UNION ALL
SELECT COUNT(browser),browser FROM user GROUP BY browser
) user
it shows all ip,os,country and browser and their count
what i really want is a column name the max count of that column
is it possible to do that in a single query?
Im expecting something like this
os count ip count
linux 50 xx:xx:xx:xx 95
SELECT *
FROM
(SELECT COUNT(ip) as cnt_ip, ip FROM user GROUP BY ip ORDER BY 1 DESC LIMIT 1) as t_ip,
(SELECT COUNT(os) as cnt_os, os FROM user GROUP BY os ORDER BY 1 DESC LIMIT 1) as t_os,
(SELECT COUNT(country) as cnt_country, country FROM user GROUP BY country ORDER BY 1 DESC LIMIT 1) as t_country,
(SELECT COUNT(browser) as cnt_browser, browser FROM user GROUP BY browser ORDER BY 1 DESC LIMIT 1) as t_browser
You may use HAVING and ALL for that. Due to readability purpose, I'll show just for one column
SELECT COUNT(ip),ip
FROM user
GROUP BY ip
HAVING COUNT(ip) >= all
(
SELECT COUNT(ip)
FROM user
GROUP BY ip
)

select last of an item for each user in postgres

I want to get the last entry for each user but the customer_id is a hash 'ASAG#...' order by customer_id destroys the query. Is there an alternative?
Select Distinct On (l.customer_id)
l.customer_id
,l.created_at
,l.text
From likes l
Order By l.customer_id, l.created_at Desc
Your current query already appears to be working, q.v. here:
Demo
I don't know why your current query is not generating the results you would expect. It should return one distinct record for every customer, corresponding to the more recent one, given your ORDER BY statement.
In any case, if it does not do what you want, an alternative would be to use ROW_NUMBER() here with a partition by user. The inner query assigns a row number to each user, with the value 1 going to the most recent record for each user. Then the outer query retains only the latest record.
SELECT
t.customer_id,
t.created_at,
t.text
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) rn
FROM likes
) t
WHERE t.rn = 1
To speed up the inner query which uses ROW_NUMBER() you can try adding a composite index on the customer_id and created_at columns:
CREATE INDEX yourIdx ON likes (customer_id, created_at);

Group output of single column postgresql

First off I'm a total SQL noob - Thanks in advance for any assistance you can offer.
I have a FortiAnalyzer that uses a Postgres DB to store firewall logs. The Analyzer is then used to report on usage etc.
Basically I need to write a custom query that can show the Top 10 Users by bandwidth used for the top 10 Websites/destinations per user.
I can get all of the relevant information out of the unit, but I cannot get the output formatted correctly.
I would be happy with the output showing a username 10 times with the top 10 sites next to the username. First prize however would be to show the username in Column A only once, then in column B and C the destination address and bandwidth used respectively.
Here is the query I have so far:
select coalesce(nullifna(`user`), `src`) as user_src,
coalesce(hostname, dstname, 'unknown') as web_site,
sum(rcvd + sent)/1024 as bandwidth from $log
where $filter and user is not null and status in ('passthrough', 'filtered')
group by `user_src` , web_site order by user_src desc
Once the query is linked to a report chart, I them have options to limit output by x value. I could for example limit this to limit the user_src column to 100 (i.e 10 Users with 10 outputs each)
I hope this is clear to you... If not, I will do my best to answer any questions.
I start with table aggregated on website, user_src level. Than it is not difficult to get top X users for top Y sites. You will need to use window function to get desired result.
Sample data:
create table test (web_site varchar, user_src varchar, bandwidth numeric);
insert into test values
('a','s1',18),
('b','s1',12),
('c','s1',13),
('d','s2',14),
('e','s2',15),
('f','s2',16),
('g','s3',17),
('h','s3',18),
('i','s3',19)
;
Get top X websites for top Y users:
with cte as (
select
user_src,
web_site,
bandwidth,
dense_rank() over(order by site_bandwidth desc) as user_rank,
dense_rank() over(partition by user_src order by bandwidth desc) as website_rank
from
test
join (select user_src, sum(bandwidth) site_bandwidth from test group by user_src) a using (user_src)
)
select
*
from
cte
where
user_rank <= 2
and website_rank <=2
order by
user_rank,
website_rank
SQLFiddle