Finding top searched country and ip from a table - postgresql

I have a table "user" with columns ip,os,country and browser. I want to find the ip,os,country and browser with maximum count.Is there any query for that in PostgreSQL
The current query I'm using is
SELECT *
FROM
(
SELECT COUNT(ip),ip FROM user GROUP BY ip
UNION ALL
SELECT COUNT(os),os FROM user GROUP BY os
UNION ALL
SELECT COUNT(country),country FROM user GROUP BY country
UNION ALL
SELECT COUNT(browser),browser FROM user GROUP BY browser
) user
it shows all ip,os,country and browser and their count
what i really want is a column name the max count of that column
is it possible to do that in a single query?
Im expecting something like this
os count ip count
linux 50 xx:xx:xx:xx 95

SELECT *
FROM
(SELECT COUNT(ip) as cnt_ip, ip FROM user GROUP BY ip ORDER BY 1 DESC LIMIT 1) as t_ip,
(SELECT COUNT(os) as cnt_os, os FROM user GROUP BY os ORDER BY 1 DESC LIMIT 1) as t_os,
(SELECT COUNT(country) as cnt_country, country FROM user GROUP BY country ORDER BY 1 DESC LIMIT 1) as t_country,
(SELECT COUNT(browser) as cnt_browser, browser FROM user GROUP BY browser ORDER BY 1 DESC LIMIT 1) as t_browser

You may use HAVING and ALL for that. Due to readability purpose, I'll show just for one column
SELECT COUNT(ip),ip
FROM user
GROUP BY ip
HAVING COUNT(ip) >= all
(
SELECT COUNT(ip)
FROM user
GROUP BY ip
)

Related

How do I get all records created by 10% of users?

I have traffic logs from my site.
I want to sample traffic from 10% of the user base.
But each record in the database is a visit, and each customer can have many visits. Getting only 10% of traffic would be incorrect, because 20% of users may generate 80% of traffic.
Table structure is simple
user_id, page
How do I get traffic from a random 10% of customers without too many nested subqueries?
If using MySQL you can try:
/* Calculate 10% of the users, rounding up to account for values below 1 */
SET #limit = CEIL((SELECT COUNT(DISTINCT(user_id)) FROM TRAFFIC) / 10);
/* Prepare a statement for getting the traffic */
PREPARE STMT FROM 'SELECT *
FROM TRAFFIC T
INNER JOIN (
SELECT DISTINCT(user_id)
FROM TRAFFIC
LIMIT ?
) U
ON T.user_id = U.user_id';
/* Execute the statement using the pre-computed limit. */
EXECUTE STMT USING #limit;
Here's a similar implementation in PostgreSQL (based on feedback):
SELECT *
FROM TRAFFIC T
INNER JOIN (
SELECT DISTINCT user_id
FROM TRAFFIC
LIMIT CEIL((SELECT COUNT(DISTINCT user_id) FROM TRAFFIC) / 10)
) U
ON T.user_id = U.user_id;
If your users are stored in a different table (and the log table's user_id is a foreign key to that) you can use the tablesample option to get 10% of the users in a sub-select:
select *
from the_table
where user_id in (select id
from users
tablesample system (10));
If you don't have such a table Jake's query (without the prepared statement) is probably the way to go.

Select columns outside of group by

I am looking to select the first and last click by each ID, along with the corresponding source. Here is a sample table:
ID Click Source
--------------------------
1 1 Google
1 2 Facebook
1 3 Yahoo
2 1 Google
2 2 Yahoo
3 1 Facebook
4 1 Yahoo
5 1 Pinterest
5 2 Google
Here is the desired result:
ID First Last
-------------------------
1 Google Yahoo
2 Google Yahoo
3 Facebook Facebook
4 Yahoo Yahoo
5 Pinterest Google
I've already managed to get the first click by simply setting click=1 in the where clause. I am not able to get MAX(click) without grouping by ID and Source. When I include the Source in the group by I don't get the results I want.
You can join two derived tables getting the first and last click per id using DISTINCT ON on the common ID.
SELECT f.id,
f.source "first",
s.source "last"
FROM (SELECT DISTINCT ON (id)
id,
source
FROM elbat
ORDER BY id ASC,
click ASC) f
INNER JOIN (SELECT DISTINCT ON (id)
id,
source
FROM elbat
ORDER BY id ASC,
click DESC) s
ON s.id = f.id;
db<>fiddle
sticky bit's solution is quite good, but you can also do this with window functions. You should test to see which one works better for you:
select distinct id,
first_value(source) OVER (partition by id order by click),
last_value(source) OVER (partition by id order by click
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM your_table
ORDER BY id;

How to write proper/efficient query

I have a question about the right way of writing the query.
I have an employees table, lets say there are 4 columns employee_id, department, salary, email.
There are some records without email address, I'd like to find the most efficient way to write SQL query using window function that brings the sum salary per group, divided by all of those without email address.
I have 2 solutions, of course only one is efficient, can anyone give any advice about it?
select department, sum(salary) as total
from employees
where email is null
group by 1
option 1
select a.department , a.total/(select sum(salary) from employees where email is null)
from (
select department, sum(salary) as total
from employees
where email is null
group by 1
) a
option 2
select a.department , a.total/sum(a.total) over()
from (
select department, sum(salary) as total
from employees
where email is null
group by 1
) a
I guess that query 2 is more efficient, but is it the right way? and is it valid to leave over clause empty?
Just started using PostgreSQL instead of MySQL 5.6.
Your second query is better.
The first query has to scan employees twice, while the second table only scans the (hopefully smaller) result set of the subquery to calculate the sum.
It is perfectly valid to leave the OVER clause empty, that just means that all result rows will get the same value (which is what you want).

Getting group by attribute in nested query

I am trying to find the most frequent value in a postgresql table. The problem is that I also want to "group by" in that table and only get the most frequent from the values that have the same name.
So I have the following query:
select name,
(SELECT value FROM table where name=name GROUP BY value ORDER BY COUNT(*) DESC limit 1)
as mfq from table group by name;
So, I am using where name=name, trying to get the outside group by attribute "name", but it doesn't seem to work. Any ideas on how to do it?
Edit: for example in the following table:
name value
a 3
a 3
a 3
b 2
b 2
I want to get:
name value
a 3
b 2
but the above statement gives:
name value
a 3
b 3
instead, since where doesn't work correctly.
There is a dedicated function in PostgreSQL for this case: the mode() ordered-set aggregate:
select name, mode() within group (order by value) mode_value
from table
group by name;
which returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results) -- which is the same behavior as with your order by count(*) desc limit 1.
It is available from PostgreSQL 9.4+.
http://rextester.com/GHGJH15037
If you want your query to work, you need table aliases. Table aliases and qualified column names are always a good idea:
select t.name,
(select t2.value
from table t2
where t2.name = t.name
group by t2.value
order by COUNT(*) desc
limit 1
) as mfq
from table t
group by t.name;

Group output of single column postgresql

First off I'm a total SQL noob - Thanks in advance for any assistance you can offer.
I have a FortiAnalyzer that uses a Postgres DB to store firewall logs. The Analyzer is then used to report on usage etc.
Basically I need to write a custom query that can show the Top 10 Users by bandwidth used for the top 10 Websites/destinations per user.
I can get all of the relevant information out of the unit, but I cannot get the output formatted correctly.
I would be happy with the output showing a username 10 times with the top 10 sites next to the username. First prize however would be to show the username in Column A only once, then in column B and C the destination address and bandwidth used respectively.
Here is the query I have so far:
select coalesce(nullifna(`user`), `src`) as user_src,
coalesce(hostname, dstname, 'unknown') as web_site,
sum(rcvd + sent)/1024 as bandwidth from $log
where $filter and user is not null and status in ('passthrough', 'filtered')
group by `user_src` , web_site order by user_src desc
Once the query is linked to a report chart, I them have options to limit output by x value. I could for example limit this to limit the user_src column to 100 (i.e 10 Users with 10 outputs each)
I hope this is clear to you... If not, I will do my best to answer any questions.
I start with table aggregated on website, user_src level. Than it is not difficult to get top X users for top Y sites. You will need to use window function to get desired result.
Sample data:
create table test (web_site varchar, user_src varchar, bandwidth numeric);
insert into test values
('a','s1',18),
('b','s1',12),
('c','s1',13),
('d','s2',14),
('e','s2',15),
('f','s2',16),
('g','s3',17),
('h','s3',18),
('i','s3',19)
;
Get top X websites for top Y users:
with cte as (
select
user_src,
web_site,
bandwidth,
dense_rank() over(order by site_bandwidth desc) as user_rank,
dense_rank() over(partition by user_src order by bandwidth desc) as website_rank
from
test
join (select user_src, sum(bandwidth) site_bandwidth from test group by user_src) a using (user_src)
)
select
*
from
cte
where
user_rank <= 2
and website_rank <=2
order by
user_rank,
website_rank
SQLFiddle