I see a discrepancy in the count because of the group by clause, please advise how to go about this.
Total records in the table are 4638 when I do count(you_id), but when I run this script I'm getting only 4544 records.
SELECT
thumbnails,
CR1,
monetize,
c_id,
you_id,
vendor,
ratio * 100 AS percent
FROM
(
SELECT thumbnails, c_id,you_id, vendor,CR1,monetize,timestamp,
count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM testing
GROUP by 1,2,3,4,5,6,7
);
Related
I want to calculate the ratio of some features in postgresql but using data from 2 different tables.
From table 1 I want to query all the count of all rows:
select count(*) as total from table1
and then perform a calculation with the total calculate above, like this:
select count(class) / total as ratio
from table2
So the total comes table1 and count(class) comes from table2:
How can I do this as there's no common fields to join them?
Use subqueries:
select
(select count(class) from table2) /
(select count(*) as total from table1) as ratio
It could be that count(class) should be count(distinct class), if not count(class) can be just count(*).
I have a table and want to calculate the percentage of total by store_id which each (category_id, store_id) subtotal represents. My code is below:
WITH
example_table (name, store_id)
AS
(
select name, store_id
from category
join film_category using (category_id)
join film using (film_id)
join inventory using (film_id)
join rental using (inventory_id)
)
SELECT name, store_id, cast(count(*) as numeric)/(SELECT count(*) FROM example_table)
FROM example_table
GROUP BY name, store_id
ORDER BY name, store_id
This code actually works, as in, it doesn't throw an error, only they're not the results I'm looking for. Here each of the subtotals is divided by the total across both stores and all 16 names. Instead, I want the subtotals divided by their respective store totals or divided by their respective name totals.
I'm wondering how to perform calculations on those subtotals in general.
Thanks in advance,
I believe you need to explore the possibilities of using aggregate functions combined with an OVER(PARTITION BY ...) e.g.
SELECT DISTINCT
name, store_id, store_id_count, name_count
FROM (
select name, store_id
, count(*) over(partition by store_id) as store_id_count
, count(*) over(partition by name) as name_count
from category
join film_category using (category_id)
join film using (film_id)
join inventory using (film_id)
join rental using (inventory_id)
) AS example_table
When using aggregate function with the over clause you get the wanted counts on each row of the result, and it seems that in this case you need this. Note that select distinct has been used simply to reduce the final number of rows returned, you might still need to use a group by but I am not sure if you do.
Once you have the needed values within the derived table (aliases as example_table) then it should be a simple matter of some arithmetic in the overall select clause.
What I have tried to solve this is:
SELECT AVG(amount) FROM (SELECT amount FROM payment ORDER BY amount LIMIT 100);
This also did not work.
SELECT AVG(highest_amount) FROM (SELECT amount AS highest_amount FROM
payment ORDER BY amount LIMIT 100);
Sorry for asking silly questions. I am a newbie. :(
If you give the subquery in your first attempt an alias, it will work:
SELECT AVG(amount)
FROM
(
SELECT amount
FROM payment
ORDER BY amount
LIMIT 100
) t;
If you want an alternative method, which can be used without using LIMIT, then we can try using row number:
SELECT AVG(amount)
FROM
(
SELECT amount, ROW_NUMBER() OVER (ORDER BY amount) rn
FROM payment
) t
WHERE rn <= 100;
The easiest way to solve this kind of problem is to use WITH queries, also know as Common Table Expression, https://www.postgresql.org/docs/10/static/queries-with.html the result is clear and easy to understand.
WITH top100 AS
(
SELECT
amount
FROM
payment
ORDER BY
amount DESC
LIMIT
100
)
SELECT
avg(amout)
FROM
top100;
I have the following query:
SELECT *
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Price ASC) as RowNum
from Offers) r
where RowNum = 1
Offers table contains about 10 million records. But there are only ~4000 distinct codes there. So I need to get the row with the lowest price for each code and there will be only 4000 rows in the result.
I have an Index on (Code, Price) columns with all other columns in INCLUDE statement.
The query runs for 2 minutes. And if I look at the execution plan, I see Index scan with 10M actual rows. So, I guess it scans the whole index to get needed values.
Why MSSQL do the whole index scan? Is it because subquery needs the whole data? How to avoid this scan? Is there a SQL hint to process only the first row in partition?
Is there another way to optimize such query?
After trying multiple different solutions, I've found the fastest query with CROSS APPLY statement:
SELECT C.*
FROM (SELECT DISTINCT Code from Offers) A
CROSS APPLY (SELECT TOP 1 *
FROM Offers B
WHERE A.Code = B.Code
ORDER by Price) C
It take ~1 second to run.
Try creating an index on ( Code, Price ) without including the other columns and then (assuming that there is a unique Id column):
select L.*
from Offers as L inner join
( select Id,
Row_Number() over ( partition by Code order by Price ) as RN
from Offers ) as R on R.Id = L.Id and R.RN = 1
An index scan on a smaller index ought to help.
Second guess would be to get the Id of the row with the lowest Price for each Code explicitly: Get distinct Code values, get Id of top 1 (to avoid problems with duplicate prices) Min( Price ) row for that Code, join with Offers to get complete rows. Again, the more compact index should help.
Not sure if you'll get any significant performance gains, but you may want to try the WITH TIES clause
Example
Select Top 1 with Ties *
From Offers
Order By Row_Number() over (Partition By Code Order By Price)
First off I'm a total SQL noob - Thanks in advance for any assistance you can offer.
I have a FortiAnalyzer that uses a Postgres DB to store firewall logs. The Analyzer is then used to report on usage etc.
Basically I need to write a custom query that can show the Top 10 Users by bandwidth used for the top 10 Websites/destinations per user.
I can get all of the relevant information out of the unit, but I cannot get the output formatted correctly.
I would be happy with the output showing a username 10 times with the top 10 sites next to the username. First prize however would be to show the username in Column A only once, then in column B and C the destination address and bandwidth used respectively.
Here is the query I have so far:
select coalesce(nullifna(`user`), `src`) as user_src,
coalesce(hostname, dstname, 'unknown') as web_site,
sum(rcvd + sent)/1024 as bandwidth from $log
where $filter and user is not null and status in ('passthrough', 'filtered')
group by `user_src` , web_site order by user_src desc
Once the query is linked to a report chart, I them have options to limit output by x value. I could for example limit this to limit the user_src column to 100 (i.e 10 Users with 10 outputs each)
I hope this is clear to you... If not, I will do my best to answer any questions.
I start with table aggregated on website, user_src level. Than it is not difficult to get top X users for top Y sites. You will need to use window function to get desired result.
Sample data:
create table test (web_site varchar, user_src varchar, bandwidth numeric);
insert into test values
('a','s1',18),
('b','s1',12),
('c','s1',13),
('d','s2',14),
('e','s2',15),
('f','s2',16),
('g','s3',17),
('h','s3',18),
('i','s3',19)
;
Get top X websites for top Y users:
with cte as (
select
user_src,
web_site,
bandwidth,
dense_rank() over(order by site_bandwidth desc) as user_rank,
dense_rank() over(partition by user_src order by bandwidth desc) as website_rank
from
test
join (select user_src, sum(bandwidth) site_bandwidth from test group by user_src) a using (user_src)
)
select
*
from
cte
where
user_rank <= 2
and website_rank <=2
order by
user_rank,
website_rank
SQLFiddle