SQL Server 2005 Query - Count of a Sum - tsql

I'm trying to do a sum of a count. I'm running the below query. I want a sum of BrowserCount however I want to return the datatable which is returned by this query. Is that possible should I be using a return value? Or is there another way? I realize there's been a decent amount of posts on this but I couldn't get this to run using them.
SELECT UA.Browser_ID
, B.Browser_Name_NM
, COUNT(B.Browser_Name_NM) AS BrowserCount
FROM llc.User_Agent_TB AS UA
LEFT JOIN llc.Browser_TB AS B ON UA.Browser_ID = B.Browser_ID
GROUP BY B.Browser_Name_NM
, UA.Browser_ID
ORDER BY BrowserCount DESC
This is sql server 2005 so I can't do a group of a set. I also have tried to get a union of two queries to work and it keeps giving me a syntax error.

This might be what you're after:
SELECT UA.Browser_ID
, B.Browser_Name_NM
, COUNT(B.Browser_Name_NM) AS BrowserCount
FROM llc.User_Agent_TB AS UA
LEFT JOIN llc.Browser_TB AS B ON UA.Browser_ID = B.Browser_ID
GROUP BY B.Browser_Name_NM
, UA.Browser_ID
WITH ROLLUP
ORDER BY BrowserCount DESC
See here for some more detailed examples, including how to show a more meaningful summary line
https://web.archive.org/web/20211020145951/https://www.4guysfromrolla.com/articles/073003-1.aspx

Related

Postgres SQL query group by get most recent record instead of an aggregate

This is a current postgres query I have:
sql = """
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
avg(cms.incremental_opens),
avg(cms.incremental_clicks),
avg(cms.incremental_conversions)
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id
"""
whereby I get the average incremental_opens, incremental_clicks, and incremental_conversions per campaign group from the cms table. However, instead of the average, I want the most recent values for the 3 fields. See the cms table screenshot below - I want the values from the record with the greatest (i.e. most recent) event_id (instead of an average for all records) for a given group).
How can I do this? Thanks
It sounds like you want a lateral join.
FROM
experiments.variant_metric_snapshot vms
CROSS JOIN LATERAL (select * from experiments.campaign_metric_snapshot cms where vms.campaign_id = cms.campaign_id order by event_id desc LIMIT 1) cms
WHERE...
If you are after a quick and dirty solution you can use array_agg function with minimal change to your query.
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
(array_agg(cms.incremental_opens ORDER BY cms.event_id DESC))[1] AS incremental_opens,
..
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id;

T-SQL Query not returning proper results

I have a query that I want to capture Sales for Parts. I am expecting to get the full results from the Parts table and if there are no Sales for that Part in the timeframe, I want to see a 0 in the Sales column. I am not seeing that. I am just getting the Parts that had Sales.
SELECT
Part,
Sum(Sales)
FROM
dbo.Parts
LEFT OUTER JOIN
dbo.SalesData ON Part = Part
WHERE
SalesDate > '2011-12-31'
GROUP BY
Part
ORDER BY
Part
What am I doing wrong?
I believe this is because your WHERE clause is removing all the parts that don't have sales because they won't have a SalesDate.
Try:-
SELECT
Part,
Sum(Sales)
FROM
dbo.Parts
LEFT OUTER JOIN
dbo.SalesData ON Part = Part
AND SalesDate > '2011-12-31'
GROUP BY
Part
ORDER BY
Part

select distinct from 2 columns but only 1 is duplicate

select a.subscriber_msisdn, war.created_datetime from
(
select distinct subscriber_msisdn from wiz_application_response
where application_item_id in
(select id from wiz_application_item where application_id=155)
and created_datetime between '2012-10-07 00:00' and '2012-11-15 00:00:54'
) a
left outer join wiz_application_response war on (war.subscriber_msisdn=a.subscriber_msisdn)
the sub select returns 11 rows but when joined return 18 (with duplicates). The objective of this query is only add the date column to the 11 rows of the sub select.
Based on your description, it stands to reason that there are multiple created_datetime values for some of the subscriber_msisdn values which is what prompted you to use the distinct in the subquery to begin with. By joining the sub query to the original table you are defeating this. A cleaner way to write the query would be:
SELECT
war.subscriber_msisdn
, war.created_datetime
FROM
wiz_application_response war
LEFT JOIN wiz_application_item wai
ON war.application_item_id = wai.id
AND wai.application_id = 155
WHERE
war.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
This should return only the rows from the war table that satisfy the criteria based on the wai table. It should not be and outer join unless you wanted to return all the rows from war table that satisfied the created_datetime parameter regardless of the application_item_id parameter.
This is my best guess based on the limited information I have about your tables and what I’m assuming you’re trying to accomplish. If this doesn’t get you what you are after, I will continue to offer other ideas based on additional information you could provide. Hope this works.
Can most probably simplified to this:
SELECT DISTINCT ON (1)
r.subscriber_msisdn, r.created_datetime
FROM wiz_application_item i
JOIN wiz_application_response r ON r.application_item_id = i.id
WHERE i.application_id = 155
AND i.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
ORDER BY 1, 2 DESC -- to pick the latest created_datetime
Details depend on missing information.
More explanation here.

Postgresql Faulty Syntax on select/join/group

What about the following is not proper syntax for Postgresql?
select p.*, SUM(vote) as votes_count
FROM votes v, posts p
where p.id = v.`voteable_id`
AND v.`voteable_type` = 'Post'
group by v.voteable_id
order by votes_count DESC limit 20
I am in the process of installing postgresql locally but wanted to get this out sooner :)
Thank you
MySQL is a lot looser in its interpretation of standard SQL than PostgreSQL is. There are two issues with your query:
Backtick quoting is a MySQL thing.
Your GROUP BY is invalid.
The first one can be fixed by simply removing the offending quotes. The second one requires more work; from the fine manual:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
This means that every column mentioned in your SELECT either has to appear in an aggregate function or in the GROUP BY clause. So, you have to expand your p.* and make sure that all those columns are in the GROUP BY, you should end up with something like this but with real columns in place of p.column...:
select p.id, p.column..., sum(v.vote) as votes_count
from votes v, posts p
where p.id = v.voteable_id
and v.voteable_type = 'Post'
group by p.id, p.column...
order by votes_count desc
limit 20
This is a pretty common problem when moving from MySQL to anything else.

Entity framework: query executing 'select from' for no reason

I'm having some issues with the entity framework. I'm executing a simple select from a view in the database. However, when I view the SQL that EF generates, it is executing the query twice using a select from. Is this the way it is supposed to operate? It seems very inefficient.
var reads = (from rt in ctx.C2kReadsToTransfer
where rt.ReadDt > fromDate
&& rt.ReadDt < toDate
select rt);
This gets translated into the following SQL
SELECT
[Extent1].[AMRID] AS [AMRID]
, [Extent1].[Comments] AS [Comments]
, [Extent1].[ExternalSystemType] AS [ExternalSystemType]
, [Extent1].[LastReadDt] AS [LastReadDt]
, [Extent1].[ReadDt] AS [ReadDt]
, [Extent1].[Reading] AS [Reading]
, [Extent1].[Units] AS [Units]
, [Extent1].[Transferred] AS [Transferred]
FROM
(SELECT
[ReadsToTransfer].[AMRID] AS [AMRID]
, [ReadsToTransfer].[Comments] AS [Comments]
, [ReadsToTransfer].[ExternalSystemType] AS [ExternalSystemType]
, [ReadsToTransfer].[LastReadDt] AS [LastReadDt]
, [ReadsToTransfer].[ReadDt] AS [ReadDt]
, [ReadsToTransfer].[Reading] AS [Reading]
, [ReadsToTransfer].[Transferred] AS [Transferred]
, [ReadsToTransfer].[Units] AS [Units]
FROM [dbo].[ReadsToTransfer] AS [ReadsToTransfer])
AS [Extent1]
That seems to be very inefficient, especially when the table contains close to 250 million rows as ours does. Also, if I tack a .Take(2000) onto the end of the code, it simply puts a 'select top 2000' on only the first select. Thus, making it select the top 2000 of the inside select which is the entire table.
Any thoughts on this?
That seems to be very inefficient
I don't think so... the outer SELECT is just a projection (actually an identity projection) of the inner SELECT, and a projection has a negligible performance impact...
Regarding the TOP 2000 clause, the fact that it is on the outer SELECT doesn't mean that the DB will read all rows from the inner SELECT ; it will read them as long as they are requested by the outer SELECT, then stop.
Just try to run the query manually, with or without the outer SELECT : I bet you won't find any significant difference in performance.