Issue with the count in PostgreSQL - postgresql

I want the count of the one column and I have 5 columns in FROM clause but it is giving wrong count as I have included all my columns that are in the from clause. I don't want that particular column in the GROUP BY clause.
If I remove that column from GROUP BY clause it throws the following error:
ERROR: column "pt.name" must appear in the GROUP BY clause or be used
in an aggregate function LINE 1: SELECT distinct on (pu.id) pu.id,
pt.name as package_name, c...
E.g.:
SELECT DISTINCT ON (a) a,b,c,count(d),e
FROM table GROUP BY a,b,c,d,e ORDER BY a
From this I want to remove e from the GROUP BY.
How can I remove that column from GROUP BY so that I can get correct count?

Updated after rereading the question.
You are mixing GROUP BY and DISTINCT ON. What you want (how I understand it) can be done with a window function combined with a DISTINCT ON:
SELECT DISTINCT ON (a)
a, b, c
, count(d) OVER (PARTITION BY a, b, c) AS d_ct
, e
FROM tbl
ORDER BY a, d_ct DESC;
Window functions require PostgreSQL 8.4 ore later.
What happens here?
Count in d_ct how many identical sets of (a,b,c) there are in the table with non-null values for d.
Pick exactly one row per a. If you don't ORDER BY more than just a, a random row will be picked.
In my example I ORDER BY d_ct DESC in addition, so a pseudo-random row out of the set with the highest d_ct will be picked.
Another, slightly different interpretation of what you might need, with GROUP BY:
SELECT DISTINCT ON (a)
a, b, c
, count(d) AS d_ct
, min(e) AS min_e -- aggregate e in some way
FROM t
GROUP BY a, b, c
ORDER BY a, d_ct DESC;
GROUP BY is applied before DISTINCT ON, so the result is very similar to the one above, only the value for e / min_e is different.

Related

Please suggest an sql query based on the requirement

I have a dynamic sql, where I need to select two columns(say A & B) from the table. I need to generate the result set only if column B has at least one non zero value. If there is no non zero value in the column B, result set should be empty.
Should be that simple, just as you wrote the rule -
select a, b
from the_table
where exists (select from the_table where b <> 0);

Selecting all rows who belong to a group with some properties

I use PostgreSQL for a web application, and I've run into a type of query I can't think of a way to write efficiently.
What I'm trying to do is select all rows from a table which, when grouped a certain way, the group meets some criteria. For example, the naive way to structure this query might be something like this:
SELECT *
FROM table T
JOIN (
SELECT iT.a, iT.b, SUM(iT.c) AS sum
FROM table iT
GROUP BY iT.a, iT.b
) TG ON (TG.a = T.a AND TG.b = T.b)
WHERE TG.sum > 100;
The problem I'm having is that this effectively doubles the time it takes the query to execute, since it's essentially selecting the rows from that table twice.
How can I structure queries of this type efficiently?
You can try a window function although I don't know if it is more efficient. I guess it is as it avoids the join. Test this and your query with explain
select *
from (
select
a, b,
sum(c) over(partition by a, b) as sum
from t
) s
where "sum" > 100

How to specify two expressions in the select list when the subquery is not introduced with EXISTS

I have a query that uses a subquery and I am having a problem returning the expected results. The error I receive is..."Only one expression can be specified in the select list when the subquery is not introduced with EXISTS." How can I rewrite this to work?
SELECT
a.Part,
b.Location,
b.LeadTime
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
AND
Date IN (SELECT Location, MAX(Date) FROM dbo.Vendor GROUP BY Location)
GROUP BY
a.Part,
b.Location,
b.LeadTime
ORDER BY
a.Part
I think something like this may be what you're looking for. You didn't say what version of SQL Server--this works in SQL 2005 and up:
SELECT
p.Part,
p.Location, -- from *p*, otherwise if no match we'll get a NULL
v.LeadTime
FROM
dbo.Parts p
OUTER APPLY (
SELECT TOP (1) * -- * here is okay because we specify columns outside
FROM dbo.Vendor v
WHERE p.Location = v.Location -- the correlation part
ORDER BY v.Date DESC
) v
WHERE
p.Location IN ('A','B','C')
ORDER BY
p.Part
;
Now, your query can be repaired as is by adding the "correlation" part to change your query into a correlated subquery as demonstrated in Kory's answer (you'd also remove the GROUP BY clause). However, that method still requires an additional and unnecessary join, hurting performance, plus it can only pull one column at a time. This method allows you to pull all the columns from the other table, and has no extra join.
Note: this gives logically the same results as Lamak's answer, however I prefer it for a few reasons:
When there is an index on the correlation columns (Location, here) this can be satisfied with seeks, but the Row_Number solution has to scan (I believe).
I prefer the way this expresses the intent of the query more directly and succinctly. In the Row_Number method, one must get out to the outer condition to see that we are only grabbing the rn = 1 values, then bop back into the CTE to see what that is.
Using CROSS APPLY or OUTER APPLY, all the other tables not involved in the single-inner-row-per-outer-row selection are outside where (to me) they belong. We aren't squishing concerns together. Using Row_Number feels a bit like throwing a DISTINCT on a query to fix duplication rather than dealing with the underlying issue. I guess this is basically the same issue as the previous point worded in a different way.
The moment you have TWO tables from which you wish to pull the most recent value, the Row_Number() solution blows up completely. With this syntax, you just easily add another APPLY clause, and it's crystal clear what you're doing. There is a way to use Row_Number for the multiple tables scenario by moving the other tables outside, but I still don't prefer that syntax.
Using this syntax allows you to perform additional joins based on whether the selected row exists or not (in the case that no matching row was found). In the Row_Number solution, you can only reasonably do that NOT NULL checking in the outer query--so you are forced to split up the query into multiple, separated parts (you don't want to be joining to values you will be discarding!).
P.S. I strongly encourage you to use aliases that hint at the table they represent. Please don't use a and b. I used p for Parts and v for Vendor--this helps you and others make sense of the query more quickly in the future.
If I understood you corrrectly, you want the rows with the max date for locations A, B and C. Now, assuming SQL Server 2005+, you can do this:
;WITH CTE AS
(
SELECT
a.Part,
b.Location,
b.LeadTime,
RN = ROW_NUMBER() OVER(PARTITION BY a.Part ORDER BY [Date] DESC)
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
)
SELECT Part,
Location,
LeadTime
FROM CTE
WHERE RN = 1
ORDER BY Part
In your subquery you need to correlate the Location and Part to the outer query.
Example:
Date = (SELECT MAX(Date)
FROM dbo.Vender v
WHERE v.Location = b.Location
AND v.Part = b.Part
)
So this will bring back one date for each location and part

TSQL selecting unique value from multiple ranges in a column

A question from a beginner.
I have two tables. One (A) contains Start_time, End_time, Status. Second one (B) contains Timestamp, Error_code. Second table is automatically logged by system every few seconds, so it contains lots of non unique values of Error_code (it changes randomly, but within a time range from table A). What i need is to select unique error code for every time range (in my case every row) from the first table for every time range in table A:
A.Start_time, A.End_time B.Error_code.
I have come to this:
select A.Start_time,
A.End_time,
B.Error_code
from B
inner join A
on B.Timestamp between A.Start_time and A.End_time
This is wrong, i know.
Any thoughts are welcome.
If tour query gives a lot of duplicates use distinct to remove them:
select DISTINCT A.Start_time, A.End_time, B.Error_code
from B
inner join A on B.Timestamp between A.Start_time and A.End_time

Select query showing incorrect order in DB2

While inserting the data in the database third record that is coming is first record at the time of insertion and the first record is second and third one is fourth and so on.
I am using the following query to fetch the data:
SELECT A, B, C, D, E, F FROM MYTABLE WHERE A = 'SOMEPGM' ORDER BY F
F have duplicate records...
why first record becomes third record in the result?
You are doing ORDER BY "MGRSEQ", but there are rows with duplicate MGRSEQ values; you need to specify another column to get a consistent ordering. Ordering without explicit ORDER BY clauses is not guaranteed.
try this:
SELECT "MGRROUT", "MGRTYP", "MGRRRN", "MGRNUM", "MGROPC",
"MGRVAR1", "MGRCOMP", "MGRVAR2", "MGREXC", "MGRSEQ", MGRCAT1
FROM "XPGMLOGIC" WHERE "MGRPGM" = 'BARSCSLMS'
ORDER BY "MGRSEQ", "MGRNUM" DESC