OR statement in select part of query in Postgres - postgresql

I have a query
SELECT
cd.signoffdate,
min(cmp.dsignoff) as dsignoff
FROM clients AS c
LEFT JOIN campaigns AS cmp ORDER BY dsignoff;
If I want to have something like this built into the postgres query will it work and how do I do it
if the cd.signoffdate is empty it should take min(cmp.dsignoff) as dsignoff as the value and then order by this column, so in other words it should order by dsignoff and cd.signoffdate and tread it as one column, is this possible and how?

Your query could look like this:
SELECT c.client_id, COALESCE(c.signoffdate, min(cmp.dsignoff)) AS signoff
FROM clients c
LEFT JOIN campaigns cmp ON cmp.client_id = c.client_id -- join condition!
GROUP BY c.client_id, cd.signoffdate -- group by!
ORDER BY COALESCE(c.signoffdate, min(cmp.dsignoff));
Or, with simplified syntax:
SELECT c.client_id, COALESCE(c.signoffdate, min(cmp.dsignoff)) AS signoff
FROM clients c
LEFT JOIN campaigns cmp USING (client_id)
GROUP BY 1, cd.signoffdate
ORDER BY 2;
Major points:
Used alias c, but referenced as cd.
No join condition leads to cross join, probably not intended.
Missing GROUP BY.
I assume that you want to group by the primary key column of clients and call it client_id.
I also assume that client_id links the two tables together.
COALESCE() serves as fallback in case signoffdate IS NULL.

ORDER BY coalesce(cd.signoffdate, min(cmp.dsignoff));
But don't you need some GROUP BY in your original query?

You can use COALESCE
SELECT COALESCE(cd.signoffdate, min(cmp.dsignoff)) as dsignoff
I'm not sure if you can order by coalesce in Postgres - might be worth just ordering by both columns

Related

Postgresql optional join

Is it possible in Postgres to have an optional join?
My use case is something like
select ...
from a
inner join b using (b_id)
where b.type in (...)
a is a very large reporting table. b is used to filter a, BUT the most common use case is that we will want all b.types, and therefore all the b records in the join. In other words, in most cases we don't want to filter by b at all, and would not need the join in that case, but the filtering optionality still needs to be there in cases when the user wants to filter by type.
So is it possible to invoke the join optionally, and save the join effort in cases when we just want all of a?
If not, what's my next best option? IF ... THEN or CTE with a union of separate queries?
If you don't need any of b's columns, there is no need to JOIN table b, You can filter by using EXISTS(SELECT .. FROM b WHERE ...).
If you want to conditionally exclude a part of the WHERE clause, you could use the following construct: (the ignore_b boolean will function as an on/off switch)
-- $ignore_b is a Boolean flag
-- when True, the optimiser will ignore the exists(...)
SELECT ...
FROM a
WHERE ( $ignore_b OR EXISTS (
SELECT *
FROM b
WHERE b.b_id = a.some_id
AND b.type in (1,2,3,4,5)
)
);
In our example, you are still filtering based on b, based on whether a row with that b_id exists in b in the first place.
Postgresql will remove unneeded joins under very specific circumstances. You write the join as a left join, so that no rows of A can be removed due to the absence of corresponding rows in B. The column B.b_id is a declared unique or primary key, so that no rows of A can be duplicated due to duplicate matches in B. And of course, no column of B can referenced in the query (except the reference to the key column in the left join condition).
In those cases, you can just always write the LEFT JOIN, and PostgreSQL will figure out that it can skip it.
You can argue that if you have a declared foreign key constraint on the join condition, then you shouldn't need the JOIN to be a LEFT JOIN in order to implement this optimization. I think that that argument is correct, but PostgreSQL does not implement it that way.
I would just do it programatically. If you are already programmatically adding references to B in the WHERE clause, you should be able to do it for the join as well.

Should I do ORDER BY twice when selecting from subquery?

I have SQL query (code below) which selects some rows from subquery. In subquery I perform ORDER BY.
The question is: will order of subquery be preserved in parent query?
Is there some spec/document or something which proves that?
SELECT sub.id, sub.name, ot.field
FROM (SELECT t.id, t.name
FROM table t
WHERE t.something > 10
ORDER BY t.id
LIMIT 25
) sub
LEFT JOIN other_table ot ON ot.table_id = sub.id
/**order by id?**/```
will order of subquery be preserved in parent query
It might happen, but you can not rely on that.
For example, if the optimizer decides to use a hash join between your derived table and other_table then the order of the derived table will not be preserved.
If you want a guaranteed sort order, then you have to use an order by in the outer query as well.

Left outer join using 2 of 3 tables in Postgresql

I need to show all clients entered into the system for a date range.
All clients are assigned to a group, but not necessarily to a staff.
When I run the query as such:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
staff.staff_name_cs,
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
clients.zrud_staff = staff.zzud_staff AND
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 121 entries
if I comment out:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
-- staff.staff_name_cs, -- Line Commented out
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
-- clients.zrud_staff = staff.zzud_staff AND --Line commented out
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 173 entries
I know I need to do an outer join to capture all clients regardless of if there
is a staff assigned, but each attempt has failed. I have done outer joins with
two tables, but adding a third has twisted my brain.
Thanks for any suggestions
I have no way of testing this (or of knowing that it is right) but what I read in your query is that you want something similar to this:
SELECT --I just used short aliases. I choose something other than the table name so I know it is an alias "c" for client etc...
c.name_lastfirst_cs,
to_char (c.date_intake,'MM/DD/YY')AS Date_Created,
c.client_id,
c.display_intake,
s.staff_name_cs,
g.name,
l.zrud_client AS "link_client",--I'm selecting some data here so that I can debug later, you can just filter this out with another select if you need to
l.zzud_group AS "link_group" --Again, so I can see these relationships
FROM
public.clients c
LEFT OUTER JOIN staff s ON --is staff required? If it isn't then outer join (optional)
s.zzud_staff = c.zrud_staff --so we linked staff to clients here
LEFT OUTER JOIN public.link_group l ON --this looks like a lookup table to me so we select the lookup record
l.zrud_client = c.zzud_client -- this is how I define the lookup, a client id
LEFT OUTER JOIN public.groups g ON --then we use that to lookup a group
g.zzup_group = l.zrud_group --which is defined by this data here
WHERE -- the following must be true
c.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
Now for the why: I've basically moved your where clause to JOIN x ON y=z syntax. In my experience this is a better way to write an maintain queries as it allows you to specify relationships between tables rather than doing a big-ol'-join and trying to filter that data with the where clause. Keep in mind each condition is REQUIRED not optional so when you say you want records with the following conditions you're going to get them (and if I read this right--I probably don't as I don't have a schema in-front of me) if a record is missing a link-table record OR a staff member you're going to filter it out.
Alternatively (possibly significantly slower) You can SELECT anything so you can chain it like:
SELECT
*
FROM
(
SELECT
*
FROM
public.clients
WHERE
x condition
)
WHERE
y condition
OR
SELECT * FROM x WHERE x.condition IN (SELECT * FROM y)
In your case this tactic probably won't be easier than a standard join syntax.
^And some serious opinion here: I recommend you use the join syntax I outlined above here. It is functionally the same as joining and specifying a where clause, but as you noted, if you don't understand the relationships it can cause a Cartesian join. http://www.tutorialspoint.com/sql/sql-cartesian-joins.htm . Lastly, I tend to specify what type of join I want. I write INNER JOIN and OUTER JOIN a lot in my queries because it helps the next person (usually me) figure out what the heck I meant. If it is optional use an outer join, if it is required use an inner join (default).
Good luck! There are much better SQL developers out there and there's probably another way to do it.

How to specify two expressions in the select list when the subquery is not introduced with EXISTS

I have a query that uses a subquery and I am having a problem returning the expected results. The error I receive is..."Only one expression can be specified in the select list when the subquery is not introduced with EXISTS." How can I rewrite this to work?
SELECT
a.Part,
b.Location,
b.LeadTime
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
AND
Date IN (SELECT Location, MAX(Date) FROM dbo.Vendor GROUP BY Location)
GROUP BY
a.Part,
b.Location,
b.LeadTime
ORDER BY
a.Part
I think something like this may be what you're looking for. You didn't say what version of SQL Server--this works in SQL 2005 and up:
SELECT
p.Part,
p.Location, -- from *p*, otherwise if no match we'll get a NULL
v.LeadTime
FROM
dbo.Parts p
OUTER APPLY (
SELECT TOP (1) * -- * here is okay because we specify columns outside
FROM dbo.Vendor v
WHERE p.Location = v.Location -- the correlation part
ORDER BY v.Date DESC
) v
WHERE
p.Location IN ('A','B','C')
ORDER BY
p.Part
;
Now, your query can be repaired as is by adding the "correlation" part to change your query into a correlated subquery as demonstrated in Kory's answer (you'd also remove the GROUP BY clause). However, that method still requires an additional and unnecessary join, hurting performance, plus it can only pull one column at a time. This method allows you to pull all the columns from the other table, and has no extra join.
Note: this gives logically the same results as Lamak's answer, however I prefer it for a few reasons:
When there is an index on the correlation columns (Location, here) this can be satisfied with seeks, but the Row_Number solution has to scan (I believe).
I prefer the way this expresses the intent of the query more directly and succinctly. In the Row_Number method, one must get out to the outer condition to see that we are only grabbing the rn = 1 values, then bop back into the CTE to see what that is.
Using CROSS APPLY or OUTER APPLY, all the other tables not involved in the single-inner-row-per-outer-row selection are outside where (to me) they belong. We aren't squishing concerns together. Using Row_Number feels a bit like throwing a DISTINCT on a query to fix duplication rather than dealing with the underlying issue. I guess this is basically the same issue as the previous point worded in a different way.
The moment you have TWO tables from which you wish to pull the most recent value, the Row_Number() solution blows up completely. With this syntax, you just easily add another APPLY clause, and it's crystal clear what you're doing. There is a way to use Row_Number for the multiple tables scenario by moving the other tables outside, but I still don't prefer that syntax.
Using this syntax allows you to perform additional joins based on whether the selected row exists or not (in the case that no matching row was found). In the Row_Number solution, you can only reasonably do that NOT NULL checking in the outer query--so you are forced to split up the query into multiple, separated parts (you don't want to be joining to values you will be discarding!).
P.S. I strongly encourage you to use aliases that hint at the table they represent. Please don't use a and b. I used p for Parts and v for Vendor--this helps you and others make sense of the query more quickly in the future.
If I understood you corrrectly, you want the rows with the max date for locations A, B and C. Now, assuming SQL Server 2005+, you can do this:
;WITH CTE AS
(
SELECT
a.Part,
b.Location,
b.LeadTime,
RN = ROW_NUMBER() OVER(PARTITION BY a.Part ORDER BY [Date] DESC)
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
)
SELECT Part,
Location,
LeadTime
FROM CTE
WHERE RN = 1
ORDER BY Part
In your subquery you need to correlate the Location and Part to the outer query.
Example:
Date = (SELECT MAX(Date)
FROM dbo.Vender v
WHERE v.Location = b.Location
AND v.Part = b.Part
)
So this will bring back one date for each location and part

Postgresql Faulty Syntax on select/join/group

What about the following is not proper syntax for Postgresql?
select p.*, SUM(vote) as votes_count
FROM votes v, posts p
where p.id = v.`voteable_id`
AND v.`voteable_type` = 'Post'
group by v.voteable_id
order by votes_count DESC limit 20
I am in the process of installing postgresql locally but wanted to get this out sooner :)
Thank you
MySQL is a lot looser in its interpretation of standard SQL than PostgreSQL is. There are two issues with your query:
Backtick quoting is a MySQL thing.
Your GROUP BY is invalid.
The first one can be fixed by simply removing the offending quotes. The second one requires more work; from the fine manual:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
This means that every column mentioned in your SELECT either has to appear in an aggregate function or in the GROUP BY clause. So, you have to expand your p.* and make sure that all those columns are in the GROUP BY, you should end up with something like this but with real columns in place of p.column...:
select p.id, p.column..., sum(v.vote) as votes_count
from votes v, posts p
where p.id = v.voteable_id
and v.voteable_type = 'Post'
group by p.id, p.column...
order by votes_count desc
limit 20
This is a pretty common problem when moving from MySQL to anything else.