Left outer join using 2 of 3 tables in Postgresql - postgresql

I need to show all clients entered into the system for a date range.
All clients are assigned to a group, but not necessarily to a staff.
When I run the query as such:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
staff.staff_name_cs,
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
clients.zrud_staff = staff.zzud_staff AND
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 121 entries
if I comment out:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
-- staff.staff_name_cs, -- Line Commented out
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
-- clients.zrud_staff = staff.zzud_staff AND --Line commented out
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 173 entries
I know I need to do an outer join to capture all clients regardless of if there
is a staff assigned, but each attempt has failed. I have done outer joins with
two tables, but adding a third has twisted my brain.
Thanks for any suggestions

I have no way of testing this (or of knowing that it is right) but what I read in your query is that you want something similar to this:
SELECT --I just used short aliases. I choose something other than the table name so I know it is an alias "c" for client etc...
c.name_lastfirst_cs,
to_char (c.date_intake,'MM/DD/YY')AS Date_Created,
c.client_id,
c.display_intake,
s.staff_name_cs,
g.name,
l.zrud_client AS "link_client",--I'm selecting some data here so that I can debug later, you can just filter this out with another select if you need to
l.zzud_group AS "link_group" --Again, so I can see these relationships
FROM
public.clients c
LEFT OUTER JOIN staff s ON --is staff required? If it isn't then outer join (optional)
s.zzud_staff = c.zrud_staff --so we linked staff to clients here
LEFT OUTER JOIN public.link_group l ON --this looks like a lookup table to me so we select the lookup record
l.zrud_client = c.zzud_client -- this is how I define the lookup, a client id
LEFT OUTER JOIN public.groups g ON --then we use that to lookup a group
g.zzup_group = l.zrud_group --which is defined by this data here
WHERE -- the following must be true
c.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
Now for the why: I've basically moved your where clause to JOIN x ON y=z syntax. In my experience this is a better way to write an maintain queries as it allows you to specify relationships between tables rather than doing a big-ol'-join and trying to filter that data with the where clause. Keep in mind each condition is REQUIRED not optional so when you say you want records with the following conditions you're going to get them (and if I read this right--I probably don't as I don't have a schema in-front of me) if a record is missing a link-table record OR a staff member you're going to filter it out.
Alternatively (possibly significantly slower) You can SELECT anything so you can chain it like:
SELECT
*
FROM
(
SELECT
*
FROM
public.clients
WHERE
x condition
)
WHERE
y condition
OR
SELECT * FROM x WHERE x.condition IN (SELECT * FROM y)
In your case this tactic probably won't be easier than a standard join syntax.
^And some serious opinion here: I recommend you use the join syntax I outlined above here. It is functionally the same as joining and specifying a where clause, but as you noted, if you don't understand the relationships it can cause a Cartesian join. http://www.tutorialspoint.com/sql/sql-cartesian-joins.htm . Lastly, I tend to specify what type of join I want. I write INNER JOIN and OUTER JOIN a lot in my queries because it helps the next person (usually me) figure out what the heck I meant. If it is optional use an outer join, if it is required use an inner join (default).
Good luck! There are much better SQL developers out there and there's probably another way to do it.

Related

MySQL Between advise [duplicate]

Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 3 years ago.
I have this table for documents (simplified version here):
id
rev
content
1
1
...
2
1
...
1
2
...
1
3
...
How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I'm using MySQL.
Currently I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?
At first glance...
All you need is a GROUP BY clause with the MAX aggregate function:
SELECT id, MAX(rev)
FROM YourTable
GROUP BY id
It's never that simple, is it?
I just noticed you need the content column as well.
This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.
It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.
Basically, you have two approaches to solve that problem:
Joining with simple group-identifier, max-value-in-group Sub-query
In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:
SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
SELECT id, MAX(rev) rev
FROM YourTable
GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev
Left Joining with self, tweaking join conditions and filters
In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:
The second join condition is having left side value less than right value
When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.
So you end up with:
SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;
Conclusion
Both approaches bring the exact same result.
If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.
Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".
Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.
My preference is to use as little code as possible...
You can do it using IN
try this:
SELECT *
FROM t1 WHERE (id,rev) IN
( SELECT id, MAX(rev)
FROM t1
GROUP BY id
)
to my mind it is less complicated... easier to read and maintain.
I am flabbergasted that no answer offered SQL window function solution:
SELECT a.id, a.rev, a.contents
FROM (SELECT id, rev, contents,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) ranked_order
FROM YourTable) a
WHERE a.ranked_order = 1
Added in SQL standard ANSI/ISO Standard SQL:2003 and later extended with ANSI/ISO Standard SQL:2008, window (or windowing) functions are available with all major vendors now. There are more types of rank functions available to deal with a tie issue: RANK, DENSE_RANK, PERSENT_RANK.
Yet another solution is to use a correlated subquery:
select yt.id, yt.rev, yt.contents
from YourTable yt
where rev =
(select max(rev) from YourTable st where yt.id=st.id)
Having an index on (id,rev) renders the subquery almost as a simple lookup...
Following are comparisons to the solutions in #AdrianCarneiro's answer (subquery, leftjoin), based on MySQL measurements with InnoDB table of ~1million records, group size being: 1-3.
While for full table scans subquery/leftjoin/correlated timings relate to each other as 6/8/9, when it comes to direct lookups or batch (id in (1,2,3)), subquery is much slower then the others (Due to rerunning the subquery). However I couldnt differentiate between leftjoin and correlated solutions in speed.
One final note, as leftjoin creates n*(n+1)/2 joins in groups, its performance can be heavily affected by the size of groups...
I can't vouch for the performance, but here's a trick inspired by the limitations of Microsoft Excel. It has some good features
GOOD STUFF
It should force return of only one "max record" even if there is a tie (sometimes useful)
It doesn't require a join
APPROACH
It is a little bit ugly and requires that you know something about the range of valid values of the rev column. Let us assume that we know the rev column is a number between 0.00 and 999 including decimals but that there will only ever be two digits to the right of the decimal point (e.g. 34.17 would be a valid value).
The gist of the thing is that you create a single synthetic column by string concatenating/packing the primary comparison field along with the data you want. In this way, you can force SQL's MAX() aggregate function to return all of the data (because it has been packed into a single column). Then you have to unpack the data.
Here's how it looks with the above example, written in SQL
SELECT id,
CAST(SUBSTRING(max(packed_col) FROM 2 FOR 6) AS float) as max_rev,
SUBSTRING(max(packed_col) FROM 11) AS content_for_max_rev
FROM (SELECT id,
CAST(1000 + rev + .001 as CHAR) || '---' || CAST(content AS char) AS packed_col
FROM yourtable
)
GROUP BY id
The packing begins by forcing the rev column to be a number of known character length regardless of the value of rev so that for example
3.2 becomes 1003.201
57 becomes 1057.001
923.88 becomes 1923.881
If you do it right, string comparison of two numbers should yield the same "max" as numeric comparison of the two numbers and it's easy to convert back to the original number using the substring function (which is available in one form or another pretty much everywhere).
Unique Identifiers? Yes! Unique identifiers!
One of the best ways to develop a MySQL DB is to have each id AUTOINCREMENT (Source MySQL.com). This allows a variety of advantages, too many to cover here. The problem with the question is that its example has duplicate ids. This disregards these tremendous advantages of unique identifiers, and at the same time, is confusing to those familiar with this already.
Cleanest Solution
DB Fiddle
Newer versions of MySQL come with ONLY_FULL_GROUP_BY enabled by default, and many of the solutions here will fail in testing with this condition.
Even so, we can simply select DISTINCT someuniquefield, MAX( whateverotherfieldtoselect ), ( *somethirdfield ), etc., and have no worries understanding the result or how the query works :
SELECT DISTINCT t1.id, MAX(t1.rev), MAX(t2.content)
FROM Table1 AS t1
JOIN Table1 AS t2 ON t2.id = t1.id AND t2.rev = (
SELECT MAX(rev) FROM Table1 t3 WHERE t3.id = t1.id
)
GROUP BY t1.id;
SELECT DISTINCT Table1.id, max(Table1.rev), max(Table2.content) : Return DISTINCT somefield, MAX() some otherfield, the last MAX() is redundant, because I know it's just one row, but it's required by the query.
FROM Employee : Table searched on.
JOIN Table1 AS Table2 ON Table2.rev = Table1.rev : Join the second table on the first, because, we need to get the max(table1.rev)'s comment.
GROUP BY Table1.id: Force the top-sorted, Salary row of each employee to be the returned result.
Note that since "content" was "..." in OP's question, there's no way to test that this works. So, I changed that to "..a", "..b", so, we can actually now see that the results are correct:
id max(Table1.rev) max(Table2.content)
1 3 ..d
2 1 ..b
Why is it clean? DISTINCT(), MAX(), etc., all make wonderful use of MySQL indices. This will be faster. Or, it will be much faster, if you have indexing, and you compare it to a query that looks at all rows.
Original Solution
With ONLY_FULL_GROUP_BY disabled, we can use still use GROUP BY, but then we are only using it on the Salary, and not the id:
SELECT *
FROM
(SELECT *
FROM Employee
ORDER BY Salary DESC)
AS employeesub
GROUP BY employeesub.Salary;
SELECT * : Return all fields.
FROM Employee : Table searched on.
(SELECT *...) subquery : Return all people, sorted by Salary.
GROUP BY employeesub.Salary: Force the top-sorted, Salary row of each employee to be the returned result.
Unique-Row Solution
Note the Definition of a Relational Database: "Each row in a table has its own unique key." This would mean that, in the question's example, id would have to be unique, and in that case, we can just do :
SELECT *
FROM Employee
WHERE Employee.id = 12345
ORDER BY Employee.Salary DESC
LIMIT 1
Hopefully this is a solution that solves the problem and helps everyone better understand what's happening in the DB.
Another manner to do the job is using MAX() analytic function in OVER PARTITION clause
SELECT t.*
FROM
(
SELECT id
,rev
,contents
,MAX(rev) OVER (PARTITION BY id) as max_rev
FROM YourTable
) t
WHERE t.rev = t.max_rev
The other ROW_NUMBER() OVER PARTITION solution already documented in this post is
SELECT t.*
FROM
(
SELECT id
,rev
,contents
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rank
FROM YourTable
) t
WHERE t.rank = 1
This 2 SELECT work well on Oracle 10g.
MAX() solution runs certainly FASTER that ROW_NUMBER() solution because MAX() complexity is O(n) while ROW_NUMBER() complexity is at minimum O(n.log(n)) where n represent the number of records in table !
Something like this?
SELECT yourtable.id, rev, content
FROM yourtable
INNER JOIN (
SELECT id, max(rev) as maxrev
FROM yourtable
GROUP BY id
) AS child ON (yourtable.id = child.id) AND (yourtable.rev = maxrev)
I like to use a NOT EXIST-based solution for this problem:
SELECT
id,
rev
-- you can select other columns here
FROM YourTable t
WHERE NOT EXISTS (
SELECT * FROM YourTable t WHERE t.id = id AND rev > t.rev
)
This will select all records with max value within the group and allows you to select other columns.
SELECT *
FROM Employee
where Employee.Salary in (select max(salary) from Employee group by Employe_id)
ORDER BY Employee.Salary
Note: I probably wouldn't recommend this anymore in MySQL 8+ days. Haven't used it in years.
A third solution I hardly ever see mentioned is MySQL specific and looks like this:
SELECT id, MAX(rev) AS rev
, 0+SUBSTRING_INDEX(GROUP_CONCAT(numeric_content ORDER BY rev DESC), ',', 1) AS numeric_content
FROM t1
GROUP BY id
Yes it looks awful (converting to string and back etc.) but in my experience it's usually faster than the other solutions. Maybe that's just for my use cases, but I have used it on tables with millions of records and many unique ids. Maybe it's because MySQL is pretty bad at optimizing the other solutions (at least in the 5.0 days when I came up with this solution).
One important thing is that GROUP_CONCAT has a maximum length for the string it can build up. You probably want to raise this limit by setting the group_concat_max_len variable. And keep in mind that this will be a limit on scaling if you have a large number of rows.
Anyway, the above doesn't directly work if your content field is already text. In that case you probably want to use a different separator, like \0 maybe. You'll also run into the group_concat_max_len limit quicker.
I think, You want this?
select * from docs where (id, rev) IN (select id, max(rev) as rev from docs group by id order by id)
SQL Fiddle :
Check here
NOT mySQL, but for other people finding this question and using SQL, another way to resolve the greatest-n-per-group problem is using Cross Apply in MS SQL
WITH DocIds AS (SELECT DISTINCT id FROM docs)
SELECT d2.id, d2.rev, d2.content
FROM DocIds d1
CROSS APPLY (
SELECT Top 1 * FROM docs d
WHERE d.id = d1.id
ORDER BY rev DESC
) d2
Here's an example in SqlFiddle
I would use this:
select t.*
from test as t
join
(select max(rev) as rev
from test
group by id) as o
on o.rev = t.rev
Subquery SELECT is not too eficient maybe, but in JOIN clause seems to be usable. I'm not an expert in optimizing queries, but I've tried at MySQL, PostgreSQL, FireBird and it does work very good.
You can use this schema in multiple joins and with WHERE clause. It is my working example (solving identical to yours problem with table "firmy"):
select *
from platnosci as p
join firmy as f
on p.id_rel_firmy = f.id_rel
join (select max(id_obj) as id_obj
from firmy
group by id_rel) as o
on o.id_obj = f.id_obj and p.od > '2014-03-01'
It is asked on tables having teens thusands of records, and it takes less then 0,01 second on really not too strong machine.
I wouldn't use IN clause (as it is mentioned somewhere above). IN is given to use with short lists of constans, and not as to be the query filter built on subquery. It is because subquery in IN is performed for every scanned record which can made query taking very loooong time.
Since this is most popular question with regard to this problem, I'll re-post another answer to it here as well:
It looks like there is simpler way to do this (but only in MySQL):
select *
from (select * from mytable order by id, rev desc ) x
group by id
Please credit answer of user Bohemian in this question for providing such a concise and elegant answer to this problem.
Edit: though this solution works for many people it may not be stable in the long run, since MySQL doesn't guarantee that GROUP BY statement will return meaningful values for columns not in GROUP BY list. So use this solution at your own risk!
If you have many fields in select statement and you want latest value for all of those fields through optimized code:
select * from
(select * from table_name
order by id,rev desc) temp
group by id
How about this:
SELECT all_fields.*
FROM (SELECT id, MAX(rev) FROM yourtable GROUP BY id) AS max_recs
LEFT OUTER JOIN yourtable AS all_fields
ON max_recs.id = all_fields.id
This solution makes only one selection from YourTable, therefore it's faster. It works only for MySQL and SQLite(for SQLite remove DESC) according to test on sqlfiddle.com. Maybe it can be tweaked to work on other languages which I am not familiar with.
SELECT *
FROM ( SELECT *
FROM ( SELECT 1 as id, 1 as rev, 'content1' as content
UNION
SELECT 2, 1, 'content2'
UNION
SELECT 1, 2, 'content3'
UNION
SELECT 1, 3, 'content4'
) as YourTable
ORDER BY id, rev DESC
) as YourTable
GROUP BY id
Here is a nice way of doing that
Use following code :
with temp as (
select count(field1) as summ , field1
from table_name
group by field1 )
select * from temp where summ = (select max(summ) from temp)
I like to do this by ranking the records by some column. In this case, rank rev values grouped by id. Those with higher rev will have lower rankings. So highest rev will have ranking of 1.
select id, rev, content
from
(select
#rowNum := if(#prevValue = id, #rowNum+1, 1) as row_num,
id, rev, content,
#prevValue := id
from
(select id, rev, content from YOURTABLE order by id asc, rev desc) TEMP,
(select #rowNum := 1 from DUAL) X,
(select #prevValue := -1 from DUAL) Y) TEMP
where row_num = 1;
Not sure if introducing variables makes the whole thing slower. But at least I'm not querying YOURTABLE twice.
here is another solution hope it will help someone
Select a.id , a.rev, a.content from Table1 a
inner join
(SELECT id, max(rev) rev FROM Table1 GROUP BY id) x on x.id =a.id and x.rev =a.rev
None of these answers have worked for me.
This is what worked for me.
with score as (select max(score_up) from history)
select history.* from score, history where history.score_up = score.max
Here's another solution to retrieving the records only with a field that has the maximum value for that field. This works for SQL400 which is the platform I work on. In this example, the records with the maximum value in field FIELD5 will be retrieved by the following SQL statement.
SELECT A.KEYFIELD1, A.KEYFIELD2, A.FIELD3, A.FIELD4, A.FIELD5
FROM MYFILE A
WHERE RRN(A) IN
(SELECT RRN(B)
FROM MYFILE B
WHERE B.KEYFIELD1 = A.KEYFIELD1 AND B.KEYFIELD2 = A.KEYFIELD2
ORDER BY B.FIELD5 DESC
FETCH FIRST ROW ONLY)
Sorted the rev field in reverse order and then grouped by id which gave the first row of each grouping which is the one with the highest rev value.
SELECT * FROM (SELECT * FROM table1 ORDER BY id, rev DESC) X GROUP BY X.id;
Tested in http://sqlfiddle.com/ with the following data
CREATE TABLE table1
(`id` int, `rev` int, `content` varchar(11));
INSERT INTO table1
(`id`, `rev`, `content`)
VALUES
(1, 1, 'One-One'),
(1, 2, 'One-Two'),
(2, 1, 'Two-One'),
(2, 2, 'Two-Two'),
(3, 2, 'Three-Two'),
(3, 1, 'Three-One'),
(3, 3, 'Three-Three')
;
This gave the following result in MySql 5.5 and 5.6
id rev content
1 2 One-Two
2 2 Two-Two
3 3 Three-Two
You can make the select without a join when you combine the rev and id into one maxRevId value for MAX() and then split it back to original values:
SELECT maxRevId & ((1 << 32) - 1) as id, maxRevId >> 32 AS rev
FROM (SELECT MAX(((rev << 32) | id)) AS maxRevId
FROM YourTable
GROUP BY id) x;
This is especially fast when there is a complex join instead of a single table. With the traditional approaches the complex join would be done twice.
The above combination is simple with bit functions when rev and id are INT UNSIGNED (32 bit) and combined value fits to BIGINT UNSIGNED (64 bit). When the id & rev are larger than 32-bit values or made of multiple columns, you need combine the value into e.g. a binary value with suitable padding for MAX().
Explanation
This is not pure SQL. This will use the SQLAlchemy ORM.
I came here looking for SQLAlchemy help, so I will duplicate Adrian Carneiro's answer with the python/SQLAlchemy version, specifically the outer join part.
This query answers the question of:
"Can you return me the records in this group of records (based on same id) that have the highest version number".
This allows me to duplicate the record, update it, increment its version number, and have the copy of the old version in such a way that I can show change over time.
Code
MyTableAlias = aliased(MyTable)
newest_records = appdb.session.query(MyTable).select_from(join(
MyTable,
MyTableAlias,
onclause=and_(
MyTable.id == MyTableAlias.id,
MyTable.version_int < MyTableAlias.version_int
),
isouter=True
)
).filter(
MyTableAlias.id == None,
).all()
Tested on a PostgreSQL database.
I used the below to solve a problem of my own. I first created a temp table and inserted the max rev value per unique id.
CREATE TABLE #temp1
(
id varchar(20)
, rev int
)
INSERT INTO #temp1
SELECT a.id, MAX(a.rev) as rev
FROM
(
SELECT id, content, SUM(rev) as rev
FROM YourTable
GROUP BY id, content
) as a
GROUP BY a.id
ORDER BY a.id
I then joined these max values (#temp1) to all of the possible id/content combinations. By doing this, I naturally filter out the non-maximum id/content combinations, and am left with the only max rev values for each.
SELECT a.id, a.rev, content
FROM #temp1 as a
LEFT JOIN
(
SELECT id, content, SUM(rev) as rev
FROM YourTable
GROUP BY id, content
) as b on a.id = b.id and a.rev = b.rev
GROUP BY a.id, a.rev, b.content
ORDER BY a.id

How to specify two expressions in the select list when the subquery is not introduced with EXISTS

I have a query that uses a subquery and I am having a problem returning the expected results. The error I receive is..."Only one expression can be specified in the select list when the subquery is not introduced with EXISTS." How can I rewrite this to work?
SELECT
a.Part,
b.Location,
b.LeadTime
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
AND
Date IN (SELECT Location, MAX(Date) FROM dbo.Vendor GROUP BY Location)
GROUP BY
a.Part,
b.Location,
b.LeadTime
ORDER BY
a.Part
I think something like this may be what you're looking for. You didn't say what version of SQL Server--this works in SQL 2005 and up:
SELECT
p.Part,
p.Location, -- from *p*, otherwise if no match we'll get a NULL
v.LeadTime
FROM
dbo.Parts p
OUTER APPLY (
SELECT TOP (1) * -- * here is okay because we specify columns outside
FROM dbo.Vendor v
WHERE p.Location = v.Location -- the correlation part
ORDER BY v.Date DESC
) v
WHERE
p.Location IN ('A','B','C')
ORDER BY
p.Part
;
Now, your query can be repaired as is by adding the "correlation" part to change your query into a correlated subquery as demonstrated in Kory's answer (you'd also remove the GROUP BY clause). However, that method still requires an additional and unnecessary join, hurting performance, plus it can only pull one column at a time. This method allows you to pull all the columns from the other table, and has no extra join.
Note: this gives logically the same results as Lamak's answer, however I prefer it for a few reasons:
When there is an index on the correlation columns (Location, here) this can be satisfied with seeks, but the Row_Number solution has to scan (I believe).
I prefer the way this expresses the intent of the query more directly and succinctly. In the Row_Number method, one must get out to the outer condition to see that we are only grabbing the rn = 1 values, then bop back into the CTE to see what that is.
Using CROSS APPLY or OUTER APPLY, all the other tables not involved in the single-inner-row-per-outer-row selection are outside where (to me) they belong. We aren't squishing concerns together. Using Row_Number feels a bit like throwing a DISTINCT on a query to fix duplication rather than dealing with the underlying issue. I guess this is basically the same issue as the previous point worded in a different way.
The moment you have TWO tables from which you wish to pull the most recent value, the Row_Number() solution blows up completely. With this syntax, you just easily add another APPLY clause, and it's crystal clear what you're doing. There is a way to use Row_Number for the multiple tables scenario by moving the other tables outside, but I still don't prefer that syntax.
Using this syntax allows you to perform additional joins based on whether the selected row exists or not (in the case that no matching row was found). In the Row_Number solution, you can only reasonably do that NOT NULL checking in the outer query--so you are forced to split up the query into multiple, separated parts (you don't want to be joining to values you will be discarding!).
P.S. I strongly encourage you to use aliases that hint at the table they represent. Please don't use a and b. I used p for Parts and v for Vendor--this helps you and others make sense of the query more quickly in the future.
If I understood you corrrectly, you want the rows with the max date for locations A, B and C. Now, assuming SQL Server 2005+, you can do this:
;WITH CTE AS
(
SELECT
a.Part,
b.Location,
b.LeadTime,
RN = ROW_NUMBER() OVER(PARTITION BY a.Part ORDER BY [Date] DESC)
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
)
SELECT Part,
Location,
LeadTime
FROM CTE
WHERE RN = 1
ORDER BY Part
In your subquery you need to correlate the Location and Part to the outer query.
Example:
Date = (SELECT MAX(Date)
FROM dbo.Vender v
WHERE v.Location = b.Location
AND v.Part = b.Part
)
So this will bring back one date for each location and part

select distinct from 2 columns but only 1 is duplicate

select a.subscriber_msisdn, war.created_datetime from
(
select distinct subscriber_msisdn from wiz_application_response
where application_item_id in
(select id from wiz_application_item where application_id=155)
and created_datetime between '2012-10-07 00:00' and '2012-11-15 00:00:54'
) a
left outer join wiz_application_response war on (war.subscriber_msisdn=a.subscriber_msisdn)
the sub select returns 11 rows but when joined return 18 (with duplicates). The objective of this query is only add the date column to the 11 rows of the sub select.
Based on your description, it stands to reason that there are multiple created_datetime values for some of the subscriber_msisdn values which is what prompted you to use the distinct in the subquery to begin with. By joining the sub query to the original table you are defeating this. A cleaner way to write the query would be:
SELECT
war.subscriber_msisdn
, war.created_datetime
FROM
wiz_application_response war
LEFT JOIN wiz_application_item wai
ON war.application_item_id = wai.id
AND wai.application_id = 155
WHERE
war.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
This should return only the rows from the war table that satisfy the criteria based on the wai table. It should not be and outer join unless you wanted to return all the rows from war table that satisfied the created_datetime parameter regardless of the application_item_id parameter.
This is my best guess based on the limited information I have about your tables and what I’m assuming you’re trying to accomplish. If this doesn’t get you what you are after, I will continue to offer other ideas based on additional information you could provide. Hope this works.
Can most probably simplified to this:
SELECT DISTINCT ON (1)
r.subscriber_msisdn, r.created_datetime
FROM wiz_application_item i
JOIN wiz_application_response r ON r.application_item_id = i.id
WHERE i.application_id = 155
AND i.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
ORDER BY 1, 2 DESC -- to pick the latest created_datetime
Details depend on missing information.
More explanation here.

Nested select statement in FROM clause? Inner Join statements? or just table name?

I'm building a query that needs data from 5 tables.
I've been told by a DBA in the past that specifying a list of columns vs getting all columns (*) is preferred from some performance/memory aspect.
I've also been told that the database performs a JOIN operation behind the scenes when there's a list of tables in the FROM clause, to create one table (or view).
The existing database has very little data at the moment, as we're at a very initial point. So not sure I can measure the performance hit in practice.
I am not a database pro. I can get what data I need. The dillema is, at what price.
Added: At the moment I'm working with MS SQL Server 2008 R2.
My questions are:
Is there a performance difference and why, between the following:
a. SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel that this might be a performance hit)
b. SELECT ... FROM tbl1 inner join tbl2 on ... inner join tbl3 on ... etc (would this be more explicit to the server and save on performance/memory)?
c. SELECT ... FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this save anythig? or is it just extra select statements that create more work for the server and for us)?
Is there yet a better way to do this?
Below are two queries that both get the slice of data that I need. One includes more nested select statements.
I apologize if they are not written in a standard form or helplessly overcomplicated - hopefully you can decipher. I try to keep them organized as much as possible.
Insights would be most appreciated as well.
Thanks for checking this out.
5 tables: devicepool, users, trips, TripTracker, and order
Query 1 (more select statements):
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
(
SELECT
tripid,
[status] tripstatus,
stops,
devid,
groupid
FROM
trips
)
AS [base2]
ON base.devid = base2.devid AND base2.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
(
SELECT
[status] [orderstatus],
[address] [destaddress],
[tripid],
stopnumber orderstopnumber
FROM [order]
)
AS [orders]
ON orders.orderstopnumber = tracker.stopnumber)
Query 2:
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
trips
ON base.devid = trips.devid AND trips.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
[order]
ON [order].stopnumber = tracker.stopnumber)
Is there a performance difference and why, between the following: a.
SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel
that this might be a performance hit) b. SELECT ... FROM tbl1 inner
join tbl2 on ... inner join tbl3 on ... etc (would this be more
explicit to the server and save on performance/memory)? c. SELECT ...
FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this
save anythig? or is it just extra select statements that create more
work for the server and for us)?
a) and b) should result in the same query plan (although this is db-specific). b) is much preferred for portability and readability over a). c) is a horrible idea, that hurts readability and if anything will result in worse peformance. Let us never speak of it again.
Is there yet a better way to do this?
b) is the standard approach. In general, writing the plainest ANSI SQL will result in the best performance, as it allows the query parser to easily understand what you are trying to do. Trying to outsmart the compiler with tricks may work in a given situation, but does not mean that it will still work when the cardinality or amount of data changes, or the database engine is upgraded. So, avoid doing that unless you are absolutely forced to.

OR statement in select part of query in Postgres

I have a query
SELECT
cd.signoffdate,
min(cmp.dsignoff) as dsignoff
FROM clients AS c
LEFT JOIN campaigns AS cmp ORDER BY dsignoff;
If I want to have something like this built into the postgres query will it work and how do I do it
if the cd.signoffdate is empty it should take min(cmp.dsignoff) as dsignoff as the value and then order by this column, so in other words it should order by dsignoff and cd.signoffdate and tread it as one column, is this possible and how?
Your query could look like this:
SELECT c.client_id, COALESCE(c.signoffdate, min(cmp.dsignoff)) AS signoff
FROM clients c
LEFT JOIN campaigns cmp ON cmp.client_id = c.client_id -- join condition!
GROUP BY c.client_id, cd.signoffdate -- group by!
ORDER BY COALESCE(c.signoffdate, min(cmp.dsignoff));
Or, with simplified syntax:
SELECT c.client_id, COALESCE(c.signoffdate, min(cmp.dsignoff)) AS signoff
FROM clients c
LEFT JOIN campaigns cmp USING (client_id)
GROUP BY 1, cd.signoffdate
ORDER BY 2;
Major points:
Used alias c, but referenced as cd.
No join condition leads to cross join, probably not intended.
Missing GROUP BY.
I assume that you want to group by the primary key column of clients and call it client_id.
I also assume that client_id links the two tables together.
COALESCE() serves as fallback in case signoffdate IS NULL.
ORDER BY coalesce(cd.signoffdate, min(cmp.dsignoff));
But don't you need some GROUP BY in your original query?
You can use COALESCE
SELECT COALESCE(cd.signoffdate, min(cmp.dsignoff)) as dsignoff
I'm not sure if you can order by coalesce in Postgres - might be worth just ordering by both columns