SQL Left Join, Cross Join - Find a Missing Value - tsql

This should be really easy, but I just can't spot the answer (I'll blame it on the fact that I've been SQL coding for almost 14 hours now).
I have a table called report with a "Label" field and a "ReportYear" field. I know that I have three possible "Label" values, but not every label will have an entry for every year.
I can select a distinct label list and I get:
'Regular'
'Special'
'None'
Doing some manual data selection here, I know that 2011 and 2010 have 0 entries with the label 'Special', but I need a query to pull this information together, so that it comes out:
'Regular' - '2012' - '5'
'Regular' - '2011' - '2'
'Regular' - '2010' - '1'
'Special' - '2012' - '3'
'Special' - '2011' - '0'
'Special' - '2010' - '0'
'None' - '2012' - '10'
'None' - '2011' - '5'
'None' - '2010' - '2'
Hopefully that makes sense.
I know I can SELECT Count(*), Label FROM (SELECT DISTINCT Label FROM Report) t1 ... but then? LEFT JOIN Report t2 ON t1.Label=t2.Label? CROSS JOIN?
My brain is fried.
Help?

SELECT
L.Label,
Y.ReportYear,
ReportCount = Count(R.Label)
FROM
(SELECT DISTINCT Label FROM dbo.Report) L
CROSS JOIN (SELECT DISTINCT ReportYear FROM dbo.Report) Y
LEFT JOIN dbo.Report R
ON L.Label = R.Label
AND Y.ReportYear = R.ReportYear
GROUP BY
L.Label,
Y.ReportYear
Now, this isn't really ideal because you're doing an entire table scan, twice, just to get the labels and years.
Unless I am misunderstanding things, it seems to me that you should normalize the Report table so it has a ReportLabelID column and then have a ReportLabel table with the distinct labels. Then you can put the ReportLabel table in place of the DISTINCT query above.
And you can also eliminate the ReportYear subquery by parameterizing the whole query to accept BeginYear and EndYear or something like that. Or, you could get the Min(ReportYear), Max(ReportYear) from the table, and assuming you have an index on the column it may be able to turn it into seeks (two separate queries might be needed to get this), then use a numbers table or on-the-fly numbers table to generate the sequence of years between them.
Once you make those two changes, the query will then perform significantly better.

Related

Oracle Sql - Discarding outer select if inner select returns null, and avoiding multiple rows

Pre-Info: In our company a person is marked * if he is actively working. And there are people who changed their departments.
For a report I use 2 tables named COMPANY_PERSON_ALL and trifm_izinler4, joining person_id field as below.
I want to discard (don't list) the row, if the first inner select returns null.
And I want to prevent the second inner select returning multiple Departments.
select izn.person_id, izn.adi_soyadi, izn.company_id,
(select a.employee_status from COMPANY_PERSON_ALL a where a.employee_status = '*' and a.person_id = izn.person_id) as Status,
(select a.org_code from COMPANY_PERSON_ALL a where a.person_id = izn.person_id) as Department,
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
where trunc(rapor_tarihi) = trunc(SYSDATE)
Can you help me how to overcome these 2 problems of inner select statements?
Assuming you only want to see the department from the active person record, you can just join the two tables instead of using subquery expressions, and filter on that status:
select izn.person_id, izn.adi_soyadi, izn.company_id,
a.employee_status as status, a.org_code as department
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
join company_person_all a on a.person_id = izn.person_id
where rapor_tarihi >= trunc(SYSDATE)
-- and rapor_tarihi < trunc(SYSDATE) + 1 -- probably not needed
and a.employee_status = '*'
I've also changed the date comparison; if you compare using trunc(rapor_tarihi) then a normal index on that column can't be used, so it's generally better to compare the original value against a range. Since you're comparing against today's date you probably only need to look for values greater than midnight today, but if that column can have future dates then you can put an upper bound on the range of midnight tomorrow - which I've included but commented out.
If a person can be active in more than one department at a time then this will show all of those, but your wording suggests people are only active in one at a time. If you want to see a department for all active users, but not necessarily the one that has the active flag (or if there can be more than one active), then it's a bit more complicated, and you need to explain how you would want to choose which to show.

Postgresql subqueries using a calculated column

I am new to this platform and need to get a value using a column I already calculated. I know I need a subquery, but am confused by the proper syntax.
SELECT well_id, reported_date, oil,
(EXTRACT(EPOCH FROM age(reported_date,
LAG(reported_date) OVER w))/3600)::int as hourly_rate,
(oil/hourly_rate)::double precision as six
FROM public.production
WINDOW w AS (PARTITION BY well_id ORDER BY well_id, reported_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
The error I am getting is
ERROR: column "hourly_rate" does not exist
LINE 4: (oil/hourly_rate)::double precision as six
^
HINT: Perhaps you meant to reference the column "production.hour_rate".
SQL state: 42703
Character: 171
Which I understand...I have tried brackets, naming the sub queries and different tactics. I know this is a syntax thing can someone please give me a hand. Thank you
I'm a bit confused with your notation, but it looks like there are parenthesis issues: your from statement is not linked to the select.
In my opinion, the best way to manage subqueries is to wrinte someting like this :
WITH query1 AS (
select col1, col2
from table1
),
query2 as (
select col1, col2
from query1
(additional clauses)
),
select (what you want)
from query2
(additional statements)
Then you can manipulate your data progressively until you have the right organisation of your data for the final select, including aggregations
You cannot use alias in the select list. YOu need to include the original calculation in the column. So your updated query would look alike -
SELECT well_id, reported_date, oil,
(EXTRACT(EPOCH FROM age(reported_date, LAG(reported_date) OVER w))/3600)::int as hourly_rate,
(Oil/(EXTRACT(EPOCH FROM age(reported_date, LAG(reported_date) OVER w))/3600))::double precision as six
FROM public.production
WINDOW w AS (PARTITION BY well_id ORDER BY well_id, reported_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)

Using EXCEPT and flagging column differences

What Im looking to do is select data from a postgres table, which does not appear in another. Both tables have identical columns, bar the use of boolean over Varchar(1) but the issue is that the data in those columns do not match up.
I know I can do this with a SELECT EXCEPT SELECT statement, which I have implemented and is working.
What I would like to do is find a method to flag the columns that do not match up. As an idea, I have thought to append a character to the end of the data in the fields that do not match.
For example if the updateflag is different in one table to the other, I would be returned '* f' instead of 'f'
SELECT id, number, "updateflag" from dbc.person
EXCEPT
SELECT id, number, "updateflag":bool from dbg.person;
Should I be joining the two tables together, post executing this statement to identify the differences, from whats returned?
I have tried to research methods to implement this but have no found anything on the topic
I prefer a full outer join for this
select *
from dbc.person p1
full join dbg.person p2 on p1.id = p2.id
where p1 is distinct from p2;
The id column is assumed the primary key column that "links" the two tables together.
This will only return rows where at least one column is different.
If you want to see the differences, you could use a hstore feature
select hstore(p1) - hstore(p2) as columns_diff_p1,
hstore(p2) - hstore(p1) as columns_diff_p2
from dbc.person p1
full join dbg.person p2 on p1.id = p2.id
where p1 is distinct from p2;

Using a list of search patterns in LIKE or IN expression

The question: I have a list of sales quotations and many of them are not valid as they are simply in the system for practice or training. Usually the quotation name contains the word 'Test' or 'Dummy'. (In a couple of instances the quote_name contains 'Prova' - which happens to be Italian for 'Test').
Given that I cannot easily control the list of strings to search for, I decided to maintain the list in a second table - 'Terms to Search for'. A simple one column table with a list of terms ('Test', 'Prova', 'Dummy', ...).
In Amazon Redshift, I tried a simple CASE statement:
CASE WHEN UPPER(vx.quote_name) LIKE ('%' + UPPER(terms.term) + '%') THEN 'Y' ELSE 'N' END AS "Any DPS"
However, that seems to only get the first search term in the list.
Also, for the same quotation, which can have multiple rows due to multiple items being sold, I usually get one row set to 'Y' and the rest set to 'N'.
I modified the statement to:
---- #4a: get a list of the quotes whose quote_names match the patterns in the list
SELECT
vx.master_quote_number,
'Y' AS "Any DPS"
FROM t_quotes vx, any_dps_search_families terms
WHERE UPPER(vx.prod_fmly) IN ('%'+ UPPER(terms.term) +'%');
--- 4b: merge Any DPS results back in
select vx.*, dps."Any DPS"
from t_quotes vx
LEFT JOIN transform_data_4 dps ON (vx.master_quote_number = dps.master_quote_number)
But that isn't doing it either.
Environment: Amazon Redshit (which is mostly like Postgres). An answer to this in Postgres would be ideal. I can switch this clause to MySQL if needed but I'd rather not.
This is a case for lateral joins (untested):
SELECT vx.master_quote_number
FROM any_dps_search_families terms
CROSS JOIN LATERAL (SELECT master_quote_number
FROM t_quotes
WHERE UPPER(prod_fmly)
LIKE ('%' || UPPER(terms.term) || '%')
) vx;

Left outer join using 2 of 3 tables in Postgresql

I need to show all clients entered into the system for a date range.
All clients are assigned to a group, but not necessarily to a staff.
When I run the query as such:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
staff.staff_name_cs,
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
clients.zrud_staff = staff.zzud_staff AND
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 121 entries
if I comment out:
SELECT
clients.name_lastfirst_cs,
to_char (clients.date_intake,'MM/DD/YY')AS Date_Created,
clients.client_id,
clients.display_intake,
-- staff.staff_name_cs, -- Line Commented out
groups.name
FROM
public.clients,
public.groups,
public.staff,
public.link_group
WHERE
-- clients.zrud_staff = staff.zzud_staff AND --Line commented out
clients.zzud_client = link_group.zrud_client AND
groups.zzud_group = link_group.zrud_group AND
clients.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
ORDER BY
groups.name ASC,
clients.client_id ASC,
staff.staff_name_cs ASC
I get 173 entries
I know I need to do an outer join to capture all clients regardless of if there
is a staff assigned, but each attempt has failed. I have done outer joins with
two tables, but adding a third has twisted my brain.
Thanks for any suggestions
I have no way of testing this (or of knowing that it is right) but what I read in your query is that you want something similar to this:
SELECT --I just used short aliases. I choose something other than the table name so I know it is an alias "c" for client etc...
c.name_lastfirst_cs,
to_char (c.date_intake,'MM/DD/YY')AS Date_Created,
c.client_id,
c.display_intake,
s.staff_name_cs,
g.name,
l.zrud_client AS "link_client",--I'm selecting some data here so that I can debug later, you can just filter this out with another select if you need to
l.zzud_group AS "link_group" --Again, so I can see these relationships
FROM
public.clients c
LEFT OUTER JOIN staff s ON --is staff required? If it isn't then outer join (optional)
s.zzud_staff = c.zrud_staff --so we linked staff to clients here
LEFT OUTER JOIN public.link_group l ON --this looks like a lookup table to me so we select the lookup record
l.zrud_client = c.zzud_client -- this is how I define the lookup, a client id
LEFT OUTER JOIN public.groups g ON --then we use that to lookup a group
g.zzup_group = l.zrud_group --which is defined by this data here
WHERE -- the following must be true
c.date_intake BETWEEN (now() - '8 days'::interval)::timestamp AND now()
Now for the why: I've basically moved your where clause to JOIN x ON y=z syntax. In my experience this is a better way to write an maintain queries as it allows you to specify relationships between tables rather than doing a big-ol'-join and trying to filter that data with the where clause. Keep in mind each condition is REQUIRED not optional so when you say you want records with the following conditions you're going to get them (and if I read this right--I probably don't as I don't have a schema in-front of me) if a record is missing a link-table record OR a staff member you're going to filter it out.
Alternatively (possibly significantly slower) You can SELECT anything so you can chain it like:
SELECT
*
FROM
(
SELECT
*
FROM
public.clients
WHERE
x condition
)
WHERE
y condition
OR
SELECT * FROM x WHERE x.condition IN (SELECT * FROM y)
In your case this tactic probably won't be easier than a standard join syntax.
^And some serious opinion here: I recommend you use the join syntax I outlined above here. It is functionally the same as joining and specifying a where clause, but as you noted, if you don't understand the relationships it can cause a Cartesian join. http://www.tutorialspoint.com/sql/sql-cartesian-joins.htm . Lastly, I tend to specify what type of join I want. I write INNER JOIN and OUTER JOIN a lot in my queries because it helps the next person (usually me) figure out what the heck I meant. If it is optional use an outer join, if it is required use an inner join (default).
Good luck! There are much better SQL developers out there and there's probably another way to do it.