AdventureWorks2014 - Create an extract of latest hires based upon Job title - tsql

I was wondering if anyone could help.
I am trying to write some code that returns a list of the latest hires based upon Jobtitle using the Adventureworks2012 databse.
So far, I have the following:
SELECT DISTINCT HREDH.BusinessEntityID,
HRE.JobTitle,
hre.HireDate
FROM [HumanResources].[EmployeeDepartmentHistory] HREDH
INNER JOIN HumanResources.Employee HRE ON HREDH.BusinessEntityID = HRE.BusinessEntityID
AND hre.BusinessEntityID = (
SELECT TOP 1 BusinessEntityID
FROM HumanResources.Employee hre2
WHERE hre2.JobTitle = hre.JobTitle
ORDER BY HireDate DESC
)
ORDER BY HRE.JobTitle
This appears to work fine, but I am sure there is a better way to do it (without the use of SELECT DISTINCT at the beginning of the statement)
I am trying my best to learn SQL by myself, so any help from the vast pool of knowledge on here would be greatly appreciated!
Thanks,

This will return all rows from the join of the two tables where the hire date for the job title is equal to the highest date for that job title.
SELECT
HREDH.BusinessEntityID
,HRE.JobTitle
,hre.HireDate
FROM [HumanResources].[EmployeeDepartmentHistory] AS HREDH
JOIN HumanResources.Employee AS HRE
ON HREDH.BusinessEntityID = HRE.BusinessEntityID
WHERE HRE.HireDate = (SELECT MAX(HireDate) FROM HumanResources.Employee AS _HRE WHERE HRE.JobTitle = _HRE.JobTitile)
ORDER BY HRE.JobTitle

Related

Postgres SQL query group by get most recent record instead of an aggregate

This is a current postgres query I have:
sql = """
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
avg(cms.incremental_opens),
avg(cms.incremental_clicks),
avg(cms.incremental_conversions)
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id
"""
whereby I get the average incremental_opens, incremental_clicks, and incremental_conversions per campaign group from the cms table. However, instead of the average, I want the most recent values for the 3 fields. See the cms table screenshot below - I want the values from the record with the greatest (i.e. most recent) event_id (instead of an average for all records) for a given group).
How can I do this? Thanks
It sounds like you want a lateral join.
FROM
experiments.variant_metric_snapshot vms
CROSS JOIN LATERAL (select * from experiments.campaign_metric_snapshot cms where vms.campaign_id = cms.campaign_id order by event_id desc LIMIT 1) cms
WHERE...
If you are after a quick and dirty solution you can use array_agg function with minimal change to your query.
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
(array_agg(cms.incremental_opens ORDER BY cms.event_id DESC))[1] AS incremental_opens,
..
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id;

How can I query for the previous claim for each claim. Window Function PostgreSQL

I have a table of claims and I want to attach each patients' previous claim. I've been able to do it with
a select statement, but my dataset is 50+ million records and I'm hoping that there is a more efficient way to do this. From my understanding, this query will need to scan the full table each time for each record. Would a window function be better? Could sorting the large table help at all?
http://www.sqlfiddle.com/#!17/09a53/6/0
select
(select b."fill_date" from t1 b
where b.user_id = a.user_id and b.fill_date < a.fill_date
order by b.fill_date desc
limit 1) as prior_fill_date,
a.* from t2 a
Thanks for the help
Please give this a try:
select *,
lag(fill_date)
over (partition by user_id order by fill_date)
as prior_fill_date
from "sql_notebook_results_T42E95sESnn0"
order by user_id, fill_date;
This sorts only once. If performance is still not good enough, then you will need to look at adding an index on (user_id, fill_date).

Grouping by attributes and counting, postgreSQL

I have written the following code that counts how many instances of each book_id there are in the table soldBooks.
SELECT book_id, sum(counter) AS no_of_books_sold, sum(retail_price) AS generated_revenue
FROM(
SELECT book_id,1 AS counter, retail_price
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN stock ON (shipments.isbn = stock.isbn)
) AS soldBooks
GROUP BY book_id
As you can see, I used a "counter" in order to solve my problem. But I am sure there must be a better, more built in way of achieving the same result! There must be some way to group a table together by a given attribute, and to create a new column displaying the count of EACH attribute. Can somebody share this with me?
Thanks!
SELECT book_id,
COUNT(book_id) AS no_books_sold,
SUM(retail_price) AS gen_rev
FROM shipments
JOIN editions ON (shipments.isbn=editions.isbn)
JOIN stock ON (shipments.isbn=stock.isbn)
GROUP BY book_id

select distinct from 2 columns but only 1 is duplicate

select a.subscriber_msisdn, war.created_datetime from
(
select distinct subscriber_msisdn from wiz_application_response
where application_item_id in
(select id from wiz_application_item where application_id=155)
and created_datetime between '2012-10-07 00:00' and '2012-11-15 00:00:54'
) a
left outer join wiz_application_response war on (war.subscriber_msisdn=a.subscriber_msisdn)
the sub select returns 11 rows but when joined return 18 (with duplicates). The objective of this query is only add the date column to the 11 rows of the sub select.
Based on your description, it stands to reason that there are multiple created_datetime values for some of the subscriber_msisdn values which is what prompted you to use the distinct in the subquery to begin with. By joining the sub query to the original table you are defeating this. A cleaner way to write the query would be:
SELECT
war.subscriber_msisdn
, war.created_datetime
FROM
wiz_application_response war
LEFT JOIN wiz_application_item wai
ON war.application_item_id = wai.id
AND wai.application_id = 155
WHERE
war.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
This should return only the rows from the war table that satisfy the criteria based on the wai table. It should not be and outer join unless you wanted to return all the rows from war table that satisfied the created_datetime parameter regardless of the application_item_id parameter.
This is my best guess based on the limited information I have about your tables and what I’m assuming you’re trying to accomplish. If this doesn’t get you what you are after, I will continue to offer other ideas based on additional information you could provide. Hope this works.
Can most probably simplified to this:
SELECT DISTINCT ON (1)
r.subscriber_msisdn, r.created_datetime
FROM wiz_application_item i
JOIN wiz_application_response r ON r.application_item_id = i.id
WHERE i.application_id = 155
AND i.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
ORDER BY 1, 2 DESC -- to pick the latest created_datetime
Details depend on missing information.
More explanation here.

a dual variable not in statement?

I have the need to look at two tables that share two variables and get a list of the data from one table that does not have matching data in the other table. Example:
Table A
xName
Date
Place
xAmount
Table B
yName
Date
Place
yAmount
I need to be able to write a query that will check Table A and find entries that have no corresponding entry in Table B. If it was a one variable issue I could use not in statement but I can't think of a way to do that with two variables. A left join also does not appear like you could do it. Since looking at it by a specific date or place name would not work since we are talking about thousands of dates and hundreds of place names.
Thanks in advance to anyone who can help out.
SELECT TableA.Date,
TableA.Place,
TableA.xName,
TableA.xAmount,
TableB.yName,
TableB.yAmount
FROM TableA
LEFT OUTER JOIN TableB
ON TableA.Date = TableB.Date
AND TableA.Place = TableB.Place
WHERE TableB.yName IS NULL
OR TableB.yAmount IS NULL
SELECT * FROM A WHERE NOT EXISTS
(SELECT 1 FROM B
WHERE A.xName = B.yName AND A.Date = B.Date AND A.Place = B.Place AND A.xAmount = B.yAmount)
in ORACLE:
select xName , xAmount from tableA
MINUS
select yName , yAmount from tableB