AdventureWorks SQL conflicting results issue - tsql

I'm working with the AdventureWorks example DB - we're running SQL Server 2008R2, so I assume that's the edition of AdventureWorks (I have read-only access). I'm trying to get a list of sales managers so that I can then determine a couple employee/manager relationships.
I'm getting two sets of three differently named people, with the same job title, with their CurrentFlag set to 1 (active) with slightly different queries. I do notice that one result group has the same contactID and employeeID, but I'm not sure what this may indicate.
So the question is: Why am I getting completely different results with these two queires? I would think I'd get six results for each - the queries are matching employee table Titles.
SQL Query 1:
select
c.FirstName,
c.LastName,
c.ContactID,
e.EmployeeID,
e.Title,
c.Title,
e.CurrentFlag
from Person.Contact c
inner join HumanResources.Employee e
on c.ContactID = e.ContactID
where
e.Title like '%Sales Manager%'
SQL Query 2:
SELECT
e.EmployeeID,
(c.FirstName + ' ' + c.LastName) as 'First Name and Last Name',
e.Title
FROM HumanResources.Employee e
INNER JOIN Person.Contact c
ON e.EmployeeID = c.ContactID
Where
e.Title LIKE '%Manager%'
AND
e.Title LIKE '%Sales%'
ORDER BY e.EmployeeID;
UPDATE: These are my results:
SQL Query 1:
------- ------- ---- --- ---------------------------- ---- --
Stephen Jiang 1011 268 North American Sales Manager NULL 1
Amy Alberts 1013 284 European Sales Manager NULL 1
Syed Abbas 1012 288 Pacific Sales Manager Mr. 1
SQL Query 2:
--- --- ----------- ---------------------------- --- --
268 268 Gary Drury North American Sales Manager Mr. 1
284 284 John Emory European Sales Manager Mr. 1
288 288 Julie Estes Pacific Sales Manager Ms. 1

The only diffrents i can see is this:
where
e.Title like '%Sales Manager%'
And this:
Where
e.Title LIKE '%Manager%'
AND
e.Title LIKE '%Sales%'
The first query says that bring me all titles that has '%Sales Manager%' you can have for ex this output:
Account Sales Manager
some Sales Manager
Sales Manager something else
The second question says bring me all the titles that has '%Manager%' and '%Sales%' so you can for ex have:
Sales Account Manager
some Sales some Manager some
Sales Manager some else thing
Manager Sales
And this join can not be corrent
INNER JOIN Person.Contact c
ON e.EmployeeID = c.ContactID
Don't you mean:
INNER JOIN Person.Contact c
ON e.ContactID= c.ContactID

The first query will match the rows where substring "Sales Manager" is present. But second one can match rows like "Managers of Sales Dep" as well. I mean the second doesn't care about positions of the words in the srting.
I believe that the results of first query is a subset of the results of second one.
UPDATE
You use different columns in JOIN clause, so it's normal that you got different results.

Related

join 2 tables with different dates into one date column

I have two tables: a_table and b_table. They contain closing records and checkout records, that for each customer can be performed on different dates. I would like to combine these 2 tables together, so that there is only one date field, one customer field, one close and one check field.
a_table
time_modified customer_name
2021-05-03 Ben
2021-05-08 Ben
2021-07-10 Jerry
b_table
time_modified account_id
2021-05-06 Ben
2021-07-08 Jerry
2021-07-12 Jerry
Expected result
date account_id_a close check
2021-05-03 Ben 1 0
2021-05-06 Ben 0 1
2021-05-08 Ben 1 0
2021-07-08 Jerry 0 1
2021-07-10 Jerry 1 1
2021-07-12 Jerry 0 1
The query so far:
with a_table as (
select rz.time_modified::date, rz.customer_name,
case when rz.time_modified::date is not null then 1 else 0 end as close
from schema.rz
),
b_table as (
select bo.time_modified::date, bo.customer_name,
case when bo.time_modified::date is not null then 1 else 0 end as check
from schema.bo
)
SELECT (CURRENT_DATE::TIMESTAMP - (i * interval '1 day'))::date as date,
a.*, b.*
FROM generate_series(1,2847) i
left join a_table a
on a.time_modified = i.date
left join b_table b
on b.time_modified = i.date
The query above returns:
SQL Error [500310] [0A000]: [Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
you just need to do a union rather than a join.
Join merges two tables into one where union adds the second table to the first
First off the error you are getting is due to the use of the generate_series() function in a query where its results need to be combined with table data. Generate_series() is a lead-node-only function and its results cannot be used on compute nodes. You will need to generate the number series you desire in another way. See How to Generate Date Series in Redshift for possible ways to do this.
I'm not sure I follow your query entirely but it seems like you want to UNION the tables and not JOIN them. You haven't defined what rz and bo are so it is a bit confusing. However UNION and some calculation for close and check seems like the way to go

How to simplify a join of 2 tables in HIVE and count values

I have two tables in HIVE, "orders" and "customers". I want to get top n user names of users who placed most orders (in status "CLOSED"). Orders table has key order_customer_id, column order_status and customers has key customer_id and name consists of 2 columns customer_fname and customer_lname.
ORDERS
order_customer_id, order_status
1,CLOSED
2,CLOSED
3,INPROGRESS
1,INPROGRESS
1,CLOSED
2,CLOSED
CUSTOMERS
customer_id, customer_fname, customer_lname
1,Mickey, Mouse
2,Henry, Ford
3,John, Doe
I tried this code:
select c.customer_id, count(o.order_customer_id) as COUNT, concat(c.customer_fname," ",c.customer_lname) as FULLNAME from customers c join orders o on c.customer_id=o.order_customer_id where o.order_status='CLOSED' group by c.customer_id,FULLNAME order by COUNT desc limit 10;
this does not work - returns error.
I was able to get the result by first creating a 3rd table:
create table id_sum as select o.order_customer_id,count(o.order_id) as COUNT from orders o join customers c on c.customer_id=o.order_customer_id where order_status='CLOSED' group by o.order_customer_id;
1833 6
5493 5
1363 5
1687 5
569 4
1764 4
1345 4
Then I joined the tables:
select s.*,concat(c.customer_fname," " ,c.customer_lname) from id_sum s join customers c on s.order_customer_id = c.customer_id order by count desc limit 20;
This resulted in desired output:
customer_id, order_count, full_name
1833 6 Ronald Smith
5493 5 Mary Cochran
1363 5 Kathy Rios
1687 5 Jerry Ellis
569 4 Mary Frye
1764 4 Megan Davila
1345 4 Adam Wilson
Is there a way how to write it in one command or more effectively?
The subquery with alias sq creates a relation with two columns order_count and customer_id calculating for each customer_id the total number of orders. This is then joined with the CUSTOMERS table. The result is sorted descending and limited to (the top) 10 rows.
SELECT c.customer_id, sq.order_count, concat(c.customer_fname," " ,c.customer_lname) as full_name
FROM CUSTOMERS c JOIN (
SELECT COUNT(*) as order_count, order_customer_id FROM ORDERS
WHERE order_status = 'CLOSED'
GROUP BY order_customer_id
) sq on c.customer_id = sq.order_customer_id
ORDER BY sq.order_count desc LIMIT 10
;
The idea is to use a subquery instead of a third table.

comparing each record of a table for some columns

I have a table TEST which has like this records
ID USERNAME IPADDRS CONNTIME country
8238237 XYZ 10.16.199.20 11:00:00 USA
8255237 XYZ 10.16.199.20 11:00:00 UK
485337 ABC 10.16.199.22 12:25:00 UK
8238237 ABC 10.16.199.23 02:45:00 INDIA
I have to compare each record and has to get ID value of the records which has the country column as UK and having same USERNAME,IPADDRS and CONNTME.
means USERNAME,IPADDRSS,CONNTIME should be equal but final filter will go on country UK.
so output will be ID=8255237 for above Table.
Appreciate your help.Thanks!
Well, SQL is descriptive. So you should describe what you want. How about this?
select a.id from ip a where a.country='UK' and (a.username,a.ipaddrs,a.conntime) in (select username,ipaddrs,conntime from ip where country<>'UK')
Basically you select the ID for those that match the required triplet, but the matching record should not be from UK. This is basic SQL and should run on all systems. Disclaimer: You might need indexes for performance.
Try this:
SELECT a.ID FROM (SELECT ID,USERNAME,IPADDRS,CONNTIME,COUNTRY,ROW_NUMBER()OVER(PARTITION BY USERNAME,IPADDRS,CONNTIME ORDER BY USERNAME,IPADDRS,CONNTIME) AS seq
FROM EMP_IP) a WHERE a.COUNTRY = 'UK' AND a.seq > 1;

counting in sql in subquery in the table

DNO DNAME
----- -----------
1 Research
2 Finance
EN ENAME CITY SALARY DNO JOIN_DATE
-- ---------- ---------- ---------- ---------- ---------
E1 Ashim Kolkata 10000 1 01-JUN-02
E2 Kamal Mumbai 18000 2 02-JAN-02
E3 Tamal Chennai 7000 1 07-FEB-04
E4 Asha Kolkata 8000 2 01-MAR-07
E5 Timir Delhi 7000 1 11-JUN-05
//find all departments that have more than 3 employees.
My try
select deptt.dname
from deptt,empl
where deptt.dno=empl.dno and (select count(empl.dno) from empl group by empl.dno)>3;
here is the solution
select deptt.dname
from deptt,empl
where deptt.dno=empl.dno
group by deptt.dname having count(1)>3;
select
*
from departments d
inner join (
select dno from employees group by dno having count(*) > 3
) e on d.dno = e.dno
There are many approaches to this problem but almost all will use GROUP BY and the HAVING clause. That clause allows you to filter results of aggregate functions. Here it is used to choose only those records where the count is greater than 3.
In the query structure used above the group by is handled on the employee table only, then the result (which is known as a derived table) is joined by an INNER JOIN to the departments table. This inner join only allows matching records so this has the effect of filtering the departments table to only those which have a count() of greater than 3.
An advantage of this query structure is fewer records are joined, and also that all columns of the departments table are available for reporting. Disadvantage of this structure is the the count() of employees per department isn't visible.

Select Single Record if ID/Name is Duplicated

This seems like a really simple problem but I can't seem to figure it out right now...
Here is a simplified view of the data that I am fetching from my current stored proc:
ID Name Class Desc
--- ----- ------ -----
84 Calvin J. 2B
53 Fred D. 3B
53 Fred D. ADJ Change/Correction
47 Mary F. 3A
47 Mary F. ADJ New Product
09 Donald M. ADJ Cancelled
21 Richard G. ADJ Bad Debt
21 Richard G. ADJ Cancelled
I need to modify my procedure to select only one record per individual. If a person has an adjustment, I only want to select the record with the adjustment and disregard the other record. Based on the above, this is the result set that I am trying to return:
ID Name Class Desc
--- ----- ------ -----
84 Calvin J. 2B
53 Fred D. ADJ Change/Correction
47 Mary F. ADJ New Product
09 Donald M. ADJ Cancelled
21 Richard G. ADJ Cancelled
Help please!
UPDATE
I just realized that there is an additional requirement for this query; if there are two adjustments where one has a description of "Bad Debt" and the other "Cancelled", the record with the "Cancelled" description needs to be selected (see updated data above).
This should do the trick:
SELECT ID, Name, Class, [Desc]
FROM (
SELECT ID, Name, Class, [Desc],
ROW_NUMBER() OVER(PARTITION BY ID
ORDER BY CASE WHEN Class = 'ADJ'
THEN 0 ELSE 1 END) rn
FROM Table1
) A
WHERE rn = 1
It looks scarier than it really is. The inner query contains an extra column computed with ROW_NUMBER(). What this does is number your rows, starting over at 1 for each distinct ID (specified in the PARTITION BY). The ORDER BY, which tells ROW_NUMBER() how to order the rows, is a case statement saying that rows with Class = 'ADJ' should come before all other rows. Then at the end we grab only rows numbered 1. The result is selecting the ADJ row if there is one for that ID, or the regular row otherwise.
Edit in response to updated requirements
If you have additional prioritization criteria then you can add those into the ORDER BY, just like you would ORDER BY in a regular query. Often it's helpful to execute just the inner query without filtering down to rn = 1 so you can see exactly how row numbers are being assigned.
Here's the updated query that should satisfy your new requirement:
SELECT ID, Name, Class, [Desc]
FROM (
SELECT ID, Name, Class, [Desc],
ROW_NUMBER() OVER(PARTITION BY ID
ORDER BY
CASE WHEN Class = 'ADJ'
THEN 0 ELSE 1 END,
CASE WHEN [Desc] = 'Cancelled'
THEN 0 ELSE 1 END) rn
FROM Table1
) A
WHERE rn = 1
See it in action here.