T-SQL grouping question - tsql

Every once and a while I have a scenario like this, and can never come up with the most efficient query to pull in the information:
Let's say we have a table with three columns (A int, B int, C int). My query needs to answer a question like this: "Tell me what the value of column C is for the largest value of column B where A = 5." A real world scenario for something like this would be 'A' is your users, 'B' is the date something happened, and 'C' is the value, where you want the most recent entry for a specific user.
I always end up with a query like this:
SELECT
C
FROM
MyTable
WHERE
A = 5
AND B = (SELECT MAX(B) FROM MyTable WHERE A = 5)
What am I missing to do this in a single query (opposed to nesting them)? Some sort of 'Having' clause?

BoSchatzberg's answer works when you only care about the 1 result where A=5. But I suspect this question is the result of a more general case. What if you want to list the top record for each distinct value of A?
SELECT t1.*
FROM MyTable t1
INNER JOIN
(
SELECT A, MAX(B)
FROM MyTable
GROUP BY A
) t2 ON t1.A = t2.A AND t1.B = t2.B

--
SELECT C
FROM MyTable
INNER JOIN (SELECT A, MAX(B) AS MAX_B FROM MyTable GROUP BY A) AS X
ON MyTable.A = X.A
AND MyTable.B = MAX_B
--
WHERE MyTable.A = 5
In this case the first section (between the comments) can also easily be moved into a view for modularity or reuse.

You can do this:
SELECT TOP 1 C
FROM MyTable
WHERE A = 5
ORDER BY b DESC

I think you are close (and what you have would work). You could use something like the following:
select C
, max(B)
from MyTable
where A = 5
group by C

After a little bit of testing, I don't think that this can be done without doing it the way you're already doing it (i.e. a subquery). Since you need the max of B and you can't get the value of C without also including that in a GROUP BY or HAVING clause, a subquery seems to be the best way.
create table #tempints (
a int,
b int,
c int
)
insert into #tempints values (1, 8, 10)
insert into #tempints values (1, 8, 10)
insert into #tempints values (2, 4, 10)
insert into #tempints values (5, 8, 10)
insert into #tempints values (5, 3, 10)
insert into #tempints values (5, 7, 10)
insert into #tempints values (5, 8, 15)
/* this errors out with "Column '#tempints.c' is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause." */
select t1.c, max(t1.b)
from #tempints t1
where t1.a=5
/* this errors with "An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING
clause or a select list, and the column being aggregated is an outer reference." */
select t1.c, max(t1.b)
from #tempints t1, #tempints t2
where t1.a=5 and t2.b=max(t1.b)
/* errors with "Column '#tempints.a' is invalid in the HAVING clause because it is not contained in either an aggregate
function or the GROUP BY clause." */
select c
from #tempints
group by b, c
having a=5 and b=max(b)
drop table #tempints

Related

SQL left join case statement

Need some help working out the SQL. Unfortunately the version of tsql is SybaseASE which I'm not too familiar with, in MS SQL I would use a windowed function like RANK() or ROW_NUMBER() in a subquery and join to those results ...
Here's what I'm trying to resolve
TABLE A
Id
1
2
3
TABLE B
Id,Type
1,A
1,B
1,C
2,A
2,B
3,A
3,C
4,B
4,C
I would like to return 1 row for each ID and if the ID has a type 'A' record that should display, if it has a different type then it doesn't matter but it cannot be null (can do some arbitrary ordering, like alpha to prioritize "other" return value types)
Results:
1, A
2, A
3, A
4, B
A regular left join (ON A.id = B.id and B.type = 'A') ALMOST returns what I am looking for however it returns null for the type when I want the 'next available' type.
You can use a INNER JOIN on a SubQuery (FirstTypeResult) that will return the minimum type per Id.
Eg:
SELECT TABLEA.[Id], FirstTypeResult.[Type]
FROM TABLEA
JOIN (
SELECT [Id], Min([Type]) As [Type]
FROM TABLEB
GROUP BY [Id]
) FirstTypeResult ON FirstTypeResult.[Id] = TABLEA.[Id]

After doing CTE Select Order By and then Update, Update results are not ordered the same (TSQL)

The code is roughly like this:
WITH cte AS
(
SELECT TOP 4 id, due_date, check
FROM table_a a
INNER JOIN table_b b ON a.linkid = b.linkid
WHERE
b.status = 1
AND due_date > GetDate()
ORDER BY due_date, id
)
UPDATE cte
SET check = 1
OUTPUT
INSERTED.id,
INSERTED.due_date
Note: the actual data has same due_date.
When I ran the SELECT statement only inside the cte, I could get the result, for ex: 1, 2, 3, 4.
But after the UPDATE statement, the updated results are: 4, 1, 2, 3
Why is this (order-change) happening?
How to keep or re-order the results back to 1,2,3,4 in this same 1 query?
In MSDN https://msdn.microsoft.com/pl-pl/library/ms177564(v=sql.110).aspx you can read that
There is no guarantee that the order in which the changes are applied
to the table and the order in which the rows are inserted into the
output table or table variable will correspond.
Thats mean you can't solve your problem with only one query. But you still can use one batch to do what you need. Because your output don't guarantee the order then you have to save it in another table and order it after update. This code will return your output values in order that you assume:
declare #outputTable table( id int, due_date date);
with cte as (
select top 4 id, due_date, check
from table_a a
inner join table_b b on a.linkid = b.linkid
where b.status = 1
and due_date > GetDate()
order by due_date, id
)
update cte
set check = 1
output inserted.id, inserted.due_date
into #outputTable;
select *
from #outputTable
order by due_date, id;

Check SQL Server table values against themselves

Imagine I had this table:
declare #tmpResults table ( intItemId int, strTitle nvarchar(100), intWeight float )
insert into #tmpResults values (1, 'Item One', 7)
insert into #tmpResults values (2, 'Item One v1', 6)
insert into #tmpResults values (3, 'Item Two', 6)
insert into #tmpResults values (4, 'Item Two v1', 7)
And a function, which we'll call fn_Lev that takes two strings, compares them to one another and returns the number of differences between them as an integer (i.e. the Levenshtein distance).
What's the most efficient way to query that table, check the fn_Lev value of each strTitle against all the other strTitles in the table and delete rows are similar to one another by a Levenshtein distance of 3, preferring to keeping higher intWeights?
So the after the delete, #tmpResults should contain
1 Item One 7
4 Item Two v1 7
I can think of ways to do this, but nothing that isn't horribly slow (i.e iterative). I'm sure there's a faster way?
Cheers,
Matt
SELECT strvalue= CASE
WHEN t1.intweight >= t2.intweight THEN t1.strtitle
ELSE t2.strtitle
END,
dist = Fn_lev(t1.strtitle, t2.strtitle)
FROM #tmpResults AS t1
INNER JOIN #tmpResults AS t2
ON t1.intitemid < t2.intitemid
WHERE Fn_lev(t1.strtitle, t2.strtitle) = 3
This will perform a self join that will match each row only once. It will excluding matching a row on itself or reverse of a previous match ie if A<->B is a match then B<->A isn't.
The case statement selects the highest weighted result
If I've understood you correctly, you can use a cross join
SELECT t1.intItemId AS Id1, t2.intItemId AS Id2, fn_Lev(t1.strTitle, t2.strTitle) AS Lev
FROM #tmpResults AS t1
CROSS JOIN #tmpResults AS t2
The cross join will give you the results of every combination of rows between the left and right side of the join (hence it doesn't need any ON clause, as it is matching everything to everything else). You can then use the result of the SELECT to choose which to delete.

Concatenated columns should not match in 2 tables

I'll just put this in layman's terms since I'm a complete noobie:
I have 2 tables A and B, both having 2 columns of interest namely: employee_number and salary.
What I am looking to do is to extract rows of 'combination' of employee_number and salary from A that are NOT present in B, but each of employee_number and salary should be present in both.
I am looking to doing it with the 2 following conditions(please forgive the wrong function
names.. this is just to present the problem 'eloquently'):
1.) A.unique(employee_number) exists in B.unique(employee_number) AND A.unique(salary)
exists in B.unique(salary)
2.) A.concat(employee_number,salary) <> B.concat(employee_number,salary)
Note: A and B are in different databases, so I'm looking to use dblink to do this.
This is what I tried doing:
SELECT distinct * FROM dblink('dbname=test1 port=5432
host=test01 user=user password=password','SELECT employee_number,salary, employee_number||salary AS ENS FROM empsal.A')
AS A(employee_number int8, salary integer, ENS numeric)
LEFT JOIN empsalfull.B B on B.employee_number = A.employee_number AND B.salary = A.salary
WHERE A.ENS not in (select distinct employee_number || salary from empsalfull.B)
but it turned out to be wrong as I had it cross-checked by using spreadsheets and I don't get the same result.
Any help would be greatly appreciated. Thanks.
For easier understanding I left out the dblink.
Because, the first one selects lines in B that equal the employeenumber in A as well as the salery in A, so their concatenated values will equal as well (if you expect this to not be true, please provide some test data).
SELECT * from firsttable A
LEFT JOIN secondtable B where
(A.employee_number = B.employee_number AND a.salery != b.salery) OR
(A.salery = B.salery AND A.employee_number != B.employee_number)
If you have troubles with lines containing nulls, you might also try somthing like this:
AND (a.salery != b.salery OR (a.salery IS NULL AND b.salery IS NOT NULL) or (a.salery IS NOT
NULL and b.salery IS NULL))
I think you're looking for something along these lines.
(Sample data)
create table A (
employee_number integer primary key,
salary integer not null
);
create table B (
employee_number integer primary key,
salary integer not null
);
insert into A values
(1, 20000),
(2, 30000),
(3, 20000); -- This row isn't in B
insert into B values
(1, 20000), -- Combination in A
(2, 20000), -- Individual values in A
(3, 50000); -- Only emp number in A
select A.employee_number, A.salary
from A
where (A.employee_number, A.salary) NOT IN (select employee_number, salary from B)
and A.employee_number IN (select employee_number from B)
and A.salary IN (select salary from B)
output: 3, 20000

MySQL: Only output some values once

I think I have done this before, but it could also be a function of PHP. What I would like is to do a MySQL query (in a MySQL client, not PHP) and get for example
Foo A
B
C
Bar B
D
E
Instead of
Foo A
Foo B
Foo C
Bar B
Bar D
Bar E
This would of course only make sence if it was ordered by that first column. Not sure if it is possible, but like I said, I mean to remember to have done this once, but can't remember how or if it was through some PHP "magic"...
Update: Suddenly remembered where I had used it. What i was thinking of was the WITH ROLLUP modifier for GROUP BY. But I also discovered that it doesn't do what I was thinking here, so my question still stands. Although I don't think there is a solution now. But smart people have proved me wrong before :P
Update: Should probably also have mentioned that what I want this for is a many-to-many relationship. In the actual select Foo would be the first name of an attendee and I would also want last name and some other columns. The A, B, C, D, E are options the attendee has selected.
attendee (id, first_name, last_name, ...)
attendees_options (attendee_id, option_id)
option (id, name, description)
This will give you
Foo A,B,C
Bar B,D,E
SELECT column1, GROUP_CONCAT(column2) FROM table GROUP BY column1
Tested this in SQL Server, but I think it will translate to MySQL.
create table test (
id int,
col1 char(3),
col2 char(1)
)
insert into test
(id, col1, col2)
select 1, 'Foo', 'A' union all
select 2, 'Foo', 'B' union all
select 3, 'Foo', 'C' union all
select 4, 'Bar', 'D' union all
select 5, 'Bar', 'E' union all
select 6, 'Bar', 'F'
select case when t.id = (select top 1 t2.id
from test t2
where t2.col1 = t.col1
order by t2.col1, t2.col2)
then t.col1
else ''
end as col1,
t.col2
from test t
order by t.id
drop table test