If group by (int) than group by (varchar) has better performance? - tsql

Suppose I have a table like
Mytaba(aid, int, name varchar(12),...)
myTabb(bid, aid, ...)
Then I have 2 SQL with group like to get same result:
Select aid,sum(...)
from Mytaba a join mytabb b on a.aid = b.aid
group by aid
Select a.name,sum(...)
from Mytaba a join mytabb b on a.aid = b.aid
group by a.name
Question: if group by(int) has better performance then group by(varchar)?

Assuming both are indexed int would have an advantage as it is smaller.
More rows are in the same amount of memory,
With the join the query must access aid anyway.
Those two would only be equivalent if both aid and name are each unique.
In the myTabb an index on aid would help
Or if that is a composite key then the order aid, bid would probably result in an index seek rather than scan - a good thing and you would not need a separate index on aid.

Related

Show the subjects per StudentNo and the count of number of subjects per student

Error: Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
SELECT Subject, StudentNo, SUM(COUNT(DISTINCT Subject)) AS NumOfSubjectPerStudent
FROM Subjects AS S
INNER JOIN STUDENTS AS ST ON S.ID = ST.ID
WHERE S.ID = ST.ID
GROUP BY ST.StudentNo, S.Subject
ORDER BY ST.StudentNo DESC
I think you're almost there but without knowing the structure of the Students and Subjects tables, I can only assume it should be something like this:
SELECT ST.StudentNo, S.Subject, SUM(COUNT(DISTINCT S.Subject)) AS NumOfSubjectPerStudent
FROM Subjects AS S
INNER JOIN STUDENTS AS ST ON S.StudentId = ST.ID
GROUP BY ST.StudentNo, S.Subject
ORDER BY ST.StudentNo DESC
This assumption is based on the Subjects table having a StudentId field that links to the Students Id field.
I also am assuming that the Subjets Id field is the unique identifier/primary key for that Subject and shouldn't be used to JOIN against the Subjects ID field.
If I am wrong with my assumptions, then can you please clarify the columns in each table, and also provide an example of data in each table to better make sense of how to help you.

how to get last added record for a battery with left join PSQL

I have query such as
select * from batteries as b ORDER BY inserted_at desc
which gives me data such as
and I have an query such as
select voltage, datetime, battery_id from battery_readings ORDER BY inserted_at desc limit 1
which returns data as
I want to combine both 2 above queries, so in one go, I can have each battery details as well as its last added voltage and datetime from battery_readings.
Postgres has a very useful syntax for this, called DISTINCT ON. This is different from plain DISTINCT in that it keeps only the first row of each set, defined by the sort order. In your case, it would be something like this:
SELECT DISTINCT ON (b.id)
b.id,
b.name,
b.source_url,
b.active,
b.user_id,
b.inserted_at,
b.updated_at,
v.voltage,
v.datetime
FROM battery b
JOIN battery_voltage v ON (b.id = v.battery_id)
ORDER BY b.id, v.datetime desc;
I think that widowing will make what you expected.
Assuming two tables
create table battery (id int, name text);
create table bat_volt(measure_time int, battery_id int, val int);
One of the possible queries is like this:
with latest as (select battery_id, max(measure_time) over (partition by battery_id) from bat_volt)
select * from battery b join bat_volt bv on bv.battery_id=b.id where (b.id,bv.measure_time) in (select * from latest);
If you have Postgres version which supports lateral, it might also make sense to try it out (in case there are way more values than batteries, it could have better performance).
select * from battery b
join bat_volt bv on bv.battery_id=b.id
join lateral
(select battery_id, max(measure_time) over (partition by battery_id) from bat_volt bbv
where bbv.battery_id = b.id limit 1) lbb on (lbb.max = bv.measure_time AND lbb.battery_id = b.id);

T-SQL select all IDs that have value A and B

I'm trying to find all IDs in TableA that are mentioned by a set of records in TableB and that set if defined in Table C. I've come so far to the point where a set of INNER JOIN provide me with the following result:
TableA.ID | TableB.Code
-----------------------
1 | A
1 | B
2 | A
3 | B
I want to select only the ID where in this case there is an entry for both A and B, but where the values A and B are based on another Query.
I figured this should be possible with a GROUP BY TableA.ID and HAVING = ALL(Subquery on table C).
But that is returning no values.
Since you did not post your original query, I will assume it is inside a CTE. Assuming this, the query you want is something along these lines:
SELECT ID
FROM cte
WHERE Code IN ('A', 'B')
GROUP BY ID
HAVING COUNT(DISTINCT Code) = 2;
It's an extremely poor question, but you you probably need to compare distinct counts against table C
SELECT a.ID
FROM TableA a
GROUP BY a.ID
HAVING COUNT(DISTINCT a.Code) = (SELECT COUNT(*) FROM TableC)
We're guessing though.

Full outer join on multiple tables in PostgreSQL

In PostgreSQL, I have N tables, each consisting of two columns: id and value. Within each table, id is a unique identifier and value is numeric.
I would like to join all the tables using id and, for each id, create a sum of values of all the tables where the id is present (meaning the id may be present only in subset of tables).
I was trying the following query:
SELECT COALESCE(a.id, b.id, c.id) AS id,
COALESCE(a.value,0) + COALESCE(b.value,0) + COALESCE(c.value.0) AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
FULL OUTER JOIN
c
ON (b.id=c.id)
But it doesn't work for cases when the id is present in a and c, but not in b.
I suppose I would have to do some bracketing like:
SELECT COALESCE(x.id, c.id) AS id, x.value+c.value AS value
FROM
(SELECT COALESCE(a.id, b.id), a.value+b.value AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
) AS x
FULL OUTER JOIN
c
ON (x.id = c.id)
It was only 3 tables and the code is ugly enough already imho. Is there some elegant, systematic ways how to do the join for N tables? Not to get lost in my code?
I would also like to point out that I did some simplifications in my example. Tables a, b, c, ..., are actually results of quite complex queries over several materialized views. But the syntactical problem remains the same.
I understood you need to sum the values from N tables and group them by id, correct?
For that I would do this:
Select x.id, sum (x.value) from (
Select * from a
Union all
Select * from b
Union all........
) as x group by x.id;
Since the n tables are composed by the same fields you can union them all creating a big table full of all the id - value tuples from all tables. Use union all because union filters for duplicates!
Then just sum all the values grouped by id.

T-SQL query one table, get presence or absence of other table value

I'm not sure what this type of query is called so I've been unable to search for it properly. I've got two tables, Table A has about 10,000 rows. Table B has a variable amount of rows.
I want to write a query that gets all of Table A's results but with an added column, the value of that column is a boolean that says whether the result also appears in Table B.
I've written this query which works but is slow, it doesn't use a boolean but rather a count that will be either zero or one. Any suggested improvements are gratefully accepted:
SELECT u.number,u.name,u.deliveryaddress,
(SELECT COUNT(productUserid)
FROM ProductUser
WHERE number = u.number and productid = #ProductId)
AS IsInPromo
FROM Users u
UPDATE
I've run the query with actual execution plan enabled, I'm not sure how to show the results but various costs are:
Nested Loops (left semi join): 29%]
Clustered Index scan (User Table): 41%
Clustered Index Scan (ProductUser table): 29%
NUMBERS
There are 7366 users in the users table and currently 18 rows in the productUser table (although this will change and could be in the thousands)
You can use EXISTS to short circuit after the first row is found rather than COUNT-ing all matching rows.
SQL Server does not have a boolean datatype. The closest equivalent is BIT
SELECT u.number,
u.name,
u.deliveryaddress,
CASE
WHEN EXISTS (SELECT *
FROM ProductUser
WHERE number = u.number
AND productid = #ProductId) THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END AS IsInPromo
FROM Users u
RE: "I'm not sure what this type of query is called". This will give a plan with a semi join. See Subqueries in CASE Expressions for more about this.
Which management system are you using?
Try this:
SELECT u.number,u.name,u.deliveryaddress,
case when COUNT(p.productUserid) > 0 then 1 else 0 end
FROM Users u
left join ProductUser p on p.number = u.number and productid = #ProductId
group by u.number,u.name,u.deliveryaddress
UPD: this could be faster using mssql
;with fff as
(
select distinct p.number from ProductUser p where p.productid = #ProductId
)
select u.number,u.name,u.deliveryaddress,
case when isnull(f.number, 0) = 0 then 0 else 1 end
from Users u left join fff f on f.number = u.number
Since you seem concerned about performance, this query can perform faster as this will cause index seek on both tables versus an index scan:
SELECT u.number,
u.name,
u.deliveryaddress,
ISNULL(p.number, 0) IsInPromo
FROM Users u
LEFT JOIN ProductUser p ON p.number = u.number
WHERE p.productid = #ProductId