I have a dataset of food eaten:
create table test
(group_id integer,
food varchar,
item_type varchar);
insert into test values
(764, 'apple', 'new_food'),
(123, 'berry', 'new_food'),
(123, 'apple', 'others'),
(123, 'berry', 'others'),
(86, 'carrot', 'others'),
(86, 'carrot', 'new_food'),
(86, 'banana', 'others');
In each group, the new food eaten is of item_type new_food. The previous food that was being eaten is whatever else in the group doesn't equal the new_food's value.
The dataset I would like from this would be:
| group | previous_food | new_food |
------------------------------------
764 null apple
123 apple berry
86 banana carrot
However, I can't get the group selections correct. My attempt is currently:
select
group_id,
max(case when item_type != 'new_food' then food else null end) as previous_food,
max(case when item_type = 'new_food' then food else null end) as new_food
from test
group by group_id
However, we can't rely on the max() function to pick the correct previous food since they are not necessarily alphabetically ordered.
I just need whichever other food in the grouping != the new_food. How can I get this?
Can I avoid using a subquery or is that inevitable? The database says I can't nest aggregate functions and it is frustrating.
Here is my sqlfiddle so far: http://sqlfiddle.com/#!17/a2a46/1
EDIT: I've solved this with a subquery here: http://sqlfiddle.com/#!17/dd8b9/12 but can we do better? Surely there must be a way of doing this comparison easily within the grouping no?
Related
Student Records are updated for subject and update date. Student can be enrolled in one or multiple subjects. I would like to get each student record with most subject update date and status.
CREATE TABLE Student
(
StudentID int,
FirstName varchar(100),
LastName varchar(100),
FullAddress varchar(100),
CityState varchar(100),
MathStatus varchar(100),
MUpdateDate datetime2,
ScienceStatus varchar(100),
SUpdateDate datetime2,
EnglishStatus varchar(100),
EUpdateDate datetime2
);
Desired query output, I am using CTE method but trying to find alternative and better way.
SELECT StudentID, FirstName, LastName, FullAddress, CityState, [SubjectStatus], UpdateDate
FROM Student
;WITH orginal AS
(SELECT * FROM Student)
,Math as
(
SELECT DISTINCT StudentID, FirstName, LastName, FullAddress, CityState,
ROW_NUMBER OVER (PARTITION BY StudentID, MathStatus ORDER BY MUpdateDate DESC) as rn
, _o.MathStatus as SubjectStatus, _o.MupdateDate as UpdateDate
FROM original as o
left join orignal as _o on o.StudentID = _o.StudentID
where _o.MathStatus is not null and _o.MUpdateDate is not null
)
,Science AS
(
...--Same as Math
)
,English AS
(
...--Same As Math
)
SELECT * FROM Math WHERE rn = 1
UNION
SELECT * FROM Science WHERE rn = 1
UNION
SELECT * FROM English WHERE rn = 1
First: storing data in a denormalized form is not recommended. Some data model redesign might be in order. There are multiple resources about data normalization available on the web, like this one.
Now then, I made some guesses about how your source table is populated based on the query you wrote. I generated some sample data that could show how the source data is created. Besides that I also reduced the number of columns to reduce my typing efforts. The general approach should still be valid.
Sample data
create table Student
(
StudentId int,
StudentName varchar(15),
MathStat varchar(5),
MathDate date,
ScienceStat varchar(5),
ScienceDate date
);
insert into Student (StudentID, StudentName, MathStat, MathDate, ScienceStat, ScienceDate) values
(1, 'John Smith', 'A', '2020-01-01', 'B', '2020-05-01'),
(1, 'John Smith', 'A', '2020-01-01', 'B+', '2020-06-01'), -- B for Science was updated to B+ month later
(2, 'Peter Parker', 'F', '2020-01-01', 'A', '2020-05-01'),
(2, 'Peter Parker', 'A+', '2020-03-01', 'A', '2020-05-01'), -- Spider-Man would never fail Math, fixed...
(3, 'Tom Holland', null, null, 'A', '2020-05-01'),
(3, 'Tom Holland', 'A-', '2020-07-01', 'A', '2020-05-01'); -- Tom was sick for Math, but got a second chance
Solution
Your question title already contains the word unpivot. That word actually exists in T-SQL as a keyword. You can learn about the unpivot keyword in the documentation. Your own solution already contains common table expression, these constructions should look familiar.
Steps:
cte_unpivot = unpivot all rows, create a Subject column and place the corresponding values (SubjectStat, Date) next to it with a case expression.
cte_recent = number the rows to find the most recent row per student and subject.
Select only those most recent rows.
This gives:
with cte_unpivot as
(
select up.StudentId,
up.StudentName,
case up.[Subject]
when 'MathStat' then 'Math'
when 'ScienceStat' then 'Science'
end as [Subject],
up.SubjectStat,
case up.[Subject]
when 'MathStat' then up.MathDate
when 'ScienceStat' then up.ScienceDate
end as [Date]
from Student s
unpivot ([SubjectStat] for [Subject] in ([MathStat], [ScienceStat])) up
),
cte_recent as
(
select cu.StudentId, cu.StudentName, cu.[Subject], cu.SubjectStat, cu.[Date],
row_number() over (partition by cu.StudentId, cu.[Subject] order by cu.[Date] desc) as [RowNum]
from cte_unpivot cu
)
select cr.StudentId, cr.StudentName, cr.[Subject], cr.SubjectStat, cr.[Date]
from cte_recent cr
where cr.RowNum = 1;
Result
StudentId StudentName Subject SubjectStat Date
----------- --------------- ------- ----------- ----------
1 John Smith Math A 2020-01-01
1 John Smith Science B+ 2020-06-01
2 Peter Parker Math A+ 2020-03-01
2 Peter Parker Science A 2020-05-01
3 Tom Holland Math A- 2020-07-01
3 Tom Holland Science A 2020-05-01
I have some interesting data, I'm trying to query however I cannot get the syntax correct. I have a temporary table (temp_id), which I've filled with the id values I care about. In this example it is only two ids.
CREATE TEMPORARY TABLE temp_id (id bigint PRIMARY KEY);
INSERT INTO temp_id (id) VALUES ( 1 ), ( 2 );
I have another table in production (let's call it foo) which holds multiples those ids in a single cell. The ids column looks like this (below) with ids as a single string separated by "|"
ids
-----------
1|9|3|4|5
6|5|6|9|7
NULL
2|5|6|9|7
9|11|12|99
I want to evaluate each cell in foo.ids, and see if any of the ids in match the ones in my temp_id table.
Expected output
ids |does_match
-----------------------
1|9|3|4|5 |true
6|5|6|9|7 |false
NULL |false
2|5|6|9|7 |true
9|11|12|99 |false
So far I've come up with this, but I can't seem to return anything. Instead of trying to create a new column does_match I tried to filter within the WHERE statement. However, the issue is I cannot figure out how to evaluate all the id values in my temp table to the string blob full of the ids in foo.
SELECT
ids,
FROM foo
WHERE ids = ANY(SELECT LISTAGG(id, ' | ') FROM temp_ids)
Any suggestions would be helpful.
Cheers,
this would work, however not sure about performance
SELECT
ids
FROM foo
JOIN temp_ids
ON '|'||foo.ids||'|' LIKE '%|'||temp_ids.id::varchar||'|%'
you wrap the IDs list into a pair of additional separators, so you can always search for |id| including the first and the last number
The following SQL (I know it's a bit of a hack) returns exactly what you expect as an output, tested with your sample data, don't know how would it behave on your real data, try and let me know
with seq AS ( # create a sequence CTE to implement postgres' unnest
select 1 as i union all # assuming you have max 10 ids in ids field,
# feel free to modify this part
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10)
select distinct ids,
case # since I can't do a max on a boolean field, used two cases
# for 1s and 0s and converted them to boolean
when max(case
when t.id in (
select split_part(ids,'|',seq.i) as tt
from seq
join foo f on seq.i <= REGEXP_COUNT(ids, '|') + 1
where tt != '' and k.ids = f.ids)
then 1
else 0
end) = 1
then true
else false
end as does_match
from temp_id t, foo
group by 1
Please let me know if this works for you!
I've got three tables: users, courses, and grades, the latter of which joins users and courses with some metadata like the user's score for the course. I've created a SQLFiddle, though the site doesn't appear to be working at the moment. The schema looks like this:
CREATE TABLE users(
id INT,
name VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO users VALUES
(1, 'Beth'),
(2, 'Alice'),
(3, 'Charles'),
(4, 'Dave');
CREATE TABLE courses(
id INT,
title VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO courses VALUES
(1, 'Biology'),
(2, 'Algebra'),
(3, 'Chemistry'),
(4, 'Data Science');
CREATE TABLE grades(
id INT,
user_id INT,
course_id INT,
score INT,
PRIMARY KEY (ID)
);
INSERT INTO grades VALUES
(1, 2, 2, 89),
(2, 2, 1, 92),
(3, 1, 1, 93),
(4, 1, 3, 88);
I'd like to know how (if possible) to construct a query which specifies some users.id values (1, 2, 3) and courses.id values (1, 2, 3) and returns those users' grades.score values for those courses
| name | Algebra | Biology | Chemistry |
|---------|---------|---------|-----------|
| Alice | 89 | 92 | |
| Beth | | 93 | 88 |
| Charles | | | |
In my application logic, I'll be receiving an array of user_ids and course_ids, so the query needs to select those users and courses dynamically by primary key. (The actual data set contains millions of users and tens of thousands of courses—the examples above are just a sample to work with.)
Ideally, the query would:
use the course titles as dynamic attributes/column headers for the users' score data
sort the row and column headers alphabetically
include empty/NULL cells if the user-course pair has no grades relationship
I suspect I may need some combination of JOINs and Postgresql's crosstab, but I can't quite wrap my head around it.
Update: learning that the terminology for this is "dynamic pivot", I found this SO answer which appears to be trying to solve a related problem in Postgres with crosstab()
I think a simple pivot query should work here, since you only have 4 courses in your data set to pivot.
SELECT t1.name,
MAX(CASE WHEN t3.title = 'Biology' THEN t2.score ELSE NULL END) AS Biology,
MAX(CASE WHEN t3.title = 'Algebra' THEN t2.score ELSE NULL END) AS Algebra,
MAX(CASE WHEN t3.title = 'Chemistry' THEN t2.score ELSE NULL END) AS Chemistry,
MAX(CASE WHEN t3.title = 'Data Science' THEN t2.score ELSE NULL END) AS Data_Science
FROM users t1
LEFT JOIN grades t2
ON t1.id = t2.user_id
LEFT JOIN courses t3
ON t2.course_id = t3.id
GROUP BY t1.name
Follow the link below for a running demo. I used MySQL because, as you have noticed, SQLFiddle seems to be perpetually busted the other databases.
SQLFiddle
I have this table:
ID Value
------------
1 car
1 moto
2 car
2 moto
3 moto
3 apple
4 gel
4 moto
5 NULL
note that moto is common to all IDs.
I would to obtain a single row with this result
car*, moto, apple*, gel*
i.e.
car, apple, gel with an asterisk because is present but NOT in all IDs
moto without an asterisk because is COMMON to all IDs
If ID + Value are Unique
SELECT Value, CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable) THEN '*' ELSE '' END AS Asterisk FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
Note that this won't group in a single line. And note that your question is wrong. ID 5 is an ID, so moto isn't common to all the IDs. It's common to all the IDs that have at least a value.
If we filter these IDs as written,
SELECT Value, CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable WHERE Value IS NOT NULL) THEN '*' ELSE '' END FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
To "merge" the * with Value, simply replace the , with a +, like:
SELECT Value + CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable WHERE Value IS NOT NULL) THEN '*' ELSE '' END Value FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
To make a single line use one of https://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/ I'll add that, sadly, tsql doesn't have any native method to do it, and all the alternatives are a little ugly :-)
In general, the string aggregation part is quite common on SO (and outside of it) Concatenate row values T-SQL, tsql aggregate string for group by, Implode type function in SQL Server 2000?, How to return multiple values in one column (T-SQL)? and too many others to count :-)
I have to write a drop down query for countries.
But USA should always be first.
The rest of the countries are in alphabetical order
I tried the following query
SELECT
countries_id
,countries_name
FROM get_countries
WHERE
countries_id = 138
UNION
SELECT
countries_id
,countries_name
FROM get_countries
WHERE
countries_id != 138
ORDER BY 2 ASC
Something like this maybe:
ORDER BY
CASE
WHEN upper(country_name) = 'USA' then '0'
ELSE lower(country_name)
END
Here's a complete example
create TABLE countries (country_name VARCHAR2(50));
INSERT INTO countries VALUES ('USA');
INSERT INTO countries VALUES ('India');
INSERT INTO countries VALUES ('Russia');
INSERT INTO countries VALUES ('China');
COMMIT;
SELECT country_name
FROM countries
ORDER BY
CASE
WHEN upper(country_name) = 'USA' then '0'
ELSE lower(country_name)
END
Returns:
USA
China
India
Russia
It's been a while since I worked with oracle, but you can try ORDER BY countries_name = 'USA', countries_name ASC.
Correction
Sorry that didn't work. I had "countries_name" mis-typed as "country_name", so it may work now.
You could also use ORDER BY decode(countries_name, 'USA', 0, 1), countries_name ASC.