Usage of DISTINCT in reversed int pairs duplicates elimination - postgresql

I have a following question:
create table memorization_word_translation
(
id serial not null
from_word_id integer not null
to_word_id integer not null
);
This table stores pairs of integers, that are often in reverse order, for example:
35 36
35 37
36 35
37 35
37 39
39 37
Question is - if I make a query, for example:
select * from memorization_word_translation
where from_word_id = 35 or to_word_id = 35
I would get
35 36
35 37
36 35 - duplicate of 35 36
37 35 - duplicate of 35 37
How is to use DISTINCT in this example to filter out all duplicates even if they are reversed?
I want to keep it only like this:
35 36
35 37

You can do it with ROW_NUMBER() window function:
select from_word_id, to_word_id
from (
select *,
row_number() over (
partition by least(from_word_id, to_word_id),
greatest(from_word_id, to_word_id)
order by (from_word_id > to_word_id)::int
) rn
from memorization_word_translation
where 35 in (from_word_id, to_word_id)
) t
where rn = 1
See the demo.

demo:db<>fiddle
You could try a it with a small sorting algorithm (here a comparison) in combination with DISTINCT ON.
The DISTINCT ON clause works an arbitrary columns or terms, e.g. on a tuple. This CASE clause sorts the two columns into tuples and removes tied (ordered) ones. The source columns can be returned in your SELECT statement:
select distinct on (
CASE
WHEN (from_word_id >= to_word_id) THEN (from_word_id, to_word_id)
ELSE (to_word_id, from_word_id)
END
)
*
from memorization_word_translation
where from_word_id = 35 or to_word_id = 35

Related

Trying to partition to remove rows where two columns don't match sql

How can I filter out rows within a group that do not have matching values in two columns?
I have a table A like:
CODE
US_ID
US_PRICE
NON_US_ID
NON_US_PRICE
5109
57
10
75
10
0206
85
11
58
11
0206
85
15
33
14
0206
85
41
22
70
T100
20
10
49
NULL
T100
20
38
64
38
Within each CODE group, I want to check whether US_PRICE = NON_US_PRICE and remove that row from the resulting table.
I tried:
SELECT *,
CASE WHEN US_PRICE != NON_US_PRICE OVER (PARTITION BY CODE) END
FROM A;
but I think I am missing something when I try to partition by CODE.
I want the resulting table to look like
CODE
US_ID
US_PRICE
NON_US_ID
NON_US_PRICE
0206
85
15
33
14
0206
85
41
22
70
T100
20
10
49
NULL
For provided sample, simple WHERE clause could produce such result:
SELECT *
FROM A
WHERE US_PRICE IS DISTINCT FROM NON_US_PRICE;
IS DISTINCT FROM handles NULLs comparing to != operator.

How do I adds a substring to every column in kdb

My current columns in kdb is (Time;Buy;Sell). What should I do to change my column names to (Time_hist;Buy_hist;Sell_hist)?
Thank you!
Following expression will append "_hist" to all column names
(`$(string cols t),\:"_hist") xcol t
where t is table.
string cols t - retrieves all column names and converts them to strings
(string cols t),\:"_hist" appends "_hist" to each column name on the left
colnames xcol t renames table column names. See xcol for more details
You can use xcol to rename columns in kdb:
q)tab:([] Time:(.z.t-10;.z.t-5;.z.t);Buy:23 35 42;Sell:22 33 40)
q)tab
Time Buy Sell
---------------------
15:51:50.746 23 22
15:51:50.751 35 33
15:51:50.756 42 40
q)`Time_hist`Buy_hist`Sell_hist xcol tab
Time_hist Buy_hist Sell_hist
-------------------------------
15:51:50.746 23 22
15:51:50.751 35 33
15:51:50.756 42 40
More documentation can be found at:
https://code.kx.com/q/ref/cols/

Postgres with recursive - result to include both grandparent and grandchildren

My family tree contains children and parents. I can write a query that returns the grandchildren for a given person using postgres with recursive. But how can I include the grandparent in the results? My target is to report the numbers of grandchildren, grouped by grandparent id. How do I refer to the non-recursive part of the query in the final results?
edited - example table:
child parent
11 null
12 null
13 null
21 11
22 11
31 12
32 12
33 12
41 13
81 21
82 21
83 21
91 33
92 33
non-recursive part of query:
select distinct child where parent is null -- as grandparent
desired result:
grandparent, grandchildren
11 3
12 2
13 0
this will do:
with recursive zzz AS (
select child AS grandparent, child AS current_child, 1 AS grand_level
FROM thetable AS tt
where parent is null
UNION
SELECT grandparent, tt.child, grand_level+1
FROM thetable AS tt
JOIN zzz
ON tt.parent = zzz.current_child
)
SELECT grandparent, COUNT(DISTINCT current_child)FILTER(WHERE grand_level = 3) AS grandchildren
FROM zzz
GROUP BY grandparent;

Ordering row having same values in two columns on top

I have data similar to one given below:
ID UserID PlayerID Name
1 56 21 A
2 57 34 B
3 77 77 C
4 65 23 D
5 77 77 E
I want the rows with same value in UserID and PlayerID column to be at the top.
I have currently done this:
select * from tblTest
order by abs(UserID - PlayerID ) asc
Any better way to achieve this result?
Try this
SELECT * From tblTest
Order By Case When UserID = PlayerID Then 0 Else 1 End

Subselect and Max

Alright, I've been trying to conceptualize this for a better part of the afternoon and still cannot figure out how to structure this subselect.
The data that I need to report are ages for a given student major grouped by the past 3 fiscal years. Each fiscal year has 3 semesters (summer, fall, spring). I need to have my query grouped on the fiscalyear and agerange fields and then count the distinct student id's.
I currently have this for my SQL statement:
Select COUNT(distinct StuID), AgeRange, FiscalYear
from tblStatic
where Campus like 'World%' and (enrl_act like 'REG%' or enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by FiscalYear, AgeRange
order by FiscalYear, AgeRange
So this is all fine and dandy except it doesn't match my headcount of students for the fiscalyear. The reason being, that people may cross over in the age ranges during the fiscal year and is adding them to my count twice.
How can I use a subselect to resolve this duplicate entry? The field I have been trying to get working is my semester field and using a max to find the max semester during a fiscalyear for a given student.
Data Sample:
Count AgeRange FiscalYear
3 1 to 19 09/10
20 20 to 23 09/10
60 24 to 29 09/10
96 30 to 39 09/10
34 40 to 49 09/10
14 50 to 59 09/10
3 60+ 09/10
2 1 to 19 10/11
24 20 to 23 10/11
73 24 to 29 10/11
109 30 to 39 10/11
43 40 to 49 10/11
11 50 to 59 10/11
2 60+ 10/11
1 1 to 19 11/12
17 20 to 23 11/12
75 24 to 29 11/12
123 30 to 39 11/12
44 40 to 49 11/12
14 50 to 59 11/12
2 60+ 11/12
Solution: (Just got this working and produced my headcounts that match what they are suppose to be)
Select COUNT(distinct S.StuID), AR.AgeRange, S.FiscalYear
from tblStatic S
INNER JOIN
( Select S.StuID, MIN(AgeRange) as AgeRange
From tblStatic S
Group By S.StuID) AR on S.StuID=AR.StuID
where Campus like 'World%' and (enrl_act like 'REG%' or
enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by S.FiscalYear, AR.AgeRange
order by S.FiscalYear, AR.AgeRange
Replace each student's age range with its maximum (or minimum, if you like) age range that fiscal year, then count them:
;
WITH sourceData AS (
SELECT
StudID,
MaxAgeRangeThisFiscalYear = MAX(AgeRange) OVER
(PARTITION BY StudID, FiscalYear),
FiscalYear
FROM tblStatic
WHERE Campus LIKE 'World%'
AND (enrl_act LIKE 'REG%' OR enrl_act LIKE 'SCH%')
AND StuMaj = 'LAWSC'
AND FiscalYear IN ('09/10', '10/11', '11/12')
)
SELECT
FiscalYear,
AgeRange = MaxAgeRangeThisFiscalYear,
Count = COUNT(DISTINCT StudID)
FROM sourceData
GROUP BY
FiscalYear,
MaxAgeRangeThisFiscalYear
ORDER BY
FiscalYear,
MaxAgeRangeThisFiscalYear