Use Group By without Aggregate - postgresql

I have below data
UserId Val txt
100 10 A
200 25 B
100 30 GV
300 15 BHG
200 20 BGV
and want to write a query that give min(val) for each user
Result :
100 10 A
200 20 BGV
300 15 BHG

Try this:
SELECT DISTINCT ON (userID) *
FROM your_table
ORDER BY userID, val

Related

Taking N-samples from each group in PostgreSQL

I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.

Add condition to where clause in q/kdb+

Table Tab
minThreshold
maxThreshold
point
1000
10000
10
wClause,:enlist((';~:;<);`qty;Tab[`minThreshold])
trying to incorporate maxThreshold column to where clause
qty >= MinThreshold
qty <= MaxThreshold
something like
wClause,:enlist((';~:;<);`qty;Tab[`minThreshold]);Tab[`maxThreshold])
q)Tab:([] minThreshold:500 1000;maxThreshold:700 2000;point:5 10)
q)Tab
minThreshold maxThreshold point
-------------------------------
500 700 5
1000 2000 10
q)select from Tab where minThreshold>=900,maxThreshold<=2500
minThreshold maxThreshold point
-------------------------------
1000 2000 10
q)parse"select from Tab where minThreshold>=900,maxThreshold<=2500"
?
`Tab
,(((';~:;<);`minThreshold;900);((';~:;>);`maxThreshold;2500))
0b
()
q)?[Tab;((>=;`minThreshold;900);(<=;`maxThreshold;2500));0b;()]
minThreshold maxThreshold point
-------------------------------
1000 2000 10
See the whitepaper for more information on functional selects:
https://code.kx.com/q/wp/parse-trees/
Is your problem
you have a Where phrase that works for functional qSQL and you want to extend it?
you want to select rows of a table where the value of a quantity falls within an upper and lower bound?
If (2) you can use Join Each to get the bounds for each row, and within to test the quantity.
q)show t:([]lwr:1000 900 150;upr:10000 25000 500;qty:10 1000 450)
lwr upr qty
---------------
1000 10000 10
900 25000 1000
150 500 450
q)select from t where qty within' lwr{x,y}'upr
lwr upr qty
--------------
900 25000 1000
150 500 450
Above we use {x,y} because in qSQL queries comma does not denote Join.

how to get list from 1 to 10 and from 11 to 20 values (postgres)

SELECT * FROM users WHERE user_name = 'Andrew' ORDER BY age DESC
Here I have some call . For example I have db in which I have 3 columns :
user_name, age , id
In this db we have 30 the same names ('Andrew') . I want to make order by age as u see above and get list from 1 to 10 , then from 11 to 20 ,then from 21 to 30 .How to make it ?
/get-users/:from/:to
You may use LIMIT and OFFSET, e.g. to get records 11-20 you could do this:
SELECT *
FROM users
WHERE user_name = 'Andrew'
ORDER BY age DESC
OFFSET 10 LIMIT 10;

Postgresql Query for display of records every 45 days

I have a table that has data of user_id and the timestamp they joined.
If I need to display the data month-wise I could just use:
select
count(user_id),
date_trunc('month',(to_timestamp(users.timestamp))::timestamp)::date
from
users
group by 2
The date_trunc code allows to use 'second', 'day', 'week' etc. Hence I could get data grouped by such periods.
How do I get data grouped by "n-day" period say 45 days ?
Basically I need to display number users per 45 day period.
Any suggestion or guidance appreciated!
Currently I get:
Date Users
2015-03-01 47
2015-04-01 72
2015-05-01 123
2015-06-01 132
2015-07-01 136
2015-08-01 166
2015-09-01 129
2015-10-01 189
I would like the data to come in 45 days interval. Something like :-
Date Users
2015-03-01 85
2015-04-15 157
2015-05-30 192
2015-07-14 229
2015-08-28 210
2015-10-12 294
UPDATE:
I used the following to get the output, but one problem remains. I'm getting values that are offset.
with
new_window as (
select
generate_series as cohort
, lag(generate_series, 1) over () as cohort_lag
from
(
select
*
from
generate_series('2015-03-01'::date, '2016-01-01', '45 day')
)
t
)
select
--cohort
cohort_lag -- This worked. !!!
, count(*)
from
new_window
join users on
user_timestamp <= cohort
and user_timestamp > cohort_lag
group by 1
order by 1
But the output I am getting is:
Date Users
2015-04-15 85
2015-05-30 157
2015-07-14 193
2015-08-28 225
2015-10-12 210
Basically The users displayed at 2015-03-01 should be the users between 2015-03-01 and 2015-04-15 and so on.
But I seem to be getting values of users upto a date. ie: upto 2015-04-15 users 85. which is not the results I want.
Any help here ?
Try this query :
SELECT to_char(i::date,'YYYY-MM-DD') as date, 0 as users
FROM generate_series('2015-03-01', '2015-11-30','45 day'::interval) as i;
OUTPUT :
date users
2015-03-01 0
2015-04-15 0
2015-05-30 0
2015-07-14 0
2015-08-28 0
2015-10-12 0
2015-11-26 0
This looks like a hot mess, and it might be better wrapped in a function where you could use some variables, but would something like this work?
with number_of_intervals as (
select
min (timestamp)::date as first_date,
ceiling (extract (days from max (timestamp) - min (timestamp)) / 45)::int as num
from users
),
intervals as (
select
generate_series(0, num - 1, 1) int_start,
generate_series(1, num, 1) int_end
from number_of_intervals
),
date_spans as (
select
n.first_date + 45 * i.int_start as interval_start,
n.first_date + 45 * i.int_end as interval_end
from
number_of_intervals n
cross join intervals i
)
select
d.interval_start, count (*) as user_count
from
users u
join date_spans d on
u.timestamp >= d.interval_start and
u.timestamp < d.interval_end
group by
d.interval_start
order by
d.interval_start
With this sample data:
User Id timestamp derived range count
1 3/1/2015 3/1-4/15
2 3/26/2015 "
3 4/4/2015 "
4 4/6/2015 " (4)
5 5/6/2015 4/16-5/30
6 5/19/2015 " (2)
7 6/16/2015 5/31-7/14
8 6/27/2015 "
9 7/9/2015 " (3)
10 7/15/2015 7/15-8/28
11 8/8/2015 "
12 8/9/2015 "
13 8/22/2015 "
14 8/27/2015 " (5)
Here is the output:
2015-03-01 4
2015-04-15 2
2015-05-30 3
2015-07-14 5

T-SQL: How to use MIN

I have the following simple table, called TableA
ID SomeVal
1 10
1 20
1 30
2 40
2 50
3 60
I want to select only those rows where SomeVal is the smallest value for the same ID value. So my results should look like this:
1 10
2 40
3 60
I think I need Group By in my SQL but am not sure how. Thanks for your help.
SELECT ID, MIN(SomeVal)
FROM [TableName]
GROUP BY ID
Group by will perform the aggregate function (MIN) for every unique value that is grouped by, and return the result.
I think this will do what you need:
Select ID, Min(SomeVal)
From MyTable
Group By ID