How to find the total values in a column which are less than that particular value in postgresql? - postgresql

I have a column which looks something like this
Quantity
20
40
10
25
I need to get the total number of values in that column which are less than that particular value like
Quantity Value
20 1
40 3
10 0
25 2

Join the table to itself on values less than the current value:
select a.quantity, count(distinct b.id)
from mytable a
left join mytable b on b.quantity < a.quantity
group by a.quantity
Selecting count(distinct b.id) handles there being non-unique quantities and the lowest value (which has no rows to join to, so the join will return a null, which count() won't count).

Related

Taking N-samples from each group in PostgreSQL

I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.

PostgreSQL SELECT COUNT returning a bunch of 1s

The following is my code that returns the correct number of rows of nameids that I am looking for (75). Then, when I do COUNT(DISTINCT nameid) at the top instead, it just returns 145 1s instead of the number of rows in my query (75). It just says
1
1
1
..
1
(145 rows)
What am I doing wrong?
SELECT
DISTINCT nameid
FROM
shop
WHERE
yearid >= 2000
GROUP BY
nameid,
yearid
HAVING
SUM(spend) > 98;
You should not use the same column in group by and in aggregated function in this way you obtain only 1 ( the distinct count of a value grouped bybthe same value is 1)
if you want count the DISTINCT nameid for each year with sum(spend) > 68 you should use
SELECT yearid, COUNT(DISTINCT nameid)
FROM shop
WHERE
yearid >= 2000
GROUP BY yearid
HAVING SUM(spend) > 98;

How can I fetch next n rows after a particular column value in postgresql

I have a result set from which I want to get next n rows (or previous n rows) after (before) the row that matches a particular cell value.
So for example, here is my data:
A B C
1 10 100
2 20 200
3 30 300
4 40 400
5 50 500
6 60 600
I am interested to get next 3 rows after the row where C=300, including C=300 row, so my output should be
A B C
3 30 300
4 40 400
5 50 500
6 60 600
With FETCH and OFFSET, you need to know the exact position number of the row, here I have to search where the data condition, i.e C=300 resides so I cannot assume that it will be the 3rd row.
select *
from table
order by C asc
Assuming you've got a table named sample, you could use a nested query and window functions to solve your issue, something like:
select *
from (
select *, lag(c,3) over(order by c asc) as three_back
from sample
where sample.c >= 300
) t
where coalesce(three_back,300) = 300
If your rows are ordered by the column value you are interested in then
SELECT *
FROM table_name
WHERE column_name >= x
ORDER BY column_name
LIMIT n
should do it. If not you’ll have to get creative
If your column values are unique and you want to order by another value then
SELECT *
FROM table_name
WHERE other_column >= (
SELECT other_column
FROM table_name
WHERE column_value = x
)
ORDER BY other_column
LIMIT n
If your column values are not unique you can
SELECT MIN(other_column)
in the inner select. This finds the first occurrence (using the other column to order by), and then retrieves the next (n - 1) rows

PGSQL duplicate record in same column

i have a table and i want to know where duplicate records are present for same columns. These are my columns and i want to get record where group_id or week are different for same code and fweek and newcode
Id newcode fweek code group_id week
1 343001 2016-01 343 100 8
2 343002 2016-01 343 100 8
3 343001 2016-01 343 101 08
Required record is
Id newcode fweek code group_id week
3 343001 2016-01 343 101 08
To find the duplicate values i have joined the table with itself.
and we need to group the results with code,fweek and newcode to get more than one duplicate rows if they exist. i have used max() to get last inserted row.
you don't need to use is distinct from (it is same for inequality + NULL). if you don't want to compare NULL ones, use <> operator.
You find more information about here info
select r.*
from your_table r
where r.id in (select max(r.id)
from your_table r
join your_table r2 on r2.code = r.code and r2.fweek = r.fweek and r2.newcode = r.newcode
where
r2.group_id is distinct from r.group_id or
r2.week is distinct from r.week
group by r.code,
r.fweek,
r.newcode
having count(*) > 1)

Postgresql difference between rows

My data:
id value
1 10
1 20
1 60
2 10
3 10
3 30
How to compute column 'change'?
id value change | my comment, how to compute
1 10 10 | 20-10
1 20 40 | 60-20
1 60 40 | default_value-60. In this example default_value=100
2 10 90 | default_value-10
3 10 20 | 30-10
3 30 70 | default_value-30
In other words: if row of id is last, then compute 100-value,
else compute next_value-value_now
You can access the value of the "next" (or "previous") row using a window function. The concept of a "next" row only makes sense if you have a column to define an order on the rows. You said you have a date column on which you can order the result. I used the column name your_date_column for this. You need to replace that with the actual column name of course.
select id,
value,
lead(value, 1, 100) over (partition by id order by your_date_column) - value as change
from the_table
order by id, your_date_column
lead(value, 1, 100) says: take the column value of the "next" row (that's the 1). If there is no such row, use the default value 100 instead.
Join on a subquery and use ROW_NUMBER to find the last value per group
WITH CTE AS(
SELECT id,value,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn,
(LEAD(value) OVER (PARTITION BY id ORDER BY date)-value) change FROM t)
SELECT cte.id,cte.value,
(CASE WHEN cte.change IS NULL THEN 100-cte.value ELSE cte.change END)as change FROM cte LEFT JOIN
(SELECT id,MAX(rn) mrn FROM cte
GROUP BY id) as x
ON x.mrn=cte.rn AND cte.id=x.id
FIDDLE