I use PostgreSQL database and have a cards table.
Each record(card) in this table have card_drop_rate integer value.
For example:
id | card_name |card_drop_rate
-------------------------------
1 |card1 |34
2 |card2 |16
3 |card3 |54
max drop rate is 34 + 16 + 54 = 104.
In accordance to my application logic I need to find a random value between 0 and 104 and then retrieve card according to this number, for example:
random value: 71
card1 range: 0 - 34(0 + 34)
card2 range: 34 - 50(34 + 16)
card3 range: 50 - 104(50 + 54)
So, my card is card3 because 71 is placed in the range 50 - 104
What is the proper way to reflect this structure in PostgreSQL ? I'll need to query this data often under so the performance is a criterion number one for this solution.
Following query works fine:
SELECT
b.id,
b.card_drop_rate
FROM (SELECT a.id, sum(a.card_drop_rate) OVER(ORDER BY id) - a.card_drop_rate as rate, card_drop_rate FROM cards as a) b
WHERE b.rate < 299 ORDER BY id DESC LIMIT 1
You can do this using cumulative sums and random. The "+ 1"s might be throwing me off, but it is something like this:
with c as (
select c.*,
sum(card_drop_rate + 1) - card_drop_rate as threshhold
from cards c
),
r as (
select random() * (sum(card_drop_rate) + count(*) - 1) as which_card
from cards c
)
select c.*
from c cross join
r
where which_card >= threshhold
order by threshhold
limit 1;
For performance, I would simply take the cards and generate a new table with 106 slots. Assign the card value to the slots and build an index on the slot number. Then get a value using:
select s.*
from slots s
where s.slotid = floor(random() * 107);
Related
I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.
I have a large SQL statements(PostgreSQL version 11) with many CTE's, i want to use the results from an intermediary CTE to create a PIVOTed set of results and join it with other CTE.
Below is a small part of my query and the CTE "previous_months_actual_sales" is the one i need to PIVOT.
,last_24 as
(
SELECT l_24m::DATE + (interval '1' month * generate_series(0,24)) as last_24m
FROM last_24_month_start LIMIT 24
)
,previous_months_actual_sales as
(
SELECT TO_CHAR(created_at,'YYYY-MM') as dates
,b.code,SUM(quantity) as qty
FROM base b
INNER JOIN products_sold ps ON ps.code=b.code
WHERE TO_CHAR(created_at,'YYYY-MM')
IN(SELECT TO_CHAR(last_24m,'YYYY-MM') FROM last_24)
GROUP BY b.code,TO_CHAR(created_at,'YYYY-MM')
)
SELECT * FROM previous_months_actual_sales
The results of this CTE "previous_months_actual_sales" is shown below,
dates code qty
"2018-04" "0009" 23
"2018-05" "0009" 77
"2018-06" "0008" 44
"2018-07" "0008" 1
"2018-08" "0009" 89
The expected output based on the above result is,
code. 2018-04. 2018-05. 2018-06. 2018-07. 2018-08
"0009". 23 77 89
"0008". 44 1
Is there a way to achieve this?
I have a result set from which I want to get next n rows (or previous n rows) after (before) the row that matches a particular cell value.
So for example, here is my data:
A B C
1 10 100
2 20 200
3 30 300
4 40 400
5 50 500
6 60 600
I am interested to get next 3 rows after the row where C=300, including C=300 row, so my output should be
A B C
3 30 300
4 40 400
5 50 500
6 60 600
With FETCH and OFFSET, you need to know the exact position number of the row, here I have to search where the data condition, i.e C=300 resides so I cannot assume that it will be the 3rd row.
select *
from table
order by C asc
Assuming you've got a table named sample, you could use a nested query and window functions to solve your issue, something like:
select *
from (
select *, lag(c,3) over(order by c asc) as three_back
from sample
where sample.c >= 300
) t
where coalesce(three_back,300) = 300
If your rows are ordered by the column value you are interested in then
SELECT *
FROM table_name
WHERE column_name >= x
ORDER BY column_name
LIMIT n
should do it. If not you’ll have to get creative
If your column values are unique and you want to order by another value then
SELECT *
FROM table_name
WHERE other_column >= (
SELECT other_column
FROM table_name
WHERE column_value = x
)
ORDER BY other_column
LIMIT n
If your column values are not unique you can
SELECT MIN(other_column)
in the inner select. This finds the first occurrence (using the other column to order by), and then retrieves the next (n - 1) rows
Apologies if this has been answered elsewhere, I'm afraid I need a little more clarification/brushing up on the UPDATE FROM clause in PostgreSQL.
Basically I have a temporary table with some intermediary computed stuff that I want to use to update the main table. This temporary table includes two foreign keys and a score, such as:
score fk_offer fk_searchprofile
65 1764 12345
...
I tested the rows to be updated with a select (the table temp_offerids_with_score contains the offers that need to be updated):
SELECT s.pkid, tmp.fk_offer, s.fk_category, tmp.score, tmp.fk_searchprofile
FROM
temp_weighted_scores_offers AS tmp
INNER JOIN sc_sp_o_c_score AS s
ON tmp.fk_offer = s.fk_offer
WHERE
tmp.fk_offer IN (SELECT fk_offer FROM temp_offerids_with_score)
AND
s.fk_category = 1
AND s.fk_searchprofile = 12345;
This correctly returns the expected number of rows (in this case 10):
pkid fk_offer fk_category score fk_searchprofile
1 47 1 78 12345
2 137 1 64 12345
3 247 1 50 12345
...
However, if I use the same in an UPDATE FROM:
UPDATE sc_sp_o_c_score
SET score = tmp.score
FROM
temp_weighted_scores_offers AS tmp
INNER JOIN sc_sp_o_c_score AS s
ON tmp.fk_offer = s.fk_offer
WHERE
tmp.fk_offer IN (SELECT fk_offer FROM temp_offerids_with_score)
AND
s.fk_category = 1
AND s.fk_searchprofile = 12345;
the whole table, over 32000 rows, gets updated with the same (wrong, of course) score overall.
pkid fk_offer fk_searchprofile fk_category score
1 47 12345 1 104
2 137 12345 1 104
3 247 12345 1 104
What am I missing?
Thanks, Julian
EDIT: just in case this could be of any help - for the record, I'm migrating things from SQL Server here, where this is in fact a valid construct.
You are using the table to be updated also as a self-join (through reference in the FROM clause). Take that out and you should be good:
UPDATE sc_sp_o_c_score
SET score = tmp.score
FROM temp_weighted_scores_offers AS tmp
WHERE tmp.fk_offer = fk_offer
AND tmp.fk_offer IN (SELECT fk_offer FROM temp_offerids_with_score)
AND fk_category = 1
AND fk_searchprofile = 12345;
Hi if I have the following table:
Person------Score-------Score_type
1 30 A
1 35 A
1 15 B
1 16 B
2 74 A
2 68 A
2 40 B
2 39 B
Where for each person and score type I want to pick out the maximum score to obtain a table like:
Person------Score-------Score_type
1 35 A
1 16 B
2 74 A
2 40 B
I can do this using multiple select statements, but this will be cumbersome, especially later on. so I was wondering if there is a function which can help me do this. I have used the parititon function before but only to label sequences in a table....
select person,
score_type,
max(score) as score
from scores
group by person, score_type
order by person, score_type;
With "partition function" I guess you mean window functions. They can indeed be used for this as well:
select person
score_type,
score
from (
select person,
score_type,
score,
row_number() over (partition by person, score_type order by score desc) as rn
from scores
) t
where rn = 1
order by person, score_type;
Using the max() aggregate function along with the grouping by person and score_type should do the trick.