postgresql: postgresq create random integer between null to 100 - postgresql

I want to generate a table with 1000 rows of
-- random int between `1-100` (including 1 and 100)
-- random int between `1-100` and also null (including 1 and 100)
-- random float between `0-100` (including 0 and 100)
-- random float between `0-100`and also null (including 0 and 100)
-- random Male (M) and Female (F) i.e M/F values
-- random Male (M) and Female (F) including null/empty i.e M/F values
-- random names of cities from a list (i.e newyork, london, mumbai, dubai etc)
-- random names of cities from a list including null/empty (i.e newyork, london, mumbai, dubai etc)
Currently I know
create table foo as select random() as test,
from generate_series(1,1000) s(i);
How can I do this

You can use multiplication and type casts or CASE expressions.
To get an integer between 42 and 1001:
42 + CAST (floor(random() * 960) AS integer)
I cannot think of a way to generate a double precision value that includes the upper bound, but then you never need that with double precision.
To get m or f evenly distributed:
CASE WHEN random() < 0.5 THEN `m` ELSE `f` END
For the cities, select a random entry from a lookup table.

Related

Does postgres optimize same expressions in CASE

I want to get the number from text column of a database (for example the next text is 'Test 900 test')
select substring('Test 900 g' from '[0-9]+\.?[0-9]*')
The in-text number can be float (from 0 to 1) or integer (>100) and I want to cast it to the single float format with CASE the following way
select case when substring('Test 900 g' from '[0-9]+\.?[0-9]*')::numeric >= 100
then substring('Test 900 g' from '[0-9]+\.?[0-9]*')::numeric / 10000
else substring('Test 900 g' from '[0-9]+\.?[0-9]*')::numeric
end as single_format
Will Postgres recalculate the substring value every time in this CASE construction or will this expression be optimized and calculated only one time for each row?
Not easy to answer for your query.
Expression should be calculated only once in the following query :
select case
when num >= 100
then num / 10000
else num
end as single_format
from CAST(substring('Test 900 g' from '[0-9]+\.?[0-9]*') AS numeric) AS num

numeric range data type postgresql

I have a strange situation in the desing of my DB. I have the case that the type of value of a field can be a normal integer or a number between a range. I explain myself with a example:
the column age can be a number (18) or a range between (18-30). How I can represent this with postgresql?
Thx!
An integer range can represent both a single integer value and a range. The single value:
select int4range(18,18,'[]');
int4range
-----------
[18,19)
The ")" in the result above means exclusive.
The range:
select int4range(18,30,'[]');
int4range
-----------
[18,31)
There are a couple different ways to do this.
Store a VARCHAR
Store two values lower bound and upper bound
If there are only a select set of ranges you can create a lookup table for that set and store a foreign key to that lookup table.
You can make a bigger number, for example 18 x 1000 + 0 = 18000 for 18 and 18 x 1000 + 30 = 18030 for (18, 30).
When you retrieve it, you do first = round(number/1000) for the first number and second = number - first for the second number.
You can also store them as a point http://www.postgresql.org/docs/9.4/static/datatype-geometric.html#AEN6730.

Divide records into groups - quick solution

I need to divide with UPDATE command rows (selected from subselect) in PostgreSQL table into groups, these groups will be identified with integer value in one of its columns. These groups should be with the same size. Source table contains billions of records.
For example I need to divide 213 selected rows into groups, every group should contains 50 records. The result will be:
1 - 50. row => 1
51 - 100. row => 2
101 - 150. row => 3
151 - 200. row => 4
200 - 213. row => 5
There is no problem to do it with some loop (or use PostgreSQL window functions), but I need to do it very efficiently and quickly. I can't use sequence in id because there should be gaps in these ids.
I have an idea to use random integer number generator and set it as default value for a row. But this is not useable when I need to adjust group size.
The query below should display 213 rows with a group-number from 0-4. Just add 1 if you want 1-5
SELECT i, (row_number() OVER () - 1) / 50 AS grp
FROM generate_series(1001,1213) i
ORDER BY i;
create temporary sequence s minvalue 0 start with 0;
select *, nextval('s') / 50 grp
from t;
drop sequence s;
I think it has the potential to be faster than the row_number version #Richard. But the difference could be not relevant depending on the specifics.

Postgresql - VALUE between two columns

I have a long list of six digit numbers (e.g. 123456)
In my postgresql DB I have a table with two columns start_value and end_value. The table has rows with start and end values which are 9 digits in length and represent a range of numbers i.e. start_value might be 123450000 and end_value might be 123459999.
I need to match each of the six digit numbers with it's row in the DB table which falls in its range.
For many numbers in my list I can simply run the following
SELECT * FROM table WHERE start_value=(number + 000)
However, this does not cover numbers which fall inside a range, but do not match this pattern.
I have been trying statements such as:
SELECT * FROM table WHERE start_value > (number + 000) AND end_value < (number + 999)
But this doesn't work because some rows cover larger ranges than xxxxx0000 to xxxxx9999 and so the statement above may return 20 rows or none.
Any points would be most welcome!
EDIT: the Data Type of the columns are numeric(25)
Assuming number is numeric:
select *
from table
where number * 1000 between start_value and end_value
Ok, so if I'm understanding correctly, first you need to pad your search value to 9 digits. You can do that with this - 12345 * (10 ^ (9 - length(12345::text))).
length(12345::text) gets the number of digits you currently have, then it subtracts that from 9 and multiplies your search value by 10 to the power of the result. Then you just throw it in your search. The resulting query looks something like this -
SELECT * FROM table WHERE (12345 * (10 ^ (9 - length(12345::text)))) > start_value AND (12345 * (10 ^ (9 - length(12345::text)))) < end_value
You could also use the BETWEEN operator, but it is inclusive, which doesn't match the example query you have.
POSTGRESQL
Some time we stuck in data type casting problems and null value exceptions.
SELECT *
FROM TABLE
WHERE COALESCE(number::int8, 0::int8) * 1000 BETWEEN start_value::int8 AND end_value::int8
;
number::int8 type cast to integer
start_value::int8 type cast to integer
COALESCE(number::int8, 0::int8)
return number or zero if value is empty to avoid exceptions

Generate a random number of non duplicated random number in [0, 1001] through a loop

I need to generate a random number of non duplicated random number in plpgsql. The non duplicated number shall fall in the range of [1,1001]. However, the code generates number exceeding 1001.
directed2number := trunc(Random()*7+1);
counter := directed2number
while counter > 0
loop
to_point := trunc((random() * 1/directed2number - counter/directed2number + 1) * 1001 +1);
...
...
counter := counter - 1;
end loop;
If I understand right
You need a random number (1 to 8) of random numbers.
The random numbers span 1 to 1001.
The random numbers need to be unique. None shall appear more than once.
CREATE OR REPLACE FUNCTION x.unique_rand_1001()
RETURNS SETOF integer AS
$body$
DECLARE
nrnr int := trunc(random()*7+1); -- number of numbers
BEGIN
RETURN QUERY
SELECT (1000 * random())::integer + 1
FROM generate_series(1, nrnr*2)
GROUP BY 1
LIMIT nrnr;
END;
$body$ LANGUAGE plpgsql VOLATILE;
Call:
SELECT x.unique_rand_1001();
Numbers are made unique by the GROUP BY. I generate twice as many numbers as needed to provide enough numbers in case duplicates are removed. With the given dimensions of the task (max. 8 of 1001 numbers) it is astronomically unlikely that not enough numbers remain. Worst case scenario: viewer numbers are returned.
I wouldn't approach the problem that way in PostgreSQL.
From a software engineering point of view, I think I'd separate generating a random integer between x and y, generating 'n' of those integers, and guaranteeing the result is a set.
-- Returns a random integer in the interval [n, m].
-- Not rigorously tested. For rigorous testing, see Knuth, TAOCP vol 2.
CREATE OR REPLACE FUNCTION random_integer(integer, integer)
RETURNS integer AS
$BODY$
select cast(floor(random()*($2 - $1 +1)) + $1 as integer);
$BODY$
LANGUAGE sql VOLATILE
Then to select a single random integer between 1 and 1000,
select random_integer(1, 1000);
To select 100 random integers between 1 and 1000,
select random_integer(1, 1000)
from generate_series(1,100);
You can guarantee uniqueness in either application code or in the database. Ruby implements a Set class. Other languages have similar capabilities under various names.
One way to do this in the database uses a local temporary table. Erwin's right about the need to generate more integers than you need, to compensate for the removal of duplicates. This code generates 20, and selects the first 8 rows in the order they were inserted.
create local temp table unique_integers (
id serial primary key,
n integer unique
);
insert into unique_integers (n)
select random_integer(1, 1001) n
from generate_series(1, 20)
on conflict (n) do nothing;
select n
from unique_integers
order by id
fetch first 8 rows only;