postgres random using setseed - postgresql

I would like to add a column with a random number using setseed to a table.
The original table structure (test_input) col_a,col_b,col_c
Desired output (test_output) col_a, col_b, col_c, random_id
The following returns the same random_id on all rows instead of a different value in each row.
select col_a,col_b,col_c,setseed(0.5),(
select random() from generate_series(1,100) limit 1
) as random_id
from test_input
Could you help me modify the query that uses setseed and returns a different random_id in each row?

You have to use setseed differently. Also generate_series() is misued in your example. You need to use something like:
select setseed(0.5);
select col_a,col_b,col_c, random() as random_id from test_input;
If you want to get the same random number assigned to the same row, you will have to sort rows first, quoting documentation:
If the ORDER BY clause is specified, the returned rows are sorted in
the specified order. If ORDER BY is not given, the rows are returned
in whatever order the system finds fastest to produce.
You can use:
select setseed(0.5);
select *, random() as random_id from (
select col_a,col_b,col_c from test_input order by col_a, col_b, col_c) a;
Here I assume that combination of col_a, col_b, col_c is unique. If it's not the case, you will have to first add another column with unique ID to the table and sort by this column in the query above.

Related

Selecting an entry from PostgreSQL table based on time and id using psycopg2

I have the following table in PostgreSQL DB:
DB exempt
I need a PostgreSQL command to get a specific value from tbl column, based on time_launched and id columns. More precisely, I need to get a value from tbl column which corresponds to a specific id and latest (time-wise) value from time_launched column. Consequently, the request should return "x" as an output.
I've tried those requests (using psycopg2 module) but they did not work:
db_object.execute("SELECT * FROM check_ids WHERE id = %s AND MIN(time_launched)", (id_variable,))
db_object.execute(SELECT DISTINCT on(id, check_id) id, check_id, time_launched, tbl, tbl_1 FROM check_ids order by id, check_id time_launched desc)
Looks like a simple ORDER BY with a LIMIT 1 should do the trick:
SELECT tbl
FROM check_ids
WHERE id = %s
ORDER BY time_launched DESC
LIMIT 1
The WHERE clause filters results by the provided id, the ORDER BY clause ensures results are sorted in reverse chronological order, and LIMIT 1 only returns the first (most recent) row

Tiebreaker criterion of the mode() in postgres

When using the mode() aggregation function, which tiebreaker criterion does the method use?
select mode() within group (order by my_field) FROM my_table
I couldn't find any documentation related to that
What happens if the column has an equal amount of occurrence of the values
select my_field, count(*) FROM my_table group by 1
status
count
4096
24
4098
24
In this example above, I am getting 4096, but I would like to confirm if it actually gets the lowest result, or if this is happening for another reason
UPDATE:
I still don't know how to fix this so that it's not an arbitrary choice, for now I'm using another order by
select mode() within group (order by my_field) FROM my_table order by my_field
Per the docs, it is arbitrary:
mode () WITHIN GROUP ( ORDER BY anyelement ) → anyelement
Computes the mode, the most frequent value of the aggregated argument
(arbitrarily choosing the first one if there are multiple
equally-frequent values). The aggregated argument must be of a
sortable type.
https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE

Selecting random value from a table with multiple entries to insert into another table in hive

I need to select random values from above table where when there are multiple values (exampl:- of 3333,4444,6666) . Currently I am using below code which is biased in the final result.
insert into com_n3
select distinct number,min(district)
from com_n2
result will give more numbers with value "A" as the district. I need a unbiased random way to select from multiple entries.
you can get some random records using following query.
select number, district
from
(
select *, row_number() over (partition by number order rand()) as rank
from
temp.com_n2
) a
where a.rank=1

Postgres: Distinct but only for one column

I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id, name, metadata.
I want to select them randomly with ORDER BY RANDOM() and LIMIT 1000, so I do this is many steps to save some memory in my PHP script.
But how can I do that so it only gives me a list having no duplicates in names.
For example [1,"Michael Fox","2003-03-03,34,M,4545"] will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]. The name field is the most important and must be unique in the list everytime I do the select and it must be random.
I tried with GROUP BY name, bu then it expects me to have id and metadata in the GROUP BY as well or in a aggragate function, but I dont want to have them somehow filtered.
Anyone knows how to fetch many columns but do only a distinct on one column?
To do a distinct on only one (or n) column(s):
select distinct on (name)
name, col1, col2
from names
This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:
select distinct on (name)
name, col1, col2
from names
order by name, col1
Will return the first row when ordered by col1.
distinct on:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Anyone knows how to fetch many columns but do only a distinct on one column?
You want the DISTINCT ON clause.
You didn't provide sample data or a complete query so I don't have anything to show you. You want to write something like:
SELECT DISTINCT ON (name) fields, id, name, metadata FROM the_table;
This will return an unpredictable (but not "random") set of rows. If you want to make it predictable add an ORDER BY per Clodaldo's answer. If you want to make it truly random, you'll want to ORDER BY random().
To do a distinct on n columns:
select distinct on (col1, col2) col1, col2, col3, col4 from names
SELECT NAME,MAX(ID) as ID,MAX(METADATA) as METADATA
from SOMETABLE
GROUP BY NAME

ROWID equivalent in postgres 9.2

Is there any way to get rowid of a record in postgres??
In oracle i can use like
SELECT MAX(BILLS.ROWID) FROM BILLS
Yes, there is ctid column which is equivalent for rowid. But is useless for you. Rowid and ctid are physical row/tuple identifiers => can change after rebuild/vacuum.
See: Chapter 5. Data Definition > 5.4. System Columns
The PostgreSQL row_number() window function can be used for most purposes where you would use rowid. Whereas in Oracle the rowid is an intrinsic numbering of the result data rows, in Postgres row_number() computes a numbering within a logical ordering of the returned data. Normally if you want to number the rows, it means you expect them in a particular order, so you would specify which column(s) to order the rows when numbering them:
select client_name, row_number() over (order by date) from bills;
If you just want the rows numbered arbitrarily you can leave the over clause empty:
select client_name, row_number() over () from bills;
If you want to calculate an aggregate over the row number you'll have to use a subquery:
select max(rownum) from (
select row_number() over () as rownum from bills
) r;
If all you need is the last item from a table, and you have a column to sort sequentially, there's a simpler approach than using row_number(). Just reverse the sort order and select the first item:
select * from bills
order by date desc limit 1;
Use a Sequence. You can choose 4 or 8 byte values.
http://www.neilconway.org/docs/sequences/
Add any unique column to your table(name maybe rowid).
And prevent changing it by creating BEFORE UPDATE trigger, which will raise exception if someone will try to update.
You may populate this column with sequence as #JohnMudd mentioned.