I know how to get random numbers in rows inserted into a table with a single select. But how can I do that such that the result of each select in an iteration of a while loop is different from each other?
The code I'm using looks like this. The values I get in each row in a set are different, but the sets of values are the same for every iteration.
WHILE some condition is true
BEGIN
DECLARE #GamesRandomlySorted TABLE
(
RandomSortId INT,
GameId INT
)
INSERT INTO #GamesRandomlySorted (RandomSortId, GameId)
SELECT Checksum(NewId()), GameId
FROM Games
SELECT *
FROM #GamesRandomlySorted
ORDER BY RandomSortId
END
Why don't you use Rand instead? According to Microsoft, if you don't specify seed as a parameter, you will get different values each time you run the script.
Also see this article - it shows how to combine NewId and Rand.
Related
Say I have a table like posts, which has typical columns like id, body, created_at. I'd like to generate a unique string with the creation of each post, for use in something like a url shortener. So maybe a 10-character alphanumeric string. It needs to be unique within the table, just like a primary key.
Ideally there would be a way for Postgres to handle both of these concerns:
generate the string
ensure its uniqueness
And they must go hand-in-hand, because my goal is to not have to worry about any uniqueness-enforcing code in my application.
I don't claim the following is efficient, but it is how we have done this sort of thing in the past.
CREATE FUNCTION make_uid() RETURNS text AS $$
DECLARE
new_uid text;
done bool;
BEGIN
done := false;
WHILE NOT done LOOP
new_uid := md5(''||now()::text||random()::text);
done := NOT exists(SELECT 1 FROM my_table WHERE uid=new_uid);
END LOOP;
RETURN new_uid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;
make_uid() can be used as the default for a column in my_table. Something like:
ALTER TABLE my_table ADD COLUMN uid text NOT NULL DEFAULT make_uid();
md5(''||now()::text||random()::text) can be adjusted to taste. You could consider encode(...,'base64') except some of the characters used in base-64 are not URL friendly.
All existing answers are WRONG because they are based on SELECT while generating unique index per table record. Let us assume that we need unique code per record while inserting: Imagine two concurrent INSERTs are happening same time by miracle (which happens very often than you think) for both inserts same code was generated because at the moment of SELECT that code did not exist in table. One instance will INSERT and other will fail.
First let us create table with code field and add unique index
CREATE TABLE my_table
(
code TEXT NOT NULL
);
CREATE UNIQUE INDEX ON my_table (lower(code));
Then we should have function or procedure (you can use code inside for trigger also) where we 1. generate new code, 2. try to insert new record with new code and 3. if insert fails try again from step 1
CREATE OR REPLACE PROCEDURE my_table_insert()
AS $$
DECLARE
new_code TEXT;
BEGIN
LOOP
new_code := LOWER(SUBSTRING(MD5(''||NOW()::TEXT||RANDOM()::TEXT) FOR 8));
BEGIN
INSERT INTO my_table (code) VALUES (new_code);
EXIT;
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
This is guaranteed error free solution not like other solutions on this thread
Use a Feistel network. This technique works efficiently to generate unique random-looking strings in constant time without any collision.
For a version with about 2 billion possible strings (2^31) of 6 letters, see this answer.
For a 63 bits version based on bigint (9223372036854775808 distinct possible values), see this other answer.
You may change the round function as explained in the first answer to introduce a secret element to have your own series of strings (not guessable).
The easiest way probably to use the sequence to guarantee uniqueness
(so after the seq add a fix x digit random number):
CREATE SEQUENCE test_seq;
CREATE TABLE test_table (
id bigint NOT NULL DEFAULT (nextval('test_seq')::text || (LPAD(floor(random()*100000000)::text, 8, '0')))::bigint,
txt TEXT
);
insert into test_table (txt) values ('1');
insert into test_table (txt) values ('2');
select id, txt from test_table;
However this will waste a huge amount of records. (Note: the max bigInt is 9223372036854775807 if you use 8 digit random number at the end, you can only have 922337203 records. Thou 8 digit is probably not necessary. Also check the max number for your programming environment!)
Alternatively you can use varchar for the id and even convert the above number with to_hex() or change to base36 like below (but for base36, try to not expose it to customer, in order to avoid some funny string showing up!):
PostgreSQL: Is there a function that will convert a base-10 int into a base-36 string?
Check out a blog by Bruce. This gets you part way there. You will have to make sure it doesn't already exist. Maybe concat the primary key to it?
Generating Random Data Via Sql
"Ever need to generate random data? You can easily do it in client applications and server-side functions, but it is possible to generate random data in sql. The following query generates five lines of 40-character-length lowercase alphabetic strings:"
SELECT
(
SELECT string_agg(x, '')
FROM (
SELECT chr(ascii('a') + floor(random() * 26)::integer)
FROM generate_series(1, 40 + b * 0)
) AS y(x)
)
FROM generate_series(1,5) as a(b);
Use primary key in your data. If you really need alphanumeric unique string, you can use base-36 encoding. In PostgreSQL you can use this function.
Example:
select base36_encode(generate_series(1000000000,1000000010));
GJDGXS
GJDGXT
GJDGXU
GJDGXV
GJDGXW
GJDGXX
GJDGXY
GJDGXZ
GJDGY0
GJDGY1
GJDGY2
I have a simple question, suppose we have a table:
id A B
1 Jon Doe
2 Foo Bar
Is there a way to know, which is the next id's increment, in this case 3 ?
Database is PostgreSQL!
Tnx alot!
If you want to claim an ID and return it, you can use nextval(), which advances the sequence without inserting any data.
Note that if this is a SERIAL column, you need to find the sequence's name based on the table and column name, as follows:
Select nextval(pg_get_serial_sequence('my_table', 'id')) as new_id;
There is no cast-iron guarantee that you'll see these IDs come back in order (the sequence generates them in order, but multiple sessions can claim an ID and not use it yet, or roll back an INSERT and the ID will not be reused) but there is a guarantee that they will be unique, which is normally the important thing.
If you do this often without actually using the ID, you will eventually use up all the possible values of a 32-bit integer column (i.e. reach the maximum representable integer), but if you use it only when there's a high chance you will actually be inserting a row with that ID it should be OK.
To get the current value of a sequence without affecting it or needing a previous insert in the same session, you can use;
SELECT last_value FROM tablename_fieldname_seq;
An SQLfiddle to test with.
Of course, getting the current value will not guarantee that the next value you'll get is actually last_value + 1 if there are other simultaneous sessions doing inserts, since another session may have taken the serial value before you.
SELECT currval('names_id_seq') + 1;
See the docs
However, of course, there's no guarantee that it's going to be your next value. What if another client grabs it before you? You can though reserve one of the next values for yourself, selecting a nextval from the sequence.
I'm new so here's the process I use having little to no prior knowledge of how Postgres/SQL work:
Find the sequence for your table using pg_get_serial_sequence()
SELECT pg_get_serial_sequence('person','id');
This should output something like public.person_id_seq. person_id_seq is the sequence for your table.
Plug the sequence from (1) into nextval()
SELECT nextval('person_id_seq');
This will output an integer value which will be the next id added to the table.
You can turn this into a single command as mentioned in the accepted answer above
SELECT nextval(pg_get_serial_sequence('person','id'));
If you notice that the sequence is returning unexpected values, you can set the current value of the sequence using setval()
SELECT setval(pg_get_serial_sequence('person','id'),1000);
In this example, the next call to nextval() will return 1001.
If there's a queue of work todo in a table that is going to be periodically polled by a number of different worker clients...what's the best way to prevent each worker from getting the same item to work on?
Say a table like: ItemId, LastAttemptDateTime, AttemptCount, and various item details.
Given an index on LastAttemptDateTime and sorted in ascending order and various clients are querying the table to grab an item to be worked on.
I use a stored procedure in MS SQL to do this...something like:
CREATE PROCEDURE GetNextQueueItem AS
SET NOCOUNT ON
DECLARE #ItemId INT
UPDATE myqueue SET #ItemId=ItemId, AttemptCount=AttemptCount+1, LastAttemptDateTime=GetDate()
WHERE ItemId=(SELECT TOP 1 ItemId
FROM myqueue
ORDER BY LastAttemptDateTime ASC)
SELECT ItemId, AttemptCount, and various item detail fields
FROM myqueue
WHERE ItemId = #ItemId
I'm fairly new to PostgreSQL and was wondering if there's alternate approaches available. (The TOP 1 will change to LIMIT 1.)
PostgreSQL equivalent could look like this:
CREATE OR REPLACE FUNCTION get_next_queue_item()
RETURNS SETOF myqueue AS
$BODY$
BEGIN
RETURN QUERY
UPDATE myqueue
SET attempt_count = attempt_count + 1
,last_attempt_ts = now()
WHERE item_id = (
SELECT item_id
FROM myqueue
ORDER BY last_attempt_ts
LIMIT 1
)
RETURNING myqueue.*;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Major points
You only need 1 statement to do it all. UPDATE can return the updated row in the same command with the RETURNING clause.
State of the row is post-update. There is ways to get the pre-update state if needed.
No need for any variables.
I changed all identifiers to lower case, which is the cleanest style in PostgreSQL.
I renamed your column LastAttemptDateTime to last_attempt_ts
ts .. for "timestamp", because that's the name of the timestamp / datetime type in Postgres.
As you mentioned yourself, LIMIT 1 instead of TOP 1.
I use RETURNS SETOF myqueue as return type.
myqueue is the associated row-type of the table myqueue - for every table or view a row-type of the same name is automatically created in PostgreSQL.
This declaration allows for multiple rows to be returned, but LIMIT 1 guarantees that it will only ever be one.
This return type allows for RETURN QUERY to return the resulting row directly without any intermediate step. Fast, clean.
Actually, you don't need a plpgsql function at all. You can do it with a simple SQL statement:
UPDATE myqueue
SET attempt_count = attempt_count + 1
,last_attempt_ts = now()
WHERE item_id = (
SELECT item_id
FROM myqueue
ORDER BY last_attempt_ts
LIMIT 1
)
RETURNING myqueue.*;
Since PostgreSQL has sequences separate to identity columns incremented with them that can be used for other things, one nice way to do have a sequence used to set an id on the table, and another for getting the item:
Look at the currval of the sequence, if it's higher than or equal to the max id of the table, there's no items waiting.
Obtain nextval. If there is no item with a matching id then loop back to 1 (this can happen if an insert to the table failed).
Obtain the row with the matching id.
This isn't the only way to skin this cat (and not the way I've used with other databases), but has the advantage of being light on writes to the database (altering only the sequence, not the table.
We have an Oracle application that uses a standard pattern to populate surrogate keys. We have a series of extrinsic rows (that have specific values for the surrogate keys) and other rows that have intrinsic values.
We use the following Oracle trigger snippet to determine what to do with the Surrogate key on insert:
IF :NEW.SurrogateKey IS NULL THEN
SELECT SurrogateKey_SEQ.NEXTVAL INTO :NEW.SurrogateKey FROM DUAL;
END IF;
If the supplied surrogate key is null then get a value from the nominated sequence, else pass the supplied surrogate key through to the row.
I can't seem to find an easy way to do this is T-SQL. There are all sorts of approaches, but none of which use the notion of a sequence generator like Oracle and other SQL-92 compliant DBs do.
Anybody know of a really efficient way to do this in SQL Server T-SQL? By the way, we're using SQL Server 2008 if that's any help.
You may want to look at IDENTITY. This gives you a column for which the value will be determined when you insert the row.
This may mean that you have to insert the row, and determine the value afterwards, using SCOPE_IDENTITY().
There is also an article on simulating Oracle Sequences in SQL Server here: http://www.sqlmag.com/Articles/ArticleID/46900/46900.html?Ad=1
Identity is one approach, although it will generate unique identifiers at a per table level.
Another approach is to use unique identifiers, in particualr using NewSequantialID() that ensues the generated id is always bigger than the last. The problem with this approach is you are no longer dealing with integers.
The closest way to emulate the oracle method is to have a separate table with a counter field, and then write a user defined function that queries this field, increments it, and returns the value.
Here is a way to do it using a table to store your last sequence number. The stored proc is very simple, most of the stuff in there is because I'm lazy and don't like surprises should I forget something so...here it is:
----- Create the sequence value table.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[SequenceTbl]
(
[CurrentValue] [bigint]
) ON [PRIMARY]
GO
-----------------Create the stored procedure
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE procedure [dbo].[sp_NextInSequence](#SkipCount BigInt = 1)
AS
BEGIN
BEGIN TRANSACTION
DECLARE #NextInSequence BigInt;
IF NOT EXISTS
(
SELECT
CurrentValue
FROM
SequenceTbl
)
INSERT INTO SequenceTbl (CurrentValue) VALUES (0);
SELECT TOP 1
#NextInSequence = ISNULL(CurrentValue, 0) + 1
FROM
SequenceTbl WITH (HoldLock);
UPDATE SequenceTbl WITH (UPDLOCK)
SET CurrentValue = #NextInSequence + (#SkipCount - 1);
COMMIT TRANSACTION
RETURN #NextInSequence
END;
GO
--------Use the stored procedure in Sql Manager to retrive a test value.
declare #NextInSequence BigInt
exec #NextInSequence = sp_NextInSequence;
--exec #NextInSequence = sp_NextInSequence <skipcount>;
select NextInSequence = #NextInSequence;
-----Show the current table value.
select * from SequenceTbl;
The astute will notice that there is a parameter (optional) for the stored proc. This is to allow the caller to reserve a block of ID's in the instance that the caller has more than one record that needs a unique id - using the SkipCount, the caller need make only a single call for however many IDs are needed.
The entire "IF EXISTS...INSERT INTO..." block can be removed if you remember to insert a record when the table is created. If you also remember to insert that record with a value (your seed value - a number which will never be used as an ID), you can also remove the ISNULL(...) portion of the select and just use CurrentValue + 1.
Now, before anyone makes a comment, please note that I am a software engineer, not a dba! So, any constructive criticism concerning the use of "Top 1", "With (HoldLock)" and "With (UPDLock)" is welcome. I don't know how well this will scale but this works OK for me so far...
Imagine the scene, you're updating some legacy Sybase code and come across a cursor. The stored procedure builds up a result set in a #temporary table which is all ready to be returned except that one of columns isn't terribly human readable, it's an alphanumeric code.
What we need to do, is figure out the possible distinct values of this code, call another stored procedure to cross reference these discrete values and then update the result set with the newly deciphered values:
declare c_lookup_codes for
select distinct lookup_code
from #workinprogress
while(1=1)
begin
fetch c_lookup_codes into #lookup_code
if ##sqlstatus<>0
begin
break
end
exec proc_code_xref #lookup_code #xref_code OUTPUT
update #workinprogress
set xref = #xref_code
where lookup_code = #lookup_code
end
Now then, whilst this may give some folks palpitations, it does work. My question is, how best would one avoid this kind of thing?
_NB: for the purposes of this example you can also imagine that the result set is in the region of 500k rows and that there are 100 distinct values of look_up_code and finally, that it is not possible to have a table with the xref values in as the logic in proc_code_xref is too arcane._
You have to have a XRef table if you want to take out the cursor. Assuming you know the 100 distinct lookup values (and that they're static) it's simple to generate one by calling proc_code_xref 100 times and inserting the results into a table
Unless you are willing to duplicate the code in the xref proc, there is no way to avoid using a cursor.
They say, that if you must use cursor, then, you must have done something wrong ;-) here's solution without cursor:
declare #lookup_code char(8)
select distinct lookup_code
into #lookup_codes
from #workinprogress
while 1=1
begin
select #lookup_code = lookup_code from #lookup_codes
if ##rowcount = 0 break
exec proc_code_xref #lookup_code #xref_code OUTPUT
delete #lookup_codes
where lookup_code = #lookup_code
end