Get next available auto_increment ID in PostgreSQL - A better approach? - postgresql

I'm new to postgreSQL, so would really appreciate any pointers from the community.
I am updating some functionality in the CMS of a pretty old site I've just inherited. Basically, I need the ID of an article before it is inserted into the database. Is there anyway anyway to check the next value that will be used by a sequence before a database session (insert) has begun?
At first I thought I could use SELECT max(id) from tbl_name, however as the id is auto incremented from a sequence and articles are often deleted, it obviously won't return a correct id for the next value in the sequence.
As the article isn't in the database yet, and a database session hasn't started, it seems I can't use the currval() functionality of postgreSQL. Furthermore if I use nextval() it auto increments the sequence before the data is inserted (the insert also auto-incrementing the sequence ending up with the sequence being doubly incremented).
The way I am getting around it at the moment is as follows:
function get_next_id()
{
$SQL = "select nextval('table_id_seq')";
$response = $this->db_query($SQL);
$arr = pg_fetch_array($query_response, NULL, PGSQL_ASSOC);
$id = (empty($arr['nextval'])) ? 'NULL' : intval($arr['nextval']);
$new_id = $id-1;
$SQL = "select setval('table_id_seq', {$new_id})";
$this->db_query($SQL);
return $id;
}
I use SELECT nextval('table_id_seq') to get the next ID in the sequence. As this increments the sequence I then immediately use SELECT setval('table_id_seq',$id) to set the sequence back to it's original value. That way when the user submits the data and the code finally hits the INSERT statement, it auto increments and the ID before the insert and after the insert are identical.
While this works for me, I'm not too hot on postgreSQL and wonder if it could cause any problems down the line, or if their isn't a better method? Is there no way to check the next value of a sequence without auto-incrementing it?
If it helps I'm using postgresql 7.2

Folks - there are reasons to get the ID before inserting a record. For example, I have an application that stores the ID as part of the text that is inserted into another field. There are only two ways to do this.
1) Regardless of the method, get the ID before inserting to include in my INSERT statement
2) INSERT, get the the ID (again, regardless of how (SELECT ... or from INSERT ... RETURNING id;)), update the record's text field that includes the ID
Many of the comments and answers assumed the OP was doing something wrong... which is... wrong. The OP clearly stated "Basically, I need the ID of an article before it is inserted into the database". It should not matter why the OP wants/needs to do this - just answer the question.
My solution opted to get the ID up front; so I do nextval() and setval() as necessary to achieve my needed result.

Disclaimer: Not sure about 7.2 as I have never used that.
Apparently your ID column is defined to get its default value from the sequence (probably because it's defined as serial although I don't know if that was available in 7.x).
If you remove the default but keep the sequence, then you can retrieve the next ID using nextval() before inserting the new row.
Removing the default value for the column will require you to always provide an ID during insert (by retrieving it from the sequence). If you are doing that anyway, then I don't see a problem. If you want to cater for both scenarios, create a before insert trigger (does 7.x have them?) that checks if the ID column has a value, if not retrieve a new value from the sequence otherwise leave it alone.
The real question though is: why do you need the ID before insert. You could simply send the row to the server and then get the generated id by calling curval()
But again: you should really (I mean really) talk to the customer to upgrade to a recent version of Postgres

Related

Postgres count(*) optimization idea

I'm currently working on a project involving keeping track of users and their actions with my database (PostgreSQL as the RDMS), and I have run into an issue when trying to perform COUNT(*) on occurrences of each user. What I want is to be able to, efficiently, count the number of times each user appears from every record, and also be able to achieve looking at counts on a particular date range.
So, the problem is how do we achieve counting the total number of times a user appears from the tables contents, and how do we count the total number on a date range.
What I've tried
As you might know, Postgres doesn't support COUNT(*) very well using indices, so we have to consider other ways to reduce the # of records it looks at in order to speed up the query. So my first approach is to create a table to keep track of the number of times a user has a log message associated with them, and on what day (similar to the idea behind a materialized view, but I dont want continually refresh the materialized view with my count query). Here is what I've come up with:
CREATE TABLE users_counts(user varchar(65536), counter int default 0, day date);
CREATE RULE inc_user_date_count
AS ON INSERT TO main_table
DO ALSO UPDATE users_counts SET counter = counter + 1
WHERE user = NEW.user AND day = DATE(NEW.date_);
What this does is every time a new record is inserted into my 'main_table', we update the current users_counts table to increment the records whose date is equal to the new records date, and the user names are the same.
NOTE: the date_ column in 'main_table' is a timestamp so I must cast the new records date_ to be a DATE type.
The problem is, what if the user column value doesn't already exist in my new table 'users_count' for the current day, then nothing is updated.
Here is my question:
How do I write the rule such that we check if a user exists for the current day, if so increment that counter, otherwise insert new row with user, day, and counter of 1;
I also would like to know if my approach makes sense to do, or is there any ideas I am missing that I just haven't thought about. As my database grows, it is increasingly inefficient to perform counting, so I want to avoid any performance bottlenecks.
EDIT 1: I was able to actually figure this out by creating a separate RULE but I'm not sure if this is correct:
CREATE RULE test_insert AS ON INSERT TO main_table
DO ALSO INSERT INTO users_counts(user, counter, day)
SELECT NEW.user, 1, DATE(NEW.date)
WHERE NOT EXISTS (SELECT user FROM users.log_messages WHERE user = NEW.user_);
Basically, an insert happens if the user doesn't already exist in my CACHED table called user_counts, and the first rule above updates the count.
What I'm unsure of is how do I know when which rule is called first, the update rule or insert.. And there must be a better way, how do I combine the two rules? Can this be done with a function?
It is true that postgresql is notoriously slow when it comes to count(*) queries. However if you do have a where clause that limits the number of entries the query will be much faster. If you are using postgresql 9.2 or newer this query will be just as fast as it's in mysql because of index only scans which was added in 9.2 but it's best to explain analyze your query to make sure.
Does my solution make sense?
Very much so provided that your explain analyze show that index only scans are not being used. Trigger based solutions like the one that you have adapted find wide usage. But as you have realized the problem with the initial state arises (whether to do an update or an insert).
which rule is called first
Multiple rules on the same table and same event type are applied in
alphabetical name order.
from http://www.postgresql.org/docs/9.1/static/sql-createrule.html
the same applies for triggers. If you want a particular rule to be executed first change it's name so that it comes up higher in the alphabetical order.
how do I combine the two rules?
One solution is to modify your rule to perform an upsert (Look right at the bottom of that page for a sample upsert ). The other is to populate the counter table with initial values. The trick is to create the trigger at the same time to avoid errors. This blog post explains it really well.
While the initial setup will be slow each individual insert will probably be faster. The two opposing factors being the slowness of a WHERE NOT EXISTS query vs the overhead of catching an exception.
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need.
Source the postgresql documentation page linked above.

Code to assign an ID when button is clicked

I have designed a simple database to keep track of company contacts. Now, I am building a form to allow employees to add contacts to the database.
On the form itself, I have all the columns except the primary key (contactID) tied to a text box. I would like the contactID value to be (the total number of entered contacts + 1) when the Add button is clicked. Basically, the first contact entered will have a contactID of 1 (0 + 1 = 1). Maybe the COUNT command factors in?
So, I am looking for assistance with what code I should place in the .Click event. Perhaps it would help to know how similar FoxPro is to SQL.
Thanks
The method you recommend for assigning ContactIDs is not a good idea. If two people are using the application at the same time, they could each create a record with the same ContactID.
My recommendation is that you use VFP's AutoIncrementing Integer capability. That is, set the relevant column to be Integer (AutoInc) in the Table Designer. Then, each new row gets the next available value, but you don't have to do any work to make it happen.
There are various ways to do this. Probably the simplest is to attempt to lock the table with flock() when saving, and if successful do:
calc max id_field to lnMax
Then when inserting your new record use lnMax+1 as the id_field value. Don't forget to
unlock all
... after saving. You'll want to ensure that 'id_field' has an index tag on it, and that you handle the case where someone else might have the table locked.
You can also do it more 'automagically' with a stored procedure.

Insert record in table if does not exist in iPhone app

I am obtaining a json array from a url and inserting data into a table. Since the contents of the url are subject to change, I want to make a second connection to a url and check for updates and insert new records in y table using sqlite3.
The issues that I face are:
1) My table doesn't have a primary key
2) The url lists the changes on the same day. Hence, if I run my app multiple times, when I insert values in my database, I get duplicate entries. I want to keep a check for the day duplicated entries that should be removed. The problem can be solved by adding a constraint, but since the url itself has duplicated values, I find it difficult.
The only way I can see you can do it if you have no primary key or something you can use that is unique to each record, is when you get your new data in you go through the new entries where for each one you check if the exact same data exists in the database already. If it doesn't then you add it, if it does then you skip over it.
You could even do something like create a unique key yourself for each entry which is a concatenation of each column of the table. That way you can quickly do the check for if the entry already exists in the database.
I see two possibilities depending on your setup:
You have a column setup as UNIQUE (this can be through a PRIMARY KEY or not). In this case, you can use the ON CONFLICT clause:
http://www.sqlite.org/lang_conflict.html
If you find this construct a little confusing, you can instead use "INSERT OR REPLACE" or "INSERT OR IGNORE" as described here:
http://www.sqlite.org/lang_insert.html
You do not have a column setup as UNIQUE. In this case, you will need to SELECT first to verify for duplicate data, and based on the result INSERT, UPDATE, or do nothing.
A more common & robust way to handle this is to associate a timestamp with each data item on the server. When your app interrogates the server it provides the timestamp corresponding to the last time it synced. The server then queries its database and returns all values that are timestamped later than the timestamp provided by the app. Then it also returns a new timestamp value for the app to store, to use on the next sync.

mysql retrieve partial column

I am using TinyMCE to allow users to write in what will eventually be submitted to a database.
TinyMCE includes a button which will insert a "pagebreak" (actually just an HTML comment <!--pagebreak-->).
How can I later pull back everything up to the first occurrence of the pagebreak "tag"? There might be a significant bit of data after that tag which I could discard via php later, but it seems like I shouldn't even bring it back. So I can I SELECT just part of a column if I never know how far in to the data the <!--pagebreak--> will be?
EDIT:
The below answer does work, and will get me where I need to go.. I am just curious as to if there is an equally simple way to handle bringing back the whole column if it is short and thus the pagebreak tag doesn't exist. With the below solution, if no pagebreak is found, an empty string is returned.
If not, then I can easily handle it in the code.
Something like this should get you everything up to the <!--:
SELECT SUBSTRING(my_col FROM 1 FOR POSITION('<!--' IN my_col)-1) AS a_chunk
FROM my_table
Reference for string functions.
Edit:
Just putting OMG Ponies' words into SQL:
SELECT CASE WHEN position('<!--' IN my_col) > 0
THEN SUBSTRING(my_col FROM 1 FOR POSITION('<!--' IN my_col)-1)
ELSE my_col END AS a_chunk
FROM my_table
It sounds like you'll also want a check on the length of the text; whether or not there is a page break. You can use CHARACTER_LENGTH() for that.
why not create another column in your mysql table to store integer value of the position where <!--pagebreak--> is. You'll calculate this value in your script and store this value at same time when you insert the html generated by tinymce.
Then in later retrievals use the value in that column to select the substring from 1 up to the value. This should make your query simpler and maybe improve query performance?

Sybase select variable logic

Ok, I have a question relating to an issue I've previously had. I know how to fix it, but we are having problems trying to reproduce the error.
We have a series of procedures that create records based on other records. The records are linked to the primary record by way of a link_id. In a procedure that grabs this link_id, the query is
select #p_link_id = id --of the parent
from table
where thingy_id = (blah)
Now, there are multiple rows in the table for the activity. Some can be cancelled. The code I have doesn't disinclude cancelled rows in the select statement, so if there are previously cancelled rows, those ids will appear in the select. There is always going to be one 'open' record that is selected if I disinclude cancelled rows. (append where status != 'C')
This solves this issue. However, I need to be able to reproduce the issue in our development environment.
I've gone through a process where I've entered a whole heap of data, opening, cancelling, etc to try and get this select statement to return an invalid id. However, whenever I run the select, the ids are in order (sequence generated), but in the case where this error occured, the select statement returned what seems to be the first value into the variable.
For example.
ID Status
1 Cancelled
2 Cancelled
3 Cancelled
4 Open
Given the above, if I do a select for the ID I want, I want to get '4'. In the error, the result is 1. However, even if I enter in 10 cancelled records, I still get the last one in the select.
In oracle, I know that if you select into a variable and more than one record is returned, you get an error (I think). Sybase apparently can assign multiple values into a variable without erroring.
I'm thinking that there's either something to do with how the data is selected from the table, where the id's without a sort order don't return in ascending order, or there's a dboption where a select into a variable will save the first or last value queried.
Edit: it looks like we can reproduce this error by rolling back stored procedure changes. However, the procs don't go anywhere near this link_id column. Is it possible that changes to the database architecture could break an index or something?
If more than one row is returned, the value that is stored will be the last value in the list, according to this.
If you haven't specified an order for retrieval via ORDER BY, then the order returned will be at the convenience of the database engine. It may very well vary by the database instance. It may be in the order created, or even appear "random" because of where the data is placed within the database block structure.
The moral of the story:
Always make singleton SELECTs return a single row
When #1 can't be done, use an ORDER BY to make sure the one you care about comes last