Occasional PostgreSQL "Duplicate key value violates unique constraint" error from Go insert - postgresql

I have a table with the unique constraint
CREATE UNIQUE INDEX "bd_hash_index" ON "public"."bodies" USING btree ("hash");
I also have a Go program that takes "body" values on a channel, filters out the duplicates by hashing, and inserts only the non-duplicates into the database.
Like this:
import (
"crypto/md5"
"database/sql"
"encoding/hex"
"log"
"strings"
"time"
)
type Process struct {
DB *sql.DB
BodiesHash map[string]bool
Channel chan BodyIterface
Logger *log.Logger
}
func (pr *Process) Run() {
bodyInsert, err := pr.DB.Prepare("INSERT INTO bodies (hash, type, source, body, created_timestamp) VALUES ($1, $2, $3, $4, $5)")
if err != nil {
pr.Logger.Println(err)
return
}
defer bodyInsert.Close()
hash := md5.New()
for p := range pr.Channel {
nowUnix := time.Now().Unix()
bodyString := strings.Join([]string{
p.GetType(),
p.GetSource(),
p.GetBodyString(),
}, ":")
hash.Write([]byte(bodyString))
bodyHash := hex.EncodeToString(hash.Sum(nil))
hash.Reset()
if _, ok := pr.BodiesHash[bodyHash]; !ok {
pr.BodiesHash[bodyHash] = true
_, err = bodyInsert.Exec(
bodyHash,
p.GetType(),
p.GetSource(),
p.GetBodyString(),
nowUnix,
)
if err != nil {
pr.Logger.Println(err, bodyString, bodyHash)
}
}
}
}
But periodically I get the error
"pq: duplicate key value violates unique constraint "bd_hash_index""
in my log file. I can't image how it can be, because I check the hash for uniqueness before I do an insert.
I am sure that when I call go processDebugBody.Run() the bodies table is empty.
The channel was created as a buffered channel with:
processDebugBody.Channel = make(chan BodyIterface, 1000)

When you execute a query outside of transaction with sql.DB, it automatically retries when there's a problem with connection. In the current implementation, up to 10 times. For example, notice maxBadConnRetries in sql.Exec.
Now, it really happens only when underlying driver returns driver.ErrBadConn and specification states the following:
ErrBadConn should be returned by a driver to signal to the sql package that a driver.Conn is in a bad state (such as the server having earlier closed the connection) and the sql package should retry on a new connection.
To prevent duplicate operations, ErrBadConn should NOT be returned if there's a possibility that the database server might have performed the operation.
I think driver implementations are a little bit careless in implementing this rule, but maybe there is some logic behind it. I've been studying implementation of lib/pq the other day and noticed this scenario would be possible.
As you pointed out in the comments you have some SSL errors issued just before seeing duplicates, so this seems like a reasonable guess.
One thing to consider is to use transactions. If you lose the connection before committing the transaction, you can be sure it will be rolled back. Also the statements of the transactions are not retransmitted automatically on bad connections, so this problem might be solved – you will most probably will se SSL errors being propagated directly to you application though, so you'll need to retry on your own.
I must tell you I've been also seeing SSL renegotiation errors on postgres using Go 1.3 and that's why I've disabled SSL for my internal DB for time being (sslmode=disable in the connection string). I was wondering whether version 1.4 has solved the issue, as one thing on changelog was The crypto/tls package now supports ALPN as defined in RFC 7301 (ALPN states for Application-Layer Protocol Negotiation Extension).

Related

FireDAC Array DML and Returning clauses

Using FireDAC's Array DML feature, it doesn't seem possible to utilise a RETURNING clause (in my case PostgeSQL).
If I run a simple insert query such as:
With FDQuery Do
begin
SQL.Text := 'INSERT INTO temptab(email, name) '
+'VALUES (''email1'', ''name1''), '
+'(''email2'', ''name2'') '
+'RETURNING id';
Open;
end;
The query returns two records containing the id for the newly inserted records.
For larger inserts I would prefer to use Array DML, but in some cases I also need to be able to get returned data.
The Open function does not have an ATimes parameter. Whilst you can call Open with Array DML, it results in the insertion and return of just the first record.
I cannot find any other properties, methods which would seem to facilitate this. I have posted on Praxis to see if anyone there has any ideas, but I have had no response. I have also posted this as a new feature request on Quality Central.
If anyone knows of a way of achieving this using Array DML, I would be grateful to hear, but my principal question is what is the most efficient route for retrieving the inserted data (principally IDs) from the DB if I persist with Array DML?
A couple of ideas occur to me, neither of which seem tremendously attractive:
Within StartTransaction and Commit and following the insertion retrieve the id of the last inserted record and then grab backwards the requisite number. This seems to be to be a bit risky, although as within a transaction, should probably be okay.
Add an integer field to the relevant table and populate each inserted record with a unique identifier and following insert retrieve the records with that identifier. Whilst this would ensure the return of the inserted records, it would be relatively inefficient unless I index the field being used to store the identifier.
Both the above would be dependent on records being inserted into the DB in the order they are supplied to the Array DML, but I assume/hope that is a given.
I would appreciate views on the best (ie most efficient and reliable) of the above options and any suggestions as to alternative even better options even if those entail abandoning Array DML where a Returning clause is needed.
You actually can get all returned ID's. You can tell Firedac to store the result values in paramters with {INTO }. See for example the following code:
FDQuery.SQL.Text := 'INSERT into tablename (fieldname) values (:p1) returning id {into :p2}';
FDQuery.Params.ArraySize := 2;
FDQuery.Params[0].AsStrings[0] := 'one';
FDQuery.Params[0].AsStrings[1] := 'two';
FDQuery.Params[1].ParamType := ptInputOutput;
FDQuery.Params[1].DataType := ftLargeInt;
FDQuery.Execute(2,0);
ID1 := FDQuery.Params[1].AsLargeInts[0];
ID2 := FDQuery.Params[1].AsLargeInts[1];
This works when 1 row is returned per arraydml element. I think it will not work for >1 row, but I've not tested it. If it does, you would have to know which result corresponds with your arraydml element.
Note that Firedac throws an AV when 0 rows are returned for one or more elements in the arraydml. For example when you UPDATE a row that was deleted in the meantime. The AV has nothing to do with the array DML itself. When FDQuery.Execute; is called, you'll get an AV as well.
I've suggested another option earlier on the delphipraxis forum, but that is a suboptimal solution as that uses a temp table to store the ID's:
https://en.delphipraxis.net/topic/4693-firedac-array-dml-returning-values-from-inserted-records/

Is there a way to remove the RETURNING clause while creating records with go-gorm?

I'm using go-gorm with a postgres 11 DB and facing an issue where I need to remove the RETURNING clause entirely when creating records (that statement seems to be included by default). I just want to insert records and get nothing back, except for errors.
I have some complex relations on the database that won't support RETURNING statements, so when I try to insert like this: (made code simpler for brevity)
type Cargo struct {
Id int64 `gorm:"primaryKey"`
Name string
}
dsnString := fmt.Sprintf("host=%s ...")
db, _ := gorm.Open(postgres.New(postgres.Config{DSN: dsnString}), &gorm.Config{})
cargo := Cargo{Name: "test"}
db.Create(cargo)
I get the error "ERROR: cannot perform INSERT RETURNING on relation Cargo".
I tried creating the db connection with the parameter WithoutReturning: true:
db, _ := gorm.Open(postgres.New(postgres.Config{DSN: dsnString, , WithoutReturning: true}), &gorm.Config{})
But then when I try db.Create(cargo) I get a different error: "LastInsertId is not supported by this driver". It seems to be still trying to get the last inserted id anyway.
In go-pg I could use db.Model(x).Returning("null").Insert(cargo) but I couldn't find a way to do it with go-gorm. Any help is greatly appreciated.
The only two ways that I can get gorm to not use the RETURNING clause with postgres are
A model that does not declare a primary key
That means getting rid of the field that is named ID/Id or any tagged gorm:"primaryKey".
type Cargo struct {
Name string
}
db.Create(&Cargo{Name: "Test"})
Using Create from map with Table()
In this case you would represent your model as a map[string]interface{} instead of as a struct and use it like this:
db.Table("cargos").Create(map[string]interface{}{
"name": "Test",
})
As it stands gorm doesn't support this use case very well. If you can't restructure your views to support RETURNING and these options aren't doing it for you, I suggest adding a feature request in the gorm repo.
You can modify the default returning behavior with
db.Clauses(clause.Returning{}).Create(&cargo)
Here the doc link: https://gorm.io/docs/update.html#Returning-Data-From-Modified-Rows

How to determine if row already exists on an Insert

My goal is to determine if the error was because the record already exists according to our database indexing which ensures uniqueness:
the problem is that I don't see an error that corresponds to a record already existing. Do I really have to read the db first to see if it exists?
I googled for:
unique constraint violation error postgres golang
I see something like this on a reddit thread lol
if pgerr, ok := err.(*pq.Error); ok {
if pgerr.Code == "23505" {
//handle duplicate insert
}
}
That seems like it could work but I am looking for a best practice here..

Postgres 'if not exists' fails because the sequence exists

I have several counters in an application I am building, as am trying to get them to be dynamically created by the application as required.
For a simplistic example, if someone types a word into a script it should return the number of times that word has been entered previously. Here is an example of sql that may be executed if they typed the word example.
CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
SELECT nextval('example')
This would return 1 the first time it ran, 2 the second time, etc.
The problem is when 2 people click the button at the same time.
First, please note that a lot more is happening in my application than just these statements, so the chances of them overlapping is much more significant than it would be if this was all that was happening.
1> BEGIN;
2> BEGIN;
1> CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
2> CREATE SEQUENCE IF NOT EXISTS example START WITH 1; -- is blocked by previous statement
1> SELECT nextval('example') -- returns 1 to user.
1> COMMIT; -- unblocks second connection
2> ERROR: duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(example, 109649) already exists.
I was under the impression that by using "IF NOT EXISTS", the statement should just be a no-op if it does exist, but it seems to have this race condition where that is not the case. I say race condition because if these two are not executed at the same time, it works as one would expect.
I have noticed that IF NOT EXISTS is fairly new to postgres, so maybe they haven't worked out all of the kinks yet?
EDIT:
The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)
Sequences are part of the database schema. If you find yourself modifying the schema dynamically based on the data stored in the database, you are probably doing something wrong. This is especially true for sequences, which have special properties e.g. regarding their behavior with respect to transactions. Specifically, if you increment a sequence (with the help of nextval) in the middle of a transaction and then you rollback that transaction, the value of the sequence will not be rolled back. So most likely, this kind of behavior is something that you don't want with your data. In your example, imagine that a user tries to add word. This results in the corresponding sequence being incremented. Now imagine that the transaction does not complete for reason (e.g. maybe the computer crashes) and it gets rolled back. You would end up with the word not being added to the database but with the sequence being incremented.
For the particular example that you mentioned, there is an easy solution; create an ordinary table to store all the "sequences". Something like that would do it:
CREATE TABLE word_frequency (
word text NOT NULL UNIQUE,
frequency integer NOT NULL
);
Now I understand that this is just an example, but if this approach doesn't work for your actual use case, let us know and we can adjust it to your needs.
Edit: Here's how you the above solution works. If a new word is added, run the following query ("UPSERT" syntax in postgres 9.5+ only):
INSERT INTO word_frequency(word,frequency)
VALUES ('foo',1)
ON CONFLICT (word)
DO UPDATE
SET frequency = word_frequency.frequency + excluded.frequency
RETURNING frequency;
This query will insert a new word in word_frequency with frequency 1, or if the word exists already it will increment the existing frequency by 1. Now what happens if two transaction try to do that at the same time? Consider the following scenario:
client 1 client 2
-------- --------
BEGIN
BEGIN
UPSERT ('foo',1)
UPSERT ('foo',1) <====
COMMIT
COMMIT
What will happen is that as soon as client 2 tries increment the frequency for foo (marked with the arrow above), that operation will block because the row was modified by a different transaction. When client 1 commits, client 2 will get unblocked and continue without any errors. This is exactly how we wanted it to work. Also note, that postgresql will use row-level locking to implement this behavior, so other insertions will not be blocked.
EDIT: The main reason we were considering doing things this way was to
avoid excess locking. The thought being that if two people were to
increment at the same time, using a sequence would mean that neither
user should have to wait for the other (except, as in this example,
for the initial creation of that sequence)
It sounds like you're optimizing for a problem that likely does not exist. Sure, if you have 100,000 simultaneous users that are only inserting rows (since a sequence will only be used then normally) there is the possibility of some contention with the sequence but realistically there will be other bottle necks long before the sequence gets in the way.
I'd advise you to first prove that the sequence is an issue. With a proper database design (which dynamic DDL is not) the sequence will not be the bottle neck.
As a reference, DDL is not transaction safe in most databases.

Using specs2 and FakeApplication() to test database fails evolution inserts

This is for Play! Framework 2.0.
I'm trying to write a simple test case to ensure my user model is functioning properly and persisting data in my database. I'd like to run it in memory if possible so I can get a fresh start with every new run.
The issue I have is that my evolutions run(tables are created, data is inserted, but I can't query it as being there). First, my code.
CREATE TABLE user_data (
id SERIAL PRIMARY KEY,
user_name varchar(256) UNIQUE NOT NULL,
email varchar(256) NOT NULL,
password varchar(256) NOT NULL,
edits int NOT NULL,
reports int NOT NULL,
active BOOLEAN NOT NULL);
INSERT INTO user_data(user_name, email, password, edits, reports, active) VALUES ('user1', 'user1#email.com', '12345678', 0, 0, true);
In application.conf
db.default.driver=org.postgresql.Driver
db.default.url="postgres://user:password#localhost:5432/ME"
In build.scala
val appDependencies = Seq(
// Add your project dependencies here,
"postgresql" % "postgresql" % "9.1-901-1.jdbc4"
)
The test code
class User_dataSpec extends Specification {
"The Database" should {
"persist data properly" in {
running(FakeApplication(additionalConfiguration = inMemoryDatabase())) {
//User_data.findAll().length must beEqualTo(1)
//Create users
User_data.create("user1", "password1", "email#test1.com") must beEqualTo(1)
User_data.create("user2", "password2", "email#test2.com") must beEqualTo(2)
User_data.create("user1", "password3", "email#test3.com") must beEqualTo(0)
//Count users
User_data.findAll().length must beEqualTo(2)
//Verify users exist
User_data.exists("user1") must beTrue
User_data.exists("user2") must beTrue
//Verify user doesn't exist
User_data.exists("user3") must beFalse
//Find users by ID
User_data.findUser(1).get.user_name must beEqualTo("user1")
User_data.findUser(2).get.user_name must beEqualTo("user2")
//Fail to find users by ID
User_data.findUser(3) must beNone
//Find users by user_name
User_data.findUser("user1").get.user_name must beEqualTo("user1")
User_data.findUser("user2").get.user_name must beEqualTo("user2")
//Fail to find users by user_name
User_data.findUser("user3") must beNone
//Authenticate users
User_data.authenticate("user1", "password1") must beTrue
User_data.authenticate("user2", "password2") must beTrue
//Fail to authenticate users
User_data.authenticate("user1", "password2") must beFalse
User_data.authenticate("user3", "passwordX") must beFalse
//Confirm the user was inserted properly
val user = User_data.findUser("user1")
user.get.user_name must beEqualTo("user1")
user.get.email must beEqualTo("email#test1.com")
user.get.password must beEqualTo("password1")
user.get.edits must beEqualTo(0)
user.get.reports must beEqualTo(0)
user.get.active must beTrue
}
}
}
}
This code will pass as written, however it shouldn't. If I uncomment the first test case inside the running block to test that my findAll() function should be a length of 1 it will fail immediately. However, if I change this to use a persisted PostgreSQL DB on my machine, it will still fail immediately, but when I look at the PostgreSQL DB, my user_data table has the single evolution applied insert in it, and the play_evolutions table has the entry for my evolution and is marked as state = "applied" and last problem = "".
Any help would be appreciated, thanks.
(P.S., I am a first time poster, but will do my best to accept an answer as soon as possible for those willing to lend their help)
* UPDATED *
As Jakob stated, the reason evolutions are failing is probably because the SQL written for MySQL is incompatible with H2DB. You can solve this by using a separate MySQL for testing as per the original answer, or put H2DB into MySQL compatibility mode which may fix the problem (see Fixtures in Play! 2 for Scala).
I think this is a bug in play framework 2.0
https://play.lighthouseapp.com/projects/82401/tickets/295-20test-testhelpers-method-evolutionfor-do-wrong-if-fakeapplication-with-inmemroydatabase
The problem with evolutions and H2 is that H2 is not compliant with everything you can do with postgres or MySQL for example. So evolutions will run fine in prod but fail in test. I had this problem in a project and eventually solved it by simply not using evolutions and instead use liquibase for the DB stuff.
Or one needs to make sure that the sql you write can be run on H2. In that case evolutions will work fine. I dont remember exactly what the problem with H2 was (something about indexes i think)