Azure Search Indexer error: "Document key cannot be missing or empty." - change-tracking

I have an index in azure search which is synced to an SQL server table through Change Tracking.
I randomly start getting this error after I make some changes to the table (but not always, and I can't seem to replicate it consistently unfortunately):
[
{
"key": null,
"errorMessage": "Document key cannot be missing or empty."
}
]
I have checked my table and there are no null values in the column that Azure Search uses for the key (enforced by an sql not null constraint). There is no other solution than deleting the index and recreating it from scratch. Not even deleting all the the documents and running the indexer again gets rid of the error.
[UPDATE - Solved]
As Eugene's answer highlighted, the problem was that the sql table tracked by Azure Search had a primary key that wasn't mapped to the Azure Search key (we were using another unique column as the azure key instead). This cannot happen when using the "SQL Integrated Change Tracking Policy" mode, as the indexer cannot reference deleted rows (the indexer will fail if you reissue an index operation after deleting some rows in the sql tracked table).
After setting the primary key of the sql table to be the same as the indexed azure key, everything seems to run smoothly, even on deletes.

In this case, search index's key field is not the same as the primary key column in the table. In such situation, deletion tracking using SQL integrated change tracking policy is not supported because changes table doesn't contain values for the column that maps to index key field. Inserts and updates will work correctly, though.
If possible, consider making table and index keys the same.

there might be empty value in your pk column,
delete in azure sql, delete table_name where pk = ''

Related

SQLAlchemy, directly inserting primary keys seems to disable key auto generation

I am trying to populate some tables using data that I extracted from Google BigQuery. For that purpose I essentially normalized a flattened table into multiple tables that include the primary key of each row in the multiple tables. The important point is that I need to load those primary keys in order to satisfy foreign key references.
Having inserted this data into tables, I then try to add new rows to these tables. I don't specify the primary key, presuming that Postgres will auto-generate those key values.
However, I always get a 'duplicate key value violates unique constraint "xxx_pkey" ' type error, e.g.
"..duplicate key value violates unique constraint "collection_pkey" DETAIL: Key (id)=(1) already exists.
It seems this is triggered by including the primary key in the data when initializing table. That is, explicitly setting primary keys, somehow seems to disable or reset the expected autogeneration of the primary key. I.E. I was expecting that new rows would be assigned primary keys starting from the highest value already in a table.
Interestingly I get the same error whether I try to add a row via SQLAlchemy or from the psql console.
So, is this as expected? And if so, is there some way to get the system to again auto-generate keys? There must be some hidden psql state that controls this...the schema is unchanged by directly inserting keys, but psql behavior is changed by that action.
I am happy to provide additional information.
Thanks

Imported data, duplicate key value violates unique constraint

I am migrating data from MSSQL.
I created the database in PostgreSQL via npgsql generated migration. I moved the data across and now when the code tries to insert a value I am getting
'duplicate key value violates unique constraint'
The npgsql tries to insert a column with Id 1..how ever the table already has Id over a thousand.
Npgsql.EntityFrameworkCore.PostgreSQL is 2.2.3 (latest)
In my context builder, I have
modelBuilder.ForNpgsqlUseIdentityColumns();
In which direction should I dig to resolve such an issue?
The code runs fine if the database is empty and doesn't have any imported data
Thank you
The values inserted during the migration contained the primary key value, so the sequence behind the column wasn't incremented and is kept at 1. A normal insert - without specifying the PK value - calls the sequence, get the 1, which already exists in the table.
To fix it, you can bump the sequence to the current max value.
SELECT setval(
pg_get_serial_sequence('myschema.mytable','mycolumn'),
max(mycolumn))
FROM myschema.mytable;
If you already know the sequence name, you can shorten it to
SELECT setval('my_sequence_name', max(mycolumn))
FROM myschema.mytable;

Errors creating constraint trigger

Let me start by saying that I’m a Linux/Unix admin. That being said my manager has tasked me with moving older PostgreSQL databases to a RedHat server running 8.4.20. I was successful moving a 7.2.1 db but I’m running into issues moving a 7.4.20 db.
I use pg_dump –c filename and psql < filename. For the problematic db everything runs until I get to a CREATE CONSTRAINT TRIGGER statement. If I run it as it is in the file I get :
NOTICE: ignoring incomplete trigger group for constraint "" FOREIGN KEY data(ups) REFERENCES upsinfo(ups)
DETAIL: Found referenced table's DELETE trigger.
CREATE TRIGGER
If I run set schema 'pg_catalog'; I get:
ERROR: relation "upsinfo" does not exist
The tables (I think) involved are:
CREATE TABLE upsinfo (
ups text NOT NULL,
ipaddr inet,
rcomm text,
wcomm text,
reachable boolean,
managed boolean,
comments text,
region text
);
CREATE TABLE data (
date timestamp with time zone,
ups text,
mib text,
value text
);
The trigger problem trigger statement:
CREATE CONSTRAINT TRIGGER "<unnamed>"
AFTER DELETE ON upsinfo
FROM data
NOT DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW
EXECUTE PROCEDURE "RI_FKey_cascade_del"('<unnamed>', 'data', 'upsinfo', 'UNSPECIFIED', 'ups', 'ups');
I know that the RI_FKey_cascade_del function is defined differently in the different versions of pg_catalog. Note that search_path is set to ‘public, pg_catalog’ so I’m also confused why I have to set the schema.
Again I’m not a real PostgreSQL DBA so try to be kind.
Oof, those are really old postgres versions, including the version you're upgrading to (8.4 was released in 2009, and support ended in 2014).
The short answer is that, as long as upsinfo and data are being created and populated, you're probably fine, and good to go. But one of your foreign key relationships is broken.
The long answer, well, let me see if I can explain what is going on (or, at least, what I think is going on).
I'm guessing that the original table definition of data included something like FOREIGN KEY (ups) REFERENCES upsinfo (ups) ON DELETE CASCADE. That causes postgres to automatically make some trigger constraints: 1- every time there's a new row for data, make sure that its ups column matches an existing row in upsinfo, and 2- every time you delete a row from upsinfo, delete the corresponding rows in data, based on the matching ups value.
That (not very informative) error message can come up when the foreign key relationship doesn't work. In order for a foreign key to make sense, the referenced value needs to be unique -- there should be only one row in upsinfo for each distinct value of ups. In order for postgres to know that, there needs to be a unique index or primary key on upsinfo.ups.
In this case, one of a couple things could be breaking it:
There's no primary key or unique index on upsinfo.ups (postgres should not have allowed a foreign key, but may have in very old versions)
There used to be a unique index, but it hadn't properly enforced uniqueness, so it didn't get successfully imported (a bug, again likely from a very old version)
In either case, if that foreign key relationship is important, you can try to fix it once the import is complete. Start by trying to make a unique index on upsinfo.ups, and see if you have problems. If you do, resolve the duplicate entries, and try again till it works. Then issue something like:
ALTER TABLE data
ADD FOREIGN KEY (ups) REFERENCES upsinfo (ups) ON DELETE CASCADE;
Of course, if things are working, it's possible you don't need to fix the foreign key, in which case you're probably able to ignore those errors and just move forward.
Hope that helps, and good luck!
This seems to be a part of ON DELETE CONSTRAINT. If I were you I would delete all such statements and replace them with a proper constraint definition on the target table.
Table definition should then look like this:
CREATE TABLE bookings (
boo_id serial NOT NULL,
boo_hotelid character varying NOT NULL,
boo_roomid integer NOT NULL,
CONSTRAINT pk_bookings
PRIMARY KEY (boo_id),
CONSTRAINT fk_bookings_boo_roomid
FOREIGN KEY (boo_roomid)
REFERENCES rooms (roo_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
) WITHOUT OIDS;
And this part is what will internally create the trigger:
CONSTRAINT fk_bookings_boo_roomid
FOREIGN KEY (boo_roomid)
REFERENCES rooms (roo_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
But, to be honest, I do not have an understanding for an upgrade to an unsupported version. You know the Postgres is version 9.5 now, right?

EF db first and table without key

I am trying to use Entity Framework DB first to do quick prototyping of a reporting website for a huge db. The problem is one of the tables doesn't have a key. I got an 'Error 159: EntityType has no key defined'. If I add a key on the model designer, I got 'Error 3024: Must specify mapping for all key properties'. My question is whether there is a way to workaround this WITHOUT adding a key to the table. The table is not in our control.
Huge table which does not have a key? It would not be possible for you or for table owner to search for anything in this table without using full table scan. Also, it is basically impossible to use UPDATE by single row without having primary key.
You really have to either create synthetic key, or ask owner to do that. As a workaround, you might be able to find some existing column (or 2-3 columns) which is unique enough that it can be used as unique key. If it is unique but does not have actual index created, that would be still not good for performance - you should create such index.

Using "rowversion" as primary key column

I am using SQL Server 2012 and I want to create a "changes" table - it will be populated with data from other table when the second table columns values are changed.
I am adding to the "changes" table "datatime2", and "rowversion" columns in order to track when the changes are made.
Is it ok to use "rowversion" as primary key?
I have read here that it will be changed, if the current row is updated and that's why it is not a good candidate for "primary key" making foreign keys invalid.
Anyway, if it won't be used as a foreign key and the rows of "changes" table will never be updated (only new rows will be inserted) is it ok to use the "rowversion" as PK or I should use additional column?
Some good info here:
Careful reading of the MSDN page also shows that duplicate rowversion values are possible if SELECT INTO statements are used improperly. Something to watch out for there.
I would stick with an Identity field in the original data, carried over into the change tracking table that has its own Identity field.