Is there anyway to make the unique index ignore old data? - postgresql

What I need to do is to block rows from being duplicade at the values number, serie and model.
My initial thought was to do something like this:
CREATE UNIQUE INDEX idx_fiscal_number
ON Fiscal(number, serie, model);
The problem is that this database is really old, and there's a lot of data already duplicated. So my question is:
Is there anyway to make this unique index start validating now and accept the old data already there?

You can try to create a partial index if you can express the condition "old" with existing columns. For example, if you have a column "age"
CREATE UNIQUE INDEX idx_fiscal_number_partial
ON fiscal(number, serie, model)
WHERE age < 1000;
See https://www.postgresql.org/docs/current/indexes-partial.html.

Related

How to SET jsonb_column = json_build_array( string_column ) in Sequelize UPDATE?

I'm converting a one-to-one relationship into a one-to-many relationship. The old relationship was just a foreign key on the parent record. The new relationship will be an array of foreign keys on the parent record.
(Using Postgres dialect, BTW.)
First I'll add a new JSONB column, which will hold an array of UUIDs.
Then I'll run a query to update all existing rows such that the value from the old column is now stored in the new column (as the first element in an array).
Finally, I'll remove the old column.
I'm looking for help with step 2: writing the update statement that will update all rows, setting the value of the new column based on the value of the old column. Basically, I'm trying to figure out how to express this SQL query using Sequelize:
UPDATE "myTable"
SET "newColumn" = json_build_array("oldColumn")
-- ^^ this really works, btw
Where:
newColumn is type JSONB, and should hold an array (of UUIDs)
oldColumn is type UUID
names are double-quoted because they're mixed case in the DB (shrug)
Expressed using Sequelize sugar, that might be something like:
const { models } = require('../sequelize')
await models.MyModel.update({ newColumn: [ 'oldColumn' ] })
...except that would result in saving an array that contains the string "oldColumn" rather than an array whose first element is the value in that row's oldColumn column.
My experience, and the Sequelize documentation, is focused on working with individual rows via the standard instance methods. I could do that here, but it'd be a lot better to have the database engine do the work internally instead of forcing it to transfer every row to Node and then back again.
Looking for whatever is the most Sequelize-idiomatic way of doing this, if there is one.
Any help is appreciated.

Maintain N latest points in Postgres

My purpose is to have N latest points in Postgres in one row. When a new point is added, remove the oldest point.
For example say I have N+1 points. 1 userid and N integers.
Now when a new integer comes for a user, I want to remove the oldest entry and add the new entry. Obviously the issue is performance here. Since I am adding only one new integer, I want some way to do it fast.
I tried one very naive way by keeping two columns
userid, json
where json was list of all integers. I would remove the first entry and append new entry and dump json in postgres. Undoubtedly it is not performing well.
Please suggest some good way to do it. Does Postgres has some min heap type of data structure which does it in much better than linear complexity?

Sphinx centralize multiple tables into a single index

I do have multiple tables (MySQL) and I want to have a single index for them.
Each table has the primary key of int autoincrement type.
The structure of collected data is the same for each table (so no conflict), but as the IDs collide so it seems that I have to query each index separately (unless you can give me a hint of how to avoid ID collision)
Question is: If I query each index separately does it means that the weight of returned results are comparable between indexes?
unless you can give me a hint of how to avoid ID collision
See for example
http://sphinxsearch.com/forum/view.html?id=13078
You can just arrange for the ids to be offset differently. The 'sphinx document id' doesnt have to match the real primary key, but having a simple mapping makes the application simpler.
You have a choice between one-index, one-source (using a single sql query to union all the tables together. one-index, many-source. (a source per table, all making one index) or many-indexes (one index per table, each with own source). Which ever way will give the same query results.
If I query each index separately does it means that the weight of returned results are comparable between indexes?
Pretty much. The difference should be negiblibe that doesnt matter whic way round you do it.

H2: Insert is slow because of index on column

I am using the h2 database to store data.
Each record has to be unique in the database (unique in the sense that the combination of timestamp, name, message,.. doesn't appear twice in the table). Therefore one column in the table is the hash of the data in the record. To speed up searching if the record already exists I created an index on the hash column. Indeed searching for a record with given hash is very fast.
But here is the problem: While in the beginning insertion of 10k records is fast enough (takes about a second), it gets awefully slow when having already one million records in the database (takes a minute). This probably because the new hashes need to be integrated into the existing index b-tree.
Is there any way to speed this up or is there a better way to ensure uniqueness of data records in the table?
Edit: To be more concrete:
Let's say my records are transactions which have the following fields:
time stamp, type, sender recipient, amount, message
A transaction should only appear once in the table so before inserting a new transaction I have to check if the transaction is already in the table. Since the sha 256 hash of all fields is unique my idea was to add a column 'hash' to the table where the hash of the fields is put in. Before inserting a new record I calculate the hash of the fields and query the table for the hash.
Index has its own over head. If you have a table where you will be having lots of insertions, I would suggest to avoid indexing on it as it has over-head of hash.
May I know what do you mean by --> one column in the table is the hash of the data in the record??
You can create a unique key constraint (here it will be the composite key of all those 3 mentioned columns), Let me know the requirements, may be we can give you a better solution of doing it in a simpler way :)
Danyal
Man, this is probably not a good way to query all the records, check them for duplicates and then insert the new row :). As soon as you move ahead, the overhead will increase as the number of the records increase.
Create a unique key constraint (check http://www.h2database.com/html/grammar.html ) on the combination of these field, you don't need to compute the hash, database will handle the hash thing. Just try to add the duplicate record, you will get the exception, catch the exception and show the error message as duplicate insertion..
Once you create the unique index, it won't allow you to insert any duplicate records. It is pretty secure and safe.
Indexing randomly distributed data is bad for performance. Once there are more entries in the index than fit in the cache, then updating the index will get very slow, specially when using a hard disk. This is because seeks on a hard disk are very slow. This, in combination with the random distribution of the data, will lead to very bad performance. With solid state disks it's a bit better, because random access reads are faster there.

indexes needed for table inheritance in postgres?

This is a fairly simple question, but it's one I can't find a firm answer on.
I have a parent table in PostgreSQL, and then several child tables which have been defined. A trigger has been established, and the children tables only have data inserted if a field, say field x, meets a certain criteria.
When I query the parent table with a field based upon x, PostgreSQL knows to immediately go to the child table that is related to that particular value of x.
That all being said, I don't need to specify a particular index on the column x do I? PostgreSQL already knows how to sort on it, and by adding an index to the parent x, PostgreSQL is therefore generating unique indexes on x for each of the new child tables.
Creating that index is a bit redundant, right?
Creating an index on the child table for x, if x only has one value (or a very, very small number of values) if probably a loss, yes. The planner would scan the whole table anyway.
If x is a timestamp and you're specifying a timeframe that may not be a whole partition, or if x is another range or set of values, an index would be a win most likely.
Edit: When I say one value or range of values, I mean, per child table.