Neo4j Best Practice for Index on Relationship Property? - postgresql

I'm trying to migrate a PostgreSQL database to Neo4j and have the following m-n relationship in PostgreSQL:
user-(blocked)-user
So in PostgreSQL I have an extra table "blocked" that has the following columns: userid1, userid2, blockedsince, blockeduntil
I also have an index on blocked.blockeduntil to search for rows that eventually must be removed when the blocking is over.
How can I build this relationship including the index in Neo4j?
I already included all user-rows in a node type "user" in Neo4j.
Next step would have been to create a relationship called "blocked" from user to user.
So basically like this: (user)-[:blocked]->(user)
I would have added the 2 relationship properties "blockeduntil" and "blockedsince" to the relationship. But it does not seem to be possible to create an index on the relationship property blocked.blockeduntil
Original code for PostgreSQL:
CREATE TABLE user (
UserId bigserial NOT NULL PRIMARY KEY,
...
);
CREATE TABLE blocked(
UserId1 bigint NOT NULL references user(UserId),
UserId2 bigint NOT NULL references user(UserId),
BlockedSince timestamp NOT NULL,
BlockedUntil timestamp NOT NULL,
PRIMARY KEY (UserId1, UserId2)
);
CREATE INDEX "IdxBlocked" on blocked(blockeduntil);
My first approach in Neo4j:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///blocked.csv" AS row
MATCH (u1:user {userid: toInteger(row.userid1)})
MATCH (u2:user {userid: toInteger(row.userid2)})
MERGE (u1)-[b:BLOCKED]->(u2)
ON CREATE SET b.blockedsince = localdatetime(replace(row.blockedsince, " ", "T")), b.blockeduntil = localdatetime(replace(row.blockeduntil, " ", "T"));
What is the best practice to achieve this relationship including the index on blockeduntil? Would it be better to create a new node type called "blocked" like this?
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///blocked.csv" AS row
CREATE (:blocked{userid1: toInteger(row.userid1), userid2: toInteger(row.userid2), blockedsince: localdatetime(replace(row.blockedsince, " ", "T")),
blockeduntil: localdatetime(replace(row.blockeduntil, " ", "T"))});
And then create an index on blocked.blockeduntil like this?
CREATE INDEX ON :blocked(blockeduntil);
During my research I stumbled upon explicit indexes Explicit Indexes but they seem to be deprecated. Also I'm not sure if Full Text Indexes are the right choice here.

Ok, it seems that I found an official answer from a Neo4J staff member.
https://community.neo4j.com/t/how-can-i-use-index-in-relationship/1627/2
From the post:
We instead recommend refactoring your model. If you need to perform an index lookup of something, that usually suggests that thing would be better modeled as a node. As a node you are able to create the indexes and constraints you need for quick lookup.
So I will model "blocked" as a node and create an index on blocked.blockeduntil.

Related

I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has

I will explain the problem with an example:
I am designing a specific case of referential integrity in a table. In the model there are two tables, enterprise and document. We register the companies and then someone insert the documents associated with it. The name of the enterprise is variable. When it comes to recovering the documents, I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has. The solution that I thought was to register the company again in each change with the same code, the updated name in this way would have the expected result, but I am not sure if it is the best solution. Can someone make a suggestion?
There are several possible solutions and it is hard to determine which one will exactly be the easiest.
Side comment: your question is limited to managing names efficiently but I would like to comment the fact that your DB is sensitive to files being moved, renamed or deleted. Your database will not be able to keep records up-to-date if anything happen at OS level. You should consider to do something about it too.
Amongst the few solution I considered, the one that is best normalized is the schema below:
CREATE TABLE Enterprise
(
IdEnterprise SERIAL PRIMARY KEY
, Code VARCHAR(4) UNIQUE
, IdName INTEGER DEFAULT -1 /* This will be used to get a single active name */
);
CREATE TABLE EnterpriseName (
IDName SERIAL PRIMARY KEY
, IdEnterprise INTEGER NOT NULL REFERENCES Enterprise(IdEnterprise) ON UPDATE NO ACTION ON DELETE CASCADE
, Name TEXT NOT NULL
);
ALTER TABLE Enterprise ADD FOREIGN KEY (IdName) REFERENCES EnterpriseName(IdName) ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED;
CREATE TABLE Document
(
IdDocument SERIAL PRIMARY KEY
, IdName INTEGER NOT NULL REFERENCES EnterpriseName(IDName) ON UPDATE NO ACTION ON DELETE NO ACTION
, FilePath TEXT NOT NULL
, Description TEXT
);
Using flag and/or timestamps or moving the enterprise name to the document table are appealing solutions, but only at first glance.
Especially, the part where you have to ensure a company always has 1, and 1 only "active" name is no easy thing to do.
Add a date range to your enterprise: valid_from, valid_to. Initialise to -infinity,+infinity. When you change the name of an enterprise, instead: update existing rows where valid_to = +infinity to be now() and insert the new name with valid_from = now(), valid_to = +infinity.
Add a date field to the document, something like create_date. Then when joining to enterprise you join on ID and d.create_date between e.valid_from and e.valid_to.
This is a simplistic approach and breaks things like uniqueness for your id and code. To handle that you could record the name in a separate table with the id,from,to,name. Leaving your original table with just the id and code for uniqueness.

Indexing jsonb data for pattern matching searches

This is a follow-up to:
Pattern matching on jsonb key/value
I have a table as follows
CREATE TABLE "PreStage".transaction (
transaction_id serial NOT NULL,
transaction jsonb
CONSTRAINT pk_transaction PRIMARY KEY (transaction_id)
);
The content in my transaction jsonb column looks like
{"ADDR": "abcd", "CITY": "abcd", "PROV": "",
"ADDR2": "",
"ADDR3": "","CNSNT": "Research-NA", "CNTRY": "NL", "EMAIL": "#.com",
"PHONE": "12345", "HCO_NM": "HELLO", "UNQ_ID": "",
"PSTL_CD": "1234", "HCP_SR_NM": "", "HCP_FST_NM": "",
"HCP_MID_NM": ""}
I need search query like:
SELECT transaction AS data FROM "PreStage".transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%';
But I need to give my user flexibility to search any key/value on the fly.
An answer to the previous question suggested to create index as:
CREATE INDEX idxgin ON "PreStage".transaction
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);
Which works, but I wanted to index other keys, too. Hence was trying something like:
CREATE INDEX idxgin ON "PreStage".transaction USING gin
((transaction->>'HCP_FST_NM'),(transaction->>'HCP_LST_NM') gin_trgm_ops)
Which doesn't work. What would be the best indexing approach here or will I have to create a separate index for each key in which case the approach will not be generic if a new key/value pair is added to the data.
The syntax error that #jjanes pointed out aside,
for a mix of some popular keys (contained in many rows and / or searched often) plus many more rare keys (contained in few rows and / or rarely searched, new keys might pop up dynamically) I suggest this combination:
Trigram indexes for popular keys
It does not seem like you are going to combine multiple keys in one search often, and a single index with many keys would grow very big and slow. So I would create a separate index for each popular key. Make it a partial index for keys that are not contained in most rows:
CREATE INDEX trans_idxgin_HCP_FST_NM ON transaction -- contained in most rows
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);
CREATE INDEX trans_idxgin_ADDR ON transaction -- not in most rows
USING gin ((transaction->>'ADDR') gin_trgm_ops)
WHERE transaction ? 'ADDR';
Etc. Like detailed in my previous answer:
Pattern matching on jsonb key/value
Basic jsonb GIN index
If you have many different keys and / or new keys are added dynamically, you can cover the rest with a basic (default) jsonb_ops GIN index:
CREATE INDEX trans_idxgin ON "PreStage".transaction USING gin (transaction);
Among other things, this supports the search for keys. But you cannot use it for pattern matching on values.
What's the proper index for querying structures in arrays in Postgres jsonb?
Query
Combine predicates addressing both indexes:
SELECT transaction AS data
FROM "PreStage".transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND transaction ? 'HCP_FST_NM'; -- even if that seems redundant.
The second condition happens to match our partial indexes as well.
So either there is a specific trigram index for the given (popular / common) key, or there is at least an index to find (the few) rows containing the rare key - and then filter for matching values. The same query should give you the best of both worlds.
Be sure to run the latest version of Postgres, there have been various updates for cost estimates recently. It will be crucial that Postgres works with good estimates and current table statistics to choose the best query plan.
There is no built in index that does precisely what you want, searching for an exact key and a corresponding wild-card matching value, without specifying ahead of time which key(s) to use. It should be possible to create an extension which would do this, but it would be an awful lot of work, and I don't know of any that exist.
Your best option that works out of the box might be to cast the jsonb to text and index that text:
create index on transaction using gin ((transaction::text) gin_trgm_ops);
And then add a secondary condition to your query:
SELECT transaction AS data FROM transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND transaction::text ilike '%neer%';
Now it can use the index to find anything containing 'neer', and then later re-check that 'neer' occurs in the value for the 'HCP_FST_NM' key, as opposed to just some other place in the JSONB.
If your query word occurs in lots of places other than in the value of the desired key, then this might not give you very good performance. For example, if someone searched for:
transaction->>'EMAIL' ilike '%ADDR%'
AND transaction::text ilike '%ADDR%';
The the index would return every row, assuming all records have the same structure as what you show, because every row contains 'ADDR' because used as a key. Then every row would fail the other condition check, but only after doing a lot of work.

Check current TTL on collection columns in Cassandra

Lets assume I have a Column Family with following schema:
CREATE TABLE users (
user_id timeuuid,
name varchar,
last_name varchar,
children list,
phone_numbers map,
PRIMARY KEY(user_id)
);
Then I insert a row into this CF with "USING TTL 60000". When I want to verify if any of these columns still has TTL set I get error: "Cannot use selection function ttl on collections".
My question is: how to get TTL on elements of a column that is defined as collection ?
Cheers!
I reproduced your problem -- naturally getting the very same result. The problem is that either (1) in collections TTL's are element-wise (one TTL per entry in collection) and (2) I found no way of getting entries from Maps or Lists.
Of course I can delete one element -- but selecting it or it's TTL was not possible. Even the Datastax' CQL driver v2 has not provided the metadata for that.
So you may change your data structure for that. If this was 'just' for testing purposes you have to trust Cassandra doing this well enough.

EF db first and table without key

I am trying to use Entity Framework DB first to do quick prototyping of a reporting website for a huge db. The problem is one of the tables doesn't have a key. I got an 'Error 159: EntityType has no key defined'. If I add a key on the model designer, I got 'Error 3024: Must specify mapping for all key properties'. My question is whether there is a way to workaround this WITHOUT adding a key to the table. The table is not in our control.
Huge table which does not have a key? It would not be possible for you or for table owner to search for anything in this table without using full table scan. Also, it is basically impossible to use UPDATE by single row without having primary key.
You really have to either create synthetic key, or ask owner to do that. As a workaround, you might be able to find some existing column (or 2-3 columns) which is unique enough that it can be used as unique key. If it is unique but does not have actual index created, that would be still not good for performance - you should create such index.

How to maintain record history on table with one-to-many relationships?

I have a "services" table for detailing services that we provide. Among the data that needs recording are several small one-to-many relationships (all with a foreign key constraint to the service_id) such as:
service_owners -- user_ids responsible for delivery of service
service_tags -- e.g. IT, Records Management, Finance
customer_categories -- ENUM value
provider_categories -- ENUM value
software_used -- self-explanatory
The problem I have is that I want to keep a history of updates to a service, for which I'm using an update trigger on the table, that performs an insert into a history table matching the original columns. However, if a normalized approach to the above data is used, with separate tables and foreign keys for each one-to-many relationship, any update on these tables will not be recognised in the history of the service.
Does anyone have any suggestions? It seems like I need to store child keys in the service table to maintain the integrity of the service history. Is a delimited text field a valid approach here or, as I am using postgreSQL, perhaps arrays are also a valid option? These feel somewhat dirty though!
Thanks.
If your table is:
create table T (
ix int identity primary key,
val nvarchar(50)
)
And your history table is:
create table THistory (
ix int identity primary key,
val nvarchar(50),
updateType char(1), -- C=Create, U=Update or D=Delete
updateTime datetime,
updateUsername sysname
)
Then you just need to put an update trigger on all tables of interest. You can then find out what the state of any/all of the tables were at any point in history, to determine what the relationships were at that time.
I'd avoid using arrays in any database whenever possible.
I don't like updates for the exact reason you are saying here...you lose information as it's over written. My answer is quite simple...don't update. Not sure if you're at a point where this can be implemented...but if you can I'd recommend using the main table itself to store historical (no need for a second set of history tables).
Add a column to your main header table called 'active'. This can be a character or a bit (0 is off and 1 is on). Then it's a bit of trigger magic...when an update is preformed, you insert a row into the table identical to the record being over-written with a status of '0' (or inactive) and then update the existing row (this process keeps the ID column on the active record the same, the newly inserted record is the inactive one with a new ID).
This way no data is ever lost (admittedly you are storing quite a few rows...) and the history can easily be viewed with a select where active = 0.
The pain here is if you are working on something already implemented...every existing query that hits this table will need to be updated to include a check for the active column. Makes this solution very easy to implement if you are designing a new system, but a pain if it's a long standing application. Unfortunately existing reports will include both off and on records (without throwing an error) until you can modify the where clause