Neo4j MERGE Query - merge

MERGE (n : Person { id : 1000)
MERGE (m : Item { id : 2000})
SET m.name ='xyz'
MERGE (n)-[r:Buy]->(m)
I am trying to upload this kind of data (around 10k+) to Neo4j but the MERGE query is becoming very slow to execute, since for each MERGE it needs a full scan of the node space to verify no other item exists with the given property.
Is there any way to solve this ?

If you define a unique constraint on the id properties of the :Person and :Item nodes, these will become indexed and will then be scanned much more efficiently.
The syntax you'll need will look something like this:
CREATE CONSTRAINT ON (person:Person) ASSERT person:id IS UNIQUE
CREATE CONSTRAINT ON (item:Item) ASSERT item:id IS UNIQUE
You can find out more about unique constraints here:
http://docs.neo4j.org/chunked/stable/query-constraints.html

Related

Is there anyway to make the unique index ignore old data?

What I need to do is to block rows from being duplicade at the values number, serie and model.
My initial thought was to do something like this:
CREATE UNIQUE INDEX idx_fiscal_number
ON Fiscal(number, serie, model);
The problem is that this database is really old, and there's a lot of data already duplicated. So my question is:
Is there anyway to make this unique index start validating now and accept the old data already there?
You can try to create a partial index if you can express the condition "old" with existing columns. For example, if you have a column "age"
CREATE UNIQUE INDEX idx_fiscal_number_partial
ON fiscal(number, serie, model)
WHERE age < 1000;
See https://www.postgresql.org/docs/current/indexes-partial.html.

Sphinx centralize multiple tables into a single index

I do have multiple tables (MySQL) and I want to have a single index for them.
Each table has the primary key of int autoincrement type.
The structure of collected data is the same for each table (so no conflict), but as the IDs collide so it seems that I have to query each index separately (unless you can give me a hint of how to avoid ID collision)
Question is: If I query each index separately does it means that the weight of returned results are comparable between indexes?
unless you can give me a hint of how to avoid ID collision
See for example
http://sphinxsearch.com/forum/view.html?id=13078
You can just arrange for the ids to be offset differently. The 'sphinx document id' doesnt have to match the real primary key, but having a simple mapping makes the application simpler.
You have a choice between one-index, one-source (using a single sql query to union all the tables together. one-index, many-source. (a source per table, all making one index) or many-indexes (one index per table, each with own source). Which ever way will give the same query results.
If I query each index separately does it means that the weight of returned results are comparable between indexes?
Pretty much. The difference should be negiblibe that doesnt matter whic way round you do it.

Difference between merge and create unique in Neo4j

I'm trying to figure out what is the difference between MERGE and CREATE UNIQUE. I know these features:
MERGE
I'm able to create node, if doesn't exist pattern.
MERGE (n { name:"X" }) RETURN n;
This create node "n" with property name, empty node "m" and relationship RELATED.
MERGE (n { name:"X" })-[:RELATED]->(m) RETURN n, m;
CREATE UNIQUE
I'm not able to create node like this.
CREATE UNIQUE (n { name:"X" }) RETURN n;
If exists node "n", create unique makes empty node "m" and relationship RELATED.
MATCH (n { name: 'X' }) CREATE UNIQUE (n)-[:RELATED]->(m) RETURN n, m;
If this pattern exists, nothing created, only returns pattern.
From my point of view, I see MERGE and CREATE UNIQUE are quite same queries, but with CREATE UNIQUE you can't create start node in relationship. I would be grateful, if someone could explain this issue and compare these queries, thx.
CREATE UNIQUE has slightly more obscure semantics than MERGE. MERGE was developed as an alternative with more intuitive behavior than CREATE UNIQUE; if in doubt, MERGE is usually the right choice.
The easiest way to think of MERGE is as a MATCH-or-create. That is, if something in the database would MATCH the pattern you are using in MERGE, then MERGE will just return that pattern. If nothing matches, the MERGE will create all missing elements in the pattern, where a missing element means any unbound identifier.
Given
MATCH (a {uid:123})
MERGE (a)-[r:LIKES]->(b)-[:LIKES]->(c)
"a" is a bound identifier from the perspective of the MERGE. This means cypher somehow already knows which node it represents.
This statement can have two outcomes. Either the whole pattern already exists, and nothing will be created, or parts of the pattern are missing, and a whole new set of relationships and nodes matching the pattern will be created.
Examples
// Before merge:
(a)-[:LIKES]->()-[:LIKES]->()
// After merge:
(a)-[:LIKES]->()-[:LIKES]->()
// Before merge:
(a)-[:LIKES]->()-[:OWNS]->()
// After merge:
(a)-[:LIKES]->()-[:OWNS]->()
(a)-[:LIKES]->()-[:LIKES]->()
// Before merge:
(a)
// After merge:
(a)-[:LIKES]->()-[:LIKES]->()

Postgres: n:m intermediate table with type

I have a table called "Tag" which consists of an Id, Name and Description column.
Now lets say I have the tables Character (C), Movie (M), Series (S) etc..
And I want to be able to tag entries in C, M, S with multiple tags and one tag may be used for multiple entries.
So I could realize it like this:
T -> TC <- C
T -> TM <- M
T -> TS <- S
Where TC, TM, TS are the intermediate tables.
I was wondering if I could combine TC, TM, TS into one table with a type column added and still use foreign keys.
As of yet I haven't found a way to do it.
Or is this something I shouldn't be doing?
As the comments above suggested you can't combine multiple table into a single one. If you want to have a single view of the "tag relationships" you can pull the needed information into a View. This way, you only need to write a longer query once and are able to use like a single table. Keep in mind that you can't insert data into a view (there are possibilities to do so, but they are a little advanced)

what's the utility of array type?

I'm totally newbie with postgresql but I have a good experience with mysql. I was reading the documentation and I've discovered that postgresql has an array type. I'm quite confused since I can't understand in which context this type can be useful within a rdbms. Why would I have to choose this type instead of using a classical one to many relationship?
Thanks in advance.
I've used them to make working with trees (such as comment threads) easier. You can store the path from the tree's root to a single node in an array, each number in the array is the branch number for that node. Then, you can do things like this:
SELECT id, content
FROM nodes
WHERE tree = X
ORDER BY path -- The array is here.
PostgreSQL will compare arrays element by element in the natural fashion so ORDER BY path will dump the tree in a sensible linear display order; then, you check the length of path to figure out a node's depth and that gives you the indentation to get the rendering right.
The above approach gets you from the database to the rendered page with one pass through the data.
PostgreSQL also has geometric types, simple key/value types, and supports the construction of various other composite types.
Usually it is better to use traditional association tables but there's nothing wrong with having more tools in your toolbox.
One SO user is using it for what appears to be machine-aided translation. The comments to a follow-up question might be helpful in understanding his approach.
I've been using them successfully to aggregate recursive tree references using triggers.
For instance, suppose you've a tree of categories, and you want to find products in any of categories (1,2,3) or any of their subcategories.
One way to do it is to use an ugly with recursive statement. Doing so will output a plan stuffed with merge/hash joins on entire tables and an occasional materialize.
with recursive categories as (
select id
from categories
where id in (1,2,3)
union all
...
)
select products.*
from products
join product2category on...
join categories on ...
group by products.id, ...
order by ... limit 10;
Another is to pre-aggregate the needed data:
categories (
id int,
parents int[] -- (array_agg(parent_id) from parents) || id
)
products (
id int,
categories int[] -- array_agg(category_id) from product2category
)
index on categories using gin (parents)
index on products using gin (categories)
select products.*
from products
where categories && array(
select id from categories where parents && array[1,2,3]
)
order by ... limit 10;
One issue with the above approach is that row estimates for the && operator are junk. (The selectivity is a stub function that has yet to be written, and results in something like 1/200 rows irrespective of the values in your aggregates.) Put another way, you may very well end up with an index scan where a seq scan would be correct.
To work around it, I increased the statistics on the gin-indexed column and I periodically look into pg_stats to extract more appropriate stats. When a cursory look at those stats reveal that using && for the specified values will return an incorrect plan, I rewrite applicable occurrences of && with arrayoverlap() (the latter has a stub selectivity of 1/3), e.g.:
select products.*
from products
where arrayoverlap(cat_id, array(
select id from categories where arrayoverlap(parents, array[1,2,3])
))
order by ... limit 10;
(The same goes for the <# operator...)