neo4j creating random empty nodes when merging - merge

I'm trying to create a new node with label C and relationships from a-->c and b-->c, but if and only if the whole pattern a-->c,b-->c does exist.
a and b already exist (merged before the rest of the query).
The below query is a portion of the query I want to write to accomplish this.
However, it creates a random empty node devoid of properties and labels and attaches the relationship to that node instead. This shouldn't be possible and is certainty not what I want. How do I stop that from happening?
merge (a: A {id: 1})
merge (b: B {id:1})
with *
call {with a, b
match (a)-[:is_required]->(dummy:C), (a)-[:is_required]->(b)
with count(*) as cnt
where cnt = 0
merge (temp: Temporary {id: 12948125})
merge (a)-[:is_required]->(temp)
return temp
}
return *
Thanks

I think there are a couple of problems here:
There are restrictions on how you can use variables introduced with WITH in a sub-query. This article helps to explain them https://neo4j.com/developer/kb/conditional-cypher-execution/
I think you may be expecting the WHERE to introduce conditional flow like IF does in other languages. WHERE is a filter (maybe FILTER would have been a better choice of keyword than WHERE). In this case you are filtering out 'cnt's where they are 0, but then never reference cnt again, so the merge (temp: Temporary {id: 12948125}) and merge (a)-[:is_required]->(temp) always get executed. The trouble is, due to the above restrictions on using variables inside sub-queries, the (a) node you are trying to reference doesn't exist, it's not the one in the outer query. Neo4j then just creates an empty node, with no properties or labels and links it to the :Temporary node - this is completely valid and why you are getting empty nodes.
This query should result in what you intend:
merge (a: A {id: 1})
merge (b: B {id:1})
with *
// Check if a is connected to b or :C (can't use a again otherwise we'd overwrite it)
optional match(x:A {id: 1}) where exists((a)-[:is_required]->(:C)) or exists((a)-[:is_required]->(b))
with *, count(x) as cnt
// use a case to 'fool' foreach into creating the extra :Temporary node required if a is not related to b or :C
foreach ( i in case when cnt = 0 then [1] else [] end |
merge (temp: Temporary {id: 12948125})
merge (a)-[:is_required]->(temp)
)
with *
// Fetch the :Temporary node if it was created
optional match (a)-[:is_required]->(t:Temporary)
return *
There are apoc procedures you could use to perform conditional query execution (they are mentioned in the linked article). You could also play around with looking for a path from (a) and check its length, rather than introduce a new MATCH and the variable x then checking for the existance of related nodes.

If anyone is having the same problem, the answer is that the Neo4j browser is display nonexistent nodes. The query executes fineā€¦

Related

Merge vertex list in gremlin orientDb

I am a newbie in the graph databases world, and I made a query to get leaves of the tree, and I also have a list of Ids. I want to merge both lists of leaves and remove duplicates in a new one to sum property of each. I cannot merge the first 2 sets of vertex
g.V().hasLabel('Group').has('GroupId','G001').repeat(
outE().inV()
).emit().hasLabel('User').as('UsersList1')
.V().has('UserId', within('001','002')).as('UsersList2')
.select('UsersList1','UsersList2').dedup().values('petitions').sum().unfold()
Regards
There are several things wrong in your query:
you call V().has('UserId', within('001','002')) for every user that was found by the first part of the traversal
the traversal could emit more than just the leafs
select('UsersList1','UsersList2') creates pairs of users
values('petitions') tries to access the property petitions of each pair, this will always fail
The correct approach would be:
g.V().has('User', 'UserId', within('001','002')).fold().
union(unfold(),
V().has('Group','GroupId','G001').
repeat(out()).until(hasLabel('User'))).
dedup().
values('petitions').sum()
I didn't test it, but I think the following will do:
g.V().union(
hasLabel('Group').has('GroupId','G001').repeat(
outE().inV()
).until(hasLabel('User')),
has('UserId', within('001','002')))
.dedup().values('petitions').sum()
In order to get only the tree leaves, it is better to use until. Using emit will output all inner tree nodes as well.
union merges the two inner traversals.

Merge neo4j relationships into one while returning the result if certain condition satisfies

My use case is:
I have to return whole graph in result but the condition is
If there are more than 1 relationship in between two particular nodes in the same direction then I have to just merge it into 1 relationship. For ex: Lets say there are two nodes 'm' and 'n' and there are 3 relations in between these nodes say r1, r2, r3 (in the same direction) then when I get the result after firing cypher query I should get only 1 relation in between 'n' and 'm'.
I need to perform some operations on top of it like the resultant relation that we got from merging all the relations should contain the properties and their values that I want to retain. Actually I will retain all the properties of any one of the relations that are merging depending upon the timestamp field that is one of the properties in relation.
Note : I have same properties throughout all my relations (The number of properties and name of properties are same across all relations. Values may differ for sure)
Any help would be appreciated. Thanks in advance.
You mean something like this?
Delete all except the first
MATCH (a)-[r]->(b)
WITH a,b,type(r) as type, collect(r) as rels
FOREACH (r in rels[1..] | DELETE r)
Ordering by timestamp first
MATCH (a)-[r]->(b)
WITH a,r,b
ORDER BY r.timestamp DESC
WITH a,b,type(r) as type, collect(r) as rels
FOREACH (r in rels[1..] | DELETE r)
If you want to do all those operations virtually just on query results you'd do them in your programming language of choice.

Cypher - WHERE and AND with 2 ids

I have a node with id 1 and a node with id 2 in the database and they are linked to each other. Why when I run this query
MATCH (a)-[r]-(b) WHERE id(a)=1 AND id(b)=2 RETURN *;
Nothing is returned?
Solution
I use GrapheneDB. Usually GrapheneDB presents the system node id on the node graphic but when you have an attribute id it presents that instead. When I ran the query I was using the graphic id which wasn't actually the system id so id(a) didn't give the expected result.
Because the WHERE clause is evaluated for each candidate result, and the entire clause must evaluate to true.
Also, putting MATCH (a)-[r]-(b) will only find parts of the graph where those two nodes are related.
If you just want to find nodes 1 and 2, you can do this:
MATCH n
WHERE id(n) = 1 OR id(n) = 2
RETURN n
However, you should not be using node ids. They are deprecated, and being phased out. There are lots of other ways to find and identify nodes that don't rely on their internal identifier. If you open a new question with your actual scenario, we could help you write a better query.

Difference between merge and create unique in Neo4j

I'm trying to figure out what is the difference between MERGE and CREATE UNIQUE. I know these features:
MERGE
I'm able to create node, if doesn't exist pattern.
MERGE (n { name:"X" }) RETURN n;
This create node "n" with property name, empty node "m" and relationship RELATED.
MERGE (n { name:"X" })-[:RELATED]->(m) RETURN n, m;
CREATE UNIQUE
I'm not able to create node like this.
CREATE UNIQUE (n { name:"X" }) RETURN n;
If exists node "n", create unique makes empty node "m" and relationship RELATED.
MATCH (n { name: 'X' }) CREATE UNIQUE (n)-[:RELATED]->(m) RETURN n, m;
If this pattern exists, nothing created, only returns pattern.
From my point of view, I see MERGE and CREATE UNIQUE are quite same queries, but with CREATE UNIQUE you can't create start node in relationship. I would be grateful, if someone could explain this issue and compare these queries, thx.
CREATE UNIQUE has slightly more obscure semantics than MERGE. MERGE was developed as an alternative with more intuitive behavior than CREATE UNIQUE; if in doubt, MERGE is usually the right choice.
The easiest way to think of MERGE is as a MATCH-or-create. That is, if something in the database would MATCH the pattern you are using in MERGE, then MERGE will just return that pattern. If nothing matches, the MERGE will create all missing elements in the pattern, where a missing element means any unbound identifier.
Given
MATCH (a {uid:123})
MERGE (a)-[r:LIKES]->(b)-[:LIKES]->(c)
"a" is a bound identifier from the perspective of the MERGE. This means cypher somehow already knows which node it represents.
This statement can have two outcomes. Either the whole pattern already exists, and nothing will be created, or parts of the pattern are missing, and a whole new set of relationships and nodes matching the pattern will be created.
Examples
// Before merge:
(a)-[:LIKES]->()-[:LIKES]->()
// After merge:
(a)-[:LIKES]->()-[:LIKES]->()
// Before merge:
(a)-[:LIKES]->()-[:OWNS]->()
// After merge:
(a)-[:LIKES]->()-[:OWNS]->()
(a)-[:LIKES]->()-[:LIKES]->()
// Before merge:
(a)
// After merge:
(a)-[:LIKES]->()-[:LIKES]->()

what's the utility of array type?

I'm totally newbie with postgresql but I have a good experience with mysql. I was reading the documentation and I've discovered that postgresql has an array type. I'm quite confused since I can't understand in which context this type can be useful within a rdbms. Why would I have to choose this type instead of using a classical one to many relationship?
Thanks in advance.
I've used them to make working with trees (such as comment threads) easier. You can store the path from the tree's root to a single node in an array, each number in the array is the branch number for that node. Then, you can do things like this:
SELECT id, content
FROM nodes
WHERE tree = X
ORDER BY path -- The array is here.
PostgreSQL will compare arrays element by element in the natural fashion so ORDER BY path will dump the tree in a sensible linear display order; then, you check the length of path to figure out a node's depth and that gives you the indentation to get the rendering right.
The above approach gets you from the database to the rendered page with one pass through the data.
PostgreSQL also has geometric types, simple key/value types, and supports the construction of various other composite types.
Usually it is better to use traditional association tables but there's nothing wrong with having more tools in your toolbox.
One SO user is using it for what appears to be machine-aided translation. The comments to a follow-up question might be helpful in understanding his approach.
I've been using them successfully to aggregate recursive tree references using triggers.
For instance, suppose you've a tree of categories, and you want to find products in any of categories (1,2,3) or any of their subcategories.
One way to do it is to use an ugly with recursive statement. Doing so will output a plan stuffed with merge/hash joins on entire tables and an occasional materialize.
with recursive categories as (
select id
from categories
where id in (1,2,3)
union all
...
)
select products.*
from products
join product2category on...
join categories on ...
group by products.id, ...
order by ... limit 10;
Another is to pre-aggregate the needed data:
categories (
id int,
parents int[] -- (array_agg(parent_id) from parents) || id
)
products (
id int,
categories int[] -- array_agg(category_id) from product2category
)
index on categories using gin (parents)
index on products using gin (categories)
select products.*
from products
where categories && array(
select id from categories where parents && array[1,2,3]
)
order by ... limit 10;
One issue with the above approach is that row estimates for the && operator are junk. (The selectivity is a stub function that has yet to be written, and results in something like 1/200 rows irrespective of the values in your aggregates.) Put another way, you may very well end up with an index scan where a seq scan would be correct.
To work around it, I increased the statistics on the gin-indexed column and I periodically look into pg_stats to extract more appropriate stats. When a cursory look at those stats reveal that using && for the specified values will return an incorrect plan, I rewrite applicable occurrences of && with arrayoverlap() (the latter has a stub selectivity of 1/3), e.g.:
select products.*
from products
where arrayoverlap(cat_id, array(
select id from categories where arrayoverlap(parents, array[1,2,3])
))
order by ... limit 10;
(The same goes for the <# operator...)