Cypher - WHERE and AND with 2 ids - nosql

I have a node with id 1 and a node with id 2 in the database and they are linked to each other. Why when I run this query
MATCH (a)-[r]-(b) WHERE id(a)=1 AND id(b)=2 RETURN *;
Nothing is returned?
Solution
I use GrapheneDB. Usually GrapheneDB presents the system node id on the node graphic but when you have an attribute id it presents that instead. When I ran the query I was using the graphic id which wasn't actually the system id so id(a) didn't give the expected result.

Because the WHERE clause is evaluated for each candidate result, and the entire clause must evaluate to true.
Also, putting MATCH (a)-[r]-(b) will only find parts of the graph where those two nodes are related.
If you just want to find nodes 1 and 2, you can do this:
MATCH n
WHERE id(n) = 1 OR id(n) = 2
RETURN n
However, you should not be using node ids. They are deprecated, and being phased out. There are lots of other ways to find and identify nodes that don't rely on their internal identifier. If you open a new question with your actual scenario, we could help you write a better query.

Related

neo4j creating random empty nodes when merging

I'm trying to create a new node with label C and relationships from a-->c and b-->c, but if and only if the whole pattern a-->c,b-->c does exist.
a and b already exist (merged before the rest of the query).
The below query is a portion of the query I want to write to accomplish this.
However, it creates a random empty node devoid of properties and labels and attaches the relationship to that node instead. This shouldn't be possible and is certainty not what I want. How do I stop that from happening?
merge (a: A {id: 1})
merge (b: B {id:1})
with *
call {with a, b
match (a)-[:is_required]->(dummy:C), (a)-[:is_required]->(b)
with count(*) as cnt
where cnt = 0
merge (temp: Temporary {id: 12948125})
merge (a)-[:is_required]->(temp)
return temp
}
return *
Thanks
I think there are a couple of problems here:
There are restrictions on how you can use variables introduced with WITH in a sub-query. This article helps to explain them https://neo4j.com/developer/kb/conditional-cypher-execution/
I think you may be expecting the WHERE to introduce conditional flow like IF does in other languages. WHERE is a filter (maybe FILTER would have been a better choice of keyword than WHERE). In this case you are filtering out 'cnt's where they are 0, but then never reference cnt again, so the merge (temp: Temporary {id: 12948125}) and merge (a)-[:is_required]->(temp) always get executed. The trouble is, due to the above restrictions on using variables inside sub-queries, the (a) node you are trying to reference doesn't exist, it's not the one in the outer query. Neo4j then just creates an empty node, with no properties or labels and links it to the :Temporary node - this is completely valid and why you are getting empty nodes.
This query should result in what you intend:
merge (a: A {id: 1})
merge (b: B {id:1})
with *
// Check if a is connected to b or :C (can't use a again otherwise we'd overwrite it)
optional match(x:A {id: 1}) where exists((a)-[:is_required]->(:C)) or exists((a)-[:is_required]->(b))
with *, count(x) as cnt
// use a case to 'fool' foreach into creating the extra :Temporary node required if a is not related to b or :C
foreach ( i in case when cnt = 0 then [1] else [] end |
merge (temp: Temporary {id: 12948125})
merge (a)-[:is_required]->(temp)
)
with *
// Fetch the :Temporary node if it was created
optional match (a)-[:is_required]->(t:Temporary)
return *
There are apoc procedures you could use to perform conditional query execution (they are mentioned in the linked article). You could also play around with looking for a path from (a) and check its length, rather than introduce a new MATCH and the variable x then checking for the existance of related nodes.
If anyone is having the same problem, the answer is that the Neo4j browser is display nonexistent nodes. The query executes fineā€¦

Gremlin query to find the count of a label for all the nodes

Sample query
The following query returns me the count of a label say
"Asset " for a particular id (0) has >>>
g.V().hasId(0).repeat(out()).emit().hasLabel('Asset').count()
But I need to find the count for all the nodes that are present in the graph with a condition as above.
I am able to do it individually but my requirement is to get the count for all the nodes that has that label say 'Asset'.
So I am expecting some thing like
{ v[0]:2
{v[1]:1}
{v[2]:1}
}
where v[1] and v[2] has a node under them with a label say "Asset" respectively, making the overall count v[0] =2 .
There's a few ways you could do it. It's maybe a little weird, but you could use group()
g.V().
group().
by().
by(repeat(out()).emit().hasLabel('Asset').count())
or you could do it with select() and then you don't build a big Map in memory:
g.V().as('v').
map(repeat(out()).emit().hasLabel('Asset').count()).as('count').
select('v','count')
if you want to maintain hierarchy you could use tree():
g.V(0).
repeat(out()).emit().
tree().
by(project('v','count').
by().
by(repeat(out()).emit().hasLabel('Asset')).select(values))
Basically you get a tree from vertex 0 and then apply a project() over that to build that structure per vertex in the tree. I had a different way to do it using union but I found a possible bug and had to come up with a different method (actually Gremlin Guru, Daniel Kuppitz, came up with the above approach). I think the use of project is more natural and readable so definitely the better way. Of course as Mr. Kuppitz pointed out, with project you create an unnecessary Map (which you just get rid of with select(values)). The use of union would be better in that sense.

Neo4j Import Tool ID Spaces in Cypher Query

I'm using the Neo4j Import Tool to import some nodes, and am trying to understand how ID spaces and labels work together and affect the behavior of cypher queries that match on nodes with specific IDs.
So for example, suppose I load nodes into two ID spaces ID_SPACE_X and ID_SPACE_Y:
x_nodes.csv:
id:ID(ID_SPACE_X),field1:string,field2:long,:LABEL
1,"foo",42,A
y_nodes.csv:
id:ID(ID_SPACE_Y),field1:string,:LABEL
1,"bar",A
Then I perform the following Cypher query:
MATCH (n:A {id:1}) RETURN n;
Which node is returned? Can you express the ID space in a cypher query in order to return the right node? Or must labels assigned to nodes in one ID space be exclusive to that ID space?
Thanks for any help.
ID spaces are only meaningful to the Import Tool. They enable the tool to correctly detect uniqueness errors.
They have no connection to node labels.
So, your example is buggy. You are telling the Import Tool that it is OK to create 2 A nodes with the same id property value. This will cause 2 such nodes to be created.

Slow query with order and limit clause but only if there are no records

I am running the following query:
SELECT * FROM foo WHERE name = 'Bob' ORDER BY address DESC LIMIT 25 OFFSET 1
Because I have records in the table with name = 'Bob' the query time is fast on a table of 10M records (<.5 seconds)
However, if I search for name = 'Susan' the query takes over 45 seconds. I have no records in the table where name = 'Susan'.
I have an index on each of name and address. I've vacuumed the table, analyzed it and have even tried to re-write the query:
SELECT * FROM (SELECT * FROM foo WHERE name = 'Bob' ORDER BY address DESC) f LIMIT 25 OFFSET 1
and can't find any solution. I'm not really sure how to proceed. Please note this is different than this post as my slowness only happens when there are no records.
EDIT:
If I take out the ORDER BY address then it runs quickly. Obviously, I need that there. I've tried re-writing it (with no success):
SELECT * FROM (SELECT * FROM foo WHERE name = 'Bob') f ORDER BY address DESC LIMIT 25 OFFSET 1
Examine the execution plan to see which index is being used. In this case, the separate indexes for name and address are not enough. You should create a combined index of name, then address for this query.
Think of an index as a system maintained copy of certain columns, in a different order from the original. In this case, you want to first find matches by name, then tie-break on address, then take until you have enough or run out of name matches.
By making name first in the multi-column index, the index will be sorted by name first. Then address will serve as our tie-breaker.
Under the original indexes, if the address index is the one chosen then the query's speed will vary based on how quickly it can find matches.
The plan (in english) would be: Proceed through all of the rows which happen to already be sorted by address, discard any that do not match the name, keep going until we have enough.
So if you do not get 25 matches, you read the whole table!
With my proposed multi-column index, the plan (in English) would be: Proceed through all of the name matching rows which happen to already be sorted by address. Start with the first one and take them until you have enough. If you run out, stop.
Since the situation is that a query without the Order By is much faster than the one with the Order By clause; I'd make 2 queries:
-One without the order by, limit 1, to know if you have at least one record.
In the case you have at least one, it's safe to run the query with Order by.
-If there's no record, no need to run the second query.
Yes, it's not a solution, but it will let you deliver your project. Just ensure you create a ticket to handle the technical debt after delivery ;) otherwise your lead developer will set you on fire.
Then, to solve the real technical problem, it will be useful to know which indices you have created. Without these it will be very hard to give you a proper solution!

Retrieve first value with Xquery using a wildcard

In an XmlData column in SQL Server 2008 that has no schema assigned to it, how can I pull the first item at a particular node level? For example, I have:
SELECT
XmlData.value('//*/*[1]','NVARCHAR(6)')
FROM table
where XmlData.Exist('//*/*[1]') = 1
I assume this does not work because if there are multiple nodes with different names at the 2nd level, the first of each of those could be returned (and the value() requires that a singleton be selected.
Since I don't know what the names of any nodes will be, is there a way to always select whatever the first node is at the 2nd level?
I found the answer by chaining Xquery .query() and .value()
XMLDATA.query('//*/*[1]').value('.[1]','NVARCHAR(6)')
This returns the value of the first node and works perfectly for my needs.