Neo4j - merge all nodes that have same label(s) or property value

Neo4j - merge all nodes that have same label(s) or property value - merge

In Neo4j, how can I merge all nodes by a single property i.e. node1.property1=node2.property2? Or, how can I merge all nodes that have the same label name?
In the manual, it talks about this for individual nodes:
MATCH (person:Person)
MERGE (city:City { name: person.bornIn })
RETURN person.name, person.bornIn, city
Three nodes labeled City are created, each of which contains a name
property with the value of New York, Ohio, and New Jersey,
respectively. Note that even though the MATCH clause results in three
bound nodes having the value New York for the bornIn property, only a
single New York node (i.e. a City node with a name of New York) is
created. As the New York node is not matched for the first bound node,
it is created. However, the newly-created New York node is matched and
bound for the second and third bound nodes.
Result
5 rows
Nodes created: 3
Properties set: 3
Labels added: 3
My problem is that my node labels are variable and so I would have to do the above code for every node that contains a different label (the label in the above example is Person). For my example, if nodes have the property value , they will also have the same label.
UPDATE:
So, now I'm not sure my original question is what I need after all (I will clean up the question once things come into focus better).
My problem is that I have two node-edge-node instances
(node1) -[relation1]-> (node2)
and
(node2) -[relation2]->(node3)
where node1, node2, etc. are the labels for each node-edge-node. Note, node2 may have some different property values across different instances and there may be many nodes with label node2, each having exactly one relationship as shown above. Some properties will always be the same (unique identifiers related to the label name) though.
With that said, I'd like to run the query:
MATCH (n1: node1) -[r1: relation1]->
(n2: node2) -[r2: relation2]-> (n3: node3)
RETURN n1, r1, n2, r2, n3
but since there are many nodes with label node2 but none of them are connected (or merged?), the above query returns nothing. So, how can I merge all nodes with the same label so the query works as I would like?

You can tweak the query from the manual so that it works for any node that has a label from a specified collection (e.g., ['Person', 'Foo', 'Bar']). (The query below assumes that all such nodes have name and bornIn properties.)
MATCH (person)
WHERE ANY(x IN LABELS(person) WHERE x IN ['Person', 'Foo', 'Bar'])
MERGE (city:City { name: person.bornIn })
RETURN person.name, person.bornIn, city;
The above query can be tweaked to pass the list of labels in a parameter, which is more efficient if there are multiple sets of labels.
Another way of doing the same thing, if the set of labels does not change:
MATCH (person)
WHERE person:Person OR person:Foo OR person:Bar
MERGE (city:City { name: person.bornIn })
RETURN person.name, person.bornIn, city

Related

Combine Grafana metrics with mismatched labels

I have two metrics (relating to memory usage in my Kubernetes pods) defined as follows:
kube_pod_container_resource_limits_memory_bytes{app="kube-state-metrics",container="foo",instance="10.244.0.7:8080",job="kubernetes-endpoints",kubernetes_name="kube-state-metrics",kubernetes_namespace="monitoring",namespace="test",node="aks-nodepool1-25518080-0",pod="foo-cb9bc5fb5-2bghz"}
container_memory_working_set_bytes{agentpool="nodepool1",beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="Standard_A2",beta_kubernetes_io_os="linux",container_name="foo",failure_domain_beta_kubernetes_io_region="westeurope",failure_domain_beta_kubernetes_io_zone="1",id="/kubepods/burstable/pod5b0099a9-eeff-11e8-884b-ca2011a99774/eeb183b21e2b3226a32de41dd85d7a2e9fc8715cf31ea7109bfbb2cae7c00c44",image="#sha256:6d6003ba86a0b7f74f512b08768093b4c098e825bd7850db66d11f66bc384870",instance="aks-nodepool1-25518080-0",job="kubernetes-cadvisor",kubernetes_azure_com_cluster="MC_test.planned.bthbygg.se_bthbygg-test_westeurope",kubernetes_io_hostname="aks-nodepool1-25518080-0",kubernetes_io_role="agent",name="k8s_foo_foo-cb9bc5fb5-2bghz_test_5b0099a9-eeff-11e8-884b-ca2011a99774_0",namespace="test",pod_name="foo-cb9bc5fb5-2bghz",storageprofile="managed",storagetier="Standard_LRS"}
I want to combine these two into a percentage, by doing something like
container_memory_working_set_bytes{namespace="test"}
/ kube_pod_container_resource_limits_memory_bytes{namespace="test"}
but that gives me no data back, presumably because there are no matching labels to join the data sets on. As you can see, I do have matching label values, but the label names don't match.
Is there somehow I can formulate my query to join these on e.g. pod == pod_name, without having to change the metrics at the other end (where they are exported)?

You can use PromQL label_replace function to create a new matching label from the original labels.
For instance, you can use the below expression to add a container_name="foo" label to the first metric which can be used to do the join:
label_replace(
kube_pod_container_resource_limits_memory_bytes,
"container_name", "$1", "container", "(.*)")
You can use the above patern to create new labels that can be used for the matching.

Cypher MERGE with two uniqueness constraints

Suppose I specify two uniqueness constraints on a label Person in Cypher:
CREATE CONSTRAINT ON Person
ASSERT name IS UNIQUE
CREATE CONSTRAINT ON Person
ASSERT id_number IS UNIQUE
If I run the following MERGE command
MERGE (p:Person {name:"Alice", id_number=153})
the behavior is:
if there is a node with the name Alice and id_number 153, it is returned
if there is a node with the name Alice xor id_number 153, there is an error because we cannot create a new node and maintain both uniqueness constraints
if there is a node with neither the name Alice nor id_number 153, a new node is created with these properties.
I want to change the xor behavior so that we do
if there is a node with the name Alice or id_number 153, it is returned
if there is a node with neither the name Alice nor id_number 153, a new node is created with these properties.
Any idea how to achieve this in Cypher?

What should happen if you have one node with the name Alice and another different node with the id_number 153? That's kind of the central philosophical problem with this schema. Setting that aside, your closest bet is going to be manually adapting MERGE logic like so:
OPTIONAL MATCH (p:Person)
WHERE p.name = 'Alice' or p.id_number = 153
WITH COLLECT(p) AS ps
WITH ps, CASE SIZE(ps) WHEN 0 THEN [True] ELSE [] END AS news
UNWIND news AS new
MERGE (q:Person {name: 'Alice', id_number:153})
WITH ps, COLLECT(q) AS qs
WITH COALESCE(HEAD(ps), HEAD(qs)) AS p
You may not actually need uniqueness constraints in your case; they are often used casually in place of regular indexes, but they are really only necessary if you have to worry about asynchronous writes (which, even then, can be managed other ways). Otherwise you just need to be disciplined in query writing so that you use MERGE instead of CREATE and don't MERGE patterns with unbound nodes that should be unique.

Merge neo4j relationships into one while returning the result if certain condition satisfies

My use case is:
I have to return whole graph in result but the condition is
If there are more than 1 relationship in between two particular nodes in the same direction then I have to just merge it into 1 relationship. For ex: Lets say there are two nodes 'm' and 'n' and there are 3 relations in between these nodes say r1, r2, r3 (in the same direction) then when I get the result after firing cypher query I should get only 1 relation in between 'n' and 'm'.
I need to perform some operations on top of it like the resultant relation that we got from merging all the relations should contain the properties and their values that I want to retain. Actually I will retain all the properties of any one of the relations that are merging depending upon the timestamp field that is one of the properties in relation.
Note : I have same properties throughout all my relations (The number of properties and name of properties are same across all relations. Values may differ for sure)
Any help would be appreciated. Thanks in advance.

You mean something like this?
Delete all except the first
MATCH (a)-[r]->(b)
WITH a,b,type(r) as type, collect(r) as rels
FOREACH (r in rels[1..] | DELETE r)
Ordering by timestamp first
MATCH (a)-[r]->(b)
WITH a,r,b
ORDER BY r.timestamp DESC
WITH a,b,type(r) as type, collect(r) as rels
FOREACH (r in rels[1..] | DELETE r)
If you want to do all those operations virtually just on query results you'd do them in your programming language of choice.

Cypher - WHERE and AND with 2 ids

I have a node with id 1 and a node with id 2 in the database and they are linked to each other. Why when I run this query
MATCH (a)-[r]-(b) WHERE id(a)=1 AND id(b)=2 RETURN *;
Nothing is returned?
Solution
I use GrapheneDB. Usually GrapheneDB presents the system node id on the node graphic but when you have an attribute id it presents that instead. When I ran the query I was using the graphic id which wasn't actually the system id so id(a) didn't give the expected result.

Because the WHERE clause is evaluated for each candidate result, and the entire clause must evaluate to true.
Also, putting MATCH (a)-[r]-(b) will only find parts of the graph where those two nodes are related.
If you just want to find nodes 1 and 2, you can do this:
MATCH n
WHERE id(n) = 1 OR id(n) = 2
RETURN n
However, you should not be using node ids. They are deprecated, and being phased out. There are lots of other ways to find and identify nodes that don't rely on their internal identifier. If you open a new question with your actual scenario, we could help you write a better query.

Retrieve first value with Xquery using a wildcard

In an XmlData column in SQL Server 2008 that has no schema assigned to it, how can I pull the first item at a particular node level? For example, I have:
SELECT
XmlData.value('//*/*[1]','NVARCHAR(6)')
FROM table
where XmlData.Exist('//*/*[1]') = 1
I assume this does not work because if there are multiple nodes with different names at the 2nd level, the first of each of those could be returned (and the value() requires that a singleton be selected.
Since I don't know what the names of any nodes will be, is there a way to always select whatever the first node is at the 2nd level?

I found the answer by chaining Xquery .query() and .value()
XMLDATA.query('//*/*[1]').value('.[1]','NVARCHAR(6)')
This returns the value of the first node and works perfectly for my needs.