Merge neo4j relationships into one while returning the result if certain condition satisfies

Merge neo4j relationships into one while returning the result if certain condition satisfies - merge

My use case is:
I have to return whole graph in result but the condition is
If there are more than 1 relationship in between two particular nodes in the same direction then I have to just merge it into 1 relationship. For ex: Lets say there are two nodes 'm' and 'n' and there are 3 relations in between these nodes say r1, r2, r3 (in the same direction) then when I get the result after firing cypher query I should get only 1 relation in between 'n' and 'm'.
I need to perform some operations on top of it like the resultant relation that we got from merging all the relations should contain the properties and their values that I want to retain. Actually I will retain all the properties of any one of the relations that are merging depending upon the timestamp field that is one of the properties in relation.
Note : I have same properties throughout all my relations (The number of properties and name of properties are same across all relations. Values may differ for sure)
Any help would be appreciated. Thanks in advance.

You mean something like this?
Delete all except the first
MATCH (a)-[r]->(b)
WITH a,b,type(r) as type, collect(r) as rels
FOREACH (r in rels[1..] | DELETE r)
Ordering by timestamp first
MATCH (a)-[r]->(b)
WITH a,r,b
ORDER BY r.timestamp DESC
WITH a,b,type(r) as type, collect(r) as rels
FOREACH (r in rels[1..] | DELETE r)
If you want to do all those operations virtually just on query results you'd do them in your programming language of choice.

Related

neo4j creating random empty nodes when merging

I'm trying to create a new node with label C and relationships from a-->c and b-->c, but if and only if the whole pattern a-->c,b-->c does exist.
a and b already exist (merged before the rest of the query).
The below query is a portion of the query I want to write to accomplish this.
However, it creates a random empty node devoid of properties and labels and attaches the relationship to that node instead. This shouldn't be possible and is certainty not what I want. How do I stop that from happening?
merge (a: A {id: 1})
merge (b: B {id:1})
with *
call {with a, b
match (a)-[:is_required]->(dummy:C), (a)-[:is_required]->(b)
with count(*) as cnt
where cnt = 0
merge (temp: Temporary {id: 12948125})
merge (a)-[:is_required]->(temp)
return temp
}
return *
Thanks

I think there are a couple of problems here:
There are restrictions on how you can use variables introduced with WITH in a sub-query. This article helps to explain them https://neo4j.com/developer/kb/conditional-cypher-execution/
I think you may be expecting the WHERE to introduce conditional flow like IF does in other languages. WHERE is a filter (maybe FILTER would have been a better choice of keyword than WHERE). In this case you are filtering out 'cnt's where they are 0, but then never reference cnt again, so the merge (temp: Temporary {id: 12948125}) and merge (a)-[:is_required]->(temp) always get executed. The trouble is, due to the above restrictions on using variables inside sub-queries, the (a) node you are trying to reference doesn't exist, it's not the one in the outer query. Neo4j then just creates an empty node, with no properties or labels and links it to the :Temporary node - this is completely valid and why you are getting empty nodes.
This query should result in what you intend:
merge (a: A {id: 1})
merge (b: B {id:1})
with *
// Check if a is connected to b or :C (can't use a again otherwise we'd overwrite it)
optional match(x:A {id: 1}) where exists((a)-[:is_required]->(:C)) or exists((a)-[:is_required]->(b))
with *, count(x) as cnt
// use a case to 'fool' foreach into creating the extra :Temporary node required if a is not related to b or :C
foreach ( i in case when cnt = 0 then [1] else [] end |
merge (temp: Temporary {id: 12948125})
merge (a)-[:is_required]->(temp)
)
with *
// Fetch the :Temporary node if it was created
optional match (a)-[:is_required]->(t:Temporary)
return *
There are apoc procedures you could use to perform conditional query execution (they are mentioned in the linked article). You could also play around with looking for a path from (a) and check its length, rather than introduce a new MATCH and the variable x then checking for the existance of related nodes.

If anyone is having the same problem, the answer is that the Neo4j browser is display nonexistent nodes. The query executes fine…

Does datajoint "join" operator require tables to have the same value at shared secondary attributes?

I am a researcher in Loren Frank's lab at UCSF. When calling populate on a Computed datajoint table that depends on two upstream tables which have entries that have the same value for shared primary attributes but different values for a shared secondary attribute ("analysis_file_name"), I get the following error:
~/anaconda3/envs/nwb_datajoint/lib/python3.8/site-packages/datajoint/condition.py in assert_join_compatibility(expr1, expr2)
63 if not isinstance(expr1, U) and not isinstance(expr2, U): # dj.U is always compatible
64 try:
---> 65 raise DataJointError(
66 "Cannot join query expressions on dependent attribute `%s`" % next(
67 r for r in set(expr1.heading.secondary_attributes).intersection(
DataJointError: Cannot join query expressions on dependent attribute `analysis_file_name`
To make clear why this situation is arising, our lab currently has a workflow where all datajoint tables that store data in nwb files have the secondary attribute "analysis_file_name", which contains the name of the analysis file storing the data. Consequently, entries across two tables can share values at primary attributes, but differ in the value at the secondary attribute "analysis_file_name". The above error seems to arise when "joining" two such tables, e.g. during the autopopulation of a third table that depends on those tables. Could folks from datajoint clarify whether to join two tables (e.g. during autopopulation of a third table that depends on those tables), it must be the case that entries which have the same value at shared primary attributes also have the same value at shared secondary attributes? Thanks for any clarification on this.

This is indeed a bug and I filed it here: issue 980
We are working on the solution PR981
The problem is that the key_source did not project out the secondary attributes before joining.
In the meantime, the workaround is to override the key_source attribute projecting out the secondary attributes in one of the tables. For example, if the two parents are A and B then you need to define the method:
#property
def key_source(self):
return A.proj() * B

Algebra Relational sql GROUP BY SORT BY ORDER BY

I wanted to know what is the equivalent in GROUP BY, SORT BY and ORDER BY in algebra relational ?

Neither is possible in relational algebra but people have been creating some "extensions" for these operations (Note: in the original text, part of the text is written as subscript).
GROUP BY, According to the book Fundamentals of Database Systems (Elmasri, Navathe 2011 6th ed):
Another type of request that cannot be expressed in the basic relational algebra is to
specify mathematical aggregate functions on collections of values from the database.
...
We can define an AGGREGATE FUNCTION operation, using the symbol ℑ (pronounced
script F)7, to specify these types of requests as follows:
<grouping attributes> ℑ <function list> (R)
where <grouping attributes> is a list of attributes of the relation specified in R, and <function list> is a list of (<function> <attribute>) pairs. In each such pair,
<function> is one of the allowed functions—such as SUM, AVERAGE, MAXIMUM,
MINIMUM,COUNT—and <attribute> is an attribute of the relation specified by R. The resulting relation has the grouping attributes plus one attribute for each element in the function list.
ORDER BY (SORT BY), John L. Donaldson's lecture notes* (not available anymore):
Since a relation is a set (or a bag), there is no ordering defined for a relation. That is, two relations are the same if they contain the same tuples, irrespective of ordering. However, a user frequently wants the output of a query to be listed in some particular order. We can define an additional operator τ which sorts a relation if we are willing to allow an operator whose output is not a relation, but an ordered list of tuples.
For example, the expression
τLastName,FirstName(Student)
generates a list of all the Student tuples, ordered by LastName (as the primary sort key) then FirstName (as a secondary sort key). (The secondary sort key is used only if two tuples agree on the primary sort key. A sorting operation can list any number of sort keys, from most significant to least significant.)
*John L. Donaldson's (Emeritus Professor) lecture notes from the course CSCI 311 Database Systems at the Oberlin College Computer Science. Referenced 2015. Checked 2022 and not available anymore.

You can use projection π for the columns that you want group the table by them without aggregating (The PROJECT operation removes any duplicate tuples)
as following:
π c1,c2,c3 (R)
where c1,c2,c3 are columns(attributes) and R is the table(the relation)

According to this SQL to relational algebra converter tool, we have:
SELECT agents.agent_code, agents.agent_name, SUM(orders.advance_amount)
FROM agents, orders
WHERE agents.agent_code = orders.agent_code
GROUP BY agents.agent_code, agents.agent_name
ORDER BY agents.agent_code
Written in functions sort of like:
τ agents.agent_code
γ agent_code, agent_name, SUM(advance_amount)
σ agents.agent_code = orders.agent_code (agents × orders)
With a diagram like:

How do I implement object-persistence not involving loading to memory?

I have a Graph object (this is in Perl) for which I compute its transitive closure (i.e. for solving the all-pairs shortest paths problem).
From this object, I am interested in computing:
Shortest path from any vertices u -> v.
Distance matrix for all vertices.
General reachability questions.
General graph features (density, etc).
The graph has about 2000 vertices, so computing the transitive closure (using Floyd-Warshall's algorithm) takes a couple hours. Currently I am simply caching the serialized object (using Storable, so it's pretty efficient already).
My problem is, deserializing this object still takes a fair amount of time (a minute or so), and consumes about 4GB of RAM. This is unacceptable for my application.
Therefore I've been thinking about how to design a database schema to hold this object in 'unfolded' form. In other words, precompute the all-pairs shortest paths, and store those in an appropriate manner. Then, perhaps use stored procedures to retrieve the necessary information.
My other problem is, I have no experience with database design, and have no clue about implementing the above, hence my post. I'd also like to hear about other solutions that I may be disregarding. Thanks!

To start with, sounds like you need two entities: vertex and edge and perhaps a couple tables for results. I would suggest a table that stores node-to-node information. If A is reachable from Y the relationship gets the reachable attribute. So here goes
Vertex:
any coordinates (x,y,...)
name: string
any attributes of a vertex*
Association:
association_id: ID
association_type: string
VertexInAssociation:
vertex: (constrained to Vertex)
association: (constrained to association)
AssociationAttributes:
association_id: ID (constrained to association)
attribute_name: string
attribute_value: variable -- possibly string
* You might also want to store vertex attributes in a table as well, depending on how complex they are.
The reason that I'm adding the complexity of Association is that an edge is not felt to be directional and it simplifies queries to consider both vertexes to just be members of a set of vertexes "connected-by-edge-x"
Thus an edge is simply an association of edge type, which would have an attribute of distance. A path is an association of path type, and it might have an attribute of hops.
There might be other more optimized schemas, but this one is conceptually pure--even if it doesn't make the first-class concept of "edge" a first class entity.
To create an minimal edge you would need to do this:
begin transaction
select associd = max(association_id) + 1 from Association
insert into Association ( association_id, association_type )
values( associd, 'edge' )
insert
into VertexInAssociation( association_id, vertex_id )
select associd, ? -- $vertex->[0]->{id}
UNION select associd, ? -- $vertex->[1]->{id}
insert into AssociationAttributes ( association_id, association_name, association_value )
select associd, 'length', 1
UNION select associd, 'distance', ? -- $edge->{distance}
commit
You might also want to make association types classes of sorts. So that the "edge" association automatically gets counted as a "reachable" association. Otherwise, you might want to insert UNION select associd, reachable, 'true' in there as well.
And then you could query a union of reachable associations of both vertexes and dump them as reachable associations to the other node if they did not exist and dump existing length attribute value + 1 into the length attribute.
However, you'd probably want an ORM for all that though, and just manipulate it inside the Perl.
my $v1 = Vertex->new( 'V', x => 23, y => 89, red => 'hike!' );
my $e = Edge->new( $v1, $v2 ); # perhaps Edge knows how to calculate distance.

Relations With No Attributes

Aheo asks if it is ok to have a table with just one column. How about one with no columns, or, given that this seems difficult to do in most modern "relational" DBMSes, a relation with no attributes?

There are exactly two relations with no attributes, one with an empty tuple, and one without. In The Third Manifesto, Date and Darwen (somewhat) humorously name them TABLE_DEE and TABLE_DUM (respectively).
They are useful to the extent that they are the identity of a variety of relational operators, playing roles equivalent to 1 and 0 in ordinary algebra.

A table with a single column is a set -- as long as you don't care about ordering the values, or associating any other info with them, it seems fine. You can check for membership in it, and basically that's all you can do. (If you don't have a UNIQUE constraint on the single column I guess you could also count number of occurrences... a multiset).
But what in blazes would a table with no columns (or a relation with no attributes) mean -- or, how would it be any good?!

DEE and cartesian product form a monoid. In practice, if you have Date's relational summarize operator, you'd use DEE as your grouping relation to obtain grand-totals. There are many other examples where DEE is practically useful, e.g. in a functional setting with a binary join operator you'd get n-ary join = foldr join dee

"There are exactly two relations with no attributes, one with an empty tuple, and one without. In The Third Manifesto, Date and Darwen (somewhat) humorously name them TABLE_DEE and TABLE_DUM (respectively).
They are useful to the extent that they are the identity of a variety of relational operators, playing a roles equivalent to 1 and 0 in ordinary algebra."
And of course they also play the role of "TRUE" and "FALSE" in boolean algebra. Meaning that they are useful when propositions such as "The shop is open" and "The alarm is set" are to be represented in a database.
A consequence of this is that they can also be usefully employed in any expression of the relational algebra for their properties of "acting as an IF/ELSE" : joining to TABLE_DUM means retaining no tuples at all from the other argument, joining to TABLE_DEE means retaining them all. So joining R to a relvar S which can be equal to either TABLE_DEE or TABLE_DUM, is the RA equivalent of "if S then R else FI", with FI standing for the empty relation.

Hm. So the lack of "real-world examples" got to me, and I tried my best. Perhaps surprisingly, I got half way there!
cjs=> CREATE TABLE D ();
CREATE TABLE
cjs=> SELECT COUNT (*) FROM D;
count
-------
0
(1 row)
cjs=> INSERT INTO D () VALUES ();
ERROR: syntax error at or near ")"
LINE 1: INSERT INTO D () VALUES ();

A table with a single column would make sense as a simple lookup. Let's say you have a list of strings you want to filter against for user inputed text. That table would store the words you would want to filter out.

It is difficult to see utility of TABLE_DEE and TABLE_DUM from SQL Database perspective. After all it is not guaranteed that your favorite db vendor allows you creating one or the other.
It is also difficult to see utility of TABLE_DEE and TABLE_DUM in relational algebra. One have to look beyond that. To get you a flavor how these constants can come alive consider relational algebra put into proper mathematical shape, that is as close as it is possible to Boolean algebra. D&D Algebra A is a step in this direction. Then, one can express classic relational algebra operations via more fundamental ones and those two constants become really handy.