I want to query a collection in ArangoDB using AQL, and at each node in the query, expand the node using a traversal.
I have attempted to do this by calling the traversal as a subquery using a LET statement within the collection query.
The result set for the traversal is empty, even though the query completes.
FOR ne IN energy
FILTER ne.identifier == "12345"
LET ne_edges = (
FOR v, e IN 1..1 ANY ne relation
RETURN e
)
RETURN MERGE(ne, {"edges": ne_edges})
[
{
"value": 123.99,
"edges": []
}
]
I have verified there are edges, and the traversal returns correctly when it is not executed as a subquery.
It seems as if the initial query is completing before a result is returned from the subquery, giving the result below.
What am I missing? or is there a better way?
I can think of two way to do this. This first is easier to understand but the second is more compact. For the examples below, I have a vertex collection test2 and an edge collection testEdge that links parent and child items within test2
Using Collect:
let seed = (FOR testItem IN test2
FILTER testItem._id in ['test2/Q1', 'test2/Q3']
RETURN testItem._id)
let traversal = (FOR seedItem in seed
FOR v, e IN 1..1 ANY seedItem
testEdge
RETURN {seed: seedItem, e_to: e._to})
for t in traversal
COLLECT seeds = t.seed INTO groups = t.e_to
return {myseed: seeds, mygroups: groups}
Above we first get the items we want to traverse through (seed), then we perform the traversal and get an object that has the seed .id and the related edges
Then we finally use collect into to group the results
Using array expansion
FOR testItem IN test2
FILTER testItem._id in ['test2/Q1', 'test2/Q3']
LET testEdges = (
FOR v, e IN 1..1 ANY testItem testEdge
RETURN e
)
RETURN {myseed: testItem._id, mygroups: testEdges[*]._to}
This time we combine the seed search and the traversal by using the let statement. then we use array expansion to group items
In either case, I end up with something that looks like this:
[
{
"myseed": "test2/Q1",
"mygroups": [
"test2/Q1-P5-2",
"test2/Q1-P6-3",
"test2/Q1-P4-1"
]
},
{
"myseed": "test2/Q3",
"mygroups": [
"test2/Q3",
"test2/Q3"
]
}
]
Related
I'm using Neo4j version 3.0.7
I'm reading a list of edges from a dataset and I need to pass those edges batch-wise using the REST API.
I used the following query format to create multiple nodes (if they already don't exist) and their relationships in Neo4j through a single Cypher query via the REST API. I obtain the two vertices of an edge and the node properties are set according to the vertex IDs of those vertices.
{
"query":
"MATCH (n { name: 0 }), (m { name:1 })
CREATE (n)-[:X]->(m)
WITH count(*) as dummy
MATCH (n { name: 0 }), (m { name: 6309 })
CREATE (n)-[:X]->(m)"
}
This approach works correctly for a batch of 10 edges but when I try to send a batch of 1000 edges (nodes and their relationships) through a single Cypher query, I get a StackOverflowError exception. Is there a better approach to achieve this task?
Thank you for your help.
The error obtained from the response:
{
"exception" : "StackOverflowError",
"fullname" : "java.lang.StackOverflowError",
"stackTrace" : [ "scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.scala:151) ..."
}
You can use UNWIND to get a single query:
UNWIND [[0,1], [0,6309]] AS pair
MATCH (n {name: pair[0]}), (m {name: pair[1]})
CREATE (n)-[:X]->(m)
Insert your node pairs after UNWIND as a list of two-element lists. As the query uses the name property for finding the nodes, it is worth adding an index to it. For example, if you haven Person nodes, index them with:
CREATE INDEX ON :Person(name)
(See also the Cypher reference card.)
I have documents with four fields: A, B, C, D Now I need to find documents where at least three fields matches. For example:
Query: A=a, B=b, C=c, D=d
Returned documents:
a,b,c,d (four of four met)
a,b,c (three of four met)
a,b,d (another three of four met)
a,c,d (another three of four met)
b,c,d (another three of four met)
So far I created something like:
`(A=a AND B=b AND C=c)
OR (A=a AND B=b AND D=d)
OR (A=a AND C=c AND D=d)
OR (B=b AND C=c AND D=d)`
But this is ugly and error prone.
Is there a better way to achieve it? Also, query performance matters.
I'm using Spring Data but I believe it does not matter. My current code:
Criteria c = new Criteria();
Criteria ca = Criteria.where("A").is(doc.getA());
Criteria cb = Criteria.where("B").is(doc.getB());
Criteria cc = Criteria.where("C").is(doc.getC());
Criteria cd = Criteria.where("D").is(doc.getD());
c.orOperator(
new Criteria().andOperator(ca,cb,cc),
new Criteria().andOperator(ca,cb,cd),
new Criteria().andOperator(ca,cc,cd),
new Criteria().andOperator(cb,cc,cd)
);
Query query = new Query(c);
return operations.find(query, Document.class, "documents");
Currently in MongoDB we cannot do this directly, since we dont have any functionality supporting Permutation/Combination on the query parameters.
But we can simplify the query by breaking the condition into parts.
Use Aggregation pipeline
$project with records (A=a AND B=b) --> This will give the records which are having two conditions matching.(Our objective is to find the records which are having matches for 3 out of 4 or 4 out of 4 on the given condition)`
Next in the pipeline use OR condition (C=c OR D=d) to find the final set of records which yields our expected result.
Hope it Helps!
The way you have it you have to do all permutations in your query. You can use the aggregation framework to do this without permuting all combinations. And it is generic enough to do with any K. The downside is I think you need Mongodb 3.2+ and also Spring Data doesn't support these oparations yet: $filter $concatArrays
But you can do it pretty easy with the java driver.
[
{
$project:{
totalMatched:{
$size:{
$filter:{
input:{
$concatArrays:[ ["$A"], ["$B"], ["$C"],["$D"]]
},
as:"attr",
cond:{
$eq:["$$attr","a"]
}
}
}
}
}
},
{
$match:{
totalMatched:{ $gte:3 }
}
}
]
All you are doing is you are concatenating the values of all the fields you need to check in a single array. Then select a subset of those elements that are equal to the value you are looking for (or any condition you want for that matter) and finally getting the size of that array for each document.
Now all you need to do is to $match the documents that have a size of greater than or equal to what you want.
I'm working with a dataset composed by probabilistic encrypted elements indistinguishable from random samples. This way, sequential encryptions of the same number results in different ciphertexts. However, these still comparable through a special function that applies algorithms like SHA256 to compare two ciphertexts.
I want to add a list of the described ciphertexts to a MongoDB database and index it using a tree-based structure (i.e.: AVL). I can't simply apply the default indexing of the database because, as described, the records must be comparable using the special function.
An example: Suppose I have a database db and a collection c composed by the following document type:
{
"_id":ObjectId,
"r":string
}
Moreover, let F(int,string,string) be the following function:
F(h,l,r) = ( SHA256(l | r) + h ) % 3
where the operator | is a standard concatenation function.
I want to execute the following query in an efficient way, such as in a collection with some suitable indexing:
db.c.find( { F(h,l,r) :{ $eq: 0 } } )
for h and l chosen arbitrarily but not constants. I.e.: Suppose I want to find all records that satisfy F(h1,l1,r), for some pair (h1, l1). Later, in another moment, I want to do the same but using (h2, l2) such that h1 != h2 and l1 != l2. h and l may assume any value in the set of integers.
How can I do that?
You can execute this query use the operator $where, but this way can't use index. So, for query performance it's dependents on the size of your dataset.
db.c.find({$where: function() { return F(1, "bb", this.r) == 0; }})
Before execute the code above, you need store your function F on the mongodb server:
db.system.js.save({
_id: "F",
value: function(h, l, r) {
// the body of function
}
})
Links:
store javascript function on server
I've tried a solution that store the result of the function in your collection, so I changed the schema, like below:
{
"_id": ObjectId,
"r": {
"_key": F(H, L, value),
"value": String
}
}
The field r._key is value of F(h,l,r) with constant h and l, and the field r.value is original r field.
So you can create index on field r._key and your query condition will be:
db.c.find( { "r._key" : 0 } )
By far I have encountered ways for selecting random documents but my problem is a bit more of a pickle.So here goes
I have a collection which contains say a 1000+ documents (products)
say each document has a more or less generic format of .Say for simplicity it is
{"_id":{},"name":"Product1","groupid":5}
The groupid is a number say between 1 to 20 denoting the product belongs to that group.
Now if my query input is something like an array of {groupid->weight} for eg {[{"2":4},{"7":6}]} and say another parameter n(=10 say) Then I need to be able to pick 4 random documents that belong to groupid 2 and 6 random documents that belong to groupid 7.
The only solution i can think of is to run 'm' subqueries where m is the array length in the query input.
How do I accomplish this an efficient manner in MongoDB using probably a Mapreduce.
Picking up n random documents for each group.
Group the records by the groupid field. Emit the groupid as key
and the record as value.
For each group pick n random documents from the values array.
Let,
var parameter = {"5":1,"6":2}; //groupid->weight, keep it as an Object.
be the input to the map reduce functions.
The map function, emit only those group ids which we have provided as the parameter.
var map = function map(){
if(parameter.hasOwnProperty(this.groupid)){
emit(this.groupid,this);
}
}
The reduce function, for each group, get random records based on the parameter object in scope.
var reduce = function(key,values){
var length = values.length;
var docs = [];
var added = [];
var i= 1;
while(i<=parameter[key]){
var index = Math.floor(Math.random()*length);
if(added.indexOf(index) == -1){
docs.push(values[index]);
added.push(index);
i++;
}
else{
i--;
}
}
return {result:docs};
}
Invoking map reduce on the collection, by passing the parameter object in scope.
db.collection.mapReduce(map,
reduce,
{out: "sam",
scope:{"parameter":{"5":1,"6":2,"n":10}}})
To get the dumped output:
db.sam.find({},{"_id":0,"value.result":1}).pretty()
When you bring the parameter n into picture, you need to specify the number of documents for each group as a ratio, or else that parameter is not necessary at all.
My query looks like that:
var x = db.collection.aggregate(...);
I want to know the number of items in the result set. The documentation says that this function returns a cursor. However it contains far less methods/fields than when using db.collection.find().
for (var k in x) print(k);
Produces
_firstBatch
_cursor
hasNext
next
objsLeftInBatch
help
toArray
forEach
map
itcount
shellPrint
pretty
No count() method! Why is this cursor different from the one returned by find()? itcount() returns some type of count, but the documentation says "for testing only".
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count, like that:
var cnt = x.hasNext() ? x.next().cnt : 0;
Is there a more straight forward way to get this count? As in db.collection.find(...).count()?
Barno's answer is correct to point out that itcount() is a perfectly good method for counting the number of results of the aggregation. I just wanted to make a few more points and clear up some other points of confusion:
No count() method! Why is this cursor different from the one returned by find()?
The trick with the count() method is that it counts the number of results of find() on the server side. itcount(), as you can see in the code, iterates over the cursor, retrieving the results from the server, and counts them. The "it" is for "iterate". There's currently (as of MongoDB 2.6), no way to just get the count of results from an aggregation pipeline without returning the cursor of results.
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count
Yes. This is a reasonable way to get the count of results and should be more performant than itcount() since it does the work on the server and does not need to send the results to the client. If the point of the aggregation within your application is just to produce the number of results, I would suggest using the $group stage to get the count. In the shell and for testing purposes, itcount() works fine.
Where have you read that itcount() is "for testing only"?
If in the mongo shell I do
var p = db.collection.aggregate(...);
printjson(p.help)
I receive
function () {
// This is the same as the "Cursor Methods" section of DBQuery.help().
print("\nCursor methods");
print("\t.toArray() - iterates through docs and returns an array of the results")
print("\t.forEach( func )")
print("\t.map( func )")
print("\t.hasNext()")
print("\t.next()")
print("\t.objsLeftInBatch() - returns count of docs left in current batch (when exhausted, a new getMore will be issued)")
print("\t.itcount() - iterates through documents and counts them")
print("\t.pretty() - pretty print each document, possibly over multiple lines")
}
If I do
printjson(p)
I find that
"itcount" : function (){
var num = 0;
while ( this.hasNext() ){
num++;
this.next();
}
return num;
}
This function
while ( this.hasNext() ){
num++;
this.next();
}
It is very similar var cnt = x.hasNext() ? x.next().cnt : 0; And this while is perfect for count...