Limit number of items in group().by() in gremlin query

Limit number of items in group().by() in gremlin query - group-by

I am trying to run a gremlin query which groups vertices of a certain label into several groups by a certain field (assume it is 'displayName') and limit the number of groups to n and the number of items in each group also to n.
Is there a way to achieve that?
Since group().by() returns a list of the item, I tried using unfold() and then applying limit on the inner items. I managed to limit the number of groups that are returned, but couldn't limit the number of items in each group.
Here's the query I used to limit the number of groups:
g.V().hasLabel('customLabel').group().by('displayName').unfold().limit(n)
// Expected result:(if n == 2)
[
{
"displayName1": [
{ // item 1 in first group
},
{ // item 2 in first group
}
]
},
{
"displayName2": [
{ // item 1 in second group
},
{ // item 2 in second group
}
]
}
]
// Actual result: (when n == 2)
[
{
"displayName1": [
{ // item 1 in first group
},
{ // item 2 in first group
},
... // all the items are included in the result
]
},
{
"displayName2": [
{ // item 1 in second group
},
{ // item 2 in second group
},
... // all the items are included in the result
]
}
]
Currently, with the query above, I get only 2 groups "displayName1" and "displayName2", but each one contains all the items in it and not only 2 as expected.

If you want to limit the answer you can do it by defining the values for each key in the group:
g.V().hasLabel('customLabel')
.group()
.by('displayName')
.by(identity().limit(n).fold())
.unfold().limit(n)

Related

Only keep items that match string in array in Azure Data Factory

I have a very large array, similar to this one:
{
"name":"latest_test",
"value":[
{
"name":"2016-06-27-12Z",
"type":"Folder"
},
{
"name":"2016-06-28-00Z",
"type":"Folder"
},
{
"name":"2016-06-28-12Z",
"type":"Folder"
},
{
"name":"2016-06-29-00Z",
"type":"Folder"
},
{
"name":"2016-06-29-12Z",
"type":"Folder"
}
]
}
I only want to keep the items that have 2016-06-29 in their name. Such that I have a new array that only consists of 2016-06-29-00Z and 2016-06-29-12Z.
I tried to use a filter with #contains(item(), '2016-06-29') but this returns 0.

item() is the entire object for an element in the array. In order to filter on a property have to specify the property also.
Change the filter condition to
#contains(item().name, '2016-06-29')

How to sort and then pull a specific group of objects from a mongodb query

I'm currently pulling every embedded object, that matches the given filters, from my mongodb collection. I then sort these and then .slice them to get the 10 objects the user actually needs depending on which page they are on, it looks like this.
const data = await EachWay.aggregate([
{
$unwind: "$data"
},
{
$match: filters
},
{
$project: {
"_id": 0,
"bookmaker": 1,
"sport": 1,
"data": 1
}
}
])
const start_index = (current_page - 1) * posts_per_page // included in results
const end_index = current_page * posts_per_page // won't be included in results
const shortened = data.sort((a, b) => {
if(Number(a.data.rating) < Number(b.data.rating)){
return 1
} else if(Number(a.data.rating) > Number(b.data.rating)){
return -1
} else {
return 0
}
}).slice(start_index, end_index).map(item => {
let result = item.data
result.bookmaker = item.bookmaker
result.sport = item.sport
return result
})
If the user is on page 3, this will get the 21st to the 30th post.
I feel like this is inefficient, as I'm having to pull every matching post from the database first, in some cases this can be 100,000 posts and then I only end up sending the client 10 of these. Is there a way to sort the embedded objects and pull the 21st-30th items from the database before returning the items?

mongodb query to get related data from relational collections

I have 6 collections:
A,B,C,D,E and F
the A collection has many relations with B collection:
one side is A collection.
many side is B collection.
collection A with coll_B_list field is connected to collection B.
collection B with coll_A field is connected to collection A.
collection A:
[
{id:"a1", coll_B_list:["b1", "b2", "b3", "b4"]},
{id:"a2", coll_B_list:["b5", "b6", "b7", "b8"]}
]
the B collection has one relation with collections C,D,E and F.
collection B with the coll_C field is connected to collection C.
collection B with the coll_D field is connected to collection D.
collection B with the coll_E field is connected to collection E.
collection B with the coll_F field is connected to collection F.
the collections C, D, E and F with the coll_B field are connected to collection B.
collection B:
[
{id:"b1", date:"2020-06-01", coll_A:"a1", coll_C:"c1", coll_D:"d2"},
{id:"b2", date:"2020-07-01", coll_A:"a1", coll_C:"c2"},
{id:"b3", date:"2020-05-01", coll_A:"a1", coll_C:"c3", coll_D:"d3", coll_E:"e1", coll_F:"f1"},
{id:"b4", date:"2020-08-01", coll_A:"a1", coll_C:"c4", coll_D:"d1", coll_E:"e2"},
{id:"b5", date:"2020-04-01", coll_A:"a2", coll_C:"c5", coll_D:"d4", coll_E:"e3"},
{id:"b6", date:"2020-01-01", coll_A:"a2", coll_C:"c6", coll_D:"d5", coll_E:"e4", coll_F:"f2"},
{id:"b7", date:"2020-10-01", coll_A:"a2", coll_C:"c7"},
{id:"b8", date:"2020-09-01", coll_A:"a2", coll_C:"c8", coll_D:"d6"}
]
collection C:
[
{id:"c1", coll_B:"b1", value:"v1"},
{id:"c2", coll_B:"b2", value:"v2" },
{id:"c3", coll_B:"b3", value:"v3" },
{id:"c4", coll_B:"b4", value:"v4" },
{id:"c5", coll_B:"b5", value:"v5" },
{id:"c6", coll_B:"b6", value:"v6" },
{id:"c7", coll_B:"b7", value:"v7" },
{id:"c8", coll_B:"b8", value:"v8" }
]
collection D:
[
{id:"d1", coll_B:"b4", value:"v9" },
{id:"d2", coll_B:"b1", value:"v10" },
{id:"d3", coll_B:"b3", value:"v11" },
{id:"d4", coll_B:"b5", value:"v12" },
{id:"d5", coll_B:"b6", value:"v13" },
{id:"d6", coll_B:"b8", value:"v14" }
]
collection E:
[
{id:"e1", coll_B:"b3", value:"v15" },
{id:"e2", coll_B:"b4", value:"v16" },
{id:"e3", coll_B:"b5", value:"v17" },
{id:"e4", coll_B:"b6", value:"v18" }
]
collection F:
[
{id:"f1", coll_B:"b3", value:"v19" },
{id:"f2", coll_B:"b6", value:"v20" }
]
I want Query that its result be like this:
coll_B_list field should :
sorted descending base on date field.
in the collections C,D,E and F :
- if exist related collection then show that
- else show the nearest collection
Result:
[
{
id:"a1",
coll_B_list:[
{
id:"b4",
date:"2020-08-01",
coll_C:{id:"c4", value:"v4"},
coll_D:{id:"d1", value:"v9"},
coll_E:{id:"e2", value:"v16"},
coll_F:{id:"f1", value:"v19"} // doesnt exist so get the nearest value
},
{
id:"b2",
date:"2020-07-01",
coll_C:{id:"c2", value:"v2"},
coll_D:{id:"d2", value:"v10"}, // doesnt exist so get the nearest value
coll_E:{id:"e1", value:"v15"}, // doesnt exist so get the nearest value
coll_F:{id:"f1", value:"v19"} // doesnt exist so get the nearest value
},
{
id:"b1",
date:"2020-06-01",
coll_C:{id:"c1", value:"v1"},
coll_D:{id:"d2", value:"v10"},
coll_E:{id:"e1", value:"v15"}, // doesnt exist so get the nearest value
coll_F:{id:"f1", value:"v19"} // doesnt exist so get the nearest value
},
{
id:"b3",
date:"2020-05-01",
coll_C:{id:"c3", value:"v3"},
coll_D:{id:"d3", value:"v11"},
coll_E:{id:"e1", value:"v15"},
coll_F:{id:"f1", value:"v19"} // doesnt exist so get the nearest value
}
]
},
{
id:"a2",
coll_B_list:[
{
id:"b7",
date:"2020-10-01",
coll_C:{id:"c7", value:"v7"},
coll_D:{id:"d6", value:"v14"}, // doesnt exist so get the nearest value
coll_E:{id:"e3", value:"v17"}, // doesnt exist so get the nearest value
coll_F:{id:"f2", value:"v20"} // doesnt exist so get the nearest value
},
{
id:"b8",
date:"2020-09-01",
coll_C:{id:"c8", value:"v8"},
coll_D:{id:"d6", value:"v14"},
coll_E:{id:"e3", value:"v17"}, // doesnt exist so get the nearest value
coll_F:{id:"f2", value:"v20"} // doesnt exist so get the nearest value
},
{
id:"b5",
date:"2020-04-01",
coll_C:{id:"c5", value:"v5"},
coll_D:{id:"d4", value:"v12"},
coll_E:{id:"e3", value:"v17"},
coll_F:{id:"f2", value:"v20"} // doesnt exist so get the nearest value
},
{
id:"b6",
date:"2020-01-01",
coll_C:{id:"c6", value:"v6"},
coll_D:{id:"d5", value:"v13"},
coll_E:{id:"e4", value:"v18"},
coll_F:{id:"f2", value:"v20"}
}
]
}
]
COLLECTIONS IN MONGOPLAYGROUND
NESTED $lookup in MONGOPLAYGROUND
NESTED $lookup in MONGOPLAYGROUND That Sorted

Mongodb multikey index

I want to make sure I'm creating the right index for my data structure (documents in mongodb).
My document contains list of items, and each item contain list of locations.
I'm searching for items using it's location, and I want to create the right index for this kind of search.
{ # <- document
"items": [
{ # <- first item
"name": "Item Name",
"locations": [
{ # <- first location
"sw":{"lat":0, "lng":0},
"ne":{"lat":0, "lng":0},
},
{ # <- second location
"sw":{"lat":0, "lng":0},
"ne":{"lat":0, "lng":0},
},
{ # <- more locations...
...
},
]
},
{ # <- second item
...
}
]
}
I'm searching for items using the lat and lng values, and I've used "ensureIndex" with the following key to create the index:
db.<collection>.ensureIndex({"items.locations.sw.lat":1});
db.<collection>.ensureIndex({"items.locations.sw.lng":1});
db.<collection>.ensureIndex({"items.locations.ne.lat":1});
db.<collection>.ensureIndex({"items.locations.ne.lng":1});
Is it the right way?
Thanks

mongodb query with group()?

this is my collection structure :
coll{
id:...,
fieldA:{
fieldA1:[
{
...
}
],
fieldA2:[
{
text: "ciao",
},
{
text: "hello",
},
]
}
}
i want to extract all fieldA2 in my collection but if the fieldA2 is in two or more times i want show only one.
i try this
Db.runCommand({distinct:’coll’,key:’fieldA.fieldA2.text’})
but nothing. this return all filedA1 in the collection.
so i try
db.coll.group( {
key: { 'fieldA.fieldA2.text': 1 },
cond: { } },
reduce: function ( curr, result ) { },
initial: { }
} )
but this return an empty array...
How i can do this and see the execution time?? thank u very match...

Since you are running 2.0.4 (I recommend upgrading), you must run this through MR (I think, maybe there is a better way). Something like:
map = function(){
for(i in this.fieldA.fieldA2){
emit(this.fieldA.fieldA2[i].text, 1);
// emit per text value so that this will group unique text values
}
}
reduce = function(values){
// Now lets just do a simple count of how many times that text value was seen
var count = 0;
for (index in values) {
count += values[index];
}
return count;
}
Will then give you a collection of documents whereby _id is the unique text value from fieldA2 and the value field is of the amount of times is appeared i the collection.
Again this is a draft and is not tested.

I think the answer is simpler than a Map/Reduce .. if you just want distinct values plus execution time, the following should work:
var startTime = new Date()
var values = db.coll.distinct('fieldA.fieldA2.text');
var endTime = new Date();
print("Took " + (endTime - startTime) + " ms");
That would result in a values array with a list of distinct fieldA.fieldA2.text values:
[ "ciao", "hello", "yo", "sayonara" ]
And a reported execution time:
Took 2 ms