How do I mimic a "SELECT _id" in mongoid and ruby? - mongodb

Currently I am doing the following:
responses = Response.where(user_id: current_user.uid)
qids = []
responses.each { |r| qids << r._id}
return qids
Any better way of doing this?

Use .only() to retrieve less data.
quids = Response.only(:_id).where(user_id: current_user.uid).map(&:_id)

Response.where(user_id: current_user.uid).map { |r| r._id }
That's a bit more idiomatic.
As far as mongoid is concerned, the only 'mapping' type functionality that mongoid offers are custom map-reduce filters. You can check out the documentation.
In this case, writing such a filter will not be to your advantage. You are loading the entire dataset (lazy-loading doesn't help) and you aren't reducing anything.

Straight to the point better solution for this problem
In case you want to get the id or something that's unique in the result set, then it is functionally equivalent to use the distinct method. That way you save the mapping operation and it seems to be much faster (The tests and why you should perhaps take them with a bit of a precaution is explained at the bottom).
Response.where(user_id: current_user.uid).distinct(:_id)
So only use this if you want to get something non-unique and for some reason want to get duplicate results. I.e. if your responses could be liked and if you wanted to get an array of all likes (say you wanted to calculate some statistics about likings):
Response.where(user_id: current_user.uid).map { |r| r.likes }
Testing...
Here's some random tests, but for more trust worthy results one should commit the tests with a large database, instead of repeating the action. I mean for all I know there can be any sort of optimizations for repeating the same query all over again (where obviously the map can't have any such optimizations).
Benchmark.measure { 1000.times { Organization.where(:event_labels.ne =[]).map(&:_id) } }
=> 6.320000 0.290000 6.610000 ( 6.871498)
Benchmark.measure { 1000.times { Organization.where(:event_labels.ne => []).only(:_id).map(&:_id) } }
=> 5.490000 0.140000 5.630000 ( 5.981122)
Benchmark.measure { 1000.times { Organization.where(:event_labels.ne => []).distinct(:_id) } }
=> 0.570000 0.020000 0.590000 ( 0.773239)
Benchmark.measure { 1000.times { Organization.where(:event_labels.ne => []).only(:_id) } }
=> 0.140000 0.000000 0.140000 ( 0.141278)
Benchmark.measure { 1000.times { Organization.where(:event_labels.ne => []) } }
=> 0.070000 0.000000 0.070000 ( 0.069482)
Doing map without only takes a bit longer, so using only is beneficial. Though using it seems to actually slightly hurt the performance if you don't do map at all, but having less data seems to make map run a bit faster. Anyhow, according to this test, it seems distinct is about 10 times faster on all metrics (user, system, total, real) than using the only and map combo though it's slower than using only without map.

Related

Mutation cache update not working with vue-apollo and Hasura

I'm completely new to these technologies, and am having trouble wrapping my head around it, so bear with me. So, my situation is that I've deployed Hasura on Heroku and have added some data, and am now trying to implement some functionality where I can add and edit certain rows of a table. Specifically I've been following this from Hasura, and this from vue-apollo.
I've implemented the adding and editing (which works), and now want to also reflect this in the table, by using the update property of the mutation and updating the cache. Unfortunately, this is where I get lost. I'll paste some of my code below to make my problem more clear:
The mutation for adding a player (ADD_PLAYER_MUTATION) (same as the one in Hasura's documentation linked above):
mutation addPlayer($objects: [players_insert_input!]!) {
insert_players(objects: $objects) {
returning {
id
name
}
}
}
The code for the mutation in the .vue file
addPlayer(player, currentTimestamp) {
this.$apollo.mutate({
mutation: PLAYER_ADD_MUTATION,
variables: {
objects: [
{
name: player.name,
team_id: player.team.id,
created_at: currentTimestamp,
updated_at: currentTimestamp,
role_id: player.role.id,
first_name: player.first_name,
last_name: player.last_name
}
]
},
update: (store, { data: { addPlayer } }) => {
const data = store.readQuery({
query: PLAYERS
});
console.log(data);
console.log(addPlayer);
data.players.push(addPlayer);
store.writeQuery({ query: PLAYERS, data });
}
});
},
I don't really get the update part of the mutation. In most examples the { data: { x } } bit uses the function's name in the place of x, and so I did that as well, even though I don't really get why (it's pretty confusing to me at least). When logging data the array of players is logged, but when logging addPlayer undefined is logged.
I'm probably doing something wrong that is very simple for others, but I'm obviously not sure what. Maybe the mutation isn't returning the correct thing (although I'd assume it wouldn't log undefined in that case), or maybe isn't returning anything at all. It's especially confusing since the player is actually added to the database, so it's just the update part that isn't working - plus, most of the guides / tutorials show the same thing without really much explanation.
Okay, so for anyone as stupid as me, here's basically what I was doing wrong:
Instead of addPlayer in update: (store, { data: { addPlayer } }), it should be whatever the name of the mutation is, so in this case insert_players.
By default a mutation response from Hasura has a returning field, which is a list, and so the added player is the first element in the list, so you can get it like so: const addedPlayer = insert_players.returning[0];
I didn't want to just delete my question after realising what was wrong shortly after posting it, in case this is useful to other people like me, and so I'll leave it up.

Two outputs in logstash. One for certain aggregations only

I'm trying to specify a second output of logstash in order to save certain aggregated data only. No clue how to achieve it at the moment. Documentation doesn't cover such a case.
At the moment I use a single input and a single output.
Input definition (logstash-udp.conf):
input {
udp {
port => 25000
codec => json
buffer_size => 5000
workers => 2
}
}
filter {
grok {
match => [ "message", "API call happened" ]
}
aggregate {
task_id => "%{example_task}"
code => "
map['api_calls'] ||= 0
map['api_calls'] += 1
map['message'] ||= event.get('message')
event.cancel()
"
timeout => 60
push_previous_map_as_event => true
timeout_code => "event.set('aggregated_calls', event.get('api_calls') > 0)"
timeout_tags => ['_aggregation']
}
}
Output definition (logstash-output.conf):
output {
elasticsearch {
hosts => ["localhost"]
manage_template => false
index => "%{[#metadata][udp]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
What I want to achieve now? I need to add a second, different aggregation (different data and conditions) which will save all the not aggregated data to Elasticsearch like now however aggregated data for this aggregation would be saved to Postgres. I'm pretty much stuck at the moment and searching the web for some docs/examples doesn't help.
I'd suggest using multiple pipelines: https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
This way you can have one pipeline for aggregation and second one for pure data.

AngularFire2 query collection for documents that have a value within an array efficiently

I have the following model in an Angular 6 cli/TS/AngularFire thing I'm trying to build. I'm new to all of those things.
export class Book {
constructor(
public id: string,
public title: string,
public genres: any[]
) {}
}
And I want to be able to find all books that matcha genre stored in Firebase's Cloud Firestore using AngularFire2.
A standard query looks like this (documentation):
afs.collection('books', ref => ref.where('size', '==', 'large'))
Ideally, I want to make a call to Firebase that doesn't get all documents in the collection so it's more efficient (tell me if that's wrong thinking). For example, something like this.
afs.collection('books', ref => ref.where(book.genres.containsAny(array of user defined genres)));
I have a limited understanding of NoSQL data modeling, but would happily change the model if there's something more effective that will stay fast with 1000 or 30,000 or even 100,000 documents.
Right now I am doing this.
filterArray = ["Genetic Engineering", "Science Fiction"];
filteredBooks: Book[] = [];
ngOnInit() {
this.db.collection<Book>('books')
.valueChanges().subscribe(books => {
for (var i=0; i < books.length; i++) {
if (books[i].genres.some(v => this.filterArray.includes(v))) {
this.filteredBooks.push(books[i]);
}
}
});
}
This works to filter the documents, but is there a more efficient way both in terms of speed and scalability (get only the matching documents instead of all)?
You're right to limit the documents first. You don't want to pull 30K documents, THEN filter them. And you are on the right path, but your formatting wasn't quite right. You want to do something like this:
afs.collection<book>('books', ref => ref.where('genres', 'array-contains', genre)
I believe that as of right now, you cannot pass in an array like:
afs.collection<book>('books', ref => ref.where('genres', 'array-contains', genres) // note that genres is plural implying a list of genres
However, it still may be better to do a for-loop through the genres and pull the books list, once for each genre, and concatenate the lists afterward.
Now, you mentioned that you would also like a suggestion to store the data differently. I would recommend that you do NOT use an array for genres. Instead make it a map (basically, an object), like this:
author: string;
title: string;
...
genres: map
Then you can do this:
author: 'Herman Melville'
title: 'Moby Dick'
...
genres: {
'classics': true
'nautical': true
}
And then you can filter the collection like this:
afs.collection<book>('books', ref => ref.where('genres.classics', '==', true).where('genres.nautical', '==' true)
I hope this helps

Relationship Mapping in Mongodb

I am building a game engine on Meteor JS and trying to create a way to link together a number of collections. The current 'schema' looks like this:
GameCollection = { <meta> } //This is a Collection (a Meteor MongoDB document)
Scene = {gameId: _id, <other resource ids and meta>} //This is a Collection
The issue is I need to create a map from one scene to anther. These paths needs to fork and merge easily. I am getting the feeling that I should be using a graph/triple database to represent this but I want to say within "Meteor's magic" and that means normal MongoDB Collections. If someone has a simple to use alternative I would still like to hear it, but I would prefer a Meteor-esk pattern. Pushes in the right direction would also be great!
I have three specific needs:
If I am at this scene what scene or scenes do I lead to.
If I am at this scene then give me the ids of all scene x number of steps into the future. Where 'x' is a variable (so I can send the lot of them down to the client)
Count and give me all possible paths so I can give a visual representation of the game.
What I am specifically looking for is: is a graph database what I am looking and if not what schema pattern should I use with mongoDB?
UPDATE:
I have confirmed that neo4j will do what I need from a logical standpoint. But I would lose the benefit of working with Meteor Collections. This means losing reactivity which in turn breaks my live collaborative model. I really need a MongoDB alternative.
UPDATE 2:
I ended up trying to stick the relationship inside of the GameCollection. It seems to be working but I would like a cleaner way if possible.
map: [ { //an array of objects (relations)
id: _id //key to a Scene
toKey: _id //leads to scene; toKey is ether 'next' or some num [0..n] for multi paths
}]
So I ended up going the denormalization route. I put an array of scenes into the GameCollection.
scenes: [{ id: random_id, next: 'next_id' || [next_ids], <other resource ids and meta> }]
Then I built this monster:
getScene = function (scenes, id) {
return _.find(scenes, function (scene) {
return (scene.id == id)
})
}
getNext = function (scene) {
if (!scene) { return null }
if (scene.type == 'dialogue') {
return scene.next
}
if (scene.type == 'choice') {
return _.pluck(scene.next, 'id')
}
}
scenesDive = function (list, next, container, limit, depth) {
if (!depth) {
depth = 0
}
myDepth = depth + 1
var scene = getScene(list, next)
if (container.indexOf(scene) != -1) { return } //This path has already been added. Go back up.
container.push(scene)
if (myDepth == limit) { return } //Don't dive deeper then this depth.
var nextAoS = getNext(scene) //DIVE! (array or string)
if (_.isArray(nextAoS)) {
nextAoS.forEach(function (n) {
scenesDive(list, n, container, limit, myDepth)
});
} else {
scenesDive(list, nextAoS, container, limit, myDepth)
}
}
I am sure there is a better way but this is what I am going with for now.

How should I structure my nested reactivemongo calls in my play2 application?

I'm in the process of trying to combine some nested calls with reactivemongo in my play2 application.
I get a list of objects returned from createObjects. I then loop over them, check if the object exist in the collection and if not insert them:
def dostuff() = Action {
implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
val objectsReadyForSave = createObjects(form.companyId, form.companyName, sms_pattern.findAllIn(form.phoneNumbers).toSet)
Async {
for(object <- objectsReadyForSave) {
collection.find(BSONDocument("cId" -> object.get.cId,"userId" ->
object.userId.get)).cursor.headOption.map { maybeFound =>
maybeFound.map { found =>
Logger.info("Found record, do not insert")
} getOrElse {
collection.insert(object)
}
}
}
Future(Ok(views.html.invite(form)))
}
})
}
I feel that this way is not as good as it can be and feels not "play2" and "reactivemongo".
So my question is: How should I structure my nested calls to get the result I want
and get the information of which objects that have been inserted?
I am not an expert in mongoDB neither in ReactiveMongo but it seems that you are trying to use a NoSQL database in the same way as you would use standard SQL databases. Note that mongoDB is asynchronous which means that operations may be executed in some future, this is why insertion/update operations do not return affected documents. Regarding your questions:
1 To insert the objects if they do not exist and get the information of which objects that have been inserted?
You should probably look at the mongoDB db.collection.update() method and call it with the upsert parameter as true. If you can afford it, this will either update documents if they already exist in database or insert them otherwise. Again, this operation does not return affected documents but you can check how many documents have been affected by accessing the last error. See reactivemongo.api.collections.GenericCollection#update which returns a Future[LastError].
2 For all the objects that are inserted, add them to a list and then return it with the Ok() call.
Once again, inserted/updated documents will not be returned. If you really need to return the complete affected document back, you will need to make another query to retrieve matching documents.
I would probably rewrite your code this way (without error/failure handling):
def dostuff() = Action {
implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
val objectsReadyForSave = createObjects(form.companyId, form.companyName, sms_pattern.findAllIn(form.phoneNumbers).toSet)
Async {
val operations = for {
data <- objectsReadyForSave
} yield collection.update(BSONDocument("cId" -> data.cId.get, "userId" -> data.userId.get), data, upsert = true)
Future.sequence(operations).map {
lastErrors =>
Ok("Documents probably inserted/updated!")
}
}
}
)
}
See also Scala Futures: http://docs.scala-lang.org/overviews/core/futures.html
This is really useful! ;)
Here's how I'd rewrote it.
def dostuff() = Action { implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
createObjects(form.companyId,
form.companyName,
sms_pattern.findAllIn(form.phoneNumbers).toSet).map(ƒ)
Ok(views.html.invite(form))
}
)
}
// ...
// In the model
// ...
def ƒ(cId: Option[String], userId: Option[String], logger: Logger) = {
// You need to handle the case where obj.cId or obj.userId are None
collection.find(BSONDocument("cId" -> obj.cId.get, "userId" -> obj.userId.get))
.cursor
.headOption
.map { maybeFound =>
maybeFound map { _ =>
logger.info("Record found, do not insert")
} getOrElse {
collection.insert(obj)
}
}
}
There may be some syntax errors, but the idea is there.