How to continue loop in Map( /Reduce ) function? - mongodb

I have a Map function in MongoDB which I'm later using Reduce on. I use a collection which has a bunch of users in it and users own some channels. However, there are users that do not have any channels and the Map/Reduce function raises an error in my script.
map = Code("function () {"
" if(!this.channels) continue;"
" this.channels.forEach(function(z) {"
" emit(z, 1);"
" });"
"}")
When I use return instead of continue to quit the function it works flawlessly except that I don't want to end the loop. Is there any smart way around this?
Thanks for your advice and better widsom.

If you return from map, it returns only from map for this document. Maps for other documents will be executed regardless of that.
I suggest rewriting your map to this form
function () {
if(this.channels) {
this.channels.forEach(function(z) {
emit(z, 1);
});
}
}
I think, this form is more clear. It will emit something for users that have channels, and skip those that don't have any.

Related

Creating Seq after waiting for all results from map/foreach in Scala

I am trying to loop over inputs and process them to produce scores.
Just for the first input, I want to do some processing that takes a while.
The function ends up returning just the values from the 'else' part. The 'if' part is done executing after the function returns the value.
I am new to Scala and understand the behavior but not sure how to fix it.
I've tried inputs.zipWithIndex.map instead of foreach but the result is the same.
def getscores(
inputs: inputs
): Future[Seq[scoreInfo]] = {
var scores: Seq[scoreInfo] = Seq()
inputs.zipWithIndex.foreach {
case (f, i) => {
if (i == 0) {
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
scores = scores.:+(score)
})
})
} else {
scores = scores.:+(
scoreInfo(
id = "",
score = 5
)
)
}
}
}
Future {
scores
}
}
For what you need, I would drop the mutable variable and replace foreach with map to obtain an immutable list of Futures and recover to handle exceptions, followed by a sequence like below:
def getScores(inputs: Inputs): Future[List[ScoreInfo]] = Future.sequence(
inputs.zipWithIndex.map{ case (input, idx) =>
if (idx == 0)
getGeoScore(input).map(_.getOrElse(defaultScore)).recover{ case e => errorHandling(e) }
else
Future.successful(ScoreInfo("", 5))
})
To capture/print the result, one way is to use onComplete:
getScores(inputs).onComplete(println)
The part your missing is understanding a tricky element of concurrency, and that is that the order of execution when using multiple futures is not guaranteed.
If your block here is long running, it will take a while before appending the score to scores
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
// stick a println("here") in here to see what happens, for demonstration purposes only
scores = scores.:+(score)
})
})
Since that executes concurrently, your getscores function will also simultaneously continue its work iterating over the rest of inputs in your zipWithindex. This iteration, especially since it's trivial work, likely finishes well before the long-running getgeoscore(f) completes the execution of the Future it scheduled, and the code will exit the function, moving on to whatever code is next after you called getscores
val futureScores: Future[Seq[scoreInfo]] = getScores(inputs)
futureScores.onComplete{
case Success(scoreInfoSeq) => println(s"Here's the scores: ${scoreInfoSeq.mkString(",")}"
}
//a this point the call to getgeoscore(f) could still be running and finish later, but you will never know
doSomeOtherWork()
Now to clean this up, since you can run a zipWithIndex on your inputs parameter, I assume you mean it's something like a inputs:Seq[Input]. If all you want to do is operate on the first input, then use the head function to only retrieve the first option, so getgeoscores(inputs.head) , you don't need the rest of the code you have there.
Also, as a note, if using Scala, get out of the habit of using mutable vars, especially if you're working with concurrency. Scala is built around supporting immutability, so if you find yourself wanting to use a var , try using a val and look up how to work with the Scala's collection library to make it work.
In general, that is when you have several concurrent futures, I would say Leo's answer describes the right way to do it. However, you want only the first element transformed by a long running operation. So you can use the future return by the respective function and append the other elements when the long running call returns by mapping the future result:
def getscores(inputs: Inputs): Future[Seq[ScoreInfo]] =
getgeoscore(inputs.head)
.map { optInfo =>
optInfo ++ inputs.tail.map(_ => scoreInfo(id = "", score = 5))
}
So you neither need zipWithIndex nor do you need an additional future or join the results of several futures with sequence. Mapping the future just gives you a new future with the result transformed by the function passed to .map().

What's wrong with my Meteor publication?

I have a publication, essentially what's below:
Meteor.publish('entity-filings', function publishFunction(cik, queryArray, limit) {
if (!cik || !filingsArray)
console.error('PUBLICATION PROBLEM');
var limit = 40;
var entityFilingsSelector = {};
if (filingsArray.indexOf('all-entity-filings') > -1)
entityFilingsSelector = {ct: 'filing',cik: cik};
else
entityFilingsSelector = {ct:'filing', cik: cik, formNumber: { $in: filingsArray} };
return SB.Content.find(entityFilingsSelector, {
limit: limit
});
});
I'm having trouble with the filingsArray part. filingsArray is an array of regexes for the Mongo $in query. I can hardcode filingsArray in the publication as [/8-K/], and that returns the correct results. But I can't get the query to work properly when I pass the array from the router. See the debugged contents of the array in the image below. The second and third images are the client/server debug contents indicating same content on both client and server, and also identical to when I hardcode the array in the query.
My question is: what am I missing? Why won't my query work, or what are some likely reasons it isn't working?
In that first screenshot, that's a string that looks like a regex literal, not an actual RegExp object. So {$in: ["/8-K/"]} will only match literally "/8-K/", which is not the same as {$in: [/8-K/]}.
Regexes are not EJSON-able objects, so you won't be able to send them over the wire as publish function arguments or method arguments or method return values. I'd recommend sending a string, then inside the publish function, use new RegExp(...) to construct a regex object.
If you're comfortable adding new methods on the RegExp prototype, you could try making RegExp an EJSON-able type, by putting this in your server and client code:
RegExp.prototype.toJSONValue = function () {
return this.source;
};
RegExp.prototype.typeName = function () {
return "regex";
}
EJSON.addType("regex", function (str) {
return new RegExp(str);
});
After doing this, you should be able to use regexes as publish function arguments, method arguments and method return values. See this meteorpad.
/8-K/.. that's a weird regex. Try /8\-K/.
A minus (-) sign is a range indicator and usually used inside square brackets. The reason why it's weird because how could you even calculate a range between 8 and K? If you do not escape that, it probably wouldn't be used to match anything (thus your query would not work). Sometimes, it does work though. Better safe than never.
/8\-K/ matches the string "8-K" anywhere once.. which I assume you are trying to do.
Also it would help if you would ensure your publication would always return something.. here's a good area where you could fail:
if (!cik || !filingsArray)
console.error('PUBLICATION PROBLEM');
If those parameters aren't filled, console.log is probably not the best way to handle it. A better way:
if (!cik || !filingsArray) {
throw "entity-filings: Publication problem.";
return false;
} else {
// .. the rest of your publication
}
This makes sure that the client does not wait unnecessarily long for publications statuses as you have successfully ensured that in any (input) case you returned either false or a Cursor and nothing in between (like surprise undefineds, unfilled Cursors, other garbage data.

Where can I find the emit() function implementation used in MongoDB's map/reduce?

I am trying to develop a deeper understanding of map/reduce in MongoDB.
I figure the best way to accomplish this is to look at emit's actual implementation. Where can I find it?
Even better would just be a simple implementation of emit(). In the MongoDB documentation, they show a way to troubleshoot emit() by writing your own, but the basic implementation they give is really too basic.
I'd like to understand how the grouping is taking place.
I think the definition you are looking for is located here:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/commands/mr.cpp#L886
There is quite a lot of context needed though to fully understand what is going on. I confess, I do not.
1.Mongo's required JS version is no longer in O.Powell's url, which is dead. I cannot find it.
2.The below code seems to be the snippet of most interest. This cpp function, switchMode, computes the emit function to use. It is currently at;
https://github.com/mongodb/mongo/blob/master/src/mongo/db/commands/mr.cpp#L815
3.I was trying to see if emit has a default to include the _id key, which seems to occur via _mrMap, not shown here. Elsewhere it is initialized to {}, the empty map.
void State::switchMode(bool jsMode) {
_jsMode = jsMode;
if (jsMode) {
// emit function that stays in JS
_scope->setFunction("emit",
"function(key, value) {"
" if (typeof(key) === 'object') {"
" _bailFromJS(key, value);"
" return;"
" }"
" ++_emitCt;"
" var map = _mrMap;"
" var list = map[key];"
" if (!list) {"
" ++_keyCt;"
" list = [];"
" map[key] = list;"
" }"
" else"
" ++_dupCt;"
" list.push(value);"
"}");
_scope->injectNative("_bailFromJS", _bailFromJS, this);
}
else {
// emit now populates C++ map
_scope->injectNative( "emit" , fast_emit, this );
}
}

data-based buffering in Rx

Let me explain what I want to achieve first.
Lets say I have the following data incoming form the event stream
var data = new string[] {
"hello",
"Using",
"ok:michael",
"ok",
"begin:events",
"1:232",
"2:343",
"end:events",
"error:dfljsdf",
"fdl",
"error:fjkdjslf",
"ok"
};
When I subscribe the data source, I would like to get the following result
"ok:michael"
"ok"
"begin:events 1:232 2:343 end:events"
"error:dfljsdf"
"error:fjkdjslf"
"ok"
Basically, I want to get whichever data that start with ok or error and the data between begin and end.
I have tried this so far..
var data = new string[] {
"hello",
"Using",
"ok:michael",
"ok",
"begin:events",
"1:232",
"2:343",
"end:events",
"error:dfljsdf",
"fdl",
"error:fjkdjslf",
"ok"
};
var dataStream = Observable.Generate(
data.GetEnumerator(),
e => e.MoveNext(),
e => e,
e => e.Current.ToString(),
e => TimeSpan.FromSeconds(0.1));
var onelineStream = from d in dataStream
where d.StartsWith("ok") || d.StartsWith("error")
select d;
// ???
// may be need to buffer? I want to get data like "begin:events 1:232 2:343 end:events"
// but it is not working...
var multiLineStream = from list in dataStream.Buffer<string, string, string>(
bufferOpenings: dataStream.Where(d => d.StartsWith("begin")),
bufferClosingSelector: b => dataStream.Where(d => d.StartsWith("end")))
select String.Join(" ", list);
// merge two stream????
// but I have no clue how to merge these twos :(
mergeStream .Subscribe(d =>
{
Console.WriteLine(d);
Console.WriteLine();
});
Since I'm very new to Reactive programming, I can't make myself to think in reactive way. :(
Thanks in advance.
You were so, so very close to the right answer!
Essentially you had the onelineStream & multiLineStream queries just about right.
Merging them together is very easy. Just do this:
onelineStream.Merge(multiLineStream)
However, where your queries fell short was in the Observable.Generate that you used to introduce the delay between values. This creates a observable that, if you have multiple subscribers, kind of "fans out" the values.
Given your data and your definition for dataStream look how this code behaves:
dataStream.Select(x => "!" + x).Subscribe(Console.WriteLine);
dataStream.Select(x => "#" + x).Subscribe(Console.WriteLine);
You get these values:
!hello
#Using
!ok:michael
#ok
#1:232
!begin:events
#2:343
!end:events
!fdl
#error:dfljsdf
!error:fjkdjslf
#ok
Notice that some got handled by one subscription and the others got handled by the other. This means that even though your onelineStream & multiLineStream queries were just about right they would only see some of the data each and thus not behave as you expect.
You can also get race conditions that can skip and duplicate values. So it's best to avoid this kind of observable.
A better approach to introduce a delay between values is to do this:
var dataStream = data.ToObservable().Do(_ => Thread.Sleep(100));
Now this creates a "cold" observable, meaning that every new subscriber will get a fresh subscription of the observable so starting from the first value.
Your multiLineStream query will not work correctly on a cold observable.
To make the data stream a "hot" observable (which shares values amongst the subscribers) we use the Publish operator.
So, multiLineStream now looks like this:
var multiLineStream =
dataStream.Publish(ds =>
from list in ds.Buffer(
ds.Where(d => d.StartsWith("begin")),
b => ds.Where(d => d.StartsWith("end")))
select String.Join(" ", list));
You can then get your results like so:
onelineStream.Merge(multiLineStream).Subscribe(d =>
{
Console.WriteLine(d);
Console.WriteLine();
});
This is what I got:
ok:michael
ok
begin:events 1:232 2:343 end:events
error:dfljsdf
error:fjkdjslf
ok
Let me know if that works for you.

Slow Cypher neo4j results when using REST GraphDb

I am working with the neo4j-rest-graphdb an just tried to use Cypher for fetching a simple Node result.
CypherParser parser = new CypherParser();
ExecutionEngine engine = new ExecutionEngine(graphDbService);
Query query = parser.parse( "START referenceNode = node (0) " +
"MATCH referenceNode-[PRODUCTS_REFERENCE]->products-[PRODUCT]->product " +
"RETURN product.productName " +
"ORDER BY product.productId " +
"SKIP 20"
"LIMIT 10");
ExecutionResult result = engine.execute( query );
Iterator<Map<String, Object>> iterator = result.javaIterator();
What is the best practise to iterate through the result? The last line causes my service to hang for ~6 sec. Without the iterator at the end the application is quiet fast. I also tried the webadmin cypher terminal, the results are fetched within 50ms. Am i doing something wrong?
In your case all the cypher operations (graph-matching, filtering etc. would go over the wire which is awfully chatty and slow) you don't want that !
The neo4j-rest-graphdb supports remote execution of cypher out of the box:
Just do, something like shown in this testcase:
RestCypherQueryEngine queryEngine = new RestCypherQueryEngine(restGraphDatabase.getRestAPI());
final String queryString = "start n=node({reference}) return n";
Map params = MapUtil.map("reference",0);
final Node result = (Node) queryEngine.query(queryString, params).to(Node.class).single();
assertEquals(restGraphDatabase.getReferenceNode(), result);
If I understood you correctly, graphDbService is a REST graph database, correct?
If you want to use Cypher on the Server, you should instead use the CypherPlugin. Look here: http://docs.neo4j.org/chunked/snapshot/cypher-plugin.html
I hope this helps,
Andrés