MongoDB WriteConcern Java Driver - mongodb

I have a simple mongo application that happens to be async (using Akka).
I send a message to an actor, which in turn write 3 records to a database.
I'm using WriteConcern.SAFE because I want to be sure the write happened (also tried WriteConcern.FSYNC_SAFE).
I pause for a second to let the writes happen then do a read--and get nothing.
So my write code might be:
collection.save( myObj, WriteConcern.SAFE )
println("--1--")
collection.save( myObj, WriteConcern.SAFE )
println("--2--")
collection.save( myObj, WriteConcern.SAFE )
println("--3--")
then in my test code (running outside the actor--in another thread) I print out the # of records I find:
println( collection.findAll(...) )
My output looks like this:
--1--
--2--
--3--
(pauses)
0
Indeed if I look in the database I see no records. Sometimes I actually do see data there and the test works. Async code can be tricky and it's possible the test code is being hit before the writes happen, so I also tried printing out timestamps to ensure these are being executed in the order presented--they are. The data should be there. Sample output below w/timestamps:
Saved: brand_1 / dev 1375486024040
Saved: brand_1 / dev2 1375486024156
Saved: brand_1 / dev3 1375486024261
1375486026593 0 found
So the 3 saves clearly happened (and should have written) a full 2 seconds before the read was attempted.
I understand for more liberal WriteConcerns you could get this behavior, but I thought the two safest ones would assure me the write actually happened before proceeding.

Subtle but simple problem. I was using a def to create my connection... which I then proceeded to call twice as if it was a val. So I actually had 2 different writers so that explained the sometimes-difference in my results. Refactored to a val and all was predictable. Agonizing to identify, easy to understand/fix.

Related

Why are identical SQL calls behaving differently?

I'm working on a web app in Rust. I'm using Tokio Postgres, Rocket and Tera (this may be relevant).
I'm using the following to connect to my DB which doesn't fail in either case.
(sql_cli, connection) = match tokio_postgres::connect("postgresql://postgres:*my_password*#localhost:8127/*AppName*", NoTls).await{
Ok((sql_cli, connection)) => (sql_cli, connection),
Err(e) => return Err(Redirect::to(uri!(error_display(MyError::new("Failed to make SQLClient").details)))),
};
My query is as follows. I keep my queries in a separate file (I'm self taught and find that easier).
let query= sql_cli.query(mycharactersquery::get_characters(user_id).as_str(), &[]).await.unwrap();
The get characters is as follows. It takes a user ID and should return the characters that they have made in the past.
pub fn get_characters(user_id: i16) -> String {
format!("SELECT * FROM player_characters WHERE user_id = {} ORDER BY char_id ASC;", user_id)
}
In my main file, I have one GET which is /mycharacters/<user_id> which works. This GET returns an HTML file. I have another GET which is /<user_id> which returns a Tera template. The first works fine and loads the characters, the second doesn't: it just loads indefinitely. I initially thought this was to do my lack of familiarity with Tera.
After some troubleshooting, I put some printouts in my code, the one before and after the SQL call work in /mycharacters/<user_id>, but only the one before writes to the terminal in /<user_id>. This makes me think that Tera isn't the issue as it isn't making it past the SQL call.
I've found exactly where it is going wrong, but I don't know why as it isn't giving an error.
Could someone please let me know if there is something obvious that I am missing or provide some assistance?
P.S. The database only has 3 columns, so an actual timeout isn't the cause.
I expected both of these SQL calls to function as I am connected to my database properly and the call is copied from the working call.

Spark accumulator causing application to silently fail

I have an application that processes records in an rdd and puts them into a cache. I put a couple of Spark Accumulators in my application to keep track of processed and failed records. These stats are sent to statsD before the application closes. Here is some simple sample code:
val sc: SparkContext = new SparkContext(conf)
val jdbcDF: DataFrame = sqlContext.read.format("jdbc").options(Map(...)).load().persist(StorageLevel.MEMORY_AND_DISK)
logger.info("Processing table with " + jdbcDF.count + " rows")
val processedRecords = sc.accumulator(0L, "processed records")
val erroredRecords = sc.accumulator(0L, "errored records")
jdbcDF.rdd.foreachPartition(iterator => {
processedRecords += iterator.length // Problematic line
val cache = getCacheInstanceFromBroadcast()
processPartition(iterator, cache, erroredRecords) // updates cache with iterator documents
}
submitStats(processedRecords, erroredRecords)
I built and ran this in my cluster and it appeared to be functioning correctly, the job was marked as a SUCCESS by Spark. I queried the stats using Grafana and both counts were accurate.
However, when I queried my cache, Couchbase, none of the documents were there. I've combed through both driver and executor logs to see if any errors were being thrown but I couldn't find anything. My thinking is that this is some memory issue, but a couple long accumulators is enough to cause a problem?
I was able to get this code snippet working by commenting out the line that increments processedRecords - see the line in the snippet noted with Problematic line.
Does anyone know why commenting out that line fixes the issue? Also why is Spark failing silently and not marking the job as FAILURE?
The application isn't "failing" per se. The main problem is, Iterators can only be "iterated" through one time.
Calling iterator.length actually goes through and exhausts the iterator. Thus, when processPartition receives iterator, the iterator is already exhausted and looks empty (so no records will be processed).
Reference Scala docs to confirm that size is "the number of elements returned by it. Note: it will be at its end after this operation!" -- you can also view the source code to confirm this.
Workaround
If you rewrite processPartition to return a long value, that can be fed into the accumulator.
Also, sc.accumulator is deprecated in recent versions of Spark.
The workaround could look something like:
val acc = sc.longAccumulator("total processed records")
...
df.rdd.foreachPartition(iterator => {
val cache = getCacheInstanceFromBroadcast()
acc.add(processPartition(iterator, cache, erroredRecords))
})
...
// do something else

fiveam: fail to understand why this test fails

So I'm at a loss as to why this test fails. When I run the statements in the repl everything appears to work correctly but the fiveam test fails.
There is a test case in the following gist: https://gist.github.com/PuercoPop/5765844
the fiveam test fails with the following message. I don't understand why the second board is displayed differently (with new lines):
EXPECTED-BOARD evaluated to (:EMPTY :|2| :|3| :|4| :|5| :|6| :|7| :|8| :|9|),
which is not EQUAL to (:EMPTY
:|2|
:|3|
:|4|
:|5|
:|6|
:|7|
:|8|
:|9|)..
You are modifying constant data. Weird things are allowed to happen when you modify constant data. If there's even half a chance you'll be unleashing a destructive function (as in "modify the data...") create your lists using (list ...) instead of '(...).

mongodb looping collection + save, objects returned several times

I'm writing a pretty big migration and had this code (coffeescript):
db.users.find().forEach (user)->
try
#some code changing the user depending on the old state
db.users.save(user)
print "user_ok: #{user._id}"
catch error
print "user_error: #{user._id}, error was: #{error}"
Some errors occured. But they occured on already processed users:
user_ok: user_1234
#many logs
user_error: user_1234 ...
How come the loop takes already processed objects?
I ended up doing:
backup = { users: [] }
db.users.find().forEach (user)->
try
#some code changing the user depending on the old state
backup.users.push user
print "user_ok: #{user._id}"
catch error
print "user_error: #{user._id}, error was #{error}"
#loop backup and save
And it works nice now, but it seems really weird. What's the point behind all that please?
When you modify an object, it might be moved by the database. The database needs to take additional care to remember which objects have been visited already. This feature is called snapshotting, you can ask for a snapshotted query using
db.collection.find().snapshot()
However, even this doesn't make guarantees about objects that were inserted or deleted during the cursor iteration. A few more caveats are explained in the link to the documentation.
Another option is to perform an $orderby on an invariable unique index. Ideally, that index is also monotonic, so if you are using ObjectIds as primary keys then the _id field comes in pretty handy, like
db.collection.find().sort({"_id" :1});

Cannot update one field at a time with VSTO for Word

When fields are nested, there is a problem.
foreach (Word.Field field in this.Application.ActiveDocument.Fields)
{
field.Update();
text = field.Result.Text;
}
The above code does not work.
The process starts, but winds up in an endless loop or some other process that hangs the system.
Thinking about it, I can surmise that when you update a field, it might have an effect on the fields collection - thus, the loop fails.
Does anyone have any ideas on implementing this?
P.S. I know there is a Document.UpdateFields() method to update ALL fields. However, there are reasons why I cannot use this and need to only update specific field types.
My apologies! I was going to give an example of a nested field but was trying to test some more before sending anyone (Jack) on a goose-chase.
I waited and waited and waited, and after a good 2 or 3 minutes, it finished. After the last field, it crashed with this message:
Object has been deleted.
The error was generated from the following line inside the loop:
string text = field.Code.Text;
The template is being tested on mergefields that are not being found because I am testing without database connectivity. It would be odd, but explainable, that it goes through all the fields and then, at the end of the day, the very OUTER IF field's result is "Error! Reference source not found." But I still don't get why this could happen.
Nor do I understand why looping takes 3 minutes while a call to document.Fields.Update() will do the same thing in about 1 second and NOT result in the error described above.
Again, my apologies. I never considered updating inside a loop would be vastly slower that a call to doc.fields.update().