Error inserting document into mongodb from scala - scala

Trying to insert into a mongodb database from scala. the below codes dont create a db or collection. tried using the default test db too. how do i perform CRUD operations?
object Store {
def main(args: Array[String]) = {
def addMongo(): Unit = {
var mongo = new Mongo()
var db = mongo.getDB("mybd")
var coll = db.getCollection("somecollection")
var obj = new BasicDBObject()
obj.put("name", "Mongo")
obj.put("type", "db")
coll.insert(obj)
coll.save(obj)
println("Saved") //to print to console
}
}

On a first glance things look OK in your code although you have that stray def addMongo(): Unit = {
code at the top. I'll defer to a suggestion on looking for errors here.... Two items of note:
1) save() and insert() are complementary operations - you only need one. insert() will always attempt to create a new document ... save() will create one if the _id field isn't set, and update the represented _id if it does.
2) Mongo clients do not wait for an answer to a write operation by default. It is very possible & likely that an error is occurring within MongoDB causing your write to fail. the getLastError() command will return the result of the last write operation on the current connection. Because MongoDB's Java driver uses connection pools you have to tell it to lock you onto a single connection for the duration of an operation you want to run 'safely' (e.g. check result). This is the easiest way from the Java driver (in Scala, sample code wise, though):
mongo.requestStart() // lock the connection in
coll.insert(obj) // attempt the insert
getLastError.throwOnError() // This tells the getLastError command to throw an exception in case of an error
mongo.requestDone() // release the connection lock
Take a look at this excellent writeup on MongoDB's Write Durability, which focuses specifically on the Java Driver.
You may also want to take a look at the Scala driver I maintain (Casbah) which wraps the Java driver and provides more scala functionality.
We provide among other things an execute-around-method version of the safe write concept in safely() which makes things a lot easier for testing for writes' success.

You just missed the addMongo call in main. The fix is trivial:
object Store {
def main(args: Array[String]) = {
def addMongo(): Unit = {
var mongo = new Mongo()
var db = mongo.getDB("mybd")
var coll = db.getCollection("somecollection")
var obj = new BasicDBObject()
obj.put("name", "Mongo")
obj.put("type", "db")
coll.insert(obj)
coll.save(obj)
println("Saved") //to print to console
}
addMongo // call it!
}

Related

How do I confirm I am reading the data from Mongo secondary server from Java

For performance optimisation we are trying to read data from Mongo secondary server for selected scenarios. I am using the inline query using "withReadPreference(ReadPreference.secondaryPreferred())" to read the data, PFB the code snippet.
What I want to confirm the data we are getting is coming from secondary server after executing the inline query highlighted, is there any method available to check the same from Java or Springboot
public User read(final String userId) {
final ObjectId objectId = new ObjectId(userId);
final User user = collection.withReadPreference(ReadPreference.secondaryPreferred()).findOne(objectId).as(User.class);
return user;
}
Pretty much the same way in Java. Note we use secondary() not secondaryPrefered(); this guarantees reads from secondary ONLY:
import com.mongodb.ReadPreference;
{
// This is your "regular" primaryPrefered collection:
MongoCollection<BsonDocument> tcoll = db.getCollection("myCollection", BsonDocument.class);
// ... various operations on tcoll, then create a new
// handle that FORCES reads from secondary and will timeout and
// fail if no secondary can be found:
MongoCollection<BsonDocument> xcoll = tcoll.withReadPreference(ReadPreference.secondary());
BsonDocument f7 = xcoll.find(queryExpr).first();
}

Parallel.Foreach and BulkCopy

I have a C# library which connects to 59 servers of the same database structure and imports data to my local db to the same table. At this moment I am retrieving data server by server in foreach loop:
foreach (var systemDto in systems)
{
var sourceConnectionString = _systemService.GetConnectionStringAsync(systemDto.Ip).Result;
var dbConnectionFactory = new DbConnectionFactory(sourceConnectionString,
"System.Data.SqlClient");
var dbContext = new DbContext(dbConnectionFactory);
var storageRepository = new StorageRepository(dbContext);
var usedStorage = storageRepository.GetUsedStorageForCurrentMonth();
var dtUsedStorage = new DataTable();
dtUsedStorage.Load(usedStorage);
var dcIp = new DataColumn("IP", typeof(string)) {DefaultValue = systemDto.Ip};
var dcBatchDateTime = new DataColumn("BatchDateTime", typeof(string))
{
DefaultValue = batchDateTime
};
dtUsedStorage.Columns.Add(dcIp);
dtUsedStorage.Columns.Add(dcBatchDateTime);
using (var blkCopy = new SqlBulkCopy(destinationConnectionString))
{
blkCopy.DestinationTableName = "dbo.tbl";
blkCopy.WriteToServer(dtUsedStorage);
}
}
Because there are many systems to retrieve data, I wonder if it is possible to use Pararel.Foreach loop? Will BulkCopy lock the table during WriteToServer and next WriteToServer will wait until previous will complete?
-- EDIT 1
I've changed Foreach to Parallel.Foreach but I face one problem. Inside this loop I have async method: _systemService.GetConnectionStringAsync(systemDto.Ip)
and this line returns error:
System.NotSupportedException: A second operation started on this
context before a previous asynchronous operation completed. Use
'await' to ensure that any asynchronous operations have completed
before calling another method on this context. Any instance members
are not guaranteed to be thread safe.
Any ideas how can I handle this?
In general, it will get blocked and will wait until the previous operation complete.
There are some factors that may affect if SqlBulkCopy can be run in parallel or not.
I remember when adding the Parallel feature to my .NET Bulk Operations, I had hard time to make it work correctly in parallel but that worked well when the table has no index (which is likely never the case)
Even when worked, the performance gain was not a lot faster.
Perhaps you will find more information here: MSDN - Importing Data in Parallel with Table Level Locking

orientdb: how to fully shutdown down a memory database (Java/Scala API)

I'm trying to write some unit test utilities for an orientDB client in scala.
The following is intended to take a function to operate on a DB, and it should wrap the function with code to create and destroy the DB for a single unit test.
However, there doesn't see to be much good documentation on how to clean up a memory DB (and looking at many open source projects, people seem to simply just leak databases and create new ones on a new port).
Simply calling db.close leaves the DB listening to a port and subsequent tests fail. Calling db.drop seems to work, but only if the func succeeded in adding data to the DB.
So, what cleanup is required in the finally clause?
#Test
def fTest2(): Unit = {
def withJSONDBLoan(func: ODatabaseDocumentTx => Unit) : Unit = {
val db: ODatabaseDocumentTx = new ODatabaseDocumentTx("memory:jsondb")
db.create()
try {
func(db)
} finally {
if (!db.isClosed){
db.close // Nope. DB is leaked.
}
// db.drop seems to close the DB but can't
// see when to safely call this.
}
}
val query1 = "insert into ouser set name='test',password='test', status='ACTIVE'"
withJSONDBLoan { db =>
db.command(new OCommandSQL(query1)).execute[ODocument]()
}
// Fails at create because DB already exists.
val query2 = "insert into ouser set name='test2',password='test2', status='ACTIVE'"
withJSONDBLoan { db =>
db.command(new OCommandSQL(query2)).execute[ODocument]()
}
}
I tried your code and it worked for me.
Hope it helps.

Mongo c# driver freezes and never returns a value on Update()

I have a long running operation that inserts thousands of sets of entries, each time a set is inserted using the code below.
After a while of this code running, the collection.Update() method freezes (does not return) and the entire process grinds to a halt.
Can't find any reasonable explanation for this anywhere.
I've looked at the mongod logs, nothing unusual, it just stops receiving requests from this process.
Mongo version: 2.4.1, C# driver version: 1.8.0
using (_mongoServer.RequestStart(_database))
{
var collection = GetCollection<BsonDocument>(collectionName);
// Iterate over all records
foreach (var recordToInsert in recordsDescriptorsToInsert)
{
var query = new QueryDocument();
var update = new UpdateBuilder();
foreach (var property in recordToInsert)
{
var field = property.Item1;
var value = BsonValue.Create(property.Item2);
if (keys.Contains(field))
query.Add(field, value);
update.Set(field, value);
}
collection.Update(query, update, UpdateFlags.Upsert); // ** NEVER RETURNS **
}
}
This is may related to this: CSHARP-717
It was fixed for driver 1.8.1

How to cache results in scala?

This page has a description of Map's getOrElseUpdate usage method:
object WithCache{
val cacheFun1 = collection.mutable.Map[Int, Int]()
def fun1(i:Int) = i*i
def catchedFun1(i:Int) = cacheFun1.getOrElseUpdate(i, fun1(i))
}
So you can use catchedFun1 which will check if cacheFun1 contains key and return value associated with it. Otherwise, it will invoke fun1, then cache fun1's result in cacheFun1, then return fun1's result.
I can see one potential danger - cacheFun1 can became to large. So cacheFun1 must be cleaned somehow by garbage collector?
P.S. What about scala.collection.mutable.WeakHashMap and java.lang.ref.* ?
See the Memo pattern and the Scalaz implementation of said paper.
Also check out a STM implementation such as Akka.
Not that this is only local caching so you might want to lookinto a distributed cache or STM such as CCSTM, Terracotta or Hazelcast
Take a look at spray caching (super simple to use)
http://spray.io/documentation/1.1-SNAPSHOT/spray-caching/
makes the job easy and has some nice features
for example :
import spray.caching.{LruCache, Cache}
//this is using Play for a controller example getting something from a user and caching it
object CacheExampleWithPlay extends Controller{
//this will actually create a ExpiringLruCache and hold data for 48 hours
val myCache: Cache[String] = LruCache(timeToLive = new FiniteDuration(48, HOURS))
def putSomeThingInTheCache(#PathParam("getSomeThing") someThing: String) = Action {
//put received data from the user in the cache
myCache(someThing, () => future(someThing))
Ok(someThing)
}
def checkIfSomeThingInTheCache(#PathParam("checkSomeThing") someThing: String) = Action {
if (myCache.get(someThing).isDefined)
Ok(s"just $someThing found this in the cache")
else
NotFound(s"$someThing NOT found this in the cache")
}
}
On the scala mailing list they sometimes point to the MapMaker in the Google collections library. You might want to have a look at that.
For simple caching needs, I'm still using Guava cache solution in Scala as well.
Lightweight and battle tested.
If it fit's your requirements and constraints generally outlined below, it could be a great option:
Willing to spend some memory to improve speed.
Expecting that keys will sometimes get queried more than once.
Your cache will not need to store more data than what would fit in RAM. (Guava caches are local to a single run of your application.
They do not store data in files, or on outside servers.)
Example for using it will be something like this:
lazy val cachedData = CacheBuilder.newBuilder()
.expireAfterWrite(60, TimeUnit.MINUTES)
.maximumSize(10)
.build(
new CacheLoader[Key, Data] {
def load(key: Key): Data = {
veryExpansiveDataCreation(key)
}
}
)
To read from it, you can use something like:
def cachedData(ketToData: Key): Data = {
try {
return cachedData.get(ketToData)
} catch {
case ee: Exception => throw new YourSpecialException(ee.getMessage);
}
}
Since it hasn't been mentioned before let me put on the table the light Spray-Caching that can be used independently from Spray and provides expected size, time-to-live, time-to-idle eviction strategies.
We are using Scaffeine (Scala + Caffeine), and you can read abouts its pros/cons compared to other frameworks over here.
You add your sbt,
"com.github.blemale" %% "scaffeine" % "4.0.1"
Build your cache
import com.github.blemale.scaffeine.{Cache, Scaffeine}
import scala.concurrent.duration._
val cachedItems: Cache[String, Int] =
Scaffeine()
.recordStats()
.expireAtferWrite(60.seconds)
.maximumSize(500)
.build[String, Int]()
cachedItems.put("key", 1) // Add items
cache.getIfPresent("key") // Returns an option