Spring-Data-MongoDB - Updates take seconds to complete? - mongodb

I am using spring data mongodb to interact with my mongodb setup. I was testing different write concerns and noticed that with Unacknowledged write concern, the time for updating 1000 documents was around 5-6 secs even though Unacknowledged write concern doesn't wait for any acknowledgement.
I tested the same with raw java driver and the time was around 40 msec.
What could be cause of this huge time difference between raw java driver and spring data mongodb update?
Note that I am using Unacknowledged write concern and mongodb v2.6.1 with default configurations.
Adding Code Used for comparison:-
Raw Java driver code:-
MongoClient mongoClient = new MongoClient("localhost", 27017);
DB db = mongoClient.getDB( "testdb" );
DBCollection collection = db.getCollection("product");
WriteResult wr = null;
try {
long start = System.currentTimeMillis();
wr = collection.update(
new BasicDBObject("productId", new BasicDBObject("$gte", 10000000)
.append("$lt", 10001000)),
new BasicDBObject("$inc", new BasicDBObject("price", 100)),
false, true, WriteConcern.UNACKNOWLEDGED);
long end = System.currentTimeMillis();
System.out.println(wr + " Time taken: " + (end - start) + " ms.");
}
Spring Code:-
Config.xml
<mongo:mongo host="localhost" port="27017" />
<mongo:db-factory dbname="testdb" mongo-ref="mongo" />
<bean id="Unacknowledged" class="com.mongodb.WriteConcern">
<constructor-arg name="w" type="int" value="0"/>
</bean>
<bean id="mongoTemplate" class="org.springframework.data.mongodb.core.MongoTemplate">
<constructor-arg name="mongoDbFactory" ref="mongoDbFactory" />
<property name="writeConcern" ref="Unacknowledged"/>
</bean>
Java Code for update function which is part of ProductDAOImpl:-
public int update(long fromProductId, long toProductId, double changeInPrice)
{
Query query = new Query(new Criteria().andOperator(
Criteria.where("productId").gte(fromProductId),
Criteria.where("productId").lt(toProductId)));
Update update = new Update().inc("price", changeInPrice);
WriteResult writeResult =
mongoTemplate.updateMulti(query, update, Product.class);
return writeResult.getN();
}
Accesing code:-
ProductDAOImpl productDAO = new ProductDAOImpl();
productDAO.setMongoTemplate(mongoTemplate);
long start = System.currentTimeMillis();
productDAO.update(10000000, 10001000, 100);
long end = System.currentTimeMillis();
System.out.println("Time taken = " + (end - start) + " ms.");
Schema:-
{
"_id" : ObjectId("53b64d000cf273a0d95a1a3d"),
"_class" : "springmongo.domain.Product",
"productId" : NumberLong(6),
"productName" : "product6",
"manufacturer" : "company30605739",
"supplier" : "supplier605739",
"category" : "category30605739",
"mfgDate" : ISODate("1968-04-26T05:00:00.881Z"),
"price" : 665689.7224373372,
"tags" : [
"tag82",
"tag61",
"tag17"
],
"reviews" : [
{
"name" : "name528965",
"rating" : 6.5
},
{
"name" : "name818975",
"rating" : 7.5
},
{
"name" : "name436239",
"rating" : 3.9
}
],
"manufacturerAdd" : {
"state" : "state55",
"country" : "country155",
"zipcode" : 718
},
"supplierAdd" : {
"state" : "state69",
"country" : "country69",
"zipcode" : 691986
}
}
Hope it helps.

Related

How to save the null values from a Dataset to mongodb?

I have strict requirement, to save the null values to the Mongodb, as I am aware of the case of nosql where storing null is not recommended but my business requirement have a scenario.
a sample csv file which has a null value
a,b,c,id
,2,3,A
4,4,4,B
code to save csv to mongodb
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField("a", DataTypes.IntegerType, false),
DataTypes.createStructField("b", DataTypes.IntegerType, true),
DataTypes.createStructField("c", DataTypes.IntegerType, true),
DataTypes.createStructField("id", DataTypes.StringType, true),
});
Dataset<Row> g = spark.read()
.format("csv")
.schema(schema)
.option("header", "true")
.option("inferSchema","false")
.load("/home/Documents/SparkLogs/a.csv");
MongoSpark.save(g
.write()
.option("database", "A")
.option("collection","b").mode("overwrite")
)
;
Mongodb Output
{
"_id" : ObjectId("5d663b6bec20c94c990e6d0c"),
"a" : 4,
"b" : 4,
"c" : 4,
"id" : "B"
}
/* 2 */
{
"_id" : ObjectId("5d663b6bec20c94c990e6d0d"),
"b" : 2,
"c" : 3,
"id" : "A"
}
My requirement is to have a 'a' field will null type in it.
Saving as DataSet with MongoSpark will ignore the null value keys defaultly. So my workaround is to convert Dataset to javaPairRDD of BsonObject types.
Code
/** imports ***/
import scala.Tuple2;
import java.beans.Encoder;
import java.util.UUID;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import org.bson.BSONObject;
import org.bson.BasicBSONObject;
import com.mongodb.hadoop.MongoOutputFormat;
/** imports ***/
private static void saveToMongoDB_With_Null(Dataset<Row> ds, Configuration outputConfig,String [] cols) {
JavaPairRDD<Object,BSONObject> document = ds
.toJavaRDD()
.mapToPair(f -> {
BSONObject doc = new BasicBSONObject();
for(String p:cols)
doc.put(p, f.getAs(p));
return new Tuple2<Object, BSONObject>(null, doc);
});
document.saveAsNewAPIHadoopFile(
"file:///this-is-completely-unused"
, Object.class
, BSONObject.class
, MongoOutputFormat.class
, outputConfig);
}
Configuration outputConfig = new Configuration();
outputConfig.set("mongo.output.uri",
"mongodb://192.168.0.19:27017/database.collection");
outputConfig.set("mongo.output.format",
"com.mongodb.hadoop.MongoOutputFormat");
Dataset<Row> g = spark.read()
.format("csv")
.schema(schema)
.option("header", "true")
.option("inferSchema","false")
.load("/home/Documents/SparkLogs/a.csv");
saveToMongoDB_With_Null(g, outputConfig,g.columns());
Needed Maven Dependency
<!-- https://mvnrepository.com/artifact/org.mongodb.mongo-hadoop/mongo-hadoop-core -->
<dependency>
<groupId>org.mongodb.mongo-hadoop</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>2.0.2</version>
</dependency>
MongoDB output after workflow
{
"_id" : "a62e9b02-da97-493b-9563-fc19054df60e",
"a" : null,
"b" : 2,
"c" : 3,
"id" : "A"
}
{
"_id" : "fed373a8-e671-44a4-8b85-7c7e2ff59585",
"a" : 4,
"b" : 4,
"c" : 4,
"id" : "B"
}
Downsides
Bringing the high-level api like Dataset to low-level rdds will loose the spark's ability to optimise the queryplans , so trade-off is performance.

Python (Flask) MongoDB Speed Issue

I have a big speed problem on my website using Flask/MongoDB as backend. A basic request (get 1 user for example) takes about 4 sec to respond.
Here is the python code :
#users_apis.route('/profile/<string:user_id>',methods= ['GET','PUT','DELETE'])
#auth_token_required
def profile(user_id):
if request.method == "GET":
avatar = ''
if user_id == str(current_user.id):
if(current_user.birthday):
age = (date.today().year - current_user.birthday.year)
else:
age = ''
return make_response(jsonify({
"id" : str(current_user.id),
"username" : current_user.username,
"email" : current_user.email,
"first_name": current_user.first_name,
"last_name" : current_user.last_name,
"age" : age,
"birthday" : current_user.birthday,
"gender" : current_user.gender,
"city" : current_user.city,
"country" : current_user.country,
"languages" : current_user.languages,
"description" : current_user.description,
"phone_number" : current_user.phone_number,
"countries_visited" : current_user.countries_visited,
"countries_to_visit" : current_user.countries_to_visit,
"zip_code" : str(current_user.zip_code),
"address" : current_user.address,
"pictures" : current_user.pictures,
"avatar" : "",
"interests" : current_user.interests,
"messages" : current_user.messages,
"invitations" : current_user.invitations,
"events" : current_user.events
}), 200)
And my mongodb database is build like this :
The selected user is nearly empty (has no friends, no events, no pictures...).
class BaseUser(db.Document, UserMixin):
username = db.StringField(max_length=64, unique=True, required=True)
email = db.EmailField(unique=True, required=True)
password = db.StringField(max_length=255, required=True)
active = db.BooleanField(default=True)
joined_on = db.DateTimeField(default=datetime.now())
roles = db.ListField(db.ReferenceField(Role), default=[])
class User(BaseUser)
# Identity
first_name = db.StringField(max_length=255)
last_name = db.StringField(max_length=255)
birthday = db.DateTimeField()
gender = db.StringField(max_length=1,choices=GENDER,default='N')
# Coordinates
address = db.StringField(max_length=255)
zip_code = db.IntField()
city = db.StringField(max_length=64)
region = db.StringField(max_length=64)
country = db.StringField(max_length=32)
phone_number = db.StringField(max_length=18)
# Community
description = db.StringField(max_length=1000)
activities = db.StringField(max_length=1000)
languages = db.ListField(db.StringField(max_length=32))
countries_visited = db.ListField(db.StringField(max_length=32))
countries_to_visit = db.ListField(db.StringField(max_length=32))
interests = db.ListField(db.ReferenceField('Tags'))
friends = db.ListField(db.ReferenceField('User'))
friend_requests = db.ListField(db.ReferenceField('User'))
pictures = db.ListField(db.ReferenceField('Picture'))
events = db.ListField(db.ReferenceField('Event'))
messages = db.ListField(db.ReferenceField('PrivateMessage'))
invitations = db.ListField(db.ReferenceField('Invitation'))
email_validated = db.BooleanField(default=False)
validation_date = db.DateTimeField()
I have a debian serveur with 6Go Ram and 1 vcore, 2,4GHz.
When I check the log for the mongoDB I don't see request that takes more then 378ms (for a search request)
If I use TOP during a request on my server:
I see for 1 sec a 97% CPU use for Python during the request.
When I check the python server output :
I see 4 second between the Option request and the Get Request.
I finally managed to "fix" my issue.
It seems all the problem was due to the #auth_token_required.
Each request done by the front end to the back end with the "headers.append('Authentication-Token',currentUser.token);" created a huge delay.
I replaced #auth_token_required by #login_required.
I m now using cookies.
Hope it helps someone.

mongodb not letting writes go through

Writing to Mongodb via Java Driver. Getting the following error:
com.mongodb.WriteConcernException: { "serverUsed" : "127.0.0.1:27017" , "err" : "_a != -1" , "n" : 0 , "connectionId" : 3 , "ok" : 1.0}
at com.mongodb.CommandResult.getWriteException(CommandResult.java:90)
at com.mongodb.CommandResult.getException(CommandResult.java:79)
at com.mongodb.CommandResult.throwOnError(CommandResult.java:131)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:135)
at com.mongodb.DBTCPConnector.access$000(DBTCPConnector.java:39)
at com.mongodb.DBTCPConnector$1.execute(DBTCPConnector.java:186)
at com.mongodb.DBTCPConnector$1.execute(DBTCPConnector.java:181)
at com.mongodb.DBTCPConnector.doOperation(DBTCPConnector.java:210)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:181)
at com.mongodb.DBCollectionImpl.insertWithWriteProtocol(DBCollectionImpl.java:528)
at com.mongodb.DBCollectionImpl.insert(DBCollectionImpl.java:193)
at com.mongodb.DBCollectionImpl.insert(DBCollectionImpl.java:165)
at com.mongodb.DBCollection.insert(DBCollection.java:161)
at com.mongodb.DBCollection.insert(DBCollection.java:147)
at com.mongodb.DBCollection$insert.call(Unknown Source)
Can't find any reference in docs to "err" : "_a != -1". Any thoughts?
EDIT:
Adding code I used (not all as it relies on other libraries to parse files):
MongoClient mongoClient = new MongoClient()
mongoClient.setWriteConcern(WriteConcern.SAFE)
DB db = mongoClient.getDB("vcf")
List<DBObject> documents = new ArrayList<DBObject>()
DBCollection recordsColl = db.getCollection("records")
//loop through file
BasicDBObject mongoRecord = new BasicDBObject()
//add data to mongoRecord
documents.add(mongoRecord)
//end loop
recordsColl.insert(documents)
mongoClient.close()

Logging mongodb query in java

I am using mongo-java-driver-2.9.1 for interacting with mongodb, I want to log the query that are fired on to the mongodb server. e.g. In java for inserting the document this is the code that I write
DBCollection coll = db.getCollection("mycollection");
BasicDBObject doc = new BasicDBObject("name", "MongoDB")
.append("type", "database")
.append("count", 1);
coll.insert(doc);
for this, equivalent code in "mongo" client for inserting document in mongodb is
db.mycollection.insert({
"name" : "MongoDB",
"type" : "database",
"count" : 1
})
I want to log this second code, is there any way to do it?
I think the MongoDB Java driver has not logging support so you have to write your logging Message by your own. Here an Example:
DBCollection coll = db.getCollection("mycollection");
BasicDBObject doc = new BasicDBObject("name", "MongoDB")
.append("type", "database")
.append("count", 1);
WriteResult insert = coll.insert(doc);
String msg = "";
if(insert.getError() == null){
msg = "insert into: " + collection.toString() +" ; Object " + q.toString());
//log the message
} else {
msg = "ERROR by insert into: " + collection.toString() +" ; Object " + q.toString());
msg = msg + " Error message: " + insert.getError();
}
//log the message

How to get the document with max value for a field with map-reduce in pymongo?

How do I find the document with the maximum uid field with map-reduce in pymongo?
I have tried the following but it prints out blanks:
from pymongo import Connection
from bson.code import Code
db = Connection().map_reduce_example
db.things.insert({
"_id" : "50f5fe8916174763f6217994",
"deu" : "Wie Sie sicher aus der Presse und dem Fernsehen wissen, gab es in Sri Lanka mehrere Bombenexplosionen mit zahlreichen Toten.\n",
"uid" : 13,
"eng" : "You will be aware from the press and television that there have been a number of bomb explosions and killings in Sri Lanka."
})
db.things.insert({
"_id" : "50f5fe8916234y0w3fvhv234",
"deu" : "Ich bin schwanger foo bar.\n",
"uid" : 14,
"eng" : "I am pregnant foobar."
})
db.things.insert({
"_id" : "50f5fe8916234y0w3fvhv234",
"deu" : "barbar schwarz sheep, haben sie any foo\n",
"uid" : 14,
"eng" : "barbar black sheep, have you any foo"
})
m = Code("function () {emit(this.uid,{'uid':this.uid,'eng':this.eng})}")
r = Code("function (key, values) {var total = 0;for (var i = 0; i < values.length; i++) {total += values[i];}return total;}")
result = db.things.inline_map_reduce(m, r)
for r in result:
print
An example document that look like these:
{
"_id" : ObjectId("50f5fe8916174763f6217994"),
"deu" : "Wie Sie sicher aus der Presse und dem Fernsehen wissen, gab es mehrere Bombenexplosionen mit zahlreichen Toten.\n",
"uid" : 13,
"eng" : "You will be aware from the press and television that there have been a
number of bomb explosions and killings."
}
You can use find_one to find the doc with the maximum uid by sorting on that field descending:
db.things.find_one(sort=[("uid", -1)])
or using the defined constant:
db.things.find_one(sort=[("uid", pymongo.DESCENDING)])