PyMongo - Name must be an instance of Str - mongodb

I'm trying to read and write from a database on MongoDB Atlas and while I can read data from my collections just fine, any attempt to write to a collection causes PyMongo to raise an exception 'name must be an instance of str'.
I'm guessing this is in reference to the MongoClient object but the thing is I am using a connection string. Can anyone help me with what I'm doing wrong?
My code is as follows: (I've got a ton of comments to help me understand better, so please excuse the lack of brevity)
def setattributes(self, rowdict):
""" a function to create a user. Assumes that only a data
dict is provided. strips everything else and updates.
what the data dict contains is your problem.
"""
with UseDatabase(self.dbconfig) as db:
collection = db.database[self.tablename]
locationdict = { #create a corresponding location entry
'email' : rowdict['email'],
'devstate' : 0,
'location' : {
'type': 'Point',
'coordinates' : [ 0, 0 ]
},
'lastseen' : datetime.now()
}
try:
res = db.insertdata(collection, rowdict) #insert user data
except Exception as e:
print("Error adding user to DB : %s" % e)
return False # if you cant insert, return False
try:
loccollection = db.database[self.locationtable]
resloc = db.insertdata(loccollection, locationdict)
except Exception as e: # if the status update failed
db.collection.remove({'email' : rowdict['email']})
#rollback the user insert - atomicity
return False
return True
My Database code is as follows:
class ConnectionError(Exception):
pass
class CredentialsError(Exception):
pass
class UseDatabase:
def __init__(self, config: dict):
self.config = config
def __enter__(self, config = atlas_conn_str):
try:
self.client = MongoClient(config)
self.database = self.client['reviv']
return self
except:
print("Check connection settings")
raise ConnectionError
def __exit__(self, exc_type, exc_value, exc_traceback):
self.client.close()
def insertdata(self, collection, data):
post = data
post_id = self.database[collection].insert_one(post).inserted_id
return post_id
def getdetails(self, collection, emailid):
user = collection.find_one({'email' : emailid}, {'_id' : 0})
return user

In your "setattributes()", you access a pymongo Collection instance by name:
collection = db.database[self.tablename]
Then in "insertdata()" you attempt to do the same thing again, but now "collection" is not a string, it's a Collection instance:
post_id = self.database[collection].insert_one(post).inserted_id
Instead, simply do:
post_id = collection.insert_one(post).inserted_id
By the way, I see that you've written some code to ensure you create and close a MongoClient for each operation. This unnecessarily complicated and it will slow down your application dramatically by requiring a new connection for each operation. As the FAQ says, "Create this client once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient."
I suggest you delete your UseDatabase class, make the MongoClient a module global variable, and use the MongoClient directly:
client = MongoClient(atlas_conn_str)
db = client[locationtable]
class C:
def setattributes(self, rowdict):
collection = db[self.tablename]
# ... make rowdict as usual, and then:
res = collection.insert_one(rowdict)
This code is simpler and will run much faster.

Related

Strict parsing into POJOs with KMongo

When I find documents in my collections and parse them into POJOs, I would like to see exceptions, if additional keys are available in the MongoDB, that do not correspondent to my POJO.
Can't find a way to configure that.
What I do
data class MyPojo(var a: Int)
val mongoClient = KMongo.createClient(...)
val collection = mongoClient...
val results = collection.aggregate<MyPojo>(...)
and if a result document is
{ "a": 1, "b": 2 }
What I get:
MyPojo(a=1)
I would like to see an exception of sort
kotlinx.serialization.json.JsonDecodingException: Invalid JSON...: Encountered an unknown key b
Does anyone know how to do that?
You have to specify strictMode = true in your JsonConfiguration for example:
install(ContentNegotiation) {
serialization(
contentType = ContentType.Application.Json,
json = Json(
JsonConfiguration(
strictMode = true,
prettyPrint = true
)
)
)
}

How to avoid adding duplicate data in Scrapy using MongoDB?

I want to avoid adding duplicate data and just 1) update one field (number of views) or 2) all the fields that had changed in the website. To do so I'm using an ID (origin_id) that I have found in the website that I'm scraping.
Pipelines
class MongoDBPipeline(object):
def __init__(self):
connection = pymongo.MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT']
)
db = connection[settings['MONGODB_DB']]
self.collection = db[settings['MONGODB_COLLECTION']]
def process_item(self, item, spider):
valid = True
for data in item:
if not data:
valid = False
raise DropItem("Missing {0}!".format(data))
if valid:
# Update item if it is in the database and insert otherwise.
self.collection.update({'origin_id': item['origin_id']}, dict(item), upsert=True)
return item
MongoDB record
{
"_id" : ObjectId("59725e919a1a6b7f0350027a"),
"origin_id" : "12256699",
"views":"556",
"url":"...",
"title":"...",
}
Please let me know if you want more details ...
You need to increment views field by 1 if the origin_id exists in the document.
Note that you can only set the other fields as they hold non-numeric values.
This is also necessary in order to skip an extra query that checks if a document with that origin_id exists in the collection.
self.collection.update({
'origin_id': item['origin_id']},
{
'$set': {'url': item['url'], 'title': item['title']},
'$inc': {'views': 1}
}
},
upsert=True)

Auto incrementing an indexed field in mongodb when there are multiple concurrent requests

I am trying to auto increment an indexed field in mongodb whenever there is an insertion happens, I read many posts on SO this and mongoose-auto-increment, but I am not getting how are they working Because consider below scenario
Suppose I want to auto increment a field counter in my collection and currently the first record already exist whose counter value is 1, now suppose there are three concurrent inserts happens in the database now as counter value is 1 so all of them must be trying to set counter 2. But as we know know among these three whoever will get the first lock will successfully set its counter as 2, but what about other two operations because now when they will acquire lock they will also try to set counter value as 2 but as 2 is already taken so I guess mongoose will give error duplicate key error.
Can anyone please tell me how does above two posts solves the concurreny problem for auto-incrementing an indexed field in mongodb.
I know I am missing some conecpt but what ??
Thanks.
I encounter the same problem so I ended up building my own increment handling concurrency and it was quite easy! Bottom line, the fast answer, I used a try catch loop while I save the document to catch the duplicated key error on my incremented field. Here is how I emplemented this on mongoose and in my controller/service/model architecture:
First, I need to store the auto increment, it won't be a big collection since I will never have more than a dozen concerned collections in a project, so I don't even need special indexes or whatever:
counter.model.js
// requires modules blabla...
// The mongoose schema for the counter collection
const CounterSchema = new Schema({
// entity describes the concerned collection.field
entity: {
type: {
collection: { type: String, required: true },
field: { type: String, required: true }
}, required: true, unique: true
},
// the actual counter for the collection.field
counter: { type: Number, required: true }
});
// The mongoose-based function to query the counter
async function nextCount(collection, field){
let entityCounter = await CounterModel.findOne({
entity: { collection, field })
let counter = entityCounter.counter + 1
entityCounter.counter = counter;
await entityCounter.save();
return counter
}
// mongoose boiler plate
CounterSchema.statics.nextCount = nextCount;
const CounterModel = mongoose.model("counter", Counterschema)
module.exports = CounterModel
Then, I made a service to use my counter model. We also use the service to format the auto-increment as needed. For example, accountancy wants that all client number starts with "411" ans adds a 5 figures id, so client number 1 actually will be 41100001
counter.service.js
// requires modules blabla ....
class CounterService {
constructor(){}
static nextCount = async(collection, field, prefix, len){
// Gets next counter from db
const counter = await CounterModel.nextCount(collection, field)
// Formats the counter as requested in params
let counterString = `${counter}`;
while (counterString.length < len) {
counterString = `0${counterString}`;}
return `${prefix}${counterString}`;
}
}
module.exports = CounterService
Then here is where we handle the concurrency: in the client model (I won't put here all the client model file but only what need for the explanation). Let's assume we have the client collection with the "num" field that needs the auto increment as described before:
client.model.js
// ...
const ClientSchema = new Schema({
firstName: ...
lastName: ...
num: { type: String, required: true, unique: true }
})
async function addClient(clientToAdd){
let newClient;
let genuineCounter = false
while(!genuineCounter){
try{
// Gets next increment from counter service
clientToAdd.num = await CounterService.nextcount("client","num","411",5)
newClient = new ClientModel(clientToAdd)
await newClient.save();
// if code reaches this point, no concurrency problem we end the loop
genuineCounter = true
} catch(error) {
// If num is duplicated, an error is catched
// we must ensure that the duplicated error comes from num field:
// 11000 is the mongoDB returned error for duplicate unique index
// and we check the duplicated field (could be other unique field!)
if (error instanceof MongoError
&& error.code === 11000
&& Object.keys(error.keyValue)[0] === "num")
genuineCounter = false
// For any other duplicated field or error we throw the error
else throw error
}
}
return newClient
}
}
And here we go ! If two users query the counter at the same time, second one will keep querying the counter until the key is not a duplicate.
A small bonus to test it: creates the small moodule to easily fake delay where you want:
file delay.helper.js
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
exports.delay = delay;
// use anywhere after module import with:
// await delay(5000)
Then import this module into the counter module and fake some delay between counter query and counter save:
counter.model.js
// previous described file and nextCount function with the use of delay()
const delay = require("./delay.helper")
async function nextCount(collection, field) {
let entityCounter = await CounterModel.findOne(...)
await delay(5000)
...
await entityCounter.save()
}
Then, from your front-end project, or your api end point, have two identical tabs and send 2 queries in a row:
Let's say the actual counter in db is 12
Query A reads counter in db = 12
Query A waits for 5 seconds
Query B reads counter in db = still 12
Query A increments and stores new client with num = 411000013; stores counter = 13
Query B increments and tries to store new client, 411000013 already exists, error catched and tries again
Query B reads counter in db = 13
Query B waits for 5 seconds, then increments and store new client with num = 411000014 and stores also new counter = 14

getting values from WriteResult mongo

I was trying to get familiar with the WriteResult object in mongo, but I can't access any of its values. The docs say the number of values inserted is stored in WriteResult.nInserted. Trying to access nInserted is crashing my server.
var readings = new Readings({
val1: parseInt(Data[0]),
val2: parseInt(Data[1]),
val3: parseInt(Data[2]),
val4: parseInt(Data[3]),
val5: parseInt(Data[4]),
val6: parseInt(Data[5]),
})
var result = readings.save(function (err, post){
if(err){return next(err)}
res.status(201).json(readings)
})
if(result.nInserted > 0){
console.log('wrote to database')
}
else{
console.log('could not write to database')
}
I know the data is being written to the database. I see it in the mongo shell.
The save method on a model instance doesn't return anything. All results are reported via the callback method, so you'd use something like this:
readings.save(function (err, doc, numberAffected){
if(err){return next(err)}
if (numberAffected > 0) {
console.log('updated an existing doc');
} else {
console.log('added a new doc');
}
res.status(201).json(doc)
})
Mongoose doesn't give you access to the full WriteResult, but as long as err is null you can rest assured the save succeeded and it's only a matter of whether an existing doc was updated or a new one was added. Because you're creating a new doc here, numberAffected will always be 0.

Invoke db.eval in FindAndModify using MongoDB C# Client

I have the following Document:
{
"_id": 100,
"Version": 1,
"Data": "Hello"
}
I have a function which return a number from a sequence:
function getNextSequence(name) {
var ret = db.Counter.findAndModify(
{
query: { _id: name },
update: { $inc: { value: 1 } },
new: true,
upsert: true
}
);
return ret.value;
}
I can use this for optimistic concurrency by performing the following Mongo command:
db.CollectionName.findAndModify({
query: { "_id" : NumberLong(100), "Version" : 1 },
update: { "$set" : {
"Data": "Here is new data!",
"Version" : db.eval('getNextSequence("CollectionName")') }
},
new: true
}
);
This will update the document (as the _id and Version) match, with the new Data field, and also the new number out of the eval call.
It also returns a modified document, from which I can retrieve the new Version value if I want to make another update later (in the same 'session').
My problem is:
You cannot create an Update document using the MongoDB C# client that will serialize to this command.
I used:
var update = Update.Combine(
new UpdateDocument("$set", doc),
Update.Set(versionMap.ElementName, new BsonJavaScript("db.eval('getNextSequence(\"Version:CollectionName\")')")))
);
If you use what I first expected to perform this task, BsonJavascript, you get the following document, which incorrectly sets Version to a string of javascript.
update: { "$set" : {
"Data": "Here is new data!",
"Version" : { "$code" : "db.eval('getNextSequence(\"Version:CollectionName\")')" }
}
}
How can I get MongoDB C# client to serialize an Update document with my db.eval function call in it?
I have tried to add a new BsonValue type in my assembly which I would serialize down to db.eval(''); However there is a BsonType enum which I cannot modify, without making a mod to MongoDB which I would not like to do incase of any issues with the change, compatibility etc.
I have also tried simply creating the Update document myself as a BsonDocument, however FindAndModify will only accept an IMongoUpdate interface which a simply a marker that at present I find superfluous.
I have just tried to construct the command manually by creating a BsonDocument myself to set the Value: db.eval, however I get the following exception:
A String value cannot be written to the root level of a BSON document.
I see no other way now than drop down to the Mongo stream level to accomplish this.
So I gave up with trying to get Mongo C# Client to do what I needed and instead wrote the following MongoDB function to do this for me:
db.system.js.save(
{
_id : "optimisticFindAndModify" ,
value : function optimisticFindAndModify(collectionName, operationArgs) {
var collection = db.getCollection(collectionName);
var ret = collection.findAndModify(operationArgs);
return ret;
}
}
);
This will get the collection to operate over, and execute the passed operationArgs in a FindAndModify operation.
Because I could not get the shell to set a literal value (ie, not a "quoted string") on a javascript object, I had to to this in my C# code:
var counterName = "Version:" + CollectionName;
var sequenceJs = string.Format("getNextSequence(\"{0}\")", counterName);
var doc = entity.ToBsonDocument();
doc.Remove("_id");
doc.Remove(versionMap.ElementName);
doc.Add(versionMap.ElementName, "SEQUENCEJS");
var findAndModifyDocument = new BsonDocument
{
{"query", query.ToBsonDocument()},
{"update", doc},
{"new", true},
{"fields", Fields.Include(versionMap.ElementName).ToBsonDocument() }
};
// We have to strip the quotes from getNextSequence.
var findAndModifyArgs = findAndModifyDocument.ToString();
findAndModifyArgs = findAndModifyArgs.Replace("\"SEQUENCEJS\"", sequenceJs);
var evalCommand = string.Format("db.eval('optimisticFindAndModify(\"{0}\", {1})');", CollectionName, findAndModifyArgs);
var modifiedDocument = Database.Eval(new EvalArgs
{
Code = new BsonJavaScript(evalCommand)
});
The result of this is that I can now call my Sequence Javascript, the getNextSequence function, inside the optimisticFindAndModify function.
Unforunately I had to use a string replace in C# as again there is no way of setting a BsonDocument to use the literal type db.eval necessary, although Mongo Shell likes it just fine.
All is now working.
EDIT:
Although, if you really want to push boundaries, and are actually awake, you will realize this same action can be accomplished by performing an $inc on the Version field.... and none of this is necessary....
However: If you want to follow along to the MongoDB tutorial on how they to say to implement concurrency, or you just want to use a function in a FindAndModify, this will help you. I know I'll probably refer back to it a few times in this project!