Strict parsing into POJOs with KMongo - mongodb

When I find documents in my collections and parse them into POJOs, I would like to see exceptions, if additional keys are available in the MongoDB, that do not correspondent to my POJO.
Can't find a way to configure that.
What I do
data class MyPojo(var a: Int)
val mongoClient = KMongo.createClient(...)
val collection = mongoClient...
val results = collection.aggregate<MyPojo>(...)
and if a result document is
{ "a": 1, "b": 2 }
What I get:
MyPojo(a=1)
I would like to see an exception of sort
kotlinx.serialization.json.JsonDecodingException: Invalid JSON...: Encountered an unknown key b
Does anyone know how to do that?

You have to specify strictMode = true in your JsonConfiguration for example:
install(ContentNegotiation) {
serialization(
contentType = ContentType.Application.Json,
json = Json(
JsonConfiguration(
strictMode = true,
prettyPrint = true
)
)
)
}

Related

Error inserting embedded documents into MongoDB using Scala

I use mongo-scala-driver 2.9.0 and this is a function saving a user's Recommendation List to MongoDB. The argument streamRecs is An Array of (productId:Int, score:Double). Now i want to insert a document consisting of an useId and its relevant reconmendation list recs. However, there is an error in the line val doc:Document = Document("userId" -> userId,"recs"->recs). Does anyone know what goes wrong?
def saveRecsToMongoDB(userId: Int, streamRecs: Array[(Int, Double)])(implicit mongoConfig: MongoConfig): Unit ={
val streamRecsCollection = ConnHelper.mongoClient.getDatabase(mongoConfig.db).getCollection(STREAM_RECS_COLLECTION)
streamRecsCollection.findOneAndDelete(equal("userId",userId))
val recs: Array[Document] = streamRecs.map(item=>Document("productId"->item._1,"score"->item._2))
val doc:Document = Document("userId" -> userId,"recs"->recs)
streamRecsCollection.insertOne(doc)
}
the document i want to insert into MongoDB is like this(it means an
user and his recommendation products and scores):
{
"_id":****,
"userId":****,
"recs":[
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
}
]
}
When creating a BSON document, declare the Bson type explicitly for each value in the key/value pair, like so:
/* Compose Bson document */
val document = Document(
"email" -> BsonString("email#domain.com"),
"token" -> BsonString("some_random_string"),
"created" -> BsonDateTime(org.joda.time.DateTime.now.toDate)
)
To see an example, please check https://code.linedrop.io/articles/Storing-Object-in-MongoDB.

Avro Generic Record not taking aliases into account

I have some JsonData (fastxml.jackson objects) and I want to convert this into a GenericAvro Record. As I don't know by forehand what data I will be getting, only that there is an Avro schema available in a schema repository. I can't have predefined classes. So hence the generic record.
When I pretty print my schema, I can see my keys/values and their aliases. However the Generic record "put" method does not seem to know these aliases.
I get the following exception Exception in thread "main" org.apache.avro.AvroRuntimeException: Not a valid schema field: device/id
Is this by design? How can I make this schema look at the aliases as well?
schema extract:
"fields" : [ {
"name" : "device_id",
"type" : "long",
"doc" : " The id of the device.",
"aliases" : [ "deviceid", "device/id" ]
}, {
............
}]
code:
def jsonToAvro(jSONObject: JsonNode, schema: Schema): GenericRecord = {
val converter = new JsonAvroConverter
println(jSONObject.toString) // correct
println(schema.toString(true)) // correct
println(schema.getField("device_id")) //correct
println(schema.getField("device_id").aliases()toString) //correct
val avroRecord = new GenericData.Record(schema)
val iter = jSONObject.fields()
while (iter.hasNext) {
import java.util
val e = jSONObject.fields()
val entry = iter.next.asInstanceOf[util.Map.Entry[String, Nothing]]
println(s"adding ${entry.getKey.toString} and ${entry.getValue} with ${entry.getValue.getClass.getName}") // adding device/id and 8711 with com.fasterxml.jackson.databind.node.IntNode
avroRecord.put(entry.getKey.toString, entry.getValue) // throws
}
avroRecord
}
I tried on Avro 1.8.2, it still throws this exception when I read a json string into a GenericRecord:
org.apache.avro.AvroTypeException: Expected field name not found:
But I saw some sample used alias correctly two years ago:
https://www.waitingforcode.com/apache-avro/serialization-and-deserialization-with-schemas-in-apache-avro/read
So I guess Avro changed that behaviour recently
It seems like the schema is only this flexible when reading.
Writing AVRO only looks at the current field name.
Not only that, but I'm using "/" in my field names (json), this is not supported as a field name.
Schema validation does not complain when it's in the alias, so that might work (haven't tested this)

PyMongo - Name must be an instance of Str

I'm trying to read and write from a database on MongoDB Atlas and while I can read data from my collections just fine, any attempt to write to a collection causes PyMongo to raise an exception 'name must be an instance of str'.
I'm guessing this is in reference to the MongoClient object but the thing is I am using a connection string. Can anyone help me with what I'm doing wrong?
My code is as follows: (I've got a ton of comments to help me understand better, so please excuse the lack of brevity)
def setattributes(self, rowdict):
""" a function to create a user. Assumes that only a data
dict is provided. strips everything else and updates.
what the data dict contains is your problem.
"""
with UseDatabase(self.dbconfig) as db:
collection = db.database[self.tablename]
locationdict = { #create a corresponding location entry
'email' : rowdict['email'],
'devstate' : 0,
'location' : {
'type': 'Point',
'coordinates' : [ 0, 0 ]
},
'lastseen' : datetime.now()
}
try:
res = db.insertdata(collection, rowdict) #insert user data
except Exception as e:
print("Error adding user to DB : %s" % e)
return False # if you cant insert, return False
try:
loccollection = db.database[self.locationtable]
resloc = db.insertdata(loccollection, locationdict)
except Exception as e: # if the status update failed
db.collection.remove({'email' : rowdict['email']})
#rollback the user insert - atomicity
return False
return True
My Database code is as follows:
class ConnectionError(Exception):
pass
class CredentialsError(Exception):
pass
class UseDatabase:
def __init__(self, config: dict):
self.config = config
def __enter__(self, config = atlas_conn_str):
try:
self.client = MongoClient(config)
self.database = self.client['reviv']
return self
except:
print("Check connection settings")
raise ConnectionError
def __exit__(self, exc_type, exc_value, exc_traceback):
self.client.close()
def insertdata(self, collection, data):
post = data
post_id = self.database[collection].insert_one(post).inserted_id
return post_id
def getdetails(self, collection, emailid):
user = collection.find_one({'email' : emailid}, {'_id' : 0})
return user
In your "setattributes()", you access a pymongo Collection instance by name:
collection = db.database[self.tablename]
Then in "insertdata()" you attempt to do the same thing again, but now "collection" is not a string, it's a Collection instance:
post_id = self.database[collection].insert_one(post).inserted_id
Instead, simply do:
post_id = collection.insert_one(post).inserted_id
By the way, I see that you've written some code to ensure you create and close a MongoClient for each operation. This unnecessarily complicated and it will slow down your application dramatically by requiring a new connection for each operation. As the FAQ says, "Create this client once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient."
I suggest you delete your UseDatabase class, make the MongoClient a module global variable, and use the MongoClient directly:
client = MongoClient(atlas_conn_str)
db = client[locationtable]
class C:
def setattributes(self, rowdict):
collection = db[self.tablename]
# ... make rowdict as usual, and then:
res = collection.insert_one(rowdict)
This code is simpler and will run much faster.

mongoengine filter on DictField's dynamic keys

class UserThings(DynamicDocument):
username = StringField()
things = DictField()
dcrosta_things = UserThings(username='dcrosta')
dcrosta_things.things['foo'] = 'bar'
dcrosta_things.things['bad'] = 'quack'
dcrosta_things.save()
Results in a MongoDB document like:
{ _id: ObjectId(...),
_types: ["UserThings"],
_cls: "UserThings",
username: "dcrosta",
things: {
foo: "bar",
baz: "quack"
}
}
Im using mongoengine and I'm not able to query on dict field's keys.
for e.g I have a list of things thing_list = ['foo', 'faa', 'baz', 'xyz']
and I want to filter all UserThings that contains anyof these things...
something like .. UserThings.objeect.filter(things__in=thing_list)
definitely this wont work. IS there any way to perform filtering on variable/dynamic keys on dictfield. if not, can we do it using pymongo/raw query?

Creating class instance properties from a dictionary?

I'm importing from a CSV and getting data roughly in the format
{ 'Field1' : 3000, 'Field2' : 6000, 'RandomField' : 5000 }
The names of the fields are dynamic. (Well, they're dynamic in that there might be more than Field1 and Field2, but I know Field1 and Field2 are always going to be there.
I'd like to be able to pass in this dictionary into my class allMyFields so that I can access the above data as properties.
class allMyFields:
# I think I need to include these to allow hinting in Komodo. I think.
self.Field1 = None
self.Field2 = None
def __init__(self,dictionary):
for k,v in dictionary.items():
self.k = v
#of course, this doesn't work. I've ended up doing this instead
#self.data[k] = v
#but it's not the way I want to access the data.
q = { 'Field1' : 3000, 'Field2' : 6000, 'RandomField' : 5000 }
instance = allMyFields(q)
# Ideally I could do this.
print q.Field1
Any suggestions? As far as why -- I'd like to be able to take advantage of code hinting, and importing the data into a dictionary called data as I've been doing doesn't afford me any of that.
(Since the variable names aren't resolved till runtime, I'm still going to have to throw a bone to Komodo - I think the self.Field1 = None should be enough.)
So - how do I do what I want? Or am I barking up a poorly designed, non-python tree?
You can use setattr (be careful though: not every string is a valid attribute name!):
>>> class AllMyFields:
... def __init__(self, dictionary):
... for k, v in dictionary.items():
... setattr(self, k, v)
...
>>> o = AllMyFields({'a': 1, 'b': 2})
>>> o.a
1
Edit: let me explain the difference between the above code and SilentGhost's answer. The above code snippet creates a class of which instance attributes are based on a given dictionary. SilentGhost's code creates a class whose class attributes are based on a given dictionary.
Depending on your specific situation either of these solutions may be more suitable. Do you plain to create one or more class instances? If the answer is one, you may as well skip object creation entirely and only construct the type (and thus go with SilentGhost's answer).
>>> q = { 'Field1' : 3000, 'Field2' : 6000, 'RandomField' : 5000 }
>>> q = type('allMyFields', (object,), q)
>>> q.Field1
3000
docs for type explain well what's going here (see use as a constructor).
edit: in case you need instance variables, the following also works:
>>> a = q() # first instance
>>> a.Field1
3000
>>> a.Field1 = 1
>>> a.Field1
1
>>> q().Field1 # second instance
3000
You can also use dict.update instead of manually looping over items (and if you're looping, iteritems is better).
class allMyFields(object):
# note: you cannot (and don't have to) use self here
Field1 = None
Field2 = None
def __init__(self, dictionary):
self.__dict__.update(dictionary)
q = { 'Field1' : 3000, 'Field2' : 6000, 'RandomField' : 5000 }
instance = allMyFields(q)
print instance.Field1 # => 3000
print instance.Field2 # => 6000
print instance.RandomField # => 5000
You could make a subclass of dict which allows attribute lookup for keys:
class AttributeDict(dict):
def __getattr__(self, name):
return self[name]
q = AttributeDict({ 'Field1' : 3000, 'Field2' : 6000, 'RandomField' : 5000 })
print q.Field1
print q.Field2
print q.RandomField
If you try to look up an attribute that dict already has (say keys or get), you'll get that dict class attribute (a method). If the key you ask for doesn't exist on the dict class, then the __getattr__ method will get called and will do your key lookup.
Use setattr for the pretty way. The quick-n-dirty way is to update the instance internal dictionary:
>>> class A(object):
... pass
...
>>> a = A()
>>> a.__dict__.update({"foo": 1, "bar": 2})
>>> a.foo
1
>>> a.bar
2
>>>
Using named tuples (Python 2.6):
>>> from collections import namedtuple
>>> the_dict = {'Field1': 3, 'Field2': 'b', 'foo': 4.9}
>>> fields = ' '.join(the_dict.keys())
>>> AllMyFields = namedtuple('AllMyFields', fields)
>>> instance = AllMyFields(**the_dict)
>>> print instance.Field1, instance.Field2, instance.foo
3 b 4.9
class SomeClass:
def __init__(self,
property1,
property2):
self.property1 = property1
self.property2 = property2
property_dict = {'property1': 'value1',
'property2': 'value2'}
sc = SomeClass(**property_dict)
print(sc.__dict__)
Or you can try this
class AllMyFields:
def __init__(self, field1, field2, random_field):
self.field1 = field1
self.field2 = field2
self.random_field = random_field
#classmethod
def get_instance(cls, d: dict):
return cls(**d)
a = AllMyFields.get_instance({'field1': 3000, 'field2': 6000, 'random_field': 5000})
print(a.field1)
enhanced of sub class of dict
recurrence dict works!
class AttributeDict(dict):
"""https://stackoverflow.com/a/1639632/6494418"""
def __getattr__(self, name):
return self[name] if not isinstance(self[name], dict) \
else AttributeDict(self[name])
if __name__ == '__main__':
d = {"hello": 1, "world": 2, "cat": {"dog": 5}}
d = AttributeDict(d)
print(d.cat)
print(d.cat.dog)
print(d.cat.items())
"""
{'dog': 5}
5
dict_items([('dog', 5)])
"""
If you are open for adding a new library, pydantic is a very efficient solution. It uses python annotation to construct object and validate type Consider the following code:
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: str
data = {"name": "ahmed", "age": 36}
p = Person(**data)
pydantic: https://pydantic-docs.helpmanual.io/
A simple solution is
field_dict = { 'Field1' : 3000, 'Field2' : 6000, 'RandomField' : 5000 }
# Using dataclasses
from dataclasses import make_dataclass
field_obj = make_dataclass("FieldData", list(field_dict.keys()))(*field_dict.values())
# Using attrs
from attrs import make_class
field_obj = make_class("FieldData", list(field_dict.keys()))(*field_dict.values())