exporting MongoDB to CSV using pymongo

exporting MongoDB to CSV using pymongo - mongodb

I would like write a script to generate a CSV file from my mongoDB database and I would like to know the most convenient version !
first let me begin with the structure of collections.
MyDataBase -> setting
users
fruits
in setting I have something like
setting -> _id
data
_tenant
and the thing I am after, is making a CSV file out of profiles in data
which they have some fields/properties like "name", "address", "postalcode", "email", age and etc. and not neccessary all of these profile have all files/properties and even some of them look like collection (have sub-branches) which I am not interested in at all !
so, my code is python so far is look like these
myquery = db.settings.find() # I am getting everything !
output = csv.writer(open('some.csv', 'wt')) # writng in this file
for items in myquery[0:10]: # first 11 entries
a = list(items['data']['Profile'].values()) # collections are importent as dictionary and I am making them as list
tt = list()
for chiz in a:
if chiz is not None:
tt.append(chiz.encode('ascii', 'ignore')) #encoding
else:
tt.append("none")
output.writerow(tt)
these fields/properties dont have neccessary all fields, and also even some of them are collection(with sub-branch) and will be imported as dictionary ! so, I have to convert them to list and all and all, there are quite few things to take care in such a process and in all doesn't look that straightforward !
My question might be sounds very general but is it a typical way to make such report ?! if not, can you someone make it clear ?!

Yes, I am using the same way.
It is clear and fast, also it works without of any additional libraries.

Related

jasper: listing all key-values pairs of a collection

In jasper I have to print a collection without knowing in advance the keys, because they are programmatic and can change over time. Let me do an example of this list:
formData[0]={addictions=1, workout=0, allergies=1, selfSufficiency=0}
formData[1]={gastricNose=1, weightChange=1, diet=XXXX}
[...]
formData[12]={dailyAmount=12, latestDate={type=date, value=1542582000000}, ostomy=1, ostomyType=AA, ostomyBag=BB}
How can I print a list of all these keys/values so that they get all listed properly like this? I can already print these objects raw in a list but they obviously look pretty much like the example I wrote above, while instead I need jasper to cycle through each of the elements by itself and retrieve the keys and the value for each element.
This is what I'm trying to print:
addictions: 1
workout: 0
[...]
gastricNose: 1
weightChange: 1
[...]
dailyAmount: 12
latestDate: 18/11/2018
[...]
In all posts I've looked into I'm required to know at least the name of the keys, but in my case I can't. And I cannot modify the data source structure either (for instance formatting the collection to be {key:"addictions",value:"1"} can't do that).
Thanks for your hints

How can you filter on a custom value created during dehydration?

During dehydration I create a custom value:
def dehydrate(self, bundle):
bundle.data['custom_field'] = ["add lots of stuff and return an int"]
return bundle
that I would like to filter on.
/?format=json&custom_field__gt=0...
however I get an error that the "[custom_field] field has no 'attribute' for searching with."
Maybe I'm misunderstanding custom filters, but in both build_filters and apply_filters I can't seem to get access to my custom field to filter on it. On the examples I've seen, it seems like I'd have to redo all the work done in dehydrate in build_filters, e.g.
for all the items:
item['custom_field'] = ["add lots of stuff and return an int"]
filter on item and add to pk_list
orm_filters["pk__in"] = [i.pk for i in pk_list]
which seems wrong, as I'm doing the work twice. What am I missing?

The problem is that dehydration is "per object" by design, while filters are per object_list. That's why you will have to filter it manually and redo work in dehydration.
You can imagine it like this:
# Whole table
[obj, obj1, obj2, obj3, obj4, obj5, obj5]
# filter operations
[...]
# After filtering
[obj1, obj3, obj6]
# Returning
[dehydrate(obj), dehydrate(obj3), dehydrate(obj5)]
In addition you can imagine if you fetch by filtering and you get let say 100 objects. It would be quite inefficient to trigger dehydrate on whole table for instance 100000 records.
And maybe creating new column in model could be candidate solution if you plan to use a lot of filters, ordering etc. I guess its kind of statistic information in this field so if not new column then maybe django aggregation could ease your pain a little.

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.

Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.

You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]

Web2py - Multiple tables read-only form

I've searched around the web for a way to achieve this, and found multiple solutions. Most of them had messy code, all of them drawbacks. Some ideas involved setting default values of all the db fields based on a record. Others worked by appending multiple SQLFORMs, which resulted in differences in indentation on the page (because it's 2 HTML tables in 1 form).
I'm looking for a compact and elegant way of providing a read-only representation of a record based on a join on two tables. Surely there must be some simple way to achieve this, right? The Web2py book only contains an example of an insert-form. It's this kind of neat solution I am looking for.
In the future I will probably need multi-table forms that provide update functionality as well, but for now I'll be happy if I can get a simple read-only form for a record.
I would greatly appreciate any suggestions.

This seems to work for me:
def test():
fields = [db.tableA[field] for field in db.tableA.keys() \
if type(db.tableA[field]) == type(db.tableA.some_field)]
fields += [db.tableB[field] for field in db.tableB.keys() \
if type(db.tableB[field]) == type(db.tableB.some_field)]
ff = []
for field in fields:
ff.append(Field(field.name, field.type))
form = SQLFORM.factory(*ff, readonly=True)
return dict(form=form)
You could add in field.required, field.requires validtaors, etc. And also, since you're using SQLFORM.factory, you should be able to validate it and to updates/inserts. Just make sure that the form you are building using this method contains all of the necessary information to validate the form for update -- I believe you can add them easily to the Field instantiation above.
EDIT: Oh yeah, and you need to get the values of the record in question to pre-populate the form based on a record id (after form is defined)... also.. I just realized that instead of those list comprehensions, you can just use SQLFORM.factory and provide the two tables:
def test():
form = SQLFORM.factory(db.tableA, db.tableB, readonly=True)
record = ... (query for your record, probably based on an id in request.args(0))
for field in record.keys():
if (*test if this really is a field*):
form.vars[field] = record[field]
return dict(form=form)
Some tweaking will be required since I only provided psuedo-code for the pre-population... but look at: http://web2py.com/books/default/chapter/29/7#Pre-populating-the-form and the SQLFORM/SQLFORM.factory sections.

MongoPasswordField setPassword + save

I've started working a little bit with lift+scala+mongorecord but I found a small annoyance :
Usually to easily create a record ( document ) I just do:
User.createRecord.loginName("user").firstName("Name").lastName("LastName").save
But when I use the MongoPasswordField it is impossible to do it in just one line:
val userRecord = User.createRecord.loginName("user").firstName("Name").lastName("LastName")
userRecord.password.setPassword("SomePassword")|
userRecord.save
Source code for the filed is at http://scala-tools.org/mvnsites/liftweb-2.2/framework/scaladocs/lift-persistence/lift-mongodb-record/src/main/scala/net/liftweb/mongodb/record/field/MongoPasswordField.scala.html
Is there any way of doing this in just one line?
or at least can the field code be modified in some way to actually allow doing this?

I think you could do this:
User.createRecord.loginName("user").firstName("Name").lastName("LastName").password(Password("Some password")).save

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

exporting MongoDB to CSV using pymongo - mongodb

Yes, I am using the same way. It is clear and fast, also it works without of any additional libraries.

Related

jasper: listing all key-values pairs of a collection

How can you filter on a custom value created during dehydration?

dataFrame keying using pandas groupby method

Web2py - Multiple tables read-only form

MongoPasswordField setPassword + save

Categories

Resources