I am new to Mongodb and I have been assigned a task to extract data from mongoDB and create a csv and load it to Oracle database. Below is the data in Mongodb
{{
"_id":"69ajdslsdfdksjfef9",
"col1":"456780",
"refNum":"ref001"
}
{
"clients":{
"CLI_1": "9876547390",
"CLI_2": "fsdfasl"
}
{"names":[
{
"first":"dfsakfj",
"middle":"hgfgas",
"last":"komdssdfsd"
},
{
"first":"dfskdajf",
"middle": "fgjfgjfl",
"last": "ghfghsdklfg"
}]}
second row from collection
{{
"_id":"69ajdslsdfdksjfef9",
"col1":"456780",
"refNum":"ref001"
}
{
"clients":{
"CLI_1": "9876547390",
"CLI_2": "fsdfasl"
}
{"names":[
{
"first":"dfsakfj",
"middle":"hgfgas",
"last":"komdssdfsd"
}]}
I am using pymongo utility to query and create a dataframe before generating the csv file. However, I am able to create a dataframe but not able parse "names" properly as it is inconsistent when compared between the rows. Could anyone please share how to parse and create a csv with fields _id, first, middle, last
Thank You
There is mongodb aggregation pipeline named $unwind which can help you with the task to flatten the names array so it is suitable for csv output as follow:
db.collection.aggregate([
{
$unwind: "$names"
},
{
$project: {
_id: 1,
first: "$names.first",
middle: "$names.middle",
last: "$names.last"
}
}
])// _id, first, middle, last
playground
Related
I have a beginner MongoDB question, but I am really stuck on this.
So, I have a dataset of some medical data
and I need to calculate the averages and the standard deviations for the each column and store it in a table represented below.
So, I have made an aggregation
var average_age = db.ilp.aggregate(
{$match: {"age":{$ne:-1}}},
{"$group":{
"_id": null,
"avg_age": {"$avg":"$age"}
}
})
tried to store it into a variable and I get the following output for this variable:
{ "_id" : null, "avg_age" : 44.74614065180103 }
But when I want to insert this into a new collection, by db.test.insert(average_age), I get the following output:
After I make a desired collection, I want to redirect it to a csv file.
I appreciate any help you could provide.
P.S. also, I can you tell me why, first time when I call the variable average_age, it outputs the correct value showed above, and after that when I call it, it doesn't output anything, like it is empty.
You need to add as last pipeline $out operator.
db.ilp.aggregate([
{$match: {"age":{$ne:-1}}},
{"$group":{
"_id": null,
"avg_age": {"$avg":"$age"}
},
{$out: "collection-name"}
])
When you assign db.ilp.aggregate(...)or call db.ilp.find(...) methods, MongoDB returns cursor.
cursor.toArray() returns an array that contains all the documents from a cursor
var average_age = db.ilp.aggregate(...).toArray();
Note: Do not use $out stage if you wish get all results
Aside from Valijon's excellent answer, you also have the option of creating a view and use mongoexport to save the collection as a CSV file directly. This has the advantage that you can export the view many times, updating the averages every time you export it:
db.createView('ilp_averages', 'ilp', [
{ $match: { "age": { $ne:-1 } } },
{"$group": {
"_id": null,
"avg_age": {"$avg":"$age"}
}])
And export with:
mongoexport --host <mongodbinstance>:<port> --type csv -d=<yourdb> -c=ilp_averages -o=ilp_averages.csv
Note that the correct syntax for mongoexport depends on the version of MongoDB that you are using, there could be slight variations.
I am using Mongo 3.2.14
I have a mongo collection that looks like this:
{
'_id':...
'field1':...
'field2':...
'field3':...
etc...
}
I want to aggregate this way:
db.collection.aggregate{
'$match':{},
'$project':{
'field1':1,
'field2':1,
'field3':1,
etc...(all fields)
}
}
Is there a way to include all fields in the project without listing each field one by one ? (I have around 30 fields, and growing...)
I have found info on this here:
MongoDB $project: Retain previous pipeline fields
Include all existing fields and add new fields to document
how to not write every field one by one in project
However, I'm using mongo 3.2.14 and I don't need to create a new field, so, I think I cannot use $addFields. But, if I could, can someone show me how to use it?
Basically, if you want all attributes of your documents to be passed to the next pipeline you can skip the $project pipeline. but if you want all the attributes except the "_id" value then you can pass
{ $project: { _id: 0 } }
which will return every value except the _id.
And if by any chance you have embedded lists or nests that you want to flatten, you can use the $unwind pipeline
you can use $replaceRoot
db.collection.aggregate{
"$match": {},
"$project": {
"document": "$$ROOT"
},
{
"$replaceRoot": { "newRoot": "$document" }
}
this way you can get the exact document returned with all the fields in the document...you don't need to add each field one by one in the $project...try using $replaceRoot at the end of the pipeline.
I have an abccollection collection whose documents look like:
{
id:123
focusPoint:89652.33
}
I want to retrieve all focusPoint values so that output looks like:
[89652.33, 89999.223, 99666.45, ...]
If you want to get a list without duplicates:
db.abccollection.distinct("focusPoint")
Otherwise, keeping duplicates:
db.abccollection.find({}, { _id: 0, focusPoint: 1 }).map((doc) => doc.focusPoint)
With that you will retrieve all the documents in the abccollection collection and project just the focusPoint field. The raw output of that query (before the map) will be an array of documents with a single field:
[{ focusPoint: 89652.33 }, { focusPoint: 89999.223 }, ...]
If you don't want MongoDB to do the .map in MongoDB, you can also do it in your application.
I got a mongo db collection with structure
randomstring - means the string is actually random its diffrent in each document of the collection.
{
"notrandom":{
"randomstring":{
"randomstring":{
"randomstring":{
"notrandom2":"data"
}
}
}
}
}
how can i project this data out?
something like
db.mydb.aggregate( "notrandom[0][0].notrandom2":1}} , ] )
what i'm trying to achieve is a collection of all the notrandom2 values.
if you you to move notrandom2 to higer level in document structure,
you could use $project stage like this:
{
$project:{
_id:1,
notrandom2:"notrandom.randomstring.randomstring.randomstring.notrandom2",
// list all others fields with field:1 if you want them to appear down in pieline
}}
if this field is a part of an array then you need to $unwind first and then $project
You can use the following query
db.mydb.find({<findquery (in you have any)>},{"notrandom.randomstring.randomstring.randomstring.notrandom2" : 1}).
toArray(function(err, result)
{
console.log(result); //Array of objects with `notrandom2` values
})
Suppose to have a collection of MongoDB documents with the following structure:
{
id_str: "some_value",
text: "some_text",
some_field: "some_other_value"
}
I would like to filter such documents so as to obtain the ones with distinct text values.
I learned from the MongoDB documentation how to extract unique field values from a collection, using the distinct operation. Thus, by performing the following query:
db.myCollection.distinct("text")
I would obtain an array containing the distinct text values:
["first_distinct_text", "second_distinct_text",...]
However, this is not the result that i would like to obtain. Instead, I would like to have the following:
{ "id_str": "a_sample_of_id_having_first_distinct_text",
"text": "first_distinct_text"}
{ "id_str": "a_sample_of_id_having_second_distinct_text",
"text": "second_distinct_text"}
I am not sure if this can be done with a single query.
I found a similar question which, however, do not solve fully my problem.
Do you have any hint on how to solve this problem?
Thanks.
You should look into making an aggregate query using the $group stage, and probably using the $first operator.
Maybe something along the lines of:
db.myCollection.aggregate([{ $group : { _id : { text: "$text"},
text: { $first: "$id_str" }
}
}])
try:
db.myCollection.aggregate({$group: {_id: {'text': "$text", 'id_str': '$id_str'}}})
More information here: http://docs.mongodb.org/manual/reference/method/db.collection.aggregate/