mongoimport doesn't import the object properly - mongodb

In Mongodb v2.2 When I try to import one simple json document file like this from my .json file into an empty collection I get 13 objects imported. Here is what I'm doing.
this is the data (I've shortened the field names to protect data):
[
{
"date" : ISODate("2012-08-01T00:00:00Z"),
"start" : ISODate("2012-08-01T00:00:00Z"),
"xxx" : 1,
"yyt" : 5,
"p" : 6,
"aam" : 20,
"dame" : "denon",
"33" : 10,
"xxt" : 8,
"col" : 3,
"rr" : [
{ "name" : "Plugin 1", "count" : 1 },
{ "name" : "Plugin 2", "count" : 1 },
{ "name" : "Plugin 3", "count" : 1 }
],
"xkx" : { "y" : 0, "n" : 1 },
"r" : { "y" : 0, "n" : 1 },
"po" : { "y" : 0, "n" : 1 },
"pge" : { "posts" : 0, "pages" : 1 },
"pol" : { "y" : 0, "n" : 1 },
"lic" : { "y" : 0, "n" : 1 },
"count" : 30,
"tx" : [
{ "zone" : -7, "count" : 1 }
],
"yp" : "daily",
"ons" : [
{ "version" : "9.6.8", "count" : 1 }
],
"ions" : [
{ "version" : "10.0.3", "count" : 1 }
]
}
]
with this command:
mongoimport --db development_report --collection xxx --username xxx --password xxx --file /Users/Alex/Desktop/daily2.json --type json --jsonArray --stopOnError --journal
I get this weired response:
Mon Sep 3 12:09:12 imported 13 objects
and this 13 new documents end up in the collection instead of one:
{ "_id" : ObjectId("5044114815e24c08bcdc988e") }
{ "_id" : ObjectId("5044114815e24c08bcdc988f"), "name" : "Plugin 1", "count" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9890"), "name" : "Plugin 2", "count" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9891"), "name" : "Plugin 3", "count" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9892"), "y" : 0, "n" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9893"), "y" : 0, "n" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9894"), "y" : 0, "n" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9895"), "posts" : 0, "pages" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9896"), "y" : 0, "n" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9897"), "y" : 0, "n" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9898"), "zone" : -7, "count" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc9899"), "version" : "9.6.8", "count" : 1 }
{ "_id" : ObjectId("5044114815e24c08bcdc989a"), "version" : "10.0.3", "count" : 1 }
What am I doing wrong?

The problem you are having is with the two ISODate fields you have at the start of your document.
JSON does not have any "date" type, so it does not handle ISODate fields in your document. You would need to convert these like so:
[
{
"date" : { "$date" : 1343779200000 },
"start" : { "$date" : 1343779200000 },
...
And your import will work.
The reason this comes about is because MongoDB handles more types than are available in the JSON spec. You can see more information in the documentation. There is also an open ticket to make MongoImport handle all the formats MongoDB does here and more details here

This is really frustrating; I couldn't get anywhere fast with the import tool so I used the load() function within the mongo client to load a script which inserted my records.
> load('/Users/Alex/Desktop/daily.json');
I obviously had to modify the json file to include insert commands like so:
>db.mycollection.insert(
{ DOCUMENT 1 },
...
{ DOCUMENT N }
);

This is really late, but in case it can help anyone else - you should not be passing a JSON array. Simply list 1 JSON document per line, and each line will create a separate document. The file below would insert 2 documents:
{ "date" : { "$date": 1354320000000 }, "xxx" : 1, "yyt" : 5, ... }
{ "date" : { "$date": 1354320000000 }, "xxx" : 2, "yyt" : 6, ... }

Related

MongoDB - how to optimise find query with regex search, with sort

I need to execute the following query:
db.S12_RU.find({"venue.raw":a,"title":/b|c|d|e/}).sort({"year":-1}).skip(X).limit(Y);
where X and Y are numbers.
The number of documents in my collection is:
208915369
Currently, this sort of query takes about 6 minutes to execute.
I have the following indexes:
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
{
"v" : 2,
"key" : {
"venue.raw" : 1
},
"name" : "venue.raw_1"
},
{
"v" : 2,
"key" : {
"venue.raw" : 1,
"title" : 1,
"year" : -1
},
"name" : "venue.raw_1_title_1_year_-1"
}
]
A standard document looks like this:
{ "_id" : ObjectId("5fc25fc091e3146fb10484af"), "id" : "1967181478", "title" : "Quality of Life of Swedish Women with Fibromyalgia Syndrome, Rheumatoid Arthritis or Systemic Lupus Erythematosus", "authors" : [ { "name" : "Carol S. Burckhardt", "id" : "2052326732" }, { "name" : "Birgitha Archenholtz", "id" : "2800742121" }, { "name" : "Kaisa Mannerkorpi", "id" : "240289002" }, { "name" : "Anders Bjelle", "id" : "2419758571" } ], "venue" : { "raw" : "Journal of Musculoskeletal Pain", "id" : "49327845" }, "year" : 1993, "n_citation" : 31, "page_start" : "199", "page_end" : "207", "doc_type" : "Journal", "publisher" : "Taylor & Francis", "volume" : "1", "issue" : "", "doi" : "10.1300/J094v01n03_20" }
Is there any way to make this query execute in a few seconds?

MongoDB dataset : pairs not reducing or problem with script

I'm new to programming and mongoDB and learning as I go, I'm attempting a mapreduce on a dataset using mongoDB. So far I've converted the csv to json and imported it into a mongoDB using compass.
In compass the data now looks like this :
_id :5bc4e11789f799178470be53
slug :"bitcoin"
symbol :"BTC"
name :"Bitcoin"
date :"2013-04-28"
ranknow :"1"
open :"135.3"
high :"135.98"
low :"132.1"
close :"134.21"
volume :"0"
market :"1500520000"
close_ratio :"0.5438"
spread :"3.88"
I've added each value as indices as follows, is this the right process so I can run a mapreduce against the data ?
db.testmyCrypto.getIndices()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "id",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"slug" : 1
},
"name" : "slug_1",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"symbol" : 2
},
"name" : "symbol_2",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"name" : 3
},
"name" : "name_3",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"data" : 4
},
"name" : "data_4",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"ranknow" : 4
},
"name" : "ranknow_4",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"ranknow" : 5
},
"name" : "ranknow_5",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"open" : 6
},
"name" : "open_6",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"high" : 7
},
"name" : "high_7",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"low" : 8
},
"name" : "low_8",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"volume" : 9
},
"name" : "volume_9",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"market" : 10
},
"name" : "market_10",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"close_ratio" : 11
},
"name" : "close_ratio_11",
"ns" : "myCrypto.testmyCrypto"
},
{
"v" : 2,
"key" : {
"spread" : 13
},
"name" : "spread_13",
"ns" : "myCrypto.testmyCrypto"
}
]
I've scraped the above and now im doing the following from the link to the map-reduce. Is this the correct output, someone ?
> db.testmyCrypto.mapReduce(function() { emit( this.slug, this.symbol ); }, function(key, values) { return Array.sum( values ) },
... {
... query: { date:"2013-04-28" },
... out: "Date 04-28"
... }
... )
{
"result" : "Date 04-28",
"timeMillis" : 837,
"counts" : {
"input" : 0,
"emit" : 0,
"reduce" : 0,
"output" : 0
},
"ok" : 1
}
I've added the "key value pairs" but I don't seem to be able to get anything from the data.
> db.testmyCrypto.mapReduce(function() { emit( this.slug, this.symbol, this.name, this.date, this.ranknow, this.open, this.high, this.low, this.close, this.volume, this.market, this.close_ratio, this.spread ); }, function(key, values) { return Array.sum( values ) }, { query: { slug:"bitcoin" }, out: "Date 04-28" } )
{
"result" : "Date 04-28",
"timeMillis" : 816,
"counts" : {
"input" : 0,
"emit" : 0,
"reduce" : 0,
"output" : 0
},
"ok" : 1 }
>
if you trying to sum some values then they need to numeric (when you importing data to mongo try to set type for values)
db.collectionName.mapReduce(
function() {
emit(
this.slug,
this.open
)
},
function(keySlug, valueOpen) {
return Array.sum(valueOpen)
},
{
query: { date:"2013-04-28" },
out: "Date 04-28"
}
)
this query will return you sum of open values for each slug filtered by date.
ps. you can do same thing with aggregation.
if you have any question let me know.

MongoDB query to get CPU usage

Using mongodb, I know that I can use the command
db.serverStatus()
Which will return a lot of information about the current mongo instance, including memory information:
"mem" : {
"bits" : 64,
"resident" : 4303,
"virtual" : 7390,
...
}
Is there anything similar, or anything in this output that I may be missing, that will also report CPU usage details?
i.e.
"cpu" : {
"usr" : 32,
"wa" : 16,
"id" : 52
}
You could try top command and check if the output gives you necessary information. Switch to admin database and issue:
db.runCommand( { top: 1 } )
{
"totals" : {
"note" : "all times in microseconds",
"Orders.orders" : {
"total" : {
"time" : 107211,
"count" : 56406
},
"readLock" : {
"time" : 107205,
"count" : 56405
},
"writeLock" : {
"time" : 6,
"count" : 1
},
"queries" : {
"time" : 105,
"count" : 1
},
"getmore" : {
"time" : 0,
"count" : 0
},
"insert" : {
"time" : 0,
"count" : 0
},
"update" : {
"time" : 0,
"count" : 0
},
"remove" : {
"time" : 0,
"count" : 0
},
"commands" : {
"time" : 0,
"count" : 0
}
},.... rest clipped as it gives per db stats

Nested conditional MongoDB query

Im having a hard time trying to run some nested queries with a conditional statement of an item inside an array.
this is how my documents looks like.
I would like to get a summary such as sum and average and alarmedCount (count every time Channels.AlarmStatus == "alarmed") of each "Channel" based on Channels.Id. I got sum and average to work but cant get the right query for alarmedCount
{
"_id" : "55df8e4cd8afa4ccer1915ee"
"location" : "1",
"Channels" : [{
"_id" : "55df8e4cdsafa4cc0d1915r1",
"ChannelId" : 1,
"Value" : 14,
"AlarmStatus" : "normal"
},
{
"_id" : "55df8e4cdsafa4cc0d1915r9",
"ChannelId" : 2,
"Value" : 20,
"AlarmStatus" : "alarmed"
},
{
"_id" : "55df8e4cdsafa4cc0d1915re",
"ChannelId" : 3,
"Value" : 10,
"AlarmStatus" : "alarmed"},
]
}
{
"_id" : "55df8e4cd8afa4ccer1915e0"
"location" : "1",
"Channels" : [{
"_id" : "55df8e4cdsafa4cc0d19159",
"ChannelId" : 1,
"Value" : 50,
"AlarmStatus" : "normal"
},
{
"_id" : "55df8e4cdsafa4cc0d1915re",
"ChannelId" : 2,
"Value" : 16,
"AlarmStatus" : "normal"
},
{
"_id" : "55df8e4cdsafa4cc0d1915g7",
"ChannelId" : 3,
"Value" : 9,
"AlarmStatus" : "alarmed"},
]
}
I got it to work to group them and show some calculations
using this aggregate
db.records.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$Channels.Id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )
the result looks like this:
{
"result" : [
{
"_id" : 1,
"documentSum" : 64,
"documentAvg" : 32
},
{
"_id" : 2,
"documentSum" : 36,
"documentAvg" : 18
},
{
"_id" : 3,
"documentSum" : 19,
"documentAvg" : 9.5
},
],
"ok" : 1.0000000000000000
}
I would like to get this type of result
{
"result" : [
{
"_id" : 1,
"documentSum" : 64,
"documentAvg" : 32,
"AlarmedCount" : 0
},
{
"_id" : 2,
"documentSum" : 36,
"documentAvg" : 18,
"AlarmedCount" : 1
},
{
"_id" : 3,
"documentSum" : 19,
"documentAvg" : 9.5,
"AlarmedCount" : 2
}
],
"ok" : 1.0000000000000000
}
Use a project-step before your group-step to convert the field AlarmedStatus to 1 or 0 depending on its value:
$project: {
"Channels.value":"$Channels.Value",
"Channels.AlarmCount":{ $cond: {
if: { $eq: ["$Channels.AlarmedStatus", "alarmed"] },
then: 1,
else: 0 }
}
}
Then sum the newly created field to get the aggregated count:
$group : {
"_id" : "$Channels.Id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" },
"AlarmCount" : { "$sum" : "$Channels.AlarmCount" }
}

MongoDB show not all elements in subdocument

I have the document of the following structure:
{
"_id" : ObjectId("50b8f881065f90c025000014"),
"education" : {
"schoolCountry" : 4,
"schoolTown" : -1,
"uniCountry" : 4,
"uniTown" : -1
},
"info" : {
"ava" : "auto.jpg",
"birthday" : ISODate("1942-04-01T21:00:00Z"),
"email" : "mail#gmail.com",
"name" : "name",
"sex" : 1,
"surname" : "surname"
}
}
I am trying to output only surname and name
The only thing I was able to achieve is this:
db.COLL.find({ }, {
"_id" : 0,
"education" : 0,
"info" : 1
})
My idea to show only elements that I need from subdocument failed:
db.COLL.find({ }, {
"_id" : 0,
"education" : 0,
"info.surname" : 1,
"info.name" : 1,
})
But hidding (info.email : 0) works. Is it possible to achieve my goal without hidding all unneeded fields?
You can't mix including and excluding fields, aside from turning off _id (which is included by default).
So just request the info.surname and info.name fields:
db.coll.find({ }, {
"_id" : 0,
"info.surname" : 1,
"info.name" : 1,
})
Sample output:
{ "info" : { "name" : "name", "surname" : "surname" } }