Entity insertions at Orion increasingly slower - fiware-orion

When I have a lot of information in Context Orion Broker (in MongoDB) and when I try to insert more information increasingly your insertion is slower.
E.g.: at this moment basically I have in Orion 3GB of information and when I try to send more information to Orion, I'm waiting more or less 15 minutes to send 50MB, however, if I send the same information when the Orion was empty this process finish in 1 minute.
admin 0.000GB
config 0.000GB
local 0.000GB
orion 2.932GB
Is normally this process? I mean, increasingly your insertion to be slower.
Extra info: VPS Linux with 2 cores and 8GB ram.
Indexes information:
> use orion
switched to db orion
> show collections
entities
> db.entities.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "orion.entities"
},
{
"v" : 2,
"key" : {
"location.coords" : "2dsphere"
},
"name" : "location.coords_2dsphere",
"ns" : "orion.entities",
"2dsphereIndexVersion" : 3
},
{
"v" : 2,
"key" : {
"expDate" : 1
},
"name" : "expDate_1",
"ns" : "orion.entities",
"expireAfterSeconds" : 0
}
]

To speed up Orion DB operations you should create the indexes to optimize performance recommended in performance tunning Orion documentation. In particular:
{_id.servicePath: 1, _id.id: 1, _id.type: 1} (note that this is a compound index and key order matters in this case)
creDate

Now is fast... i use this(as you said): db.entities.createIndex({"_id.id": 1, "_id.type":
1, "_id.servicePath": 1})
Thank you !

Related

Poor performance on bulk deleting a large collection mongodb

I have a single standalone mongo installation on a Linux machine.
The database contains a collection with 181 million documents. This collection is by far the largest collection in the database (approx 90%)
The size of the collection is currently 3.5 TB.
I'm running Mongo version 4.0.10 (Wired Tiger)
The collection have 2 indexes.
One on id
One on 2 fields and it is used when deleting documents (see those in the snippet below).
When benchmarking bulk deletion on this collection we used the following snippet
db.getCollection('Image').deleteMany(
{$and: [
{"CameraId" : 1},
{"SequenceNumber" : { $lt: 153000000 }}]})
To see the state of the deletion operation I ran a simple test of deleting 1000 documents while looking at the operation using currentOp(). It shows the following.
"command" : {
"q" : {
"$and" : [
{
"CameraId" : 1.0
},
{
"SequenceNumber" : {
"$lt" : 153040000.0
}
}
]
},
"limit" : 0
},
"planSummary" : "IXSCAN { CameraId: 1, SequenceNumber: 1 }",
"numYields" : 876,
"locks" : {
"Global" : "w",
"Database" : "w",
"Collection" : "w"
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(877),
"w" : NumberLong(877)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(877)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(877)
}
}
}
It seems to be using the correct index but the number and type of locks worries me. As I interpret this it aquires 1 global lock for each deleted document from a single collection.
When using this approach it has taken over a week to delete 40 million documents. This cannot be expected performance.
I realise there other design exists such as bulking documents into larger chunks and store them using GridFs, but the current design is what it is and I want to make sure that what I see is expected before changing my design or restructuring the data or even considering clustering etc.
Any suggestions of how to increase performance on bulk deletions or is this expected?

mongodb single node performance

I use MongoDB for an internal ADMIN type of application used by my team.
Mongo is installed on 1 box and no replica sets.
ADMIN application inserts 70K to 100K documents/per day and we maintain 4 months of data. DB has ~100 million documents at any given time.
When the application was deployed, it all started fine for few days. As the data kept accumulated to reach the 4 months max limit, I see severe performance issues with MongoDB.
I installed MongoDB 3.0.4 as-is on a Linux box and did not fine tune any optimization settings.
Are there any optimization settings I need to adjust?
ADMIN application has schedulers which runs every 1/2 hr to insert and purge outdated data. Given below collection with indexes defined on createdDate,env,messageId,sourceSystem, I see few queries were taking 30 min to respond.
Sample query: Count of documents with a given env,sourceSystem, but between a given range of dates. ADMIN app uses grails and the above query is created using GORM. It used to work fine in the beginning. But over the period of time, performance degraded. I tried restarting the application as well. It didn't help. I believe using the MongoDB as-is (like a Dev Mode) might be causing performance issue. Any suggestions on what to tweak in settings (perhaps cpu/mem limits etc)?
{
"_id" : ObjectId("5575e388e4b001976b5e570f"),
"createdDate" : ISODate("2015-06-07T05:00:34.040Z"),
"env" : "prod",
"messageId" : "f684b34d-a480-42a0-a7b8-69d6d18f39e5",
"payload" : "JSON or XML DATA",
"sourceSystem" : "sourceModule"
}
Update:
Indices:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"messageId" : 1
},
"name" : "messageId_1",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"createdDate" : 1
},
"name" : "createdDate_1",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"sourceSystem" : 1
},
"name" : "sourceSystem_1",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"env" : 1
},
"name" : "env_1",
"ns" : "admin.Message"
}
]

What is the best Mongodb Sharding key for my schema?

I am design a Mongodb collection which can save the statistic for daily volume
Here is my DB schema
mongos> db.arq.findOne()
{
"_id" : ObjectId("553b78637e6962c36d67c728"),
"ip" : NumberLong(635860665),
"ts" : ISODate("2015-04-25T00:00:00Z"),
"values" : {
"07" : 2,
"12" : 1
},
"daily_ct" : 5
}
mongos>
And Here is my Indexes
mongos> db.arq.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "Query_Volume.test"
},
{
"v" : 1,
"key" : {
"ip" : 1
},
"name" : "ip_1",
"ns" : "Query_Volume.test"
},
{
"v" : 1,
"key" : {
"ts" : 1
},
"name" : "ts_1",
"expireAfterSeconds" : 15552000,
"ns" : "Query_Volume.test"
}
]
mongos>
Note: I have a time stamp index since I need to use TTL mechanism.
But the Sharding Key has any suggestion?
You have multiple options:
{ts: 1} Your timestamp. The data of certain ranges will be located together, but the key is monotonically increasing, and I'm not sure, whether the TTL index will clean up shard chunks. Means: The write load switches from shard to shard, and you have a shard with high write load whereas the other shards will get no writes for the data. This pattern works nicely if you query contiguous time ranges but has downsides in writing.
{ts: "hashed"} Hash-based sharding. The data will be sharded more or less evenly across the shards. Hash-based sharding distributes the write load but involves all shards (more or less) when querying for data.
You will need to test, what fits the best for your reads and writes. The sharding key depends on the data structure and the read/write patterns of your application.

My mongo query read run more than 5000 milliseconds

I have a mysql as primary db and mongodb as secondary database.
I run a query on production and it run for more than 5 seconds.
Here is my query when i run with explain
{
"cursor" : "BtreeCursor host_1_type_1 multi",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 313566,
"nscannedObjectsAllPlans" : 313553,
"nscannedAllPlans" : 627118,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 14,
"nChunkSkips" : 0,
"millis" : 6555,
"indexBounds" : {
"host" : [
[
"",
{
}
],
[
/shannisideup/i,
/shannisideup/i
]
],
"type" : [
[
"ambassador-profile",
"ambassador-profile"
]
]
},
"server" : "mongoserver:27017"
}
i've added an indexes
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "db.visitor",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"host" : 1,
"type" : 1
},
"ns" : "db.visitor",
"name" : "host_1_type_1"
}
]
But i still don't know why is it so slow when running query
db.visitor.find({ host:/shannisideup/i, type:"ambassador-profile" }).limit(1)
FYI i run a different server for my apps and my mongodb server in AWS cloud.
For which mongodb running EC2 m3.medium, i've tried raising the openfile limit as mongodb website suggested. My mongodb is running on separate 100GB disk mounted via /dev/sdf.
I run mongodb version 2.4.5
When my mongodb runs, the CPU load almost always 100%
My MMS stat for opcounter are:
command: 19.11
query: 13.79
update: 0.03
delete: 0.00001
getmore: 0
insert: 0.03
My highest Pagefaults is 0.02
What can i do to optimize my mongodb to less then 1 second?
I'll walk thru a few things that should help, although I want to make clear up front that a case insensitive regex search ("host:/shannisideup/i") cannot use an index so there are limits to what you can do with this data model and search. You can see that it's scanning a large number of objects ("nscanned" : 313566) just to return 1 document.
Things to do:
Upgrade your instance with a faster cpu - the 100% cpu load during
this case insensitive search is a clearly indicator that at times
your database is cpu bound. A faster cpu will help. More memory
(which typically comes with Amazon EC2 instances when you move to a
faster cpu) won't hurt either. Same with a faster disk (SSD).
Lower case your host field before storing it in MongoDB - if you
store all your hosts in lower case and then lowercase your search
string prior to search, you can get rid of the case-insensitive flag
for the regex. If you can combine that with a prefix operator in your
regex you would be able to use the index which would help.
Consider using a dedicated text search product (like Solr or
Elasticsearch) for the host query. MongoDB is limited in it's text
search capabilities, particularly with regard to wild card searches.
You may find that something like Elasticsearch or Solr may provide
better performance.
A few links that you may find useful:
http://docs.mongodb.org/manual/reference/operator/query/regex/
http://selectnull.leongkui.me/2014/02/02/mongodb-elasticsearch-perfect-nosql-duo/

Insert operation became very slow for MongoDB

The client is pymongo.
The program has been running for one week. It's indeed very fast to insert data before: about 10 million / 30 minutes.
But today i found the insert operation became very very slow.
There are about 120 million records in the goods collection now.
> db.goods.count()
123535156
And the indexs for goods collection is as following:
db.goods.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "shop.goods",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"item_id" : 1,
"updated_at" : -1
},
"unique" : true,
"ns" : "shop.goods",
"name" : "item_id_1_updated_at_-1"
},
{
"v" : 1,
"key" : {
"updated_at" : 1
},
"ns" : "shop.goods",
"name" : "updated_at_1"
},
{
"v" : 1,
"key" : {
"item_id" : 1
},
"ns" : "shop.goods",
"name" : "item_id_1"
}
]
And there is enough RAM and CPU.
Someone told me because there are too many records. But didn't tell me how to solve this problem. I was a bit disappointed with the MongoDB.
There will be more data needs to be stored in future(about 50 million new records per day). Is there any solution?
Met same situation on another sever(Less data this time, total about 40 million), the current insert speed is about 5 records per second.
> db.products.stats()
{
"ns" : "c2c.products",
"count" : 42389635,
"size" : 554721283200,
"avgObjSize" : 13086.248164203349,
"storageSize" : 560415723712,
"numExtents" : 283,
"nindexes" : 3,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1.0000000000132128,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 4257185968,
"indexSizes" : {
"_id_" : 1375325840,
"product_id_1" : 1687460992,
"created_at_1" : 1194399136
},
"ok" : 1
}
I don't know if it is your problem, but take in mind that MongoDB has to update index for each insert. So if you have many indexes, and many documents, performance could be lower than expected.
Maybe, you can speed up inserts operations using sharding. You don't mention it in your question, so I guess you are not using it.
Anyway, could you provide us more information? You can use db.goods.stats(), db.ServerStatus or any of theese other methods to gather information about performance of your database.
Another possible problem is IO. Depending on your scenario Mongo might be busy trying to grow or allocate storage files for the given namespace (i.e. DB) for the subsequent insert statements. If your test pattern has been add records / delete records / add records / delete records you are likely reusing existing allocated space. If your app is now running longer than before you might be in the situation I described.
Hope this sheds some light on your situation.
I had a very similar problem.
First you need to make sure which is your bottleneck (CPU, memory and Disk IO). I use several unix tools (such as top, iotop, etc) to detect the bottleneck. In my case I found insertion speed was lagged by IO speed because mongod often took 99% io usage. (Note: my original db used mmapv1 storage engine).
My work around was to change storage engine to wiredtiger. (either by mongodump your original db then mongorestore into wiredtiger format, or start a new mongod with wiredtiger engine and then resync from other replica set memebers.) My insertion speed went to normal after doing that.
However, I am still not sure why mongod with mmapv1 suddenly drained IO usages after the size of documents reached a point.