MongoDB indexing

MongoDB indexing - mongodb

We have a MongoDB collection containing nearly 40 million records. The current size of the collection is 5GB. The data stored in this collection contains following fields:
_id: "MongoDB id"
userid: "user id" (int)
mobile: "users mobile number" (int)
transaction: "transaction id" (int)
sms: "message sent to user mobile" (text)
created_dt: "unix timestamp of the transaction"
Apart from the index on _id (created by default), we have defined separate indexes on the mobile and transaction fields.
However, the following query takes anywhere between 60 to 120 seconds to complete:
{
mobile:<users mobile number>
}
I access MongoDB using RockMongo. MongoDB is hosted on a server with 16GB RAM. Nearly 8GB RAM on this server is free.
What is it that I am doing wrong here?
Update:
Output of explain:
{
"cursor" : "BasicCursor",
"nscanned" : 37145516,
"nscannedObjects" : 37145516,
"n" : 37145516,
"millis" : 296040,
"nYields" : 1343,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Output of mongostat at the time of the query
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
13 2 0 0 0 1 0 168g 336g 6.86g 1 1 0 0|0 1|0 21k 1k 19 11:30:04
16 0 0 0 0 1 0 168g 336g 6.88g 0 0.1 0 0|0 1|0 21k 1k 19 11:30:05
14 0 0 0 0 1 0 168g 336g 6.86g 0 0 0 0|0 1|0 29k 1k 19 11:30:06
10 0 0 0 0 1 0 168g 336g 6.86g 0 0 0 0|0 1|0 19k 1k 19 11:30:07
16 0 0 0 0 1 0 168g 336g 6.88g 0 0.1 0 0|0 1|0 21k 1k 19 11:30:08
9 0 0 0 0 1 0 168g 336g 6.89g 0 0 0 0|0 1|0 13k 1k 19 11:30:09
19 0 0 0 0 1 0 168g 336g 6.89g 0 0 0 0|0 1|0 27k 1k 19 11:30:10
12 0 0 0 0 1 0 168g 336g 6.89g 1 1.2 0 0|0 1|0 24k 1k 19 11:30:11
17 0 0 0 0 1 0 168g 336g 6.89g 1 1.7 0 0|0 1|0 31k 1k 19 11:30:12
15 0 0 0 0 1 0 168g 336g 6.89g 0 0 0 0|0 1|0 19k 1k 19 11:30:13
Update 2:
Until recently, we used to store another collection with about 1.3 billion documents in the same MongoDB server. This collection has now been removed (dropped). This may explain the mapped / vsize column in above output from mongostat.
The server also stores 6 other collections which have frequent inserts. The total storage size currently is about 35GB.
Update 3:
Indexes defined on the collection. Created using RockMongo.
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "mymongodb.transaction_sms_details",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"_transaction_mobile_" : 1
},
"ns" : "mymongodb.transaction_sms_details",
"background" : 1,
"name" : "mobile"
},
{
"v" : 1,
"key" : {
"_transaction_transaction_" : 1
},
"ns" : "mymongodb.transaction_sms_details",
"background" : 1,
"name" : "transaction"
}
]

The keys generated by RockMongo is apparently incorrect.
"_transaction_mobile_" : 1
"_transaction_transtion_" : 1
I don't know what's wrong with RockMongo, but I think this can fix the issue:
db.xxx.dropIndexes();
db.xxx.ensureIndex({mobile: 1});
db.xxx.ensureIndex({transaction: 1});
Notice: This may take VERY LONG time. Don't do this on a running production machine.

Related

Show full system.string[] in CSV file

I am trying to get the wordwheelquery out of HKU into a csv. This is the original output:
$reg.PSProvider.Description
MRUListEx : {3, 0, 0, 0...}
0 : {104, 0, 97, 0...}
1 : {97, 0, 99, 0...}
2 : {107, 0, 117, 0...}
3 : {97, 0, 112, 0...}
I want to be able to get each property into their own row under the corresponding heading (property name). So far, this is as far as I've gotten:
$reg = Get-ItemProperty -Path REGISTRY::HKEY_USERS\*\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION\EXPLORER\WordWheelQuery
foreach($reg_properties in $reg.PsObject.Properties){$_properties.Name, $_properties.Value
#{Name=$reg.Name; Expression={$reg.Value}}
MRUListEx
3
0
0
0
2
0
0
0
1
0
0
0
0
0
0
0
255
255
255
255
0
104
0
97
0
108
0
32
0
108
0
101
0
111
0
110
0
97
0
114
0
100
0
0
0
1
97
0
99
0
97
0
100
0
101
0
109
0
105
0
99
0
0
0
2
107
0
117
0
112
0
100
0
102
0
46
0
110
0
101
0
116
0
95
0
104
0
97
0
108
0
45
0
108
0
101
0
111
0
110
0
97
0
114
0
100
0
45
0
103
0
117
0
105
0
116
0
97
0
114
0
45
0
109
0
101
0
116
0
104
0
111
0
100
0
45
0
98
0
111
0
111
0
107
0
45
0
49
0
46
0
112
0
100
0
102
0
0
0
3
97
0
112
0
112
0
100
0
97
0
116
0
97
0
0
0
PSPath
Microsoft.PowerShell.Core\Registry::HKEY_USERS\S-1-5-21-xxxxxxxx29-xxxxxxx50-54xxxxxxxxx9-1001\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION\EXPLORER\WordWheelQuery
PSParentPath
Microsoft.PowerShell.Core\Registry::HKEY_USERS\S-1-5-21-xxxxxxxx29-xxxxxxx50-54xxxxxxxxx9-1001\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION\EXPLORER
PSChildName
WordWheelQuery
PSProvider
When exported to a CSV, I still get issues about system.string[] and for the life of me can't get around it.
EDIT: For reference, here is the length of each property
PS C:\> $reg.MRUListEx | Measure-Object
Count : 20
Average :
Sum :
Maximum :
Minimum :
Property :
PS C:\> $reg.0 | Measure-Object
Count : 24
Average :
Sum :
Maximum :
Minimum :
Property :
PS C:\> $reg.1 | Measure-Object
Count : 18
Average :
Sum :
Maximum :
Minimum :
Property :
PS C:\> $reg.2 | Measure-Object
Count : 94
Average :
Sum :
Maximum :
Minimum :
Property :
PS C:\> $reg.3 | Measure-Object
Count : 16
Average :
Sum :
Maximum :
Minimum :
Property :

HALog - Connect and response times percentiles

When I run the following command to parse haproxy logs, the output doesn't contain any headers, and I'm not able to understand the meanings of the numbers in each of the columns.
Command halog -pct < haproxy.log > percentiles.txt
the output that I see is:
0.1 3493 18 0 0 0
0.2 6986 25 0 0 0
0.3 10479 30 0 0 0
0.4 13972 33 0 0 0
0.5 17465 37 0 0 0
0.6 20958 40 0 0 0
0.7 24451 43 0 0 0
0.8 27944 46 0 0 0
0.9 31438 48 0 0 0
1.0 34931 49 0 0 0
1.1 38424 50 0 0 0
1.2 41917 51 0 0 0
1.3 45410 52 0 0 0
1.4 48903 53 0 0 0
1.5 52396 55 0 0 0
1.6 55889 56 0 0 0
1.7 59383 57 0 0 0
1.8 62876 58 0 0 0
1.9 66369 60 0 0 0
2.0 69862 61 0 0 0
3.0 104793 74 0 0 0
4.0 139724 80 0 1 0
5.0 174656 89 0 1 0
6.0 209587 94 0 1 0
7.0 244518 100 0 1 0
8.0 279449 106 0 1 0
9.0 314380 112 0 1 0
10.0 349312 118 0 1 0
15.0 523968 144 0 1 0
20.0 698624 168 0 1 0
25.0 873280 180 0 2 0
30.0 1047936 190 0 2 0
35.0 1222592 200 0 3 0
40.0 1397248 210 0 3 0
45.0 1571904 220 0 4 0
50.0 1746560 230 0 6 0
55.0 1921216 241 0 7 0
60.0 2095872 258 0 9 0
65.0 2270528 279 0 10 0
70.0 2445184 309 0 16 0
75.0 2619840 354 1 18 0
80.0 2794496 425 1 20 0
85.0 2969152 545 1 22 0
90.0 3143808 761 1 39 1
91.0 3178740 821 1 80 1
92.0 3213671 921 1 217 1
93.0 3248602 1026 1 457 1
94.0 3283533 1190 1 683 1
95.0 3318464 1408 1 889 1
96.0 3353396 1721 1 1107 1
97.0 3388327 2181 1 1328 1
98.0 3423258 2902 1 1555 1
98.1 3426751 3000 1 1580 1
98.2 3430244 3094 1 1607 1
98.3 3433737 3196 1 1635 1
98.4 3437231 3301 1 1666 1
98.5 3440724 3420 1 1697 1
98.6 3444217 3550 1 1731 1
98.7 3447710 3690 1 1770 1
98.8 3451203 3848 1 1815 1
98.9 3454696 4030 1 1864 1
99.0 3458189 4249 1 1923 2
99.1 3461682 4490 1 1993 2
99.2 3465176 4766 2 2089 2
99.3 3468669 5085 2 2195 2
99.4 3472162 5441 3 2317 97
99.5 3475655 5899 5 2440 365
99.6 3479148 6517 11 2567 817
99.7 3482641 7403 14 2719 1555
99.8 3486134 8785 16 2992 2779
99.9 3489627 11650 997 3421 4931
100.0 3493121 85004 4008 20914 71716
The first column looks to be the percentile, (like P50, P90, P99, etc) but the what are the values in the 2nd, 3rd, 4th, 5th and 6th columns? Also, are they total values (halog reports total times when provided with other options), or average values or maximum values?

<percentile> <request count> <Request Time*> <Connect Time**> <Response Time***> <Data Time****>
* Referred to as TR in the documentation.
** Referred to as Tc in the documentation.
*** Referred to as Tr in the documentation.
**** Referred to as Td in the documentation.
The source provides some good pointers.

How to avoid locking in mongodb

I have a collection which has concurrent reads as and also some part of the application updating the same collection but during the load each read and update operation taking so much of time and it gets very slow with time
Here is log of some query
nscanned:4 nupdated:2 keyUpdates:3 numYields: 1 locks(micros) w:2475463 10247ms
nscanned:4 nupdated:2 keyUpdates:2 numYields: 1 locks(micros) w:2077481 1054ms
Collection has only 70K records.
Concurrent read and writes are almost 10.
This is what I have already done
Sharding with 3 member replica set
Sharding key is hashed and both db and collection level sharding is enble
Each replica box is has enough power and ram .
Query are bounded with index and db.collection.find().explain() has this output
{
"cursor" : "BtreeCursor fp.ulc_1_c_1_p_1",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"fp.ulc" : [
[
"0ca01c47c984b5583d455e42aafded2c",
"0ca01c47c984b5583d455e42aafded2c"
]
],
"c" : [
[
false,
false
]
],
"p" : [
[
1372062247612,
1.7976931348623157e+308
]
]
}
}
I have also tried to set read preference with secondary but after a period of time it also goes slow
Also I have noticed lock in mongostat here is output from mongostat
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time
*0 *0 6 *0 4 2|0 0 54.4g 109g 1.74g 0 collectDb:199.7% 0 6|0 0|1 3k 130k 21 set1 PRI 08:27:55
*0 *0 15 *0 11 8|0 1 54.4g 109g 1.74g 0 collectDb:200.1% 0 6|0 0|1 11k 357k 21 set1 PRI 08:27:58
7 *0 34 *0 18 26|0 0 54.4g 109g 1.75g 0 collectDb:202.9% 0 6|0 0|1 36k 362k 21 set1 PRI 08:28:00
1 *0 13 *0 8 7|0 0 54.4g 109g 1.75g 0 collectDb:192.3% 0 6|0 0|1 12k 287k 21 set1 PRI 08:28:03
1 *0 9 *0 7 8|0 0 54.4g 109g 1.75g 0 collectDb:196.1% 0 6|0 0|1 5k 258k 21 set1 PRI 08:28:04
5 *0 20 *0 10 13|0 0 54.4g 109g 1.75g 0 collectDb:207.7% 0 6|0 0|1 23k 214k 21 set1 PRI 08:28:08
8 *0 38 *0 21 29|0 0 54.4g 109g 1.74g 0 collectDb:215.9% 0 5|0 0|1 40k 548k 21 set1 PRI 08:28:12
6 *0 44 *0 24 22|0 0 54.4g 109g 1.75g 0 collectDb:199.5% 0 3|0 0|1 45k 509k 21 set1 PRI 08:28:15
2 4 27 *0 11 28|0 0 54.4g 109g 1.75g 0 collectDb:169.2% 0 6|0 0|1 21k 318k 21 set1 PRI 08:28:18
2 *0 29 *0 18 20|0 0 54.4g 109g 1.74g 0 collectDb:255.5% 0 5|0 0|1 28k 588k 21 set1 PRI 08:28:24

So i finally figured out some best way to avoid locking in mongodb.
What i did
Updated my mongodb to latest stable production release 2.4.8 from here.
Updated my ebs to optimized iops 2000 with Raid 10 ebs.
Monitored my slow queries from mongod.log file and also iowait for each drive.
Added some multikey index and compound indexs From Mongodb indexs docs.
And also i watched the consumption of ram on each ec2 instance including primary and secondary member of replica set.
Change the instance type to Ebs optimized with Gigabit Ethernet interface and more than 16 gb of ram on each server so that most of the time ram is available for index and current data set .
A good to read Documentation for amazon instance and their best use case so that you can understand your requirement better.
Although locking is a major issue in MongoDB but i think they are working on collection level locking so may be in the upcoming version it will solve almost everything related to performance degrade due to locking.
Here is the jira link you can check the status.

MongoDB replica set on a single machine for better reads?

I have a single mongod (2.0.2) running on a (Intel Xeon 8 core 2.1 Ghz, 32 GB RAM) server. Here are some sample stats from mongostat (usual normal calm day):
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn repl time
0 186 3 0 0 5 0 42.6g 87.8g 22.6g 2 0.8 0 0|0 0|0 35k 1m 319 M 20:36:00
0 177 3 0 0 4 0 42.6g 87.8g 22.5g 2 0.7 0 0|0 0|0 28k 993k 319 M 20:36:30
0 181 3 0 0 3 0 42.6g 87.8g 22.6g 1 0.6 0 0|0 0|1 28k 878k 319 M 20:37:00
0 177 4 0 0 4 0 42.6g 87.8g 22.6g 2 0.7 0 0|0 0|0 31k 851k 319 M 20:37:30
0 171 2 0 0 3 0 42.6g 87.8g 22.6g 2 0.4 0 0|0 1|0 25k 912k 319 M 20:38:00
0 133 1 0 0 3 0 42.6g 87.8g 22.5g 0 0.3 0 0|0 0|0 20k 673k 319 M 20:38:30
0 180 3 0 0 4 0 42.6g 87.8g 22.5g 1 0.6 0 0|0 1|0 29k 890k 319 M 20:39:00
But sometimes when there are 500-600 users online (I store visit counters in Mongo, so there are a lot of updates when visitors come) queries jump up to ~500 per second and read queue grows quickly and stays at around 40-50 for few minutes which makes scripts time out.
Can adding a replica set member on the same machine (I don't have any more physical servers) help me? I want to set the read preference to point to the secondary member so that writes on the primary instance do not block the reads.

MongoDB statistics

I'm running a MongoDB instance using a Replica Set, when there are a lot of insert, I can see very weird statistics on faults and locked %.
How come locked % can be more than 100 ?!
Where does the faults happen, I have no logs mentioning any fault, does someone have any clue about what it means ?
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
9 0 0 0 1 4 0 70.3g 141g 4.77g 20 124 0 0|0 0|1 1m 2m 10 socialdb M 18:49:49
18 0 0 0 3 1 0 70.3g 141g 4.77g 17 73.8 0 0|0 0|1 1m 2m 10 socialdb M 18:49:50
21 0 0 0 1 5 0 70.3g 141g 4.77g 18 104 0 0|0 0|1 1m 1m 10 socialdb M 18:49:51
20 0 0 0 3 1 0 70.3g 141g 4.78g 18 98.8 0 0|0 0|1 1m 3m 10 socialdb M 18:49:52
172 0 0 0 5 4 0 70.3g 141g 4.79g 133 72.8 0 0|0 0|0 7m 12m 10 socialdb M 18:49:53
76 0 0 0 3 1 0 70.3g 141g 4.8g 114 65.1 0 0|0 0|1 6m 10m 10 socialdb M 18:49:54
54 0 0 0 4 4 1 70.3g 141g 4.81g 45 90.6 0 0|0 0|1 2m 8m 10 socialdb M 18:49:55
85 0 0 0 4 2 0 70.3g 141g 4.84g 101 98.1 0 0|0 0|1 6m 11m 10 socialdb M 18:49:56
77 0 0 0 3 4 0 70.3g 141g 4.82g 78 74.5 0 0|0 0|1 4m 9m 10 socialdb M 18:49:57
72 0 0 0 3 1 0 70.3g 141g 4.84g 111 95.7 0 0|0 0|1 6m 10m 10 socialdb M 18:49:58
Is there a better (standard) monitoring tool, free ?

Not sure about the other two but this could be the answer to your first question, if you are using v2.2:
http://docs.mongodb.org/manual/reference/mongostat/The above page mentions:
locked:
The percent of time in a global write lock.
(Changed in version 2.2: The locked db field replaces the locked % field to more appropriate data regarding the database specific locks in version 2.2)
locked db:
New in version 2.2.
The percent of time in the per-database context-specific lock. mongostat will report the database that has spent the most time since the last mongostat call with a write lock.
This value represents the amount of time the database had a database specific lock and the time that the mongod spent in the global lock. Because of this, and the sampling method, you may see some values greater than 100%.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MongoDB indexing - mongodb

Related

Show full system.string[] in CSV file

HALog - Connect and response times percentiles

How to avoid locking in mongodb

MongoDB replica set on a single machine for better reads?

MongoDB statistics

Categories

Resources