why is this VRP problem taking so long to solve with the AUTOMATIC first solution strategy? - or-tools

I've modified the data in https://developers.google.com/optimization/routing/vrptw to my specific problem - see below.
If I'm trying with FirstSolutionStrategy.PATH_CHEAPEST_ARC or FirstSolutionStrategy.AUTOMATIC, it takes 42 seconds to solve this tiny problem!
If I'm changing to FirstSolutionStrategy.PATH_MOST_CONSTRAINED_ARC it solves within 3ms.
Why is the solver so sensitive to this setting? Is there some mode in which the solver tries all strategies in parallel and selects the fastest or the best one?
def create_data_model():
"""Stores the data for the problem."""
data = {}
data['time_matrix'] = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1000, 99999, 99999, 99999, 99999, 99999, 73, 104, 113, 117, 99999],
[1000, 99999, 99999, 99999, 99999, 99999, 84, 99999, 122, 125, 99999],
[1000, 99999, 99999, 99999, 99999, 99999, 97, 75, 122, 125, 99999],
[1000, 99999, 99999, 99999, 99999, 99999, 99999, 61, 109, 114, 103],
[1000, 99999, 99999, 99999, 99999, 99999, 47, 99999, 83, 89, 77],
[1000, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99, 92],
[1000, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 88, 94],
[1000, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 42, 64],
[1000, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 51],
[1000, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999]]
data['time_windows'] = [(0, 10000), (435, 450), (450, 465), (450, 465), (475, 490), (480, 495), (540, 555), (540, 555), (600, 615), (720, 735), (810, 825)]
data['num_vehicles'] = 5
data['depot'] = 0
return data
also changed the time dimension:
routing.AddDimension(
transit_callback_index,
10000, # allow waiting time
10000, # maximum time per vehicle
True, # Don't force start cumul to zero.
time)

Related

mongodb not follow prefix rule

mongodb often use index which include the field no in the filter , and the field will using [MaxKey , MinKey] ,this is not the same as what I saw in the index prefix documentation.
the query like this :
db.Wager.aggregate([
{"$match":
{
"CompanyId": 1341,
"UpdatedDateUtc" : {"$gte" : ISODate("2021-03-25T23:35:00Z")}
}},
{$project:
{"CompanyId":1,"UpdatedDateUtc":1,_id:0}
},
{"$group":
{"_id": 1,"n": {"$sum": 1}}
}
])
I hope this can use this index :
{ "CompanyId": -1, "UpdatedDateUtc" : -1 ,"WagerEventDateUtc" : -1 }
but it always will use index :
{ "CompanyId" : 1,"BetDateUtc" : -1, "WagerEventDateUtc" : -1, "UpdatedDateUtc" : -1}
there is the explain :
{
"nReturned": 101,
"executionTimeMillisEstimate": 0,
"totalKeysExamined": 101,
"totalDocsExamined": 0,
"executionStages": {
"stage": "PROJECTION_COVERED",
"nReturned": 101,
"executionTimeMillisEstimate": 0,
"works": 101,
"advanced": 101,
"needTime": 0,
"needYield": 0,
"saveState": 3,
"restoreState": 2,
"isEOF": 0,
"transformBy": {
"CompanyId": 1,
"UpdatedDateUtc": 1,
"_id": 0
},
"inputStage": {
"stage": "IXSCAN",
"nReturned": 101,
"executionTimeMillisEstimate": 0,
"works": 101,
"advanced": 101,
"needTime": 0,
"needYield": 0,
"saveState": 3,
"restoreState": 2,
"isEOF": 0,
"keyPattern": {
"CompanyId": 1,
"BetDateUtc": -1,
"WagerEventDateUtc": -1,
"UpdatedDateUtc": -1
},
"indexName": "CompanyId_1_BetDateUtc_-1_WagerEventDateUtc_-1_UpdatedDateUtc_-1",
"isMultiKey": false,
"multiKeyPaths": {
"CompanyId": [],
"BetDateUtc": [],
"WagerEventDateUtc": [],
"UpdatedDateUtc": []
},
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds": {
"CompanyId": [
"[1341.0, 1341.0]"
],
"BetDateUtc": [
"[MaxKey, MinKey]"
],
"WagerEventDateUtc": [
"[MaxKey, MinKey]"
],
"UpdatedDateUtc": [
"[new Date(9223372036854775807), new Date(1616715300000)]"
]
},
"keysExamined": 101,
"seeks": 1,
"dupsTested": 0,
"dupsDropped": 0
}
}
},
{
"nReturned": 101,
"executionTimeMillisEstimate": 0,
"totalKeysExamined": 101,
"totalDocsExamined": 0,
"executionStages": {
"stage": "PROJECTION_COVERED",
"nReturned": 101,
"executionTimeMillisEstimate": 0,
"works": 101,
"advanced": 101,
"needTime": 0,
"needYield": 0,
"saveState": 10242,
"restoreState": 10242,
"isEOF": 0,
"transformBy": {
"CompanyId": 1,
"UpdatedDateUtc": 1,
"_id": 0
},
"inputStage": {
"stage": "IXSCAN",
"nReturned": 101,
"executionTimeMillisEstimate": 0,
"works": 101,
"advanced": 101,
"needTime": 0,
"needYield": 0,
"saveState": 10242,
"restoreState": 10242,
"isEOF": 0,
"keyPattern": {
"CompanyId": -1,
"UpdatedDateUtc": -1,
"WagerEventDateUtc": -1
},
"indexName": "CompanyId_-1_UpdatedDateUtc_-1_WagerEventDateUtc_-1",
"isMultiKey": false,
"multiKeyPaths": {
"CompanyId": [],
"UpdatedDateUtc": [],
"WagerEventDateUtc": []
},
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds": {
"CompanyId": [
"[1341.0, 1341.0]"
],
"UpdatedDateUtc": [
"[new Date(9223372036854775807), new Date(1616715300000)]"
],
"WagerEventDateUtc": [
"[MaxKey, MinKey]"
]
},
"keysExamined": 101,
"seeks": 1,
"dupsTested": 0,
"dupsDropped": 0
}
}
}
I think mongodb choose wrong index were because "saveState" and "restoreState" too high , but I'm not sure because I don’t know what these two fields mean.
The answer to why is chooses one index over another is it runs a short test, calculates which index returns the most results per unit of "work", with bonus points for completing during the test or not using a blocking sort.
If 2 or more indexes are tied after that, it randomly picks one.

MongoDB indexed text search only works for exact match

I have field 'user_name' populated with data.
This code gives me no results:
history = db.history
history.create_index([('user_name', 'text')])
history.find({'$text' : {'$search' : 'a'}})
But when I specify the exact name, it works
history.find({'$text' : {'$search' : 'exact name'}})
Here is the output of explain() for 'a' search:
{
"executionSuccess": true,
"nReturned": 0,
"executionTimeMillis": 0,
"totalKeysExamined": 0,
"totalDocsExamined": 0,
"executionStages": {
"stage": "TEXT",
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 1,
"advanced": 0,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"indexPrefix": {},
"indexName": "user_name_text",
"parsedTextQuery": { "terms": [], "negatedTerms": [], "phrases": [], "negatedPhrases": [] },
"textIndexVersion": 3,
"inputStage": {
"stage": "TEXT_MATCH",
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 0,
"advanced": 0,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"docsRejected": 0,
"inputStage": {
"stage": "FETCH",
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 0,
"advanced": 0,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"docsExamined": 0,
"alreadyHasObj": 0,
"inputStage": { "stage": "OR", "nReturned": 0, "executionTimeMillisEstimate": 0, "works": 0, "advanced": 0, "needTime": 0, "needYield": 0, "saveState": 0, "restoreState": 0, "isEOF": 1, "dupsTested": 0, "dupsDropped": 0 }
}
}
},
"allPlansExecution": []
}
Here is the output of explain() for exact match of username ('akkcess'):
{
"executionSuccess": true,
"nReturned": 39,
"executionTimeMillis": 1,
"totalKeysExamined": 39,
"totalDocsExamined": 39,
"executionStages": {
"stage": "TEXT",
"nReturned": 39,
"executionTimeMillisEstimate": 0,
"works": 40,
"advanced": 39,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"indexPrefix": {},
"indexName": "user_name_text",
"parsedTextQuery": { "terms": ["akkcess"], "negatedTerms": [], "phrases": [], "negatedPhrases": [] },
"textIndexVersion": 3,
"inputStage": {
"stage": "TEXT_MATCH",
"nReturned": 39,
"executionTimeMillisEstimate": 0,
"works": 40,
"advanced": 39,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"docsRejected": 0,
"inputStage": {
"stage": "FETCH",
"nReturned": 39,
"executionTimeMillisEstimate": 0,
"works": 40,
"advanced": 39,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"docsExamined": 39,
"alreadyHasObj": 0,
"inputStage": {
"stage": "OR",
"nReturned": 39,
"executionTimeMillisEstimate": 0,
"works": 40,
"advanced": 39,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"dupsTested": 39,
"dupsDropped": 0,
"inputStage": {
"stage": "IXSCAN",
"nReturned": 39,
"executionTimeMillisEstimate": 0,
"works": 40,
"advanced": 39,
"needTime": 0,
"needYield": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"keyPattern": { "_fts": "text", "_ftsx": 1 },
"indexName": "user_name_text",
"isMultiKey": false,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "backward",
"indexBounds": {},
"keysExamined": 39,
"seeks": 1,
"dupsTested": 0,
"dupsDropped": 0
}
}
}
}
},
"allPlansExecution": []
}
Do you have any idea why it behaves this way?
According to docs and tutorials, this it should work.
"a" is almost surely a stop word. Almost every natural language text would include it. Therefore if it was searched for, you'd get every single document in the result set. Since this isn't very useful, text search drops stop words like "a" from the query.
Separately, MongoDB text search does include exact matching functionality, but it requires the query to be quoted which you haven't done therefore you are using the regular stemmed matching, not exact matching in your posted query.

Same query with different execution time in mongodb

I've executed one and the same query several times and the execution time differ from about a second to more than 20 seconds.
MongoDB version is 3.2.10.
Below is the output from explain method of a fast and a slow query.
Fast query:
{
"executionTimeMillis": 309,
"allPlansExecution": [
{
"shardName": "sh001-rs",
"allPlans": []
}
],
"totalKeysExamined": 18478,
"nReturned": 15096,
"executionStages": {
"executionTimeMillis": 309,
"shards": [
{
"shardName": "sh001-rs",
"executionSuccess": true,
"executionStages": {
"needYield": 0,
"docsExamined": 18378,
"saveState": 144,
"restoreState": 144,
"isEOF": 1,
"inputStage": {
"saveState": 144,
"isEOF": 1,
"seenInvalidated": 0,
"keysExamined": 18478,
"nReturned": 18378,
"invalidates": 0,
"keyPattern": {
"_id": 1
},
"isUnique": true,
"needTime": 99,
"isMultiKey": false,
"executionTimeMillisEstimate": 30,
"dupsTested": 0,
"restoreState": 144,
"direction": "forward",
"indexName": "_id_",
"isSparse": false,
"advanced": 18378,
"stage": "IXSCAN",
"dupsDropped": 0,
"needYield": 0,
"isPartial": false,
"indexBounds": {
"_id": []
},
"works": 18478,
"indexVersion": 1
},
"nReturned": 15096,
"needTime": 3381,
"filter": {
"available": {
"$gt": 0
}
},
"executionTimeMillisEstimate": 180,
"alreadyHasObj": 0,
"invalidates": 0,
"works": 18478,
"advanced": 15096,
"stage": "FETCH"
}
}
],
"nReturned": 15096,
"totalKeysExamined": 18478,
"totalChildMillis": 251,
"totalDocsExamined": 18378,
"stage": "SINGLE_SHARD"
},
"totalDocsExamined": 18378
}
Slow query:
{
"executionTimeMillis": 16139,
"allPlansExecution": [
{
"shardName": "sh001-rs",
"allPlans": []
}
],
"totalKeysExamined": 18478,
"nReturned": 15096,
"executionStages": {
"executionTimeMillis": 16139,
"shards": [
{
"shardName": "sh001-rs",
"executionSuccess": true,
"executionStages": {
"needYield": 0,
"docsExamined": 18378,
"saveState": 677,
"restoreState": 677,
"isEOF": 1,
"inputStage": {
"saveState": 677,
"isEOF": 1,
"seenInvalidated": 0,
"keysExamined": 18478,
"nReturned": 18378,
"invalidates": 0,
"keyPattern": {
"_id": 1
},
"isUnique": true,
"needTime": 99,
"isMultiKey": false,
"executionTimeMillisEstimate": 270,
"dupsTested": 0,
"restoreState": 677,
"direction": "forward",
"indexName": "_id_",
"isSparse": false,
"advanced": 18378,
"stage": "IXSCAN",
"dupsDropped": 0,
"needYield": 0,
"isPartial": false,
"indexBounds": {
"_id": []
},
"works": 18478,
"indexVersion": 1
},
"nReturned": 15096,
"needTime": 3381,
"filter": {
"available": {
"$gt": 0
}
},
"executionTimeMillisEstimate": 14518,
"alreadyHasObj": 0,
"invalidates": 0,
"works": 18478,
"advanced": 15096,
"stage": "FETCH"
}
}
],
"nReturned": 15096,
"totalKeysExamined": 18478,
"totalChildMillis": 16076,
"totalDocsExamined": 18378,
"stage": "SINGLE_SHARD"
},
"totalDocsExamined": 18378
}
The results from db.stats():
{
"raw" : {
"sh001-rs/host101-prod:27017,host102-prod:27018" : {
"db" : "records",
"collections" : 2,
"objects" : 124335,
"avgObjSize" : 48253.87085695902,
"dataSize" : 5999645033,
"storageSize" : 5008375808,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 17960960,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000018")
}
},
"sh002-rs/host101-prod:27018,host102-prod:27017" : {
"db" : "records",
"collections" : 2,
"objects" : 100643,
"avgObjSize" : 58044.42780918693,
"dataSize" : 5841765348,
"storageSize" : 4884041728,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 13737984,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000002")
}
},
"sh003-rs/host103-prod:27017,host104-prod:27018" : {
"db" : "records",
"collections" : 2,
"objects" : 191296,
"avgObjSize" : 31400.14176459518,
"dataSize" : 6006721519,
"storageSize" : 5967814656,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 32346112,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000012")
}
},
"sh004-rs/host103-prod:27018,host104-prod:27017" : {
"db" : "records",
"collections" : 2,
"objects" : 100904,
"avgObjSize" : 58444.951716482996,
"dataSize" : 5897329408,
"storageSize" : 5684531200,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 14114816,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff000000000000000c")
}
},
"sh005-rs/host105-prod:27017,host106-prod:27018" : {
"db" : "records",
"collections" : 16,
"objects" : 851626,
"avgObjSize" : 10900.204212882181,
"dataSize" : 9282897313,
"storageSize" : 7225233408,
"numExtents" : 0,
"indexes" : 43,
"indexSize" : 31690752,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff000000000000000e")
}
},
"sh006-rs/host105-prod:27018,host106-prod:27017" : {
"db" : "records",
"collections" : 2,
"objects" : 100946,
"avgObjSize" : 58688.667386523484,
"dataSize" : 5924386218,
"storageSize" : 7723163648,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 13565952,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000059")
}
},
"sh007-rs/host107-prod:27017,host108-prod:27018" : {
"db" : "records",
"collections" : 2,
"objects" : 100988,
"avgObjSize" : 58563.519497366025,
"dataSize" : 5914212707,
"storageSize" : 4643889152,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 14073856,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff000000000000000c")
}
},
"sh008-rs/host107-prod:27018,host108-prod:27017" : {
"db" : "records",
"collections" : 2,
"objects" : 100747,
"avgObjSize" : 58695.07362005817,
"dataSize" : 5913352582,
"storageSize" : 4877357056,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 13676544,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000002")
}
},
"sh009-rs/host109-prod:27017,host110-prod:27018" : {
"db" : "records",
"collections" : 4,
"objects" : 69101,
"avgObjSize" : 152884.28821580007,
"dataSize" : 10564457200,
"storageSize" : 16441020352,
"numExtents" : 32,
"indexes" : 17,
"indexSize" : 26171376,
"fileSize" : 19251855360,
"nsSizeMB" : 16,
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"dataFileVersion" : {
"major" : 4,
"minor" : 22
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000000")
}
},
"sh010-rs/host110-prod:27017,host113-prod:27018" : {
"db" : "records",
"collections" : 4,
"objects" : 69148,
"avgObjSize" : 152176.07311852838,
"dataSize" : 10522671104,
"storageSize" : 16439971776,
"numExtents" : 32,
"indexes" : 17,
"indexSize" : 26269488,
"fileSize" : 19251855360,
"nsSizeMB" : 16,
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"dataFileVersion" : {
"major" : 4,
"minor" : 22
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000002")
}
},
"sh011-rs/host109-prod:27018,host111-prod:27017" : {
"db" : "records",
"collections" : 2,
"objects" : 77687,
"avgObjSize" : 75111.53102835738,
"dataSize" : 5835189511,
"storageSize" : 5171572736,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 9543680,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000002")
}
},
"sh012-rs/host114-prod:27017,host115-prod:27018" : {
"db" : "records",
"collections" : 4,
"objects" : 91151,
"avgObjSize" : 115459.23068315213,
"dataSize" : 10524224336,
"storageSize" : 16454213568,
"numExtents" : 32,
"indexes" : 17,
"indexSize" : 42793184,
"fileSize" : 19251855360,
"nsSizeMB" : 16,
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"dataFileVersion" : {
"major" : 4,
"minor" : 22
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000028")
}
},
"sh013-rs/host114-prod:27018,host115-prod:27017" : {
"db" : "records",
"collections" : 2,
"objects" : 99992,
"avgObjSize" : 58494.27406192495,
"dataSize" : 5848959452,
"storageSize" : 6180712448,
"numExtents" : 0,
"indexes" : 17,
"indexSize" : 13615104,
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff00000000000000a5")
}
},
"sh014-rs/host111-prod:27018,host113-prod:27017" : {
"db" : "records",
"collections" : 4,
"objects" : 91498,
"avgObjSize" : 114842.1660801329,
"dataSize" : 10507828512,
"storageSize" : 16454213568,
"numExtents" : 32,
"indexes" : 17,
"indexSize" : 42646016,
"fileSize" : 19251855360,
"nsSizeMB" : 16,
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"dataFileVersion" : {
"major" : 4,
"minor" : 22
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000004")
}
}
},
"objects" : 2170062,
"avgObjSize" : 48193.523844940835,
"dataSize" : 104583640243,
"storageSize" : 123156111104,
"numExtents" : 128,
"indexes" : 264,
"indexSize" : 312205824,
"fileSize" : 77007421440,
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"ok" : 1
}
I've noticed that it is mainly the number of "saveState" and "restoreState" that differ. What could be the possible reason for this execution time variance?
Thanks in advance.
I don't know if this is sufficient as an answer for you, but it is possible to get different numbers when running the same query.
In depends on:
how many other operations are happening at the moment on your server
are the requested documents in memory (RAM) in this case the index.
Here you can also find the documentation for the output of your explain()

How to do a query based pagination in a MongoDB collection according to a unique attribute?

I have a collection named ranking which documents look like this:
{ "_id" : ObjectId("55dde5e4827ff4e65b684f94"), "round" : 0, "teamId" : "55a9c261a02911b85fdff231", "teamName" : "FORZA JUVE", "userId" : "55a9c209291bc40561cc97d2", "signupDate" : ISODate("2015-07-18T03:03:37Z"), "userName" : "Noé González Ramírez", "signupPoints" : 0, "lfPoints" : 559, "roundPoints" : [ 110, 99, 91, 65, 64, 61, 69 ], "roundRankings" : [ ], "ranking" : 1 }
{ "_id" : ObjectId("55dde591827ff4e65b6846d6"), "round" : 0, "teamId" : "55a58179f82921e922244402", "teamName" : "Dream ★ Team", "userId" : "55a5809fd18e541b2176412f", "signupDate" : ISODate("2015-07-14T21:35:27Z"), "userName" : "Fabio Dionicio López Lazcano", "signupPoints" : 0, "lfPoints" : 555, "roundPoints" : [ 105, 91, 108, 60, 53, 77, 61 ], "roundRankings" : [ ], "ranking" : 2 }
{ "_id" : ObjectId("55dde593827ff4e65b6847d3"), "round" : 0, "teamId" : "55a58889f82921e922245511", "teamName" : "Camel Toes", "userId" : "55a58868f82921e9222454b1", "signupDate" : ISODate("2015-07-14T22:08:40Z"), "userName" : "Luis Fridman Tabak", "signupPoints" : 0, "lfPoints" : 555, "roundPoints" : [ 74, 96, 99, 76, 69, 70, 71 ], "roundRankings" : [ ], "ranking" : 3 }
{ "_id" : ObjectId("55dde60a827ff4e65b6853d5"), "round" : 0, "teamId" : "55ac7ca61532c9284e2a89a7", "teamName" : "S.T.A.R.S.", "userId" : "55ac7be31532c9284e2a895c", "signupDate" : ISODate("2015-07-20T04:41:07Z"), "userName" : "Daniel Ivan Estudillo Lopez", "signupPoints" : 0, "lfPoints" : 555, "roundPoints" : [ 61, 94, 105, 73, 90, 50, 82 ], "roundRankings" : [ ], "ranking" : 4 }
{ "_id" : ObjectId("55dde5d2827ff4e65b684db3"), "round" : 0, "teamId" : "55a857b8fad3a3124ca72edf", "teamName" : "EFRASTEAM", "userId" : "55a8575dfad3a3124ca72ebd", "signupDate" : ISODate("2015-07-17T01:16:13Z"), "userName" : "EFRAÍN BARRIENTOS RODRÍGUEZ", "signupPoints" : 0, "lfPoints" : 549, "roundPoints" : [ 102, 89, 92, 58, 81, 60, 67 ], "roundRankings" : [ ], "ranking" : 5 }
{ "_id" : ObjectId("55dde5c1827ff4e65b684baa"), "round" : 0, "teamId" : "55a800a4ac9c9c3302be1dee", "teamName" : "Lombardi FC", "userId" : "55a722eac0dbc6e577489d34", "signupDate" : ISODate("2015-07-16T03:20:10Z"), "userName" : "J Alejandro Padilla", "signupPoints" : 0, "lfPoints" : 546, "roundPoints" : [ 103, 98, 65, 47, 77, 59, 97 ], "roundRankings" : [ ], "ranking" : 6 }
{ "_id" : ObjectId("55dde5a1827ff4e65b68485f"), "round" : 0, "teamId" : "55a5c2e88a5f0fd66b355ef6", "teamName" : "FC Barcelona", "userId" : "55a5c0458a5f0fd66b355d6f", "signupDate" : ISODate("2015-07-15T02:07:01Z"), "userName" : "Daniel Alvarez Nunez", "signupPoints" : 0, "lfPoints" : 545, "roundPoints" : [ 94, 101, 82, 63, 71, 68, 66 ], "roundRankings" : [ ], "ranking" : 7 }
{ "_id" : ObjectId("55dde55d827ff4e65b684197"), "round" : 0, "teamId" : "55a54fbfef67112c685cf78d", "teamName" : "RUCHINTER FC *", "userId" : "55a54f713711fed46621fb01", "signupDate" : ISODate("2015-07-14T18:05:37Z"), "userName" : "RAUL VERA LUGO", "signupPoints" : 0, "lfPoints" : 544, "roundPoints" : [ 84, 109, 87, 41, 84, 57, 82 ], "roundRankings" : [ ], "ranking" : 8 }
{ "_id" : ObjectId("55dde583827ff4e65b684661"), "round" : 0, "teamId" : "55a57327f82921e9222418ae", "teamName" : "SPARTA FC *", "userId" : "55a571b2f82921e9222413a2", "signupDate" : ISODate("2015-07-14T20:31:46Z"), "userName" : "Alejandro Padilla", "signupPoints" : 0, "lfPoints" : 544, "roundPoints" : [ 86, 98, 93, 29, 80, 65, 93 ], "roundRankings" : [ ], "ranking" : 9 }
{ "_id" : ObjectId("55dde592827ff4e65b684739"), "round" : 0, "teamId" : "55a59941b07aeb00512822d8", "teamName" : "Minions", "userId" : "55a59400d18e541b21766a0e", "signupDate" : ISODate("2015-07-14T22:58:08Z"), "userName" : "Carlos Zarate", "signupPoints" : 0, "lfPoints" : 541, "roundPoints" : [ 100, 91, 69, 66, 76, 68, 71 ], "roundRankings" : [ ], "ranking" : 10 }
As you can see they got a ranking attribute which is an integer, that attribute is NOT (and can not be) assigned on creation.
So I need to obtain the whole documents with round set to 0, ordered by ranking in decrement order. In other words I have to paginate the whole documents (they're about 300K).
I know there's an approach where you query looking for documents with _id greater than last result in previous page.
This is why I tried following indexes:
{ranking: 1, _id: 1, round: 1}
{ranking: 1, _id: 1}
So first I query the first n elements at first page, for example:
db.ranking.find({round: 0}).sort({ranking: 1}).limit(3)
Which outputs:
{ "_id" : ObjectId("55dde5e4827ff4e65b684f94"), "round" : 0, "teamId" : "55a9c261a02911b85fdff231", "teamName" : "FORZA JUVE", "userId" : "55a9c209291bc40561cc97d2", "signupDate" : ISODate("2015-07-18T03:03:37Z"), "userName" : "Noé González Ramírez", "signupPoints" : 0, "lfPoints" : 559, "roundPoints" : [ 110, 99, 91, 65, 64, 61, 69 ], "roundRankings" : [ ], "ranking" : 1 }
{ "_id" : ObjectId("55dde591827ff4e65b6846d6"), "round" : 0, "teamId" : "55a58179f82921e922244402", "teamName" : "Dream ★ Team", "userId" : "55a5809fd18e541b2176412f", "signupDate" : ISODate("2015-07-14T21:35:27Z"), "userName" : "Fabio Dionicio López Lazcano", "signupPoints" : 0, "lfPoints" : 555, "roundPoints" : [ 105, 91, 108, 60, 53, 77, 61 ], "roundRankings" : [ ], "ranking" : 2 }
{ "_id" : ObjectId("55dde593827ff4e65b6847d3"), "round" : 0, "teamId" : "55a58889f82921e922245511", "teamName" : "Camel Toes", "userId" : "55a58868f82921e9222454b1", "signupDate" : ISODate("2015-07-14T22:08:40Z"), "userName" : "Luis Fridman Tabak", "signupPoints" : 0, "lfPoints" : 555, "roundPoints" : [ 74, 96, 99, 76, 69, 70, 71 ], "roundRankings" : [ ], "ranking" : 3 }
And if then I try to get following page:
db.lineuppointsrecord.find({round: 0, _id: {$gt: ObjectId("55dde593827ff4e65b6847d3")}}).sort({ranking: 1}).limit(3)
That will incorrectly output:
{ "_id" : ObjectId("55dde5e4827ff4e65b684f94"), "round" : 0, "teamId" : "55a9c261a02911b85fdff231", "teamName" : "FORZA JUVE", "userId" : "55a9c209291bc40561cc97d2", "signupDate" : ISODate("2015-07-18T03:03:37Z"), "userName" : "Noé González Ramírez", "signupPoints" : 0, "lfPoints" : 559, "roundPoints" : [ 110, 99, 91, 65, 64, 61, 69 ], "roundRankings" : [ ], "ranking" : 1 }
{ "_id" : ObjectId("55dde60a827ff4e65b6853d5"), "round" : 0, "teamId" : "55ac7ca61532c9284e2a89a7", "teamName" : "S.T.A.R.S.", "userId" : "55ac7be31532c9284e2a895c", "signupDate" : ISODate("2015-07-20T04:41:07Z"), "userName" : "Daniel Ivan Estudillo Lopez", "signupPoints" : 0, "lfPoints" : 555, "roundPoints" : [ 61, 94, 105, 73, 90, 50, 82 ], "roundRankings" : [ ], "ranking" : 4 }
{ "_id" : ObjectId("55dde5d2827ff4e65b684db3"), "round" : 0, "teamId" : "55a857b8fad3a3124ca72edf", "teamName" : "EFRASTEAM", "userId" : "55a8575dfad3a3124ca72ebd", "signupDate" : ISODate("2015-07-17T01:16:13Z"), "userName" : "EFRAÍN BARRIENTOS RODRÍGUEZ", "signupPoints" : 0, "lfPoints" : 549, "roundPoints" : [ 102, 89, 92, 58, 81, 60, 67 ], "roundRankings" : [ ], "ranking" : 5 }
First result is the doc with round 1 present at first page from initial query!
I know I could use skip and limit but, how should I perform the query in order of performance?
Page based on unique values
If each ranking is unique in the collection, as in your example documents, and every required document has a ranking, you could query each time using the last ranking of the previous results e.g.:
db.ranking.find({round: 0}).sort({ranking: 1}).limit(3)
db.ranking.find({round: 0, ranking : {$gt : 3}}).sort({ranking: 1}).limit(3)
db.ranking.find({round: 0, ranking : {$gt : 6}}).sort({ranking: 1}).limit(3)
If ranking is not unique in the collection, i.e. if there can be ties where multiple teams have the same ranking, this approach won't work, as it is dependant on the next ranking always being greater than the previous one. If you try to use it, you could end up skipping documents.
This solution can use the index { round: 1, ranking: 1 }
Range based paging
If ranking is unique, always incremented by 1 and there are no gaps, you could even query by range as you know the rankings you expect:
db.ranking.find({round: 0, ranking : {$gte : 1, $lte: 3 }})
db.ranking.find({round: 0, ranking : {$gte : 4, $lte: 6 }})
db.ranking.find({round: 0, ranking : {$gte : 7, $lte: 9 }})
If ranking is not unique in the collection, you can still do this but would need to handle the fact that it might result in pages of varying size or even empty pages, especially for small page sizes.
This solution can also use the index { round: 1, ranking: 1 }
Paging over non-unique values
For completeness, here is a query that will handle paging over non-unique rankings and is covered by an index. It needs to check that the ranking is greater, OR the ranking is the same and the _id is greater. I'd recommend the other solutions if possible as they are much simpler, both the index required and the query.
db.ranking.createIndex({round: 1, ranking: 1, _id: 1})
db.ranking.find({$or : [
{round: 0, ranking : { $gt : 3}},
{round: 0, ranking : 3, _id : { $gt : new ObjectId("55a9c261a02911b85fdff231")}}
]})
.sort({round:1, ranking: 1, _id: 1})
.limit(3)
You can sort by multiple fields. You can sort first by round, then by rank. What this will do is it will first sort all of round 0 by rank, then next round 1 by rank, then round 2 by rank, etc, etc...
You just specify multiple fields in the sort object like so:
db.ranking.find({}).sort({round: 1, ranking: 1})
Then you can paginate like normal.
As others mentioned, you can (and should here) sort by multiple fields.
In your original query you have
.sort({ranking: 1})
Which leaves order of _id undefined. But you actually rely on its order! So, make it explicit.
.sort({ranking: 1, _id: 1})

Probit model in winbugs

I conducted an analysis using a logit model and now want to do the same using a probit model. Can anyone please turn this winbugs logit model into a winbugs probit model?
model
{
for (i in 1:n) {
# Linear regression on logit
logit(p[i]) <- alpha + b.sex*sex[i] + b.age*age[i]
# Likelihood function for each data point
frac[i] ~ dbern(p[i])
}
alpha ~ dnorm(0.0,1.0E-4) # Prior for intercept
b.sex ~ dnorm(0.0,1.0E-4) # Prior for slope of sex
b.age ~ dnorm(0.0,1.0E-4) # Prior for slope of age
}
Data
list(sex=c(1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1,
1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1,
0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1),
age= c(69, 57, 61, 60, 69, 74, 63, 68, 64, 53, 60, 58, 79, 56, 53, 74, 56, 76, 72,
56, 66, 52, 77, 70, 69, 76, 72, 53, 69, 59, 73, 77, 55, 77, 68, 62, 56, 68, 70, 60,
65, 55, 64, 75, 60, 67, 61, 69, 75, 68, 72, 71, 54, 52, 54, 50, 75, 59, 65, 60, 60,
57, 51, 51, 63, 57, 80, 52, 65, 72, 80, 73, 76, 79, 66, 51, 76, 75, 66, 75, 78, 70,
67, 51, 70, 71, 71, 74, 74, 60, 58, 55, 61, 65, 52, 68, 75, 52, 53, 70),
frac=c(1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1,
1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
1, 0, 1, 1, 0, 0, 1, 0, 0, 1),
n=100)
Initial Values
list(alpha=0, b.sex=1, b.age=1)
WinBUGS accepts multiple types of link functions (see page 15 in the WinBUGS manual). For a probit model, change your linear regression equation to:
probit(p[i]) <- alpha + b.sex*sex[i] + b.age*age[i]
I would recommend you center the age variable, otherwise you may well run into some convergence problems, so something like:
probit(p[i]) <- alpha + b.sex*sex[i] + b.age*(age[i] - mean(age[]))
Alternatively, for a probit model (if the probit functions gives you some trap errors) you could use the phi standard normal cdf function:
p[i] <- phi(alpha + b.sex*sex[i] + b.age*(age[i] - mean(age[])))