Why it needs to convert `{"$numberInt": "1"}` to `1` before restoring data with `mongorestore`? - mongodb

I have some dumped BSON and JSON files from a MongoDB server running on Google Cloud Platform(GCP) and I want to restore the data into a new local server with version 4.0.3. However, I got errors showing that the indices cannot be restored. I had to convert {"$numberInt": "1"} in the JSON files to 1 to make the restoring process success. Why I need to take effort to fix the format of the dumped files. Is it due to the different versions between the source server and the target server or due to some things I did not do correctly?
I have googled and searched stack overflow, but I did not see any one discussed this problem. And the release note of MongoDB does not mention any changes related to this problem.
Here is the JSON example cannot accept by mongorestore with version 4.0.3
{
"options": {},
"indexes": [
{
"v": {
"$numberInt": "2"
},
"key": {
"_id": {
"$numberInt": "1"
}
},
"name": "_id_",
"ns": "demo.item"
},
{
"v": {
"$numberInt": "2"
},
"key": {
"itemId": {
"$numberDouble": "1.0"
}
},
"name": "itemId_1",
"ns": "demo.item"
}
],
"uuid": "8ce4755612da4d048b0fd38a793f2b55"
}
And this is the accepted one which is converted on my own.
{
"options": {},
"indexes": [
{
"v": 2,
"key": {
"_id": 1
},
"name": "_id_",
"ns": "demo.item"
},
{
"v": 2,
"key": {
"itemId": 1.0
},
"name": "itemId_1",
"ns": "demo.item"
}
],
"uuid": "8ce4755612da4d048b0fd38a793f2b55"
}
And here is the script I use to do the conversion.
Questions:
Why mongorestore does not accept the dumped file created by mongodump?
Is there any method for avoiding from modifying the dumped files manually?

You need to use mongorestore version 4.2+ that supports Extended JSON v2.0 (Canonical mode or Relaxed) format. See reference here.

Related

MongoDB update_one vs update_many - Improve speed

I got a collection of 10000 ca. docs, where each doc has the following format:
{
"_id": {
"$oid": "631edc6e207c89b932a70a26"
},
"name": "Ethereum",
"auditInfoList": [
{
"coinId": "1027",
"auditor": "Fairyproof",
"auditStatus": 2,
"reportUrl": "https://www.fairyproof.com/report/Covalent"
}
],
"circulatingSupply": 122335921.0615,
"cmcRank": 2,
"dateAdded": "2015-08-07T00:00:00.000Z",
"id": 1027,
"isActive": 1,
"isAudited": true,
"lastUpdated": 1662969360,
"marketPairCount": 6085,
"quotes": [
{
"name": "USD",
"price": 1737.1982544180462,
"volume24h": 14326453277.535921,
"marketCap": 212521748520.66168,
"percentChange1h": 0.62330307,
"percentChange24h": -1.08847937,
"percentChange7d": 10.96517745,
"lastUpdated": 1662966780,
"percentChange30d": -13.49374496,
"percentChange60d": 58.25153862,
"percentChange90d": 42.27475921,
"fullyDilluttedMarketCap": 212521748520.66,
"marketCapByTotalSupply": 212521748520.66168,
"dominance": 20.0725,
"turnover": 0.0674117,
"ytdPriceChangePercentage": -53.9168
}
],
"selfReportedCirculatingSupply": 0,
"slug": "ethereum",
"symbol": "ETH",
"tags": [
"mineable",
"pow",
"smart-contracts",
"ethereum-ecosystem",
"coinbase-ventures-portfolio",
"three-arrows-capital-portfolio",
"polychain-capital-portfolio",
"binance-labs-portfolio",
"blockchain-capital-portfolio",
"boostvc-portfolio",
"cms-holdings-portfolio",
"dcg-portfolio",
"dragonfly-capital-portfolio",
"electric-capital-portfolio",
"fabric-ventures-portfolio",
"framework-ventures-portfolio",
"hashkey-capital-portfolio",
"kenetic-capital-portfolio",
"huobi-capital-portfolio",
"alameda-research-portfolio",
"a16z-portfolio",
"1confirmation-portfolio",
"winklevoss-capital-portfolio",
"usv-portfolio",
"placeholder-ventures-portfolio",
"pantera-capital-portfolio",
"multicoin-capital-portfolio",
"paradigm-portfolio",
"injective-ecosystem"
],
"totalSupply": 122335921.0615
}
Im pulling updated version of it and, to aviod duplicates, im doing the following by using 'update_one'
for doc in new_doc_list:
CRYPTO_TEMPORARY_LIST.update_one(
{ "name" : doc['name']},
{ "$set": {
"lastUpdated": doc['lastUpdated']
}
},
upsert=True)
The problem is it's too slow.
I'm trying to figure out how to improve speed by using update_many but can't figure out how to set it up.
I Basically want to update every document x name. Completely change the doc and not the "lastUpdated" field would b even better.
Thanks guys <3

Poor Write perfomance with MongoDB 5.0.8 in a PSA (Primary-Secondary-Arbiter) setup

I have some write performance struggle with MongoDB 5.0.8 in an PSA (Primary-Secondary-Arbiter) deployment when one data bearing member goes down.
I am aware of the "Mitigate Performance Issues with PSA Replica Set" page and the procedure to temporarily work around this issue.
However, in my opinion, the manual intervention described here should not be necessary during operation. So what can I do to ensure that the system continues to run efficiently even if a node fails? In other words, as in MongoDB 4.x with the option "enableMajorityReadConcern=false".
As I understand the problem has something to do with the defaultRWConcern. When configuring a PSA Replica Set in MongoDB you are forced to set the DefaultRWConcern. Otherwise the following message will appear when rs.addArb is called:
MongoServerError: Reconfig attempted to install a config that would
change the implicit default write concern. Use the setDefaultRWConcern
command to set a cluster-wide write concern and try the reconfig
again.
So I did
db.adminCommand({
"setDefaultRWConcern": 1,
"defaultWriteConcern": {
"w": 1
},
"defaultReadConcern": {
"level": "local"
}
})
I would expect that this configuration causes no lag when reading/writing to a PSA System with only one data bearing node available.
But I observe "slow query" messages in the mongod log like this one:
{
"t": {
"$date": "2022-05-13T10:21:41.297+02:00"
},
"s": "I",
"c": "COMMAND",
"id": 51803,
"ctx": "conn149",
"msg": "Slow query",
"attr": {
"type": "command",
"ns": "<db>.<col>",
"command": {
"insert": "<col>",
"ordered": true,
"txnNumber": 4889253,
"$db": "<db>",
"$clusterTime": {
"clusterTime": {
"$timestamp": {
"t": 1652430100,
"i": 86
}
},
"signature": {
"hash": {
"$binary": {
"base64": "bEs41U6TJk/EDoSQwfzzerjx2E0=",
"subType": "0"
}
},
"keyId": 7096095617276968965
}
},
"lsid": {
"id": {
"$uuid": "25659dc5-a50a-4f9d-a197-73b3c9e6e556"
}
}
},
"ninserted": 1,
"keysInserted": 3,
"numYields": 0,
"reslen": 230,
"locks": {
"ParallelBatchWriterMode": {
"acquireCount": {
"r": 2
}
},
"ReplicationStateTransition": {
"acquireCount": {
"w": 3
}
},
"Global": {
"acquireCount": {
"w": 2
}
},
"Database": {
"acquireCount": {
"w": 2
}
},
"Collection": {
"acquireCount": {
"w": 2
}
},
"Mutex": {
"acquireCount": {
"r": 2
}
}
},
"flowControl": {
"acquireCount": 1,
"acquireWaitCount": 1,
"timeAcquiringMicros": 982988
},
"readConcern": {
"level": "local",
"provenance": "implicitDefault"
},
"writeConcern": {
"w": 1,
"wtimeout": 0,
"provenance": "customDefault"
},
"storage": {},
"remote": "10.10.7.12:34258",
"protocol": "op_msg",
"durationMillis": 983
}
The collection involved here is under proper load with about 1000 reads and 1000 writes per second from different (concurrent) clients.
MongoDB 4.x with "enableMajorityReadConcern=false" performed "normal" here and I have not noticed any loss of performance in my application. MongoDB 5.x doesn't manage that and in my application data is piling up that I can't get written away in a performant way.
So my question is, if I can get the MongoDB 4.x behaviour back. A write guarantee from the single data bearing node which is available in the failure scenario would be OK for me. But in a failure scenario, having to manually reconfigure the faulty node should actually be avoided.
Thanks for any advice!
At the end we changed the setup to a PSS layout.
This was also recommended in the MongoDB Community Forum.

MongoDB Query doesn't return with a sort

I have the query:
db.changes.find(
{
$or: [
{ _id: ObjectId("60b1e8dc9d0359001bb80441") },
{ _oid: ObjectId("60b1e8dc9d0359001bb80441") },
],
},
{
_id: 1,
}
);
which returns almost instantly.
But the moment I add a sort, the query doesn't return. The query just runs. The longest I could tolerate the query running was over 30 Min, so I'm not entirely sure if it does eventually return.
db.changes
.find(
{
$or: [
{ _id: ObjectId("60b1e8dc9d0359001bb80441") },
{ _oid: ObjectId("60b1e8dc9d0359001bb80441") },
],
},
{
_id: 1,
}
)
.sort({ _id: -1 });
I have the following indexes:
[
{
"_oid" : 1
},
{
"_id" : 1
}
]
and this is what db.currentOp() returns:
{
"host": "xxxx:27017",
"desc": "conn387",
"connectionId": 387,
"client": "xxxx:55802",
"appName": "MongoDB Shell",
"clientMetadata": {
"application": {
"name": "MongoDB Shell"
},
"driver": {
"name": "MongoDB Internal Client",
"version": "4.0.5-18-g7e327a9017"
},
"os": {
"type": "Linux",
"name": "Ubuntu",
"architecture": "x86_64",
"version": "20.04"
}
},
"active": true,
"currentOpTime": "2021-09-24T15:26:54.286+0200",
"opid": 71111,
"secs_running": NumberLong(23),
"microsecs_running": NumberLong(23860504),
"op": "query",
"ns": "myDB.changes",
"command": {
"find": "changes",
"filter": {
"$or": [
{
"_id": ObjectId("60b1e8dc9d0359001bb80441")
},
{
"_oid": ObjectId("60b1e8dc9d0359001bb80441")
}
]
},
"sort": {
"_id": -1.0
},
"projection": {
"_id": 1.0
},
"lsid": {
"id": UUID("38c4c09b-d740-4e44-a5a5-b17e0e04f776")
},
"$readPreference": {
"mode": "secondaryPreferred"
},
"$db": "myDB"
},
"numYields": 1346,
"locks": {
"Global": "r",
"Database": "r",
"Collection": "r"
},
"waitingForLock": false,
"lockStats": {
"Global": {
"acquireCount": {
"r": NumberLong(2694)
}
},
"Database": {
"acquireCount": {
"r": NumberLong(1347)
}
},
"Collection": {
"acquireCount": {
"r": NumberLong(1347)
}
}
}
}
This wasn't always a problem, it's only recently started. I've also rebuilt the indexes, and nothing seems to work. I've tried using .explain(), and that also doesn't return.
Any suggestions would be welcome. For my situation, it's going to be much easier to make changes to the DB than it is to change the query.
This is happening due to the way Mongo chooses what's called a "winning plan", I recommend you read more on this in my other answer which explains this behavior. However it is interesting to see if the Mongo team will consider this specific behavior a feature or a bug.
Basically the $or operator has some special qualities, as specified:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
It seems that the addition of the sort is disrupting the usage this quality, meaning you're running a collection scan all of a sudden.
What I recommend you do is use the aggregation pipeline instead of the query language, I personally find it has more stable behavior and it might work there. If not maybe just do the sorting in code ..
The server can use a separate index for each branch of the $or, but in order to avoid doing an in-memory sort the indexes used would have to find the documents in the sort order so a merge-sort can be used instead.
For this query, an index on {_id:1} would find documents matching the first branch, and return them in the proper order. For the second branch, and index on {oid:1, _id:1} would do the same.
If you have both of those indexes, the server should be able to find the matching documents quickly, and return them without needing to perform an explicit sort.

Hyperledger IROHA get_acc_ast_tx in CLI mode dont work

I just finished the pluralsigt course and completed the tutorial of the official project documentation without problems, but nevertheless using the CLI I could not use the functions get_acc_ast_tx, get_acc_tx, I checked that the peer keys and the configuration files and correspond to genesis file, where admin#test is allowed to use these functions and I get:
[2019-12-08 04: 55: 57.883070400] [E] [CLI/ResponseHandler/Query]: Query is stateless invalid.
The genesis file I use is the initial one of the git repository:
{
"blockV1": {
"payload": {
"transactions": [{
"payload": {
"reducedPayload": {
"commands": [{
"addPeer": {
"peer": {
"address": "127.0.0.1:10001",
"peerKey": "bddd58404d1315e0eb27902c5d7c8eb0602c16238f005773df406bc191308929"
}
}
}, {
"createRole": {
"roleName": "admin",
"permissions": ["can_add_peer", "can_add_signatory", "can_create_account", "can_create_domain", "can_get_all_acc_ast", "can_get_all_acc_ast_txs", "can_get_all_acc_detail", "can_get_all_acc_txs", "can_get_all_accounts", "can_get_all_signatories", "can_get_all_txs", "can_get_blocks", "can_get_roles", "can_read_assets", "can_remove_signatory", "can_set_quorum"]
}
}, {
"createRole": {
"roleName": "user",
"permissions": ["can_add_signatory", "can_get_my_acc_ast", "can_get_my_acc_ast_txs", "can_get_my_acc_detail", "can_get_my_acc_txs", "can_get_my_account", "can_get_my_signatories", "can_get_my_txs", "can_grant_can_add_my_signatory", "can_grant_can_remove_my_signatory", "can_grant_can_set_my_account_detail", "can_grant_can_set_my_quorum", "can_grant_can_transfer_my_assets", "can_receive", "can_remove_signatory", "can_set_quorum", "can_transfer"]
}
}, {
"createRole": {
"roleName": "money_creator",
"permissions": ["can_add_asset_qty", "can_create_asset", "can_receive", "can_transfer"]
}
}, {
"createDomain": {
"domainId": "test",
"defaultRole": "user"
}
}, {
"createAsset": {
"assetName": "coin",
"domainId": "test",
"precision": 2
}
}, {
"createAccount": {
"accountName": "admin",
"domainId": "test",
"publicKey": "313a07e6384776ed95447710d15e59148473ccfc052a681317a72a69f2a49910"
}
}, {
"createAccount": {
"accountName": "test",
"domainId": "test",
"publicKey": "716fe505f69f18511a1b083915aa9ff73ef36e6688199f3959750db38b8f4bfc"
}
}, {
"appendRole": {
"accountId": "admin#test",
"roleName": "admin"
}
}, {
"appendRole": {
"accountId": "admin#test",
"roleName": "money_creator"
}
}],
"quorum": 1
}
}
}],
"txNumber": 1,
"height": "1",
"prevBlockHash": "0000000000000000000000000000000000000000000000000000000000000000"
}
}
}
I use the hyperledger image of docker, in MAC OS CATALINA.
I followed the tutorial according to this manual: https://iroha.readthedocs.io/en/latest/build/index.html
Thank you very much for the help.
Unfortunately, CLI is rather outdated – we are working on new solution for it, but meanwhile it is better to use one of the SDKs available – for Java, Python, JS or iOS (if you prefer mobile development).
All of them contain examples, so it should not be too tricky to use those. Although, if you encounter any issues, please contact us using one of the chats here.
This is due to outdated cli. A newer version that is developed will replace it, but is not yet ready.
The exact problem is that there was pagination metadata added for these queries in iroha, but the cli was not updated to set it properly. Protobuf transport allows cli to send a query without some fields that were added later, but iroha refuses to handle it.
You can use one of client libraries that are always kept up to date: https://iroha.readthedocs.io/en/latest/develop/libraries.html.

How to update the Embedded Data which is inside of another Embedded Data?

I have document like below in MongoDB:
{
"_id": "test",
"tasks": [
{
"Name": "Task1",
"Parameter": [
{
"Name": "para1",
"Type": "String",
"Value": "*****"
},
{
"Name": "para2",
"Type": "String",
"Value": "*****"
}
]
},
{
"Name": "Task2",
"Parameter": [
{
"Name": "para1",
"Type": "String",
"Value": "*****"
},
{
"Name": "para2",
"Type": "String",
"Value": "*****"
}
]
}
]
}
There is Embedded Data Structure (Parameter) inside of another Embedded Data Structure (Tasks). Now I want to update the para1 in Task1's Parameter.
I have tried many ways but I can only use query tasks.Parameter.name to find the para1 but cannot update it. the example in the doc are using .$. to update the value in a Embedded Data Structure but it doesn't work in my case.
Anyone have any ideas ?
MongoDB currently only supports the positional operator once, and only for the top level array. There is a ticket SERVER-831 to change this behavior for your use case. You can follow the issue there and up vote it.
However, you might be able to change your approach to accomplish what you want to do. One way is to change your schema. Collapse the tasks name into the array so the document looks like this:
{
_id:test,
tasks:
[
{
Task:1
Name:para1,
Type:String,
Value:*****
},
{
Task:1
Name:para2,
Type:String,
Value:*****
},
{
Task:2
Name:para1,
Type:String,
Value:*****
},
{
Task:2
Name:para2,
Type:String,
Value:*****
}
]
}
Another approach that may work for you is to use $pull and $push. For instance something like this to replace a task (this assumes that tasks.Parameter.Name is unique to an array of Parameters):
db.test2.update({$and: [{"tasks.Name": "Task3"}, {"tasks.Parameter.Name":"para1"}]}, {$pull: {"tasks.$.Parameter": {"Name": "para1"}}})
db.test2.update({"tasks.Name": "Task3"}, {$push: {"tasks.$.Parameter": {"Name": "para3", Type: "String", Value: 1}}})
With this solution you need to be careful with regard to concurrency, as there will be a brief moment where the document doesn't exist.