MongoDB to Cosmos DB index db migration - mongodb

Is there a way to migrate the indexes from MongoDB to Cosmos DB in an automatic way? I've read that people has to do it by hand, but isn't possible to use a tool or script to do it? or any resource to read about it. I have like 200 collections to migrate with several indexes, it will take me a lot of time to do it manually

You can do some thing like export all the creating indexes command then execute these commands in the new database like what this answer said.
Looping through all the collections and constructs a run command for each collection. Copy the printed command and execute the commands in the new database.
var database = ‘my_new_db' // SHOULD ALWAYS MATCH DESTINATION DB NAME
db.getCollectionNames().forEach(function(collection){
var command = {}
var indexes = []
idxs = db.getCollection(collection).getIndexes()
if(idxs.length>1){
idxs.forEach(function(idoc){
if(idoc.name!='_id_'){
var ns = database+"."+idoc.ns.substr(idoc.ns.indexOf('.') + 1 )
idoc.ns = ns
indexes.push(idoc)
}
})
command['createIndexes'] = collection
command['indexes'] = indexes
print('db.runCommand(')
printjson(command)
print(')')
}
})
What you need to note is the unique index could only be created when the collection is empty. Like this doc said:
Azure Database Migration Service automatically migrates MongoDB
collections with unique indexes. However, the unique indexes must be
created before the migration. Azure Cosmos DB does not support the
creation of unique indexes, when there is already data in your
collections. For more information, see Unique keys in Azure Cosmos DB.

Related

Azure MongoDB + Synapse Link: Column values unexpectedly contain JSON

I'm following this tutorial and have created an Azure MongoDB (Mongo API) and a Synapse Workspace, imported the ECDC data into MongoDB and connected it in Synapse Workspace. So far so good.
However, when I query the data, for e.g. string column date_rep I get
{"string":"2020-12-14"} instead of just 2020-12-14
The query I'm using is:
SELECT TOP 10 *
FROM OPENROWSET(​PROVIDER = 'CosmosDB',
CONNECTION = 'Account=analytcstest;Database=ecdc',
OBJECT = 'ecds',
SERVER_CREDENTIAL = 'analytcstest'
) with ( date_rep varchar(200) ) as rows
When don't specify the "with" clause to automatically infer the schema, I have the same problem:
SELECT TOP 10 *
FROM OPENROWSET(​PROVIDER = 'CosmosDB',
CONNECTION = 'Account=analytcstest;Database=ecdc',
OBJECT = 'ecds',
SERVER_CREDENTIAL = 'analytcstest'
) as rows
I could parse it of course, like this, but I don't understand why I have to do that and it's not in the docs?
SELECT TOP 10 JSON_VALUE([date_rep], '$.string') AS [date_rep]
FROM OPENROWSET(​PROVIDER = 'CosmosDB',
CONNECTION = 'Account=analytcstest;Database=ecdc',
OBJECT = 'ecds',
SERVER_CREDENTIAL = 'analytcstest'
) with ( date_rep varchar(200) ) as rows
I tried to reproduce similar thing in my environment and getting similar output A value with data type when I used the CosmosDB for Mongo API:
this can be because of the The Azure Cosmos DB API for MongoDB stores data in a document structure, via BSON format. It is a bin­ary-en­coded seri­al­iz­a­tion of JSON documents. BSON has been extended to add some optional non-JSON-native data types, like dates and binary data.
The Cosmos DB API for MongoDB is meant for MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB account's connection string.
When I tried similar thing with the CosmosDB for SQL API it stores data in Json format, and it is giving appropriate result for that.
For more information on BSON refer this Document

Mongoose schema definition [duplicate]

This question already has an answer here:
Why does mongoose use schema when mongodb's benefit is supposed to be that it's schema-less?
(1 answer)
Closed 5 years ago.
I am a beginner with MongoDB and trying to learn MEAN Stack. So I am using Mongoose as the ORM
I read that MongoDB is a NoSQL database, but while using Mongoose as ORM, I am asked to create a schema first. Why is it so? There shouldn't be a schema ideally as MongoDB is a NoSQL database.
Thanks in advance.
Mongoose is an orm on top of mongodb , if you are using core mongodb you need not create any schema , you can just dump any data you want , but in mongoose you have a schema so that you can i have some basic key value pair for advanced searching and filtering and you can anytime update the schema. Or If you want to go schemaless and dump whatever the response is you can use a schema type like this var someSchema = {data:Object} and drop all your data in this data key and then you can easily extract whatever JSON data is inside your id field.
var mongoose = require('mongoose');
module.exports = mongoose.model('twitter', {
created_at:{
type:Date
},
dump:{
type:Object
}
});
In the above example dump is used to save whatever JSON I get as a response from twitter api and created_at contains only the creating date of tweet , so I have the entire data , but if i want to search tweets of a particular date I can search it using a find query on created_at and this query will be lot faster and here I have a fixed structure and a knowledge about what to expect of a find query each time a run one, So this is one of the benefit of using the mongoose orm i.e I don't lose data but I can maximise my searching ability by creating appropriate keys.
So basically mongoose is an ORM db , it offers you relational db features like creating foreign keys , not strictly foreign keys but you can create something like an id reference to another schema and later populate the field by the id associated parameters when you fetch data using your find query , also a relational schema is easy to manage , what mongoose does is it gives a JSON/BSON based db the power of relational db and you get best of both the world i.e you can easily maintain new keys or you don't need to worry about extracting each and every data from your operation and placing it properly/inserting it , you just need to see that your keys and values match , as well as you have flexibility in update operations while having a schema or table structure.

Update Single Field on Domain Save

The save() method seems to replace an entire document/record in my database. If I know the primary key of a document in my database, what is the best way in Grails for me to update a single field on that document without first querying the database using get()?
For instance, I don't like the following code because it executes two queries when all I want to do is to update myField.
def key = "foo"
def doc = MyDomain.get(key) // This is the query I want to eliminate
doc.myField = "bar"
doc.save()
In situations where I know the primary key, I want to simply update a single field, similarly to how Ruby on Rails leverages the ActionModel.update_attribute() method.
Even though my specific database is MongoDB, I think the question is applicable to any database, SQL or NoSQL. If the database supports the ability to update just a single field on one record via one query, using Grails can I avoid the extra get() query to retrieve a full record from the database?

Insert all documents from one collection into another collection in MongoDB database

I have a python script that collects data everyday and inserts it into a MongoDB collection (~10M documents). Sometimes the job fails and I am left with partial data which is not useful to me. I would like to insert the data into a staging collection first and then copy or move all documents from the staging collection into the final collection only when the job finishes and the data is complete. I cannot seem to find a straight forward solution for doing this as a "bulk" type operation, but it seems there should be one.
In SQL it would be something like this:
INSERT INTO final_table
SELECT *
FROM staging_table
I thought that db.collection.copyTo() would work for this but it seems it makes the destination collection a clone of the source collection.
Additionally, I know from this: mongodb move documents from one collection to another collection that I can do something like the following:
var documentsToMove = db.collectionA.find({});
documentsToMove.forEach(function(doc) {
db.collectionB.insert(doc);
}
But it seems like there should be a more efficient way.
So, How can I take all documents from one collection and insert them into another collection in the most efficient manner?
NOTE: the final collection has data in it already. The new documents that I want to move over would be adding to this data, e.g if my staging collection has 2 documents and my final collection has 10 documents, I would have 12 documents in my final collection after I move the staging data over.
You can use db.cloneCollection(); see mondb cloneCollection
if you no longer need the staging collection you can simply use the renaming option.
switch to admin db
db.runCommand({renameCollection:"staging.CollectionA",to:"targetdb.CollectionB"})

is there any way to restore predefined schema to mongoDB?

I'm beginner with mongoDB. i want to know is there any way to load predefined schema to mongoDB? ( for example like cassandra that use .cql file for this purpose)
If there is, please intruduce some document about structure of that file and way for restoring.
If there is not, how i can create an index only one time when I create a collection. I think it is wrong if i create index every time I call insert method or run my program.
p.s: I have a multi-threaded program that every thread insert and update my mongo collection. I want to create index only one time.
Thanks.
To create an index on a collection you need to use ensureIndex command. You need to only call it once to create an index on a collection.
If you call ensureIndex repeatedly with the same arguments, only the first call will create an index, all subsequent calls will have no effect.
So if you know what indexes you're going to use for your database, you can create a script that will call that command.
An example insert_index.js file that creates 2 indexes for collA and collB collections:
db.collA.ensureIndex({ a : 1});
db.collB.ensureIndex({ b : -1});
You can call it from a shell like this:
mongo --quiet localhost/dbName insert_index.js
This will create those indexes on a database named dbName on your localhost. It's worth noticing that if your database and/or collections are not yet created, this will create both the database and the collections for which you're adding the indexes.
Edit
To clarify a little bit. MongoDB is schemaless so you can't restore it's schema.
You can only create indexes and collections (by using createCollection helper).
MongoDB is basically schemaless so there is no definition of a schema or namespaces to be restored.
In the case of indexes, these can be created at any time. There does not need to be a collection present or even the required fields for the index as this will all be sorted out as the collections are created and when documents are inserted that matches the defined fields.
Commands to create an index are generally the same with each implementation language, for example:
db.collection.ensureIndex({ a: 1, b: -1 })
Will define the index on the target collection in the target database that will reference field "a" and field "b", the latter in descending order. This will happen even if the collection or even the database does not exist as yet, or in fact will establish a blank namespace in that case.
Subsequent calls to the same index creation method do not actually re-create the index. Where the same index is specified to one that already exists it is effectively skipped as a "no-operation".
As such, you can simply feed all your required index creation statements at application startup and anything that is not already present will be created. Anything that already exists will be left alone.