Azure MongoDB + Synapse Link: Column values unexpectedly contain JSON - mongodb

I'm following this tutorial and have created an Azure MongoDB (Mongo API) and a Synapse Workspace, imported the ECDC data into MongoDB and connected it in Synapse Workspace. So far so good.
However, when I query the data, for e.g. string column date_rep I get
{"string":"2020-12-14"} instead of just 2020-12-14
The query I'm using is:
SELECT TOP 10 *
FROM OPENROWSET(​PROVIDER = 'CosmosDB',
CONNECTION = 'Account=analytcstest;Database=ecdc',
OBJECT = 'ecds',
SERVER_CREDENTIAL = 'analytcstest'
) with ( date_rep varchar(200) ) as rows
When don't specify the "with" clause to automatically infer the schema, I have the same problem:
SELECT TOP 10 *
FROM OPENROWSET(​PROVIDER = 'CosmosDB',
CONNECTION = 'Account=analytcstest;Database=ecdc',
OBJECT = 'ecds',
SERVER_CREDENTIAL = 'analytcstest'
) as rows
I could parse it of course, like this, but I don't understand why I have to do that and it's not in the docs?
SELECT TOP 10 JSON_VALUE([date_rep], '$.string') AS [date_rep]
FROM OPENROWSET(​PROVIDER = 'CosmosDB',
CONNECTION = 'Account=analytcstest;Database=ecdc',
OBJECT = 'ecds',
SERVER_CREDENTIAL = 'analytcstest'
) with ( date_rep varchar(200) ) as rows

I tried to reproduce similar thing in my environment and getting similar output A value with data type when I used the CosmosDB for Mongo API:
this can be because of the The Azure Cosmos DB API for MongoDB stores data in a document structure, via BSON format. It is a bin­ary-en­coded seri­al­iz­a­tion of JSON documents. BSON has been extended to add some optional non-JSON-native data types, like dates and binary data.
The Cosmos DB API for MongoDB is meant for MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB account's connection string.
When I tried similar thing with the CosmosDB for SQL API it stores data in Json format, and it is giving appropriate result for that.
For more information on BSON refer this Document

Related

MongoDB to Cosmos DB index db migration

Is there a way to migrate the indexes from MongoDB to Cosmos DB in an automatic way? I've read that people has to do it by hand, but isn't possible to use a tool or script to do it? or any resource to read about it. I have like 200 collections to migrate with several indexes, it will take me a lot of time to do it manually
You can do some thing like export all the creating indexes command then execute these commands in the new database like what this answer said.
Looping through all the collections and constructs a run command for each collection. Copy the printed command and execute the commands in the new database.
var database = ‘my_new_db' // SHOULD ALWAYS MATCH DESTINATION DB NAME
db.getCollectionNames().forEach(function(collection){
var command = {}
var indexes = []
idxs = db.getCollection(collection).getIndexes()
if(idxs.length>1){
idxs.forEach(function(idoc){
if(idoc.name!='_id_'){
var ns = database+"."+idoc.ns.substr(idoc.ns.indexOf('.') + 1 )
idoc.ns = ns
indexes.push(idoc)
}
})
command['createIndexes'] = collection
command['indexes'] = indexes
print('db.runCommand(')
printjson(command)
print(')')
}
})
What you need to note is the unique index could only be created when the collection is empty. Like this doc said:
Azure Database Migration Service automatically migrates MongoDB
collections with unique indexes. However, the unique indexes must be
created before the migration. Azure Cosmos DB does not support the
creation of unique indexes, when there is already data in your
collections. For more information, see Unique keys in Azure Cosmos DB.

Postgres/jOOQ replace jsonb[] element

I'm having a Spring application with jOOQ and Postgresql database having a table (issues) with the following two columns:
id (Long)
documents (jsonb[]) <- array of jsonb (not jsonb array)
The document json structure is on the following format:
{
"id": (UUID),
"name": (String),
"owner"; (String)
}
What I want to achieve is to be able to replace documents with matching id (normally only one) with a new document. I'm struggling with the jOOQ or even the plain SQL.
I guess I need to write some plain SQL in jOOQ to be able to do this but that is ok (to a minimum). I had an idea to do the following:
Unnest the document column
Filter out the document that should be updated of the array
Append the document that should be updated
Store the whole array
Raw SQL looks like this but missing the new document to be added:
UPDATE issues SET documents = (SELECT ARRAY_AGG(doc) FROM issues, UNNEST(issues.documents) AS doc WHERE doc->>'id' != 'e4e3422f-83a4-493b-8bf9-37980d532538') WHERE issues.id = 1;
My final goal is to write this in jOOQ and append the document to be replaced. I'm using jOOQ 3.11.4.
You should be able to just concatenate arrays in PostgreSQL:
UPDATE issues
SET documents = (
SELECT ARRAY_AGG(doc) || '{"id":"e4e3422f-83a4-493b-8bf9-37980d532538","name":"n"}'::jsonb
FROM issues, UNNEST(issues.documents) AS doc
WHERE doc->>'id' != 'e4e3422f-83a4-493b-8bf9-37980d532538'
)
WHERE issues.id = 1
Some common array functions will be added to jOOQ in the near future, e.g. array concatenation, but you can get away for now with plain SQL templating I suspect?

Mongoose schema definition [duplicate]

This question already has an answer here:
Why does mongoose use schema when mongodb's benefit is supposed to be that it's schema-less?
(1 answer)
Closed 5 years ago.
I am a beginner with MongoDB and trying to learn MEAN Stack. So I am using Mongoose as the ORM
I read that MongoDB is a NoSQL database, but while using Mongoose as ORM, I am asked to create a schema first. Why is it so? There shouldn't be a schema ideally as MongoDB is a NoSQL database.
Thanks in advance.
Mongoose is an orm on top of mongodb , if you are using core mongodb you need not create any schema , you can just dump any data you want , but in mongoose you have a schema so that you can i have some basic key value pair for advanced searching and filtering and you can anytime update the schema. Or If you want to go schemaless and dump whatever the response is you can use a schema type like this var someSchema = {data:Object} and drop all your data in this data key and then you can easily extract whatever JSON data is inside your id field.
var mongoose = require('mongoose');
module.exports = mongoose.model('twitter', {
created_at:{
type:Date
},
dump:{
type:Object
}
});
In the above example dump is used to save whatever JSON I get as a response from twitter api and created_at contains only the creating date of tweet , so I have the entire data , but if i want to search tweets of a particular date I can search it using a find query on created_at and this query will be lot faster and here I have a fixed structure and a knowledge about what to expect of a find query each time a run one, So this is one of the benefit of using the mongoose orm i.e I don't lose data but I can maximise my searching ability by creating appropriate keys.
So basically mongoose is an ORM db , it offers you relational db features like creating foreign keys , not strictly foreign keys but you can create something like an id reference to another schema and later populate the field by the id associated parameters when you fetch data using your find query , also a relational schema is easy to manage , what mongoose does is it gives a JSON/BSON based db the power of relational db and you get best of both the world i.e you can easily maintain new keys or you don't need to worry about extracting each and every data from your operation and placing it properly/inserting it , you just need to see that your keys and values match , as well as you have flexibility in update operations while having a schema or table structure.

Find hashed data with MongoDB

Is it possible with MongoDB to find data by its hash?
I'm trying to find the MongoDB equivalent of this MySQL query:
SELECT column FROM table WHERE SHA1(column) = "value"
It doesn't seems there is any such functionality (a bit old, but couldn't find in docs.mongodb.com either), but you can include a library that contains sha1 functionality (e.g. js-sha1) in mongo shell with load function, then use it within your mongo operation.

Using MongoDB to query selected field

I am trying to query out the data from my MongoDB database but there are some fields which I would like to omit as MongoDB will query the whole collections with id, n out.
I did this to limit the query but unfortunately only one field could be omitted but not the other which is the 'n' field. How can I omit two fields?
data = collection.find_one({"files_id": file_id},{"_id":0,"data":1})
And I also realized that my query for data has the field name (u'data') too, how can I query it so that it only returns the data? for this case it's a binary data
Example:
{u'data': Binary('\x00\x00\xed\x00\n\x00\x00\xd5\xa9\x00\x000\x00\x00\x00#\x00\x00\x0f\xff\xf0\x00\x0b\x80\x00\x00\x00
Kindly assist thanks!