Azure DocumentDB with MongoDB Protocol Spark integration - mongodb

I want to use DocumentDB but there is no connector for PySpark. Looks like DocumentDB also supports MongoDB Protocol as mentioned here, which means all existing MongoDB drivers should work. Since there is PySpark connector for MongoDB, I wanted to try this out.
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
This throws error.
com.mongodb.MongoCommandException: Command failed with error 115: ''$sample' is not supported' on server example.documents.azure.com:10250. The full response is { "_t" : "OKMongoResponse", "ok" : 0, "code" : 115, "errmsg" : "'$sample' is not supported", "$err" : "'$sample' is not supported" }
It looks like DocumentDB MongoDB API doesn't support all MongoDB features, but I can't find any documentation about. Or am I missing something else?

I want to use DocumentDB but there is no connector for PySpark.
A preview of a Spark to DocumentDB connector (including a pyDocumentDB package) was made available in early April 2017.
Looks like DocumentDB also supports MongoDB Protocol as mentioned here, which means all existing MongoDB drivers should work
DocumentDB supports the MongoDB wire protocol for communication and reports its version as MongoDB 3.2.0, but this does not mean that it is a drop-in replacement with full support for all MongoDB features (or that DocumentDB implements features with identical behaviour and limits). A notable absence at the moment is any support for MongoDB's aggregation pipeline, which includes the $sample operator that the PySpark connector is expecting to be available given a connection to a server claiming to be MongoDB 3.2.
You can find more examples of potential compatibility issues in the comments on the DocumentDB API for MongoDB documentation you referenced in your question.

Related

How can I connect to an Atlas cluster with the SRV connection string format using ReactiveMongo?

I have a play scala app and i have an atlas cluster which i am trying to connect. According to the ReactiveMongo this is possible. I can add my connection string gotten from Atlas to my app via
mongodb.uri
In my application.conf file. I have tried everything based on the instructions from reactivemongo and atlas db but i am still unable to connect to the cluster. using my mongoshell however, i am able to connect and have access to my db but it simply refuses to connect via my app.
Mongo simply returns an error "MongoError['No primary node is available! (Supervisor-13/Connection-14)']" } and logs a warning in my console Some options were ignored because they are not supported (yet): w, retryWrites. I am using scala version 2.12 and reactivemongo 0.12.6 with play 2.6.
My connection string is mongodb+srv://<username>:<password>#my-cluster.abo25.mongodb.net/my-db?retryWrites=true&w=majority
Any info or help would be greatly appreciated.
Solved my problem. It turns out the +srv string format works seamlessly from reactivemongo version 0.17 and i was initially on 0.16. After i upgraded (and also upgraded my code), i was able to connect to my cluster. I also found out one of the user credentials i was using was wrong so that plus the upgrade got me up and running.

How to solve 'MongoError: $where is not allowed in this atlas tier'?

I had the following query which was working in my local mongodb but when I switched to atlas, its giving me
MongoError: $where is not allowed in this atlas tier
I looked at similar post, but could not find answer
await Markertag.find( { $where: 'this.markerNum.toString().match(' + search.searchText + ')' }).distinct('photoId');
Unsupported Commands in M0/M2/M5 Clusters.
The following commands exhibit special behavior in M0 Free Tier and M2/M5 shared starter clusters:
...
distinct - The $where operator is not supported.
find - The $where operator is not supported.
...
You should either upgrade your tier to paid one, or chose another MongoDB provider.

Azure Function with Cosmos MongoDB integration not saving

I have setup a Azure Function with a Azure Cosmos DB(document) output. The cosmos database is configured to be a MongoDB.
And added the following simple code to try and add a new document:
module.exports = function (context, eventHubMessages) {
context.bindings.document = {
text : "Test data"
}
context.done();
};
When i test run i get success, but when i try to open the the collection using Studio 3T i get:
Query failed with error code 1 and error message 'Unknown server error occurred when processing this request.'
When i use the same code to write to a DocumentDB i get success and i can view data in Azure. Do you need to use a different API to save data to mongoDB?
The DocumentDB output binding is using the DocumentDB API to connect and save information in the database. But your database (from what you are saying) is using the MongoDB API, they are different APIs (links point to the docs).
As you surely know, MongoDB has some requirements (like the existence of an "_id" attribute) that are covered when you connect to the database from a MongoDB client (either an SDK or a third-party client), but since you are communicating through the DocumentDB API, it's probably failing to fulfill those requirements.
You might want to try and use the Mongo driver in the function to connect to your Cosmos DB database through the MongoDB API.

Mongo Spark connector with several hosts

I try to connect Spark to MongoDB using mongo-spark-connector_2.10-2.0.0 but it doesn't work when I have several hosts in the URI
My URI looks like that :
mongodb://login:password#cluster0-shard-0xxxxx:27017,cluster0-shard-0yyyyy:27017,cluster0-shard-0zzzzz:27017/database?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin
and I get errors like this
Command failed with error 8000: &apos;no SNI name sent, make sure using a MongoDB 3.4+ driver/shell.&apos; on server cluster0-shard-0xxxxx
It works fine with other URIs that only have 1 host.
The problem was that I was using Atlas Free tier that requires SNI, which is not supported by the Mongo Java driver currently used by mongo-spark-connector_2.10-2.0.0.

How to know which storage engine is used in mongodb?

Starting from version 3.0, mongodb support pluggable storage engine. How to know which storage engine is being used in a system?
Easiest way to find the storage engine being used currently in from mongo console.
Inside mongo console, type (You might need admin access to run this command)
db.serverStatus().storageEngine
If It returns,
{ "name" : "wiredTiger" }
WireTiger Storage engine is being used.
Once it is confirmed that wiredTiger is being used then type
db.serverStatus().wiredTiger
to get all the configuration details of wiredTiger.
On the console, Mayank's answer makes more sense.
On the other hand, by using MongoDB GUI like MongoChef or Robomongo storageEngine may be found by using the ways below;
On Robomongo;
On MongoChef;
You can detect this via:
db.serverStatus().wiredTiger
So at "present" where this "exists" then there is a different storage engine configured other than the default "MMAPv1" where "WiredTiger" is not used.
This applies to the present "MongoDB 3.0x" series