Mongo Java Driver 3.6: Change Streams Codec Error - mongodb

I'm trying to use the new ChangeStream feature with Mongo Java Driver 3.6, but I'm stuck. This is my error:
ChangeStreamDocument contains generic types that have not been specialised.
Top level classes with generic types are not supported by the PojoCodec.
Here's how I'm starting the changeStream:
CodecRegistry pojoCodecRegistry = fromRegistries(MongoClient.getDefaultCodecRegistry(),
fromProviders(PojoCodecProvider.builder().automatic(true).build()));
MongoDatabase database = mongoClient.getDatabase(mongoClientURI.getDatabase())
.withCodecRegistry(pojoCodecRegistry);
collection.insertOne(Person.builder().age(100).build());
collection.insertOne(Person.builder().age(100).build());
collection.watch().forEach((Block<? super ChangeStreamDocument<Person>>) personChangeStreamDocument -> {
System.out.println(personChangeStreamDocument.getFullDocument());
});
Person is just a POJO.

There's currently a bug with the automatic ProjoCodecProvider as described in JAVA-2800.
The temporary workaround is register the POJO classes manually for example:
CodecRegistry pojoCodecRegistry = fromRegistries(MongoClient.getDefaultCodecRegistry(),
fromProviders(PojoCodecProvider.builder().register(Person.class).build()));
Additionally, please note that you may not be able to see the event insert if you performed the insert operation before you open a change stream on a collection. Although you can test this easily by inserting/altering documents from another thread or process. See also Change Streams for more information.
UPDATE: The ticket JAVA-2800 has been resolved and a fix version is available on MongoDB Java Driver v3.6.4+.

Related

test both forwards and backwards compatibility of json schemas for both readers and writers

I am seeking to programmaticaly answer the question whether an arbitrary two message schemas are compatible. Ideally for each of JSON, Avro and Protobuf. I am aware that internally to the kafka schema registry there is such logic. Yet I want to ask programmatically in a deployment pipeline if when I am promoting an old topic reader between environments whether it will be able to read the latest (or historic) messages on the topic.
I am aware that:
The maven plugin can do this at consumer compile time but that option isn't open to me as I am not using Java for my "exotic" consumer. Yet I can generate the json schema it expects.
I am aware that I can invoke the schema registry API to ask about a new schema being compatible with an old one but I want to ask whether a reader that knows its own expected schema is compatible with the latest registered and that isn't supported.
I am aware that the topic can be set with FORWARD_TRANSITIVE or FULL_TRANSITIVE compatibility and based on knowing that I can assume my reader will always work. Yet I do not control the many topics in a large organization controlled by many other teams so I cannot enforce that the many teams with many existing topics set a correct policy.
I am aware that with careful testing and change management we can manually verify compatibility. The real world is messy and we will be doing this at scale with many inexperienced teams so anything that can go wrong will most certainly go wrong at some point.
I am aware that other folks must have wanted to do this so if I searched hard enough the answer should be there; yet I have read all the Q&As I could find and I really couldn't find any actual working answer to any of the past times this question has been asked.
What I want to control is that I don't promote a reader into production when at that point in time I can pull the latest registered schema and check it is compatible with the (generated) schema expected by the (exotic) reader that is being deployed.
I am aware I can just pull the source code that the kafka schema registry / maven plugin uses and roll my own solution but I feel that there must be a convenient solution out there I can easily script on a deployment pipeline to check "is this (generated) schema a subset of that (published/downloaded) one".
Okay I cracked open the Confluent Kafka Client code and wrote a simple Java CLI that uses it's logic and the generic ParsedSchema class that can be either a concrete json, or avro, or protobuf schema. The dependencies are:
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-client</artifactId>
<version>7.3.1</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-json-schema-provider</artifactId>
<version>7.3.1</version>
</dependency>
To be able to handle avro and protobuf it would need to have their corresponding schema provider dependency included. The code then parses a json schema from a string with:
final var readSchemaString = ... // load from file
final var jsonSchemaRead = new JsonSchema(readSchemaString);
final var writeSchemaString = ... // load from file
final var jsonSchemaWrite = new JsonSchema(writeSchemaString);
Then we can validate that using my helper class:
final var validator = new Validator();
final var errors = validator.validate(jsonSchemaRead, jsonSchemaWrite);
for( var e : errors) {
logger.error(e);
}
System.exit(errors.size());
The validator class is very basic just using "can be read" semantics:
public class Validator {
SchemaValidator validator = new SchemaValidatorBuilder().canBeReadStrategy().validateLatest();
public List<String> validate(ParsedSchema reader, ParsedSchema writer) {
return validator.validate(reader, Collections.singleton(writer));
}
}
The full code is over on github at /simbo1905/msg-schema-read-validator
At deployment time I can simply curl the latest schema in the registry (or any historic schemas) that writers should be using. I can generate the schema for my reader on disk. I can then run that simple tool to check that things are compatible.
Running things in a debugger I can see that there are at least 56 different schema compatibility rules that are being checked for JSON schema alone. It would not be feasible to try to code that up oneself.
It should be relatively simply to extend the code to add in the avro and protobuf providers to get a ParsedSchema for those if anyone wants to be fully generic.

How to get stream from Mongodb 3.6 $changeStream instead of pulling the cursor using $cmd.getMore?

I am a mongodb driver developer.
Is any way to get a stream for the changes? Like websocket/sse, keep sending data without close it.
Below is the $cmd which sent to mongodb to get a new changes from the server (I am using mongodb-core#3.0.2)
‌
{
"getMore":"5293718446697444994",
"collection":"event",
"batchSize":1
}
Is any way to get a stream for the changes?
According to the official MongoDB driver specifications for ChangeStream, it is an abstraction of a TAILABLE_AWAIT cursor. You may choose to implement it as an extension of an existing tailable cursor implementation.
Extending an existing cursor implementation would provide benefits as you don't have to implement other behaviour/features that comes automatically with a cursor.

How do we do select query using phantom driver without table defintion

I have streaming of data coming from SparkStreaming. Which i need to process and finally want to store the data in Cassandra. So, earlier i was trying to use SparkCassandra connector. But it doesn't give the access of SparkStreaming Context object on workers. So, I have to use separate cassandra-scala driver. Hence, i ended up with phantom. Now, my question is i have already defined the column family in the cassnandra. So, how do i do the select and update query from scala.
I have followed these documentation link1 but i don't understand why do we need to give the table definition at client (scala code) side. Why can't we just give Keyspace, ClusterPoints and ColumnFamily and be done with it.
object CustomConnector {
val hosts = Seq("IP1", "IP2")
val Connector = ContactPoints(hosts).keySpace("KEYSPACE_NAME")
}
realTimeAgg.foreachRDD{ x => if (x.toLocalIterator.nonEmpty) {
x.foreachPartition {
How to achieve select/insert in Cassandra table here using phantom
}
This is not yet possible using phantom, we are actively working on phantom-spark to allow you to do this, but at this stage in time this is still a few months away.
In the interim, you will have to rely on the spark cassandra connector and use the non type-safe API to achieve this. It's a more unfortunate setup, but in the very near future this will be resolved.

JacksonDBDecoder - Server errors from MongoDB - mongojack 2.3.0

I have a problem when my application receives an error from MongoDB server, for example:
Imagine that I do a find in MongoDB but the response from MongoDB Server is an error because of a timeout:
{
$err: "MongoTimeout due to...bla bla bla..."
code: 50
}
JacksonDBDecoder is expecting my Java type for example my class "Stuff" (that contains several fields like "price" and "weight"), but when it receives the previous JSON, then there is not any matching fields with "price" and "weight", so the outcome is empty: { }
The empty JSON will be processed by the mongo-java-driver classes (com.mongodb.QueryResultIterator.throwOnQueryFailure exactly) and it will never log the original information "MongoTimeout due to...bla bla bla..." and 50, because the decoder could not understand the JSON from MongoDB Server.
Could you help me to configure mongojack or Jackson to handle this types of responses from MongoDB Server?
Many thanks.
Regards,
Paco.
After talking with MongoDB support, they confirmed to me the following:
"The driver team read our last comments and they believe this is indeed a driver bug. Basically, they think the driver should detect that this is a query failure and use the default decoder to decode the error document rather than the custom decoder registered by MongoJack.
The most relevant part is that this bug does not exist in the 3.x driver series. So our suggestion for you is to upgrade to the 3.2.2 driver (note that MongoJack lists the 3.2 Java driver as its preferred dependency: http://mongojack.org/dependencies.html)."
So, it is not an mongojack issue ;)
Regards,
Paco.

How to rollback transaction in Grails integration tests on MongoDB

How I can (should) configure Grails integration tests to rollback transactions automatically when using MongoDB as datasource?
(I'm using Grails 2.2.1 + mongodb plugin 1.2.0)
For spock integration tests I defined a MongoIntegrationSpec that gives some control of cleaning up test data.
dropDbOnCleanup = true // will drop the entire DB after every feature method is executed.
dropDbOnCleanupSpec = true // will drop the entire DB after after the spec is complete.
dropCollectionsOnCleanup = ["collectionA", "collectionB", ...] // drops collections after every feature method is executed.
dropCollectionsOnCleanupSpec = ["collectionA", "collectionB", ...] // drops collections after the spec is complete.
dropNewCollectionsOnCleanup = true // after every feature method is executed, all new collections are dropped
dropNewCollectionsOnCleanupSpec = true // after the spec is complete, all new collections are dropped
Here's the source
https://github.com/onetribeyoyo/mtm/tree/dev/src/test/integration/com/onetribeyoyo/util/MongoIntegrationSpec.groovy
And the project has a couple usage examples too.
I don't think that it's even possible, because MongoDB doesn't support transactions. You could use suggested static transactional = 'mongo', but it helps only if you didn't flush your data (it's rare situation I think)
Instead you could cleanup database on setUp() manually. You can drop collection for a domain that you're going to test, like:
MyDomain.collection.drop()
and (optinally) fill with all data require for your test.
Can use static transactional = 'mongo' in integration test and/or service class.
Refer MongoDB Plugin for more details.
MongoDB does not support transactions! And hence you cannot use it. The options you have are
1. Go around and drop the collections for the DomainClasses you used.
MyDomain.collection.drop() //If you use mongoDB plugin alone without hibernate
MyDomain.mongo.collection.drop() //If you use mongoDB plugin with hibernate
Draw back is you have to do it for each domain you used
2. Drop the whole database (You don't need to create it explicitly, but you can)
String host = grailsApplication.config.grails.mongo.host
Integer port = grailsApplication.config.grails.mongo.port
Integer databaseName = grailsApplication.config.grails.mongo.databaseName
def mongo = new GMongo(host, port)
mongo.getDB(databaseName).dropDatabase() //this takes 0.3-0.5 seconds in my machin
The second option is easier and faster. To make this work for all your tests, extend IntegrationSpec and add the code to drop the database in the cleanup block (I am assuming you are using Spock test framework) or do a similar thing for JUnit like tests!
Hope this helps!