Spring Data MongoDB Reactive: How to get the result of db.runCommand() - mongodb

We're currently using spring data mongodb reactive as part of springboot application to connect to AWS Document DB.
I'm trying to enable change streams on the documentdb database on application startup. The idea is to enable it only if already not enabled. To this, I'm having the following code that runs the command using MongoClient from reactive package but I don't know how to read the response from running the query that checks if the change stream is already enabled.
#Service
public class InitDB {
private final MongoClient mongoClient;
private static final String enableChangeStreams = """
{modifyChangeStreams: 1,
database: "",
collection: "",
enable: true}
""";
private static final String isChangeStreamsEnabled = """
{aggregate: 1,
pipeline: [{$listChangeStreams: 1},
{$match: {$or: [{database: "", collection: ""}]}}
],
cursor:{}}
""";
#Autowired
public InitDB(final MongoClient mongoClient) {
this.mongoClient = mongoClient;
}
public void initializeDB() {
runCommand(isChangeStreamsEnabled);
}
private void runCommand(final String command) {
final MongoDatabase db = mongoClient.getDatabase("admin");
final Document document = Document.parse(command);
final Bson bson = document.toBsonDocument();
final Publisher<Document> commandResult = db.runCommand(bson);
commandResult.subscribe();
//How to fetch the command result response into a variable via subscribers?
final SubscriberHelpers.ObservableSubscriber subscriber = new SubscriberHelpers.ObservableSubscriber();
commandResult.subscribe(subscriber);
System.out.println(subscriber.isCompleted());
}
}
Can someone please guide on how to fetch the command result response into a variable via subscribers?
Based on the response, if the change stream is not enabled then I would want to run enableChangeStreams command (the code is yet to be built for this)

Since you are resorting to a blocking primitive using the SubscriberHelpers#ObervableSubscriber, you can use the await / get methods to block the executing context waiting for the MongoDB query response:
private void runCommand(final String command) {
final MongoDatabase db = mongoClient.getDatabase("admin");
final Document document = Document.parse(command);
final Bson bson = document.toBsonDocument();
final Publisher<Document> commandResult = db.runCommand(bson);
final SubscriberHelpers.ObservableSubscriber subscriber = new SubscriberHelpers.ObservableSubscriber();
commandResult.subscribe(subscriber);
// block waiting for command response
System.out.println(subscriber.get(1000, TimeUnit.MILLISECONDS));
}

Related

PUT requests to LogStash fail when sent using HttpClient, succeed when sent using cURL

Here is my logstash.conf file. (Apologies for not pasting the code here directly; StackOverflow does not allow posts exceeding a certain code-to-text ratio.)
My remote VM, which also hosts my ElasticSearch and LogStash servers, listens on Port 8080.
On my local machine, I periodically send zipped folders (containing JSON documents) over TCP to my remote server, which receives the data into a memory stream, unzips the folders, and sends the contents to LogStash. LogStash in turn forwards the data to ElasticSearch.
I am currently testing the workflow with some dummy data.
On my remote server, here is the method for receiving data over TCP:
private static void ReceiveAndUnzipElasticSearchDocumentFolder(int numBytesExpectedToReceive)
{
int numBytesLeftToReceive = numBytesExpectedToReceive;
using (MemoryStream zippedFolderStream = new MemoryStream(new byte[numBytesExpectedToReceive]))
{
while (numBytesLeftToReceive > 0)
{
// Receive data in small packets
}
zippedFolderStream.Unzip(afterReadingEachDocument: LogStashDataSender.Send);
}
}
Here is the code for unzipping the received folder:
public static class StreamExtensions
{
public static void Unzip(this Stream zippedElasticSearchDocumentFolderStream, Action<ElasticSearchJsonDocument> afterReadingEachDocument)
{
JsonSerializer jsonSerializer = new JsonSerializer();
foreach (ZipArchiveEntry entry in new ZipArchive(zippedElasticSearchDocumentFolderStream).Entries)
{
using (JsonTextReader jsonReader = new JsonTextReader(new StreamReader(entry.Open())))
{
dynamic jsonObject = jsonSerializer.Deserialize<ExpandoObject>(jsonReader);
string jsonIndexId = jsonObject.IndexId;
string jsonDocumentId = jsonObject.DocumentId;
afterReadingEachDocument(new ElasticSearchJsonDocument(jsonObject, jsonIndexId, jsonDocumentId));
}
}
}
}
And here is the method for sending data to LogStash:
public static async void Send(ElasticSearchJsonDocument document)
{
HttpResponseMessage response =
await httpClient.PutAsJsonAsync(
IsNullOrWhiteSpace(document.DocumentId)
? $"{document.IndexId}"
: $"{document.IndexId}/{document.DocumentId}",
document.JsonObject);
try
{
response.EnsureSuccessStatusCode();
}
catch (Exception exception)
{
Console.WriteLine(exception.Message);
}
Console.WriteLine($"{response.Content}");
}
The httpClient referenced in the public static async void Send(ElasticSearchJsonDocument document) method was created using the following code:
private const string LogStashHostAddress = "http://127.0.0.1";
private const int LogStashPort = 31311;
httpClient = new HttpClient { BaseAddress = new Uri($"{LogStashHostAddress}:{LogStashPort}/") };
httpClient.DefaultRequestHeaders.Accept.Clear();
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
When I step into a new debug instance, the program runs smoothly, but dies immediately after executing await httpClient.PutAsJsonAsync for each of the documents contained inside the zipped folder -- response.EnsureSuccessStatusCode(); is never hit; neither is Console.WriteLine(exception.Message); nor Console.WriteLine($"{response.Content}");.
Here is an example of ElasticSearchJsonDocument that is passed to the public static async void Send(ElasticSearchJsonDocument document) method:
When I ran the same PUT request using cURL, the Book index was successfully created, and I could then a GET request to retrieve the data from ElasticSearch.
My questions are:
Why did the program die immediately (with no visible exception messages) after executing await httpClient.PutAsJsonAsync(...) for each of the JSON document inside the received zipped folder?
What changes should I make to ensure that I can make successful PUT requests to LogStash using a HttpClient instance?
I changed my httpClient instantiation code from
httpClient = new HttpClient { BaseAddress = new Uri($"{LogStashHostAddress}:{LogStashPort}/") };
httpClient.DefaultRequestHeaders.Accept.Clear();
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
to
httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Accept.Clear();
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
And I changed await http.Client.PutAsJsonAsync(...) to
HttpResponseMessage response =
await httpClient.PutAsJsonAsync(
IsNullOrWhiteSpace(document.DocumentId)
? $"{LogStashHostAddress}:{LogStashPort}/{document.IndexId}"
: $"{LogStashHostAddress}:{LogStashPort}/{document.IndexId}/{document.DocumentId}",
document.JsonObject);
response.EnsureSuccessStatusCode();
It turns out that the BaseAddress field in HttpClient is extremely user-unfriendly, so instead of wasting more time on it, I decided to just eliminate it entirely.

Update the subDcoument attribute

I have a document having schema like bellow
{
"Personal":[
{
"name":"Test_Name",
"isActive":true
}
]
}
am trying to update this as below using java driver.
collections.updateMany(new Document("Personal.name", "Test_Name"), new Document("$set", new Document("Personal.$.isActive", false)))
But unfortunately this trows an error
"The positional operator did not find the match needed from the query. Unexpanded update: Personal.$.isActive"
But if i modify the above update filter something like
collections.updateMany(new Document("Personal.name", "Test_Name"), new Document("$set", new Document("Personal.0.isActive", false)))
it works.
Can any one help me in understanding whats wrong in using "$" in my 1st update statement?
Here is some more code spinets
Creating the collection object:
collections = mongoConnection.establishConnection()
MongoConnection object:
public MongoConnection() {
StringBuilder connectionString = new StringBuilder();
connectionString.append("mongodb://url_with_port_and_server")
client = new MongoClient(new MongoClientURI(connectionString.toString()));
db = client.getDatabase("test");
collections = db.getCollection("test");
}
public MongoCollection<Document> establishConnection() {
return collections;
}
public void closeConnection() {
client.close();
}

How to access a OrientDB emedded database from a remote client?

I have written a small application running OrientDB embedded. It works well. I can read and write to the database from the applicatiom using a plocal connection.
Now I am trying to access the same database from a remote OrientDB client (from a another PC).
I am getting a error message telling me that the database is locked and cant be accessed.
Is there a work around for this, or are I doing something wrong?
Using Java and OrienDB 2.2.12
You can try this code for connection:
private static final String dbUrl = "remote:localhost/databaseName";
private static final String dbUser = "admin";
private static final String dbPassword = "admin";
public static void createDBIfDoesNotExist() throws IOException {
OServerAdmin server = new OServerAdmin(dbUrl).connect(dbUser, dbPassword);
if (!server.existsDatabase("plocal")) {
server.createDatabase("graph", "plocal");
}
server.close();
}
public static void connectToDBIfExists() throws IOException {
OServerAdmin server = new OServerAdmin(dbUrl).connect(dbUser, dbPassword);
// some code
server.close();
}

Too many connections in MongoDB and Spark

My Spark Streaming application stores the data in MongoDB.
Unfortunately each Spark worker opening too many connections while storing it in MongoDB
Following is my code Spark - Mongo DB code:
public static void main(String[] args) {
int numThreads = Integer.parseInt(args[3]);
String mongodbOutputURL = args[4];
String masterURL = args[5];
Logger.getLogger("org").setLevel(Level.OFF);
Logger.getLogger("akka").setLevel(Level.OFF);
// Create a Spark configuration object to establish connection between the application and spark cluster
SparkConf sparkConf = new SparkConf().setAppName("AppName").setMaster(masterURL);
// Configure the Spark microbatch with interval time
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(60*1000));
Configuration config = new Configuration();
config.set("mongo.output.uri", "mongodb://host:port/database.collection");
// Set the topics that should be consumed from Kafka cluster
Map<String, Integer> topicMap = new HashMap<String, Integer>();
String[] topics = args[2].split(",");
for (String topic: topics) {
topicMap.put(topic, numThreads);
}
// Establish the connection between kafka and Spark
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc, args[0], args[1], topicMap);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
#Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
JavaPairDStream<Object, BSONObject> save = lines.mapToPair(new PairFunction<String, Object, BSONObject>() {
#Override
public Tuple2<Object, BSONObject> call(String input) {
BSONObject bson = new BasicBSONObject();
bson.put("field1", input.split(",")[0]);
bson.put("field2", input.split(",")[1]);
return new Tuple2<>(null, bson);
}
});
// Store the records in database
save.saveAsNewAPIHadoopFiles("prefix","suffix" ,Object.class, Object.class, MongoOutputFormat.class, config);
jssc.start();
jssc.awaitTermination();
}
How to control the no of connections at each worker?
Am I missing any configuration parameters?
Update 1:
I am using Spark 1.3 with Java API.
I was not able to perform coalesce() but I was able to do repartition(2) operation.
Now no of connections got controlled.
But I think connections are not being closed or not reused at worker.
Please find the below screenshot:
Streaming interval 1-minute and 2 partitions
You can try map partitions, which works on partition level instead of record level, I.e, task execute on one node will shares one database connection instead of for every record.
Also I guess you can use a pre partition( not the stream RDD). Spark is smart enough to utilize this to reduce shuffle.
I was able to solve the issue by using foreachRDD.
I am establishing the connection and closing it after every DStream.
myRDD.foreachRDD(new Function<JavaRDD<String>, Void>() {
#Override
public Void call(JavaRDD<String> rdd) throws Exception {
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
#Override
public void call(Iterator<String> record) throws Exception {
MongoClient mongo = new MongoClient(server:port);
DB db = mongo.getDB(database);
DBCollection targetTable = db.getCollection(collection);
BasicDBObject doc = new BasicDBObject();
while (record.hasNext()) {
String currentRecord = record.next();
String[] delim_records = currentRecord.split(",");
doc.append("column1", insert_time);
doc.append("column2", delim_records[1]);
doc.append("column3",delim_records[0]);
targetTable.insert(doc);
doc.clear();
}
mongo.close();
}
});
return null;
}
});

Mongo db java driver - resource management with client

Background
I have a remote hosted server thats running java vm with custom server code for multiplayer real-time quiz game. The server deals with matchmaking, rooms, lobbies etc. I'm also using a Mongo db on same space which holds all the questions for mobile phone quiz game.
This is my first attempt at such a project and although I'm competent in Java my mongo skill are novice at best.
Client Singleton
My server contains static singleton of mongo client:
public class ClientSingleton
{
private static ClientSingleton uniqueInstance;
// The MongoClient class is designed to be thread safe and shared among threads.
// We create only 1 instance for our given database cluster and use it across
// our application.
private MongoClient mongoClient;
private MongoClientOptions options;
private MongoCredential credential;
private final String password = "xxxxxxxxxxxxxx";
private final String host = "xx.xx.xx.xx";
private final int port = 38180;
/**
*
*/
private ClientSingleton()
{
// Setup client credentials for DB connection (user, db name & password)
credential = MongoCredential.createCredential("XXXXXX", "DBName", password.toCharArray());
options = MongoClientOptions.builder()
.connectTimeout(25000)
.socketTimeout(60000)
.connectionsPerHost(100)
.threadsAllowedToBlockForConnectionMultiplier(5)
.build();
try
{
// Create client (server address(host,port), credential, options)
mongoClient = new MongoClient(new ServerAddress(host, port),
Collections.singletonList(credential),
options);
}
catch (UnknownHostException e)
{
e.printStackTrace();
}
}
/**
* Double checked dispatch method to initialise our client singleton class
*
*/
public static ClientSingleton getInstance()
{
if(uniqueInstance == null)
{
synchronized (ClientSingleton.class)
{
if(uniqueInstance == null)
{
uniqueInstance = new ClientSingleton();
}
}
}
return uniqueInstance;
}
/**
* #return our mongo client
*/
public MongoClient getClient() {
return mongoClient;
}
}
Notes here:
Mongo client is new to me and I understand failure to properly utilise connection pooling is one major “gotcha” that greatly impact Mongo db performance. Also creating new connections to the db is expensive and I should try and re-use existing connections.
I've not left socket timeout and connect timeout at defaults (eg infinite) if connection hangs for some reason I think it will get stuck forever!
I set number of milliseconds the driver will wait before a connection attempt is aborted, for connections made through a Platform-as-a-Serivce (where server is hosted) it is advised to have a higher timeout (e.g. 25 seconds). I also set number of milliseconds the driver will wait for a response from the server for all types of requests (queries, writes, commands, authentication, etc.). Finally I set threadsAllowedToBlockForConnectionMultiplier to 5 (500) connection in, a FIFO stack, awaiting their turn on the db.
Server Zone
Zone gets a game request from client and receives the meta data string for quiz type. In this case "Episode 3". Zone creates room for user or allows user to join room with with that property.
Server Room
Room then establishes db connection to mongo collection for the quiz type:
// Get client & collection
mongoDatabase = ClientSingleton.getInstance().getClient().getDB("DBName");
mongoColl = mongoDatabase.getCollection("GOT");
// Query mongo db with meta data string request
queryMetaTags("Episode 3");
Notes here:
Following a game or I should say after an room idle time the room get destroyed - this idle time is currently set to 60 mins. I believe that if connections per host is set to 100 then while this room is idle then it would be using valuable connection resources.
Question
Is this a good way to manage my client connections?
If I have several hundred concurrently connected games and each accessing the db to pull the questions then maybe following that request free up the client connection for other rooms to use? How should this be done? I'm concerned about possible bottle necks here!
Mongo Query FYI
// Query our collection documents metaTag elements for a matching string
// #SuppressWarnings("deprecation")
public void queryMetaTags(String query)
{
// Query to search all documents in current collection
List<String> continentList = Arrays.asList(new String[]{query});
DBObject matchFields = new
BasicDBObject("season.questions.questionEntry.metaTags",
new BasicDBObject("$in", continentList));
DBObject groupFields = new BasicDBObject( "_id", "$_id").append("questions",
new BasicDBObject("$push","$season.questions"));
//DBObject unwindshow = new BasicDBObject("$unwind","$show");
DBObject unwindsea = new BasicDBObject("$unwind", "$season");
DBObject unwindepi = new BasicDBObject("$unwind", "$season.questions");
DBObject match = new BasicDBObject("$match", matchFields);
DBObject group = new BasicDBObject("$group", groupFields);
#SuppressWarnings("deprecation")
AggregationOutput output =
mongoColl.aggregate(unwindsea,unwindepi,match,group);
String jsonString = null;
JSONObject jsonObject = null;
JSONArray jsonArray = null;
ArrayList<JSONObject> ourResultsArray = new ArrayList<JSONObject>();
// Loop for each document in our collection
for (DBObject result : output.results())
{
try
{
// Parse our results so we can add them to an ArrayList
jsonString = JSON.serialize(result);
jsonObject = new JSONObject(jsonString);
jsonArray = jsonObject.getJSONArray("questions");
for (int i = 0; i < jsonArray.length(); i++)
{
// Put each of our returned questionEntry elements into an ArrayList
ourResultsArray.add(jsonArray.getJSONObject(i));
}
}
catch (JSONException e1)
{
e1.printStackTrace();
}
}
pullOut10Questions(ourResultsArray);
}
The way I've done this is to use Spring to create a MongoClient Bean. You can then autowire this bean wherever it is needed.
For example:
MongoConfig.java
import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.tescobank.insurance.telematics.data.connector.config.DatabaseProperties;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.net.UnknownHostException;
#Configuration
public class MongoConfig {
private #Autowired DatabaseProperties properties;
#Bean
public MongoClient fooClient() throws UnknownHostException {
return mongo(properties.getFooDatabaseURI());
}
}
Class Requiring Mongodb connection:
#Component
public class DatabaseUser {
private MongoClient mongoClient;
....
#Autowired
public DatabaseUser(MongoClient mongoClient) {
this.mongoClient = mongoClient;
}
}
Spring will then create the connection and wire it where required. What you've done seems very complex and perhaps tries to recreate the functionality you would get for free by using a tried and test framework such as Spring. I'd generally try to avoid the use of Singletons too if I could avoid it. I've had no performance issues using Mongodb connections like this.