Mongo DB reactive streams ForEach operation corresponding to Mongo DB Async driver - mongodb

I have been trying to move from Mongo-DB async driver to reactive streams.
so far I have been successful in migrating most of the operations.
But I'm stuck with MongoDbIterable and trying to find a compatible version for reactive driver
Here is the code snippet for async driver that Im trying to migrate
String param = "hello";
database. getCollection("sample").find(Filters.eq("mongo", param)).forEach(
new Block<T>() {
#Override
public void apply(ProcessingProtectedRegion region) {
//my code to handle
}
},
//implementation of SingleResultCallback<T>
);
Im trying to migrate the above snippet to Reactive driver but not able to find the correct operation which would behave similar to the ForEach() of async driver that takes 2 parameter as the reactive streams operations always needs subscriber
documentation of Async driver ForEach operation
/* Iterates over all documents in the view, applying the given block to each, and completing the returned future after all documents
* have been iterated, or an exception has occurred.
* #param block the block to apply to each document
* #param callback a callback that completed once the iteration has completed
*/
void forEach(Block<? super TResult> block, SingleResultCallback<Void> callback);

Related

Mutiny - Kafka writes happening sequentially

I am new to Quarkus. I am trying to write a REST endpoint using quarkus reactive that receives an input, does some validation, transforms the input to a list and then writes a message to kafka. My understanding was converting everything to Uni/Multi, would result in the execution happening on the I/O thread in async manner. In, the intelliJ logs, I could see that the code is getting executed in a sequential manner in the executor thread. The kafka write happens in its own network thread sequentially, which is increasing latency.
#POST
#Consumes(MediaType.APPLICATION_JSON)
#Produces(MediaType.APPLICATION_JSON)
public Multi<OutputSample> send(InputSample inputSample) {
ObjectMapper mapper = new ObjectMapper();
//deflateMessage() converts input to a list of inputSample
Multi<InputSample> keys = Multi.createFrom().item(inputSample)
.onItem().transformToMulti(array -> Multi.createFrom().iterable(deflateMessage.deflateMessage(array)))
.concatenate();
return keys.onItem().transformToUniAndMerge(payload -> {
try {
return producer.writeToKafka(payload, mapper);
} catch (JsonProcessingException e) {
e.printStackTrace();
}
return null;
});
}
#Inject
#Channel("write")
Emitter<String> emitter;
Uni<OutputSample> writeToKafka(InputSample kafkaPayload, ObjectMapper mapper) throws JsonProcessingException {
String inputSampleJson = mapper.writeValueAsString(kafkaPayload);
return Uni.createFrom().completionStage(emitter.send(inputSampleJson))
.onItem().transform(ignored -> new OutputSample("id", 200, "OK"))
.onFailure().recoverWithItem(new OutputSample("id", 500, "INTERNAL_SERVER_ERROR"));
}
I have been on it for a couple of days. Not sure if doing anything wrong. Any help would be appreciated.
Thanks
mutiny as any other reactive library is designed mainly around data flow control.
That being said, at its heart, it will offer a set of capabilities (generally through some operators) to control flow execution and scheduling. This means that unless you instruct munity objects to go asynchronous, they will simply execute in a sequential (old) fashion.
Execution scheduling is controlled using two operators:
runSubscriptionOn: which will cause the code snippet generating the items (which is generally referred to upstream) to execute on a thread from the specified Executor
emitOn: which will cause subscribing code (which is generally referred to downstream) to execute on a thread from the specified Executor
You can then update your code as follows causing the deflation to go asynchronous:
Multi<InputSample> keys = Multi.createFrom()
.item(inputSample)
.onItem()
.transformToMulti(array -> Multi.createFrom()
.iterable(deflateMessage.deflateMessage(array)))
.runSubscriptionOn(Infrastructure.getDefaultExecutor()) // items will be transformed on a separate thread
.concatenate();
EDIT: Downstream on a separate thread
In order to have the full downstream, transformation and writing to Kafka queue done on a separate thread, you can use the emitOn operator as follows:
#POST
#Consumes(MediaType.APPLICATION_JSON)
#Produces(MediaType.APPLICATION_JSON)
public Multi<OutputSample> send(InputSample inputSample) {
ObjectMapper mapper = new ObjectMapper();
return Uni.createFrom()
.item(inputSample)
.onItem()
.transformToMulti(array -> Multi.createFrom().iterable(deflateMessage.deflateMessage(array)))
.emitOn(Executors.newFixedThreadPool(5)) // items will be emitted on a separate thread after transformation
.onItem()
.transformToUniAndConcatenate(payload -> {
try {
return producer.writeToKafka(payload, mapper);
} catch (JsonProcessingException e) {
e.printStackTrace();
}
return Uni.createFrom().<OutputSample>nothing();
});
}
Multi is intended to be used when you have a source that emits items continuously until it emits a completion event, which is not your case.
From Mutiny docs:
A Multi represents a stream of data. A stream can emit 0, 1, n, or an
infinite number of items.
You will rarely create instances of Multi yourself but instead use a
reactive client that exposes a Mutiny API.
What you are looking for is a Uni<List<OutputSample>> because your API you return 1 and only 1 item with the complete result list.
So what you need is to send each message to Kafka without immediately waiting for their return but collecting the generated Unis and then collecting it to a single Uni.
#POST
public Uni<List<OutputSample>> send(InputSample inputSample) {
// This could be injected directly inside your producer
ObjectMapper mapper = new ObjectMapper();
// Send each item to Kafka and collect resulting Unis
List<Uni<OutputSample>> uniList = deflateMessage(inputSample).stream()
.map(input -> producer.writeToKafka(input, mapper))
.collect(Collectors.toList());
// Transform a list of Unis to a single Uni of a list
#SuppressWarnings("unchecked") // Mutiny API fault...
Uni<List<OutputSample>> result = Uni.combine().all().unis(uniList)
.combinedWith(list -> (List<OutputSample>) list);
return result;
}

Denormalization practices in reactive application

I am creating a reactive application with Meteor (with MongoDB as a backend).
I initially created a non-reactive-aware collection and denormalizers, eg.:
class DocCollection extends Mongo.Collection {
insert(doc, callback) {
const docId = super.insert(doc, callback);
doc = docMongo.findOne(docId); // for illustration, A
console.log(doc);
return docId;
}
}
docMongo = new DocCollection();
Now, I'd like to wrap it into MongoObservable, which will facilitate listening to the changes to the collection:
export const Doc = new MongoObservable.Collection(docMongo);
Then, I define a Method:
Meteor.methods({
add_me() {
Doc.insert(myDoc);
}
});
in server/main.js and call it in app.component.ts's constructor:
#Component(...)
export class AppComponent {
constructor() {
Meteor.call('add_me');
}
}
I get undefined printed to console (unless I sleep a little before findOne), so I suppose when I was looking for the doc after insertion in my Mongo.Collection, the document wasn't yet ready to be searched for.
Why does it happen, even though I overwrote the non-reactive class and only then wrapped it in MongoObservable?
How do I typically do denormalization with a reactive collection? Should I pass observables to my denormalizers and there create new ones, or is it possible to nicely wrap the non-reactive code afterwards (like I tried and failed above)? Note that I don't want to directly pass doc inside, as in more complex scenarios it will cause other inserts/updates elsewhere for which I'd also want to wait.
How do people typically test these things? If I run a test, the code above may succeed locally, as db insertion time is small, but fail when the delay is higher.

Azure Function Queue triggered with Mediatr - DbContext error

I'm implementing Queue triggered azure function - I'm using a Mediator Pattern library called Mediatr for enhancing command query segregation - and using the latest run-time (2.0.12382.0) constructor dependency injection in Azure Function according to the following tutorial
https://devkimchi.com/2019/02/22/performing-constructor-injections-on-azure-functions-v2/
For each Azure function trigger, I call a Mediatr CommandHandler but i'm receiving error :
"A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext, however instance members are not guaranteed to be thread safe. This could also be caused by a nested query being evaluated on the client, if this is the case rewrite the query avoiding nested invocations."
The error states that i'm trying to access the same instance of DbContext from parallel tasks. however I only have one command handler (Mediatr Handler) and one Query Handler. and i'm using constructor injection for that
I tried to change the Meditr service to be transient in the startup , but still receive the same error on testing the function inside the azure function emulator
Startup Class
public class StartUp : IWebJobsStartup
{
public void Configure(IWebJobsBuilder builder)
{
var configuration = new ConfigurationBuilder()
.AddJsonFile("local.settings.json", optional: true, reloadOnChange: true)
.AddEnvironmentVariables()
.Build();
var connection = configuration.GetConnectionString("Default");
builder.Services.AddDbContext<CoreDBContext>(options =>
{
options.UseSqlServer(connection, p =>
{
p.MigrationsAssembly("B12Core.Persistence");
});
}
);
builder.Services.AddTransient(typeof(IPipelineBehavior<,>), typeof(RequestPreProcessorBehavior<,>));
builder.Services.AddTransient(typeof(IPipelineBehavior<,>), typeof(RequestPerformanceBehaviour<,>));
builder.Services.AddTransient(typeof(IPipelineBehavior<,>), typeof(RequestValidationBehavior<,>));
builder.Services.AddMediatR(p =>
{
p.AsTransient();
}, typeof(CreateMessageCommand).GetTypeInfo().Assembly);
}
}
Full Error
System.Private.CoreLib: Exception while executing function: Function1. Microsoft.EntityFrameworkCore: A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext, however instance members are not guaranteed to be thread safe. This could also be caused by a nested query being evaluated on the client, if this is the case rewrite the query avoiding nested invocations.
Solved it by changing the db context injection lifetime to ServiceLifetime.Transient
builder.Services.AddDbContext<CoreDBContext>(options =>
{
options.UseSqlServer(connection, p =>
{
p.MigrationsAssembly("Presistence");
});
},ServiceLifetime.Transient
);

Is there a way to use Solr's streaming API with spring data solr?

I have a use case where I need to fetch the ids of my entire solr collection. For that, with solrj, I use the Streaming API like this :
CloudSolrServer server = new CloudSolrServer("zkHost1:2181,zkHost2:2181,zkHost3:2181");
SolrQuery query = new SolrQuery("*:*");
server.queryAndStreamResponse(tmpQuery, handler);
Where handler is a class that implements StreamingResponseCallback, ommited in my code for brevity.
Now, the Spring data repositories abstraction give me the ability to search by pages, by cursors, but I can't seem to find a way to handle the streaming use case.
Is there a workaround ?
SolrTemplate allows to access the underlying SolrClient in a callback style. So you could use that one to work around the current limitations.
The result conversion using the MappingSolrConverter available via the SolrTemplate is broken at the moment (I need to check why) - but you get the idea of how to do it.
solrTemplate.execute(new SolrCallback<Void>() {
#Override
public Void doInSolr(SolrClient solrClient) throws SolrServerException, IOException {
SolrQuery sq = new SolrQuery("*:*");
solrClient.queryAndStreamResponse("collection1", sq, new StreamingResponseCallback() {
#Override
public void streamSolrDocument(SolrDocument doc) {
// the bean conversion fails atm
// ExampleSolrBean bean = solrTemplate.getConverter().read(ExampleSolrBean.class, doc);
System.out.println(doc);
}
#Override
public void streamDocListInfo(long numFound, long start, Float maxScore) {
// do something useful
}
});
return null;
}
});

Create async methods to load data from database using EF

how does one write an async method that gets data from database, using DbContext and .NET 4.0?
public Task<List<Product>> GetProductsAsync()
{
// Context.Set<Product>().ToList();
}
so we can await somewhere for result of this method
// somewhere
List<Product> products = await repository.GetProductsAsync();
What is important is that I do not want to use thread pool thread for this task, since this is an IO task, and I need to support framework 4.0, and EF6 had added support for async methods only in 4.5, if I am not mistaken.
Not possible, SaveAsync is wrapped with:
#if !NET40
Source: https://github.com/aspnet/EntityFramework6/blob/master/src/EntityFramework/DbContext.cs#L335
Installing Entity Framework 6 is available as a Nuget Package and works also with .NET 4. In EF6, async
DB calls can be done like this example:
public static async Task<List<Customer>> SelectAllAsync()
{
using (var context = new NorthWindEntities())
{
var query = from c in context.Customers
orderby c.CustomerID ascending
select c;
return await query.ToListAsync();
}
}
Here, a DbContext is instantiated in a using block and the query will search the Northwind database for all its customers and order ascending by primary key CustomerID. The method needs to be decorated with async keyword and must return a task of the desired type T, here a list of customers. The result will then be return by using the ToListAsync() method in this case and one must remember to use the await keyword. There are additional Async methods in EF6, such as SaveChangesAsync, SingleOrDefaultAsync and so on. Also, async methods in C# should following this naming conventions according to guidelines.
To get started test the code above in Visual Studio, just create a new Console Application solution, type: install-package EntityFramework in the Package Manager Console and add an edmx file pointing to a local Northwind database. The main method will then just call the method above like this:
class Program
{
static void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
var task = CustomHelper.SelectAllAsync();
task.Wait();
Console.WriteLine("Got data!");
List<Customer> data = task.Result;
Console.WriteLine(data.Count);
Console.WriteLine("Async op took: " + sw.ElapsedMilliseconds);
sw.Stop();
sw.Start();
//data =
//var data = CustomHelper.SelectAll();
//Console.WriteLine("Got data!");
//Console.WriteLine(data.Count);
//Console.WriteLine("Sync operation took: " + sw.ElapsedMilliseconds);
Console.WriteLine("Press any key to continue ...");
Console.ReadKey();
}
}
I see that the async operation takes a bit longer than the sync operation, so the async db call will have some time penalty but allows asynchronous processing, not freezing the calling code awaiting the result in a blocking fashion.