Do apache ignite REST APIs provide implicit locks? - rest

I want to perform get and put operations on an ignite cache using ignite REST APIs. In my application, multiple systems will be performing these operations simultaneously.
https://apacheignite.readme.io/docs/rest-api

Yes, cache writes and reads are safe to execute simultaneously from multiple threads or clients.

Related

Is it me or are DynamoDb Streams just really lacking?

I have an application running in multiple regions in AWS, this application reads from global DynamoDb table(s). Updates occur in the background via another process and I wanted to be able to be able to monitor for these updates so the application can invalidate its cache (I'm not using DAX).
I was thinking I could use DynamoDb streams for this, however; after going through a number of road blocks with Spring Kinesis Streams Binder (e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry] be created, my company doesn't allow dynamic creation of tables (so that was fun to hunt down as I couldn't find any mention in the docs - 🤷‍♀️ maybe I missed it). Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
Is that true?
Is there a way
Is there a way for multiple applications, that only read from DynamoDb, to get notified when an update occurs? I was thinking that I could use DynamoDb Streams such that each app would monitor the stream for updates and be able to invalidate their cache. If the above is true, then I need to do something more involved or complex (use a SNS/SQS for updates, elasticache, Redis, Kafka) which just seems like overkill for this scenario.
e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry]
Well, that's how consumer group management is handled by Spring Cloud Stream Kinesis Binder. Even if you would use only a KCL, it still would require from you extra table in DynamoDB. Therefore your concern sounds more like a lack of confidence in cloud services you use.
Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
That's not true if all your consumer applications are configured for different consumer groups.
Please, make yourself familiar with Spring Cloud Stream and its model: https://docs.spring.io/spring-cloud-stream/docs/3.1.1/reference/html/spring-cloud-stream.html#_main_concepts
Another way probably could be done via AWS Lambda trigger for DynamoDB Streams: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

Solution architecture Kafka

I am working with a third party vendor who I asked to provide me the events generated by a website
The vendor proposed to stream the events using Kafka ... why not...
On my side (the client) I am running a 100% MSSQL/Windows production environment and internal business want to have kpi and dashboard on website activities
Now the question - what would be the architecture to support a PoC so I can manage the inputs on one hand and create datamarts to deliver business needs?
Not clear what you mean by "events from website". Your Kafka producers are typically server side components, as you make API requests, you'd put Kafka producing events between those requests and your databases calls. I would be surprised if any third-party would just be able to do that immediately
Maybe you're looking for something like https://divolte.io/
You can also use CDC products to stream events out of your database
The architecture could be like this. The app streams event to Kafka. You can write a service to read the data from Kafka, do transformation and write to Database. You can then build Dashboard on top of DB.
Alternatively, you can populate indexes in Elastic Search and build Kibana dashboard as well.
My suggestion would be to use Lambda architecture to cater both Real-time and Batch processing needs:
Architecture:
Lambda architecture is designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods.
This architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data.
Another Solution:

Will WebFlux have any bottlenecks in such architecture?

We're currently about to migrate from monolithic design to the microservice architecture, trying to choose the best way to replace JAX-WS with RESTful and considering to use Spring WebFlux.
We currently have an JAX-WS endpoint deployed at Tomcat EE serving requests from third-party clients. Webservice endpoint makes a long running blocking call to the database and then sends a SOAP-response to the client with a data retrieved from DB (Oracle).
Oracle DB will be replaced with one of NoSQL databases soon (possibly it will be MongoDB). Since MongoDB supports asynchronous calls we're considering to substitute current implementation with a microservice exposing REST endpoint based on WebFlux.
We have about 2500 req/sec at peaks, so current endpoint often gets down with a OutOfMemoryError. It was a root cause that pushed us towards migration.
My thoughts are to create a non-blocking endpoint which will call MongoDB in asynchronous manner and send a REST-response to the client. So I have a few questions considering basic features that WebFlux provides:
As far as I concerned there is a built-in backpressure control at
the business-level (not TCP flow control) in WebFlux and it works
generally via Reactive Streams. Since our clients are not
reactive, does it means that such way of a backpressure control is
not implementable here?
Suppose that calls to a new database remains long-running in a new
architecture. Since Netty uses EventLoop to serve incoming
requests, is there possible a situation when the microservice has
accepted all incoming HTTP connections, invoke an async call to the
db and subscribed a resulted Mono to the scheduler, but, since
the request quantity keeps growing explosively, application keep
creating new workers at scheduler pools that leads to a
crashing? Is this a realistic scenario?
Suppose that calls to the database remained synchronous. Is there a
way to handle them using WebFlux in a such way that microservice
will remain reachable under load?
Which bottlenecks can be found in such design? Does this solution
looks adequate?
Does Netty (or Reactor-Netty, or whatever) has a tool to limit a
quantity of requests processing simultaneously? Say I would to limit
the endpoint to serve not more than 100 parallel requests and skip
all requests above that point, is it possible?
Suppose I will create a huge amount of threads serving async (or
maybe sync) calls to the DB. Where is a breaking point when the
application will crash or stop responding to the incoming
HTTP-requests? What will happened there - we will ran out of memory
or..?
Finally, there were no any major issues concerning perfomance during our pilot project. But unfortunately we didn't take in account some specific Linux (and also OpenShift) TCP tuning props.
They may significanly affect the overall perfomance, in our case we've gained about 10 times more requests after tuning.
So pay attention to the net.core.somaxconn and other related parameters.
I've summarized our expertise in the article.

can vert.x event bus replace the need for Kafka?

I am evaluating the vert.x framework to see if I can reduce the Kafka based communications between my microservices developed using spring boot.
The question is:
Can I replace
Kafka with vert.x event bus and
spring boot microservices with vert.x based verticles
To answer quickly, I would say it depends on your needs.
Yes, the eventbus can be a good way to handle natively communication between microservices verticles using an asynchronous and non-blocking paradigm.
But in some cases you could need:
to handle some common enterprises patterns like replay mechanisms, persistence of messages, transactional reading
to be able to process some kind of messages in a chronological order
to handle communication between multiples kind of microservices that aren't all written with the same framework/toolkit or even programming language
to handle reliability, resilience and
failure recovery when all your consumers/microservices/verticles are died
to handle dynamic horizontal scalability and monitoring of your consumers/microservices/verticles
to be able to work with a single cluster deployed in multi-datacenters and multi-regions
In those cases I'd prefer to choose Apache Kafka over the native eventbus or an old fascioned JMS compliant system.
It's not forbidden to use both eventbus and kafka in the same microservices architecture according to your real needs. For example, you could have one kafka consumers group reading a kafka topic to handle scaling, monitoring, failure recovery and reply mechanism and then handle communication between your sub-verticles through the eventbus.
I'll clarify a little bit for the scalability and monitoring part and explain why I think it's more simple to handle that with Kafka over the native eventbus and cluster mode with vert.x : Kafka allow us to know in real time (through JMX metrics and the describe command):
the "lag" of a topic which corresponds to
the number of unread messages
the number of consumers of each group that are listening a topic
the number of partitions of a topic affected of each consumers
i/o metrics
So it's possible to use an ElasticStack or Prometheus+Grafana solution to monitor those metrics and use them to handle a dynamic scalability (when you know that there's a need to increase temporarily the number of consumers for example according to the lag metric and the number of partitions and the cpu/ram/swap metrics of your hosts).
To answer the second question vert.x or SpringBoot my answer will be not very objective but I'd vote for vert.x for its performances on the JVM and especially for its simplicity. I'm a little tired of the Spring factory and its big layers of abstraction that hides a lot of issues under a mountain of annotations triggering a mountain of AOP.
Moreover, In the Java world of microservices, there's other alternatives to SpringBoot like the different implementations of Microprofile (thorntail project for example).
The event-bus is not persistent. You should use it for fast verticle-to-verticle communications, and more generally to dispatch events where you know that you can loose them if you have some crash.
Kafka streams are persistent, and you should send events there because either you want other (possibly non-Vert.x) applications to consume them, and/or because you want to ensure that these events are not being lost in case of failure.
A reactive (read "scalable and fault-tolerant") Vert.x application typically uses a combination of both the event-bus and some replicable messaging systems like AMQP / Kafka / etc.
On the question:
Can I replace spring boot microservices with vert.x based verticles?
Yes, definitely, although the 2 have different programming models.
If you want a more progressive approach and use Spring for structuring your application while using Vert.x for resource efficiency over your I/O and event processing then you can mix them, see https://github.com/vert-x3/vertx-examples/tree/master/spring-examples for examples.
Take a look at the Quarkus framework: in the workshop section you'll find Vert.x and Apache Kafka combined!

Throttle API calls to external service using Scala

I have a service exposing a REST endpoint that, after a couple of transformations, calls a third-party service also via its REST endpoint.
I would like to implement some sort of throttling on my service to avoid being throttled by this third-party service. Note that my service's endpoint accepts only one request and not a list of them. I'm using Play and we also have Akka Streams as dependency.
My first though was to have my service saving the requests into a database table and then have an Akka Streams Source, leveraging the throttle function, picking tasks, applying the transformations and then calling the external service.
Is this a reasanoble approach or does it have any severe drawbacks?
Thanks!
Why save the requests to the database? Does the queue need to survive restarts and/or do you run a load-balanced setup that needs to somehow synchronize the requests?
If you don't need the above I'd think using only Source.queue to store the task data would work just as well?
And maybe you already thought of this: If you want to make your endpoint more resilient you should allow your API to send a 'sorry, busy' response and drop the request instead of queuing it if your queue grows beyond a certain size.