In ODM, I configured the size and other properties of the XU connection pool for improved ruleset performance. I am trying to do the same when using the Business Rules service on Bluemix but do not know how to do so.
It is not possible to configure the XU connection pool for the Business Rules service on Bluemix.
However, I am working with an XML XOM, and I found that I could increase the value of the ruleset property xmlDocumentDriverPool.maxSize to configure the size of the execution pool of XML document drivers. The default setting of 1 could cause a performance bottleneck if several clients executed a ruleset concurrently, so I set its value to the number of clients.
Related
I have some general problems/questions regarding self managed Microservices (in Kubernetes).
The Situation:
I have a provider (Discord API) for my desired state, which tells me the count (or multiples of the count) of sharded connections (websocket -> stateful in some way) I should establish with the provider.
Currently a have a "monolithic" microservice (it can't be deployed in an autoscaling service and has to be stateful), which determines the count of connections i should have and a factor based on the currently active pods, that can establish a connection to this API.
It further (by heartbeating and updating the connection target of all those pods) manages the state of every pod and achieves this target configuration.
It also handles the case of a pod being removed from the service and a change of target configuration, by rolling out the updated target and only after updating the target discontinuing the old connections.
The Cons:
This does not in any way resemble a good microservice architecture
A failure of the manager (even when persisting the current state in a cache or db of some sort) results in the target of the target provider not being achieved and maybe one of the pods having a failure without graceful handling of the manager
The Pros:
Its "easy" to understand and maintain a centrally managed system
There is no case (assuming a running manager system) where a pod can fail and it wont be handled -> connection resumed on another pod
My Plan:
I would like this websocket connection pods to manage themselves in some way.
Theoretically there has to be a way in which a "swarm" (swarm here is just a descriptive word for pods within a service) can determine a swarm wide accepted target.
The tasks to achieve this target (or change of target) should then be allocated across the swarm by the swarm itself.
Every failure of a member of the swarm has to be recognized, and the now unhandled tasks (in my case websocket connections) have to be resumed on different members of the swarm.
Also updates of the target have to be rolled out across the swarm in a distinct manner, retaining the tasks for the old target till all tasks for the new target are handled.
My ideas so far:
As a general syncing point a cache like redis or a db like mongodb could be used.
Here the current target (and the old target, for creating earlier mentioned smooth target changes) could be stored, along with all tasks that have to be handled to achieve this desired target.
This should be relatively easy to set up and also a "voting process" for the current target could be possible - if even necessary (every swarm member checks the current target of the target provider and the target that is determined by most of the swarm members is set as the vote outcome).
But now we face the problem already mentioned in the pros for the managed system, I currently cant think of a way the failure of a swarm member can be recognized and managed by the swarm consistently.
How should a failure be determined without a constant exchange between swarm members, which i think should be avoided because of the:
swarms should operate entirely target driven and interact with each other as litte as possible
kubernetes itself isn't really designed to have easy intra service communication
Every contribution, idea or further question here helps.
My tech stack would be but isn't limited to:
Java with Micronaut for the application
Grpc as the only exchange protocol
Kubernetes as the orchestrator
Since you're on the JVM, you could use Akka Cluster to take care of failure detection between the pods (even in Kubernetes, though there's some care needed with service meshes to exempt the pod-to-pod communications from being routed through the mesh) and use (as one of many possibilities for this) Distributed Data's implementations of CRDTs to distribute state (in this case the target) among the pods.
This wouldn't require you to use Akka HTTP or Akka's gRPC implementations, so you could still use Micronaut for external interactions. It would effectively create a stateful self-organizing service which presents to Kubernetes as a regular stateless service.
If for some reason Akka isn't appealing, looking through the code and docs for its failure detection (phi-accrual) might provide some ideas for implementing a failure detector using (e.g.) periodic updates to a DB.
Disclaimer: I am employed by Lightbend, which provides commercial support for Akka and employs or has employed at some point most of the contributors to and maintainers of Akka.
Given a PostgreSQL database that is reasonably configured for its intended load what factors would contribute to selecting an external/middleware connection pool (i.e. pgBouncer, pgPool) vs a client-side connection pool (HikariCP, c3p0). Lastly, in what instances are you looking to apply both client-side and external connection pooling?
From my experience and understanding, the disadvantages of an external pool are:
additional failure point (including from a security standpoint)
additional latency
additional complexity in deployment
security complications w/ user credentials
In researching the question, I have come across instances where both client-side and external pooling are used. What is the motivation for such a deployment? In my mind that is compounding the majority of disadvantages for a gain that I appear to be missing.
Usually, a connection pool on the application side is a good thing for the reasons you detail. An external connection pool only makes sense if
your application server does not have a connection pool
you have several (many) instances of the application server, so that you cannot effectively limit the number of database connections with a connection pool in the application server
What are the impacts of bluemix auto-scaling in terms of resource management. For example if a runtime is specified with 1 GB of memory and auto-scaling is set to 2 instances, does the application consume 2 GB?
Same question for the disk allocated for the runtime?
Are logs from the various instances combined automatically?
If an instance is currently serving a REST request (short), how does Auto-Scaling make sure that the request is not interrupted while being served?
When you say, "a runtime is specified with 1 GB of memory and auto-scaling is set to 2 instances" I assume that you set your group/application up such that each instance is given 1 GB of memory and you are asking what will happen if the Auto-Scaling service scales up your group/application to 2 instances.
Memory/Disk
For example if a runtime is specified with 1 GB of memory and auto-scaling is set to 2 instances, does the application consume 2 GB? Same question for the disk allocated for the runtime?
Yes, your application will now consume 2 GB of your total memory quota. The same applies for disk allocation.
The Auto-Scaling service will deploy a new instance with the same configuration as your existing instances. If you've set up your group/application such that each instance gets 1 GB of memory, then when Auto-Scaling increases your group's instance count from 1 to 2 your application will now consume 2 GB of memory, assuming that adding another GB doesn't go beyond your memory quota. The same applies with disk allocation and quota.
Logs
Are logs from the various instances combined automatically?
Yes, the logs are combined automatically.
Cloud Foundry applications combine logs as well. For more information about viewing these logs check out the documentation.
The IBM Containers service sends logs to IBM's Logmet service. For more information check out the documentation.
Handling REST requests without interruption
If an instance is currently serving a REST request (short), how does Auto-Scaling make sure that the request is not interrupted while being served?
Adding an instance to the group/application: no interruption
If an instance is being added to the group then there will be no interruption to existing requests because any previously existing instances are not touched or altered by the Auto-Scaling service.
Removing an instance from the group/application: possible interruption
At this time, the Auto-Scaling service does not support protecting ongoing requests from being dropped during a scale down operation. If the request is being processed by the instance that is being removed, then that request will be dropped. It is up to the application to handle such cases. One option is your application could store session data in external storage to allow the user to retry the request.
Additional Information
There are currently two different Auto-Scaling services in Bluemix:
Auto-Scaling for Cloud Foundry applications exists in all Bluemix regions and is available as a service you bind to your existing Cloud Foundry application.
Auto-Scaling for Container Groups currently is available as a beta service for the London region in the new Bluemix console.
The answers to your questions above are applicable to both services.
I hope this helps! Happy scaling!
We are transitioning from building applications on monolith application servers, to more microservices oriented applications on Spring Boot. We will publish health information with SB Actuator through HTTP or JMX.
What are the options/best practices to monitor services, that will be around 30-50 in total? Thanks for your input!
Not knowing too much detail about your architecture and services, here are some suggestions that represent (a subset of) the strategies that have been proven in systems i've worked on in production. For this I am assuming you are using one container/VM per micro service:
If your services are stateless (as they should be :-) and you have redundancy (as you should have :-) then you set up your load balancer to call your /health on each instance and if the health check fails then the load balancer should take the instance out of rotation. Depending on how tolerant your system is, you can set up various rules that define failure instead of just a single failure (e.g. 3 consecutive, etc.)
On each instance run a Nagios agent that calls your health check (/health) on the localhost. If this fails, generate an alert that specifies which instance failed.
You also want to ensure that a higher level alert is generated if none of your instances are healthy for a given service. You might be able to set this up in your load balancer or you can set up a monitor process outside the load balancer that calls your service periodically and if it does not get any response (i.e. none of the instances are responding) then it should sound all alarms. Hopefully this condition is never triggered in production because you dealt with the other alarms.
Advanced: In a cloud environment you can connect the alarms with automatic scaling features. In that way, unhealthy instances are torn down and healthy ones are brought up automatically every time an instance of a service is deemed unhealthy by the monitoring system
We have a REST web service that receives requests from external systems and makes updates to our DB accordingly. I'm looking to implement a caching/queuing solution for the requests that come in, as we've had some DB server challenges lately, and have lost some messages when the DB server went down.
Before I start putting together a simple persistent file-based queue, I'm wanting to see if there are any good alternatives to JMS as it's use is restricted in our environment.
Current platforms:
Jboss 4.3
Richfaces 3.3
Spring 3.0.5
RESTEasy
** UPDATES **
Per skaffman's question below, my requirements for clustering, transactions, etc.
Clustering: Our web and app servers are all clustered, so the queue(s) will need to be able to process items from all cluster nodes. However, our commits are essentially atomic, so ordering and synchronization issues are extremely minimal. Thread and cluster-safety is not really a factor. Separate/Independent queues on each cluster would be sufficient.
Transactions: Again, due to the atomic nature of our data, transactional needs are minmal/not required outside of each individual request.
Security: Moderate concern, but I would anticipate that to be handled by our regular security on the Web Service. I wouldn't anticipate anything reading or writing to the queue(s) other than the web-app itself. That would only be necessary in instances of high volume or when the DB is unavailable.
Thanks,
Mike
For one project we did use a queue (HornetQ) but was integrated in the war and deployable on a Tomcat because the customer did not want Weblogic or JBoss application servers, but if your restricting policy goes to your application architecture as well such solution would be forbidden.
For another project we did not use any JMS implementation and we make the asynchronous implementation by using a message database and the Service Activator of the spring-integration framework for consuming the events.
That way any message publisher just insert a row in a DB table and the Service Activator trigs the event and call any other service (Spring, Web-service, etc...).