High availability on Bluemix - ibm-cloud

I've seen several status updates on Bluemix saying that applications are being restarted and there will be issues logging in, e.g.
During this time, you might experience temporary errors logging into
Bluemix or managing applications, such as starting, staging, and so
on. If this situation occurs, retry the operation later. The latest
status will be available at http://ibm.biz/bluemixstatus throughout
the upgrade process.
Existing applications will see a brief restart of instances, but near
continuous availability is expected.
Is it possible then to build a high-availability application on Bluemix?

IBM Bluemix supports deploying applications in multiple different regions.
Minimising downtime during platform issues can be achieved by hosting your application in multiple regions simultaneously and using an external load-balancer to move traffic between the instances depending on availability.
Replicating application data between regions will be dependent on the individual services you're using. For example, Cloudant supports multi-master replication, allowing you to failover without any manual intervention.

Related

Service Fabric Strategies for Bi-Directional Communication with External Devices

My company is interested in using a stand-alone Service Fabric cluster to manage communications with robots. In our scenario, each robot would host its own rosbridge server, and our Service Fabric application would maintain WebSocket clients to each robot. I envision a stateful service partitioned along device ids which opens connections on startup. It should monitor connection health via heartbeats, pass messages from the robots to some protocol gateway service, and listen to other services for messages to pass to the robots.
I have not seen discussion of this style of external communications in the Service Fabric documentation - I cannot tell if this is because:
There are no special considerations for managing WebSockets (or any two-way network protocol) this way from Service Fabric. I've seen no discussion of restrictions and see no reason, conceptually, why I can't do this. I originally thought replication would be problematic (duplicate messages?), but since only one replica can be primary at any time this appears to be a non-issue.
Service Fabric is not well-suited to bi-directional communication with external devices
I would appreciate some guidance on whether this architecture is feasible. If not, discussion on why it won't work will be helpful. General discussion of limitations around bi-directional communication between Service Fabric services and external devices is welcome. I would prefer if we could keep discussion to stand-alone clusters - we have no plans to use Azure services at this time.
Any particular reason you want SF to host the client and not the other way around?
Doing the way you suggest, I think you will face big challenges to make SF find these devices on your network and keep track of them, for example, Firewall, IPs, NAT, planned maintenance, failures, connection issues, unless you are planning to do it by hand.
From the brief description I saw in the docs your provided about rosbridge server, I could understand that you have to host it on a Server(like you would with a service fabric service) and your devices would connect to it, in this case, your devices would have installed the ROS to make this communication.
Regarding your concerns about the communication, service fabric services are just executable programs you would normally run on your local machine, if it works there will likely work on service fabric environment on premise, the only extra care you have to worry is the external access to the cluster(if in azure or network configurations) and service discovery.
In my point of view, you should use SF as the central point of communication, and each device would connect to SF services.
The other approach would be using Azure IoT Hub to bridge the communication between both. There is a nice Iot Hub + Service Fabric Sample that might be suitable for your needs.
Because you want to avoid Azure, you could in this case replace IoT Hub with another messaging platform or implement the rosbridge in your service to handle the calls.
I hope I understood everything right.
About the obstacles:
I think the major issue here is that bi-directional connection can be established between service replica and the robot.
This has two major problems:
Only primary replica has write access - i.e. only one replica would be able to modify state. This issue hence could be mitigated by creating a separate partition for each robot (but please remember that you can't change partition count after the service was created) or by creating a separate service instance for each robot (this would allow you to dynamically add or remove robots but would require additional logic related to service discoverability).
The replica can be shutdown (terminated), moved to another node (shutdown and start of new replica) or even demoted (the primary replica get's demoted to secondary and another secondary replica get's promoted to primary) by various reasons. So the service code and robot communication code should be able to handle this.
About WebSockets
This looks possible by implementing custom ICommunicationListener and other things using WebSockets.

Mongo database in GCP app engine

I'm currently looking into GCP app engine and I was figuring out how I would deploy a very large application with multiple services. I also wanted to use mongodb. GCP docs say that app engine allows dockerfiles and images. What would happen if I used the mongo docker image as a service on app engine? How would it scale it's instances? What will happen to consistency? I'm aware GCP have a third party solution for mongo, but since they allow docker images, what stops me from using it?
App Engine routinely tears down and creates new instances. If your instance is running MongoDB, then all the data stored in that instance will be lost.
This is why Google Cloud offers other, permanent places to store state, like Datastore and CloudSQL. You can also run MongoDb yourself on Google Compute Engine.
What would happen if I used the mongo docker image as a service on app engine?
Flexible App Engine allows you to use docker images to build your own application, as per is mentioned on this document [1]: "App Engine flexible environment instances are Compute Engine virtual machines, which means that you can take advantage of custom libraries, use SSH for debugging, and deploy your own Docker containers."
So there is no problem to use your own docker image in flexible app Engine.
How would it scale it's instances?
Each active version in App Engine must have at least one instance to handle requests, there are two ways to scale the instance in App Engine: automatic and manual.
As per is mentioned on the document[2]:
Automatic scaling creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, as well as a minimum number instances to keep running at all times.
Manual scaling specifies the number of instances that continuously run regardless of the load level. This allows tasks such as complex initializations and applications that rely on the state of the memory over time.
The way you can configure these features is through the app.yaml file, I suggest you read this document[3]
What will happen to consistency?
Since App Engine scaling can be configured depending on its load, this allows for good performance in service execution and provides consistency in operations and optimization of resources.
[1] https://cloud.google.com/appengine/docs/flexible#features
[2] https://cloud.google.com/appengine/docs/flexible/go/how-instances-are-managed#instance_scaling
[3] https://cloud.google.com/appengine/docs/flexible/go/configuring-your-app-with-app-yaml

difference between dcos-kafka-service and mesos-kafka

I’m doing a POC to deploy Kafka as an application on Mesos Cluster. I came across these 2 codebases on github. One developed by apache-mesos (github page) & other developed by mesosphere and can run only on DCOS (github page).
Question: Would like to know if there are any differences between DCOS-Kafka & mesos-Kafka in terms of features and extended functionality.
Regarding Mesos-Kafka:
I don’t see active participation on github (and some open issues) for mesos-kafka in the past months. Can I assume that the service is robust enough that I can use in production environment? Any Inputs on this would be helpful.
kakfa-mesos is a package that includes a release of Kafka and a custom mesos scheduler that was meant to work around issues with running Kafka as a stateful service on Marathon. I think post but confluent is useful. It also includes a RESTful api for doing ops tasks and aims to include these features in the future (this is pulled from the article I linked)
Integrating Kafka commands (e.g. kafka-topics, etc) into the Scheduler so it can be used through the CLI and REST API.
Auto-scaling clusters (including auto reassignment of partitions) so that the resources (CPU, RAM, etc.) that brokers are using can be used elsewhere in known valleys of traffic.
Rack-aware partition assignment for fault tolerance.
Hooks so that producers and consumers can also be launched from the Scheduler and managed with the cluster.
Automated partition reassignment based on load and traffic
I haven't used it in a production environment myself but it has the support of Confluent which is a good sign.
DC/OS Kafka on the other hand is a DC/OS service which will probably only be useful if you are already running or plan on running services through Mesosphere's DC/OS. It also includes an API and a CLI management tool but is less ambitious with additional features. It's current feature set includes
Single-command installation for rapid provisioning
Multiple clusters for multiple tenancy with DC/OS
High availability runtime configuration and software updates
Storage volumes for enhanced data durability, known as Mesos Dynamic * * Reservations and Persistent Volumes
Integration with syslog-compatible logging services for diagnostics and troubleshooting
Integration with statsd-compatible metrics services for capacity and performance monitoring

How to use kafka and storm on cloudfoundry?

I want to know if it is possible to run kafka as a cloud-native application, and can I create a kafka cluster as a service on Pivotal Web Services. I don't want only client integration, I want to run the kafka cluster/service itself?
Thanks,
Anil
I can point you at a few starting points, there would be some work involved to go from those starting points to something fully functional.
One option is to deploy the kafka cluster on Cloud Foundry (e.g. Pivotal Web Services) using docker images. Spotify has Dockerized kafka and kafka-proxy (including Zookeeper). One thing to keep in mind is that PWS currently doesn't support apps with persistence (although this work is starting) so if you were to go this route right now, you would lose the data in kafka when the application is rolled. Looking at that Spotify repo, it looks like the docker images are generally run without any mounted volumes, so this persistence-less kafka seems like it may be a valid use case (I don't know enough about kafka to say).
The other option is to deploy kafka directly on some IaaS (e.g. AWS) using BOSH. BOSH can be hard if you're seeing it for the first time, but it is the ideal way to deploy any distributed software that you want running on VMs. You will also be able to have persistent volumes attached to your kafka VMs if necessary. Here is a kafka BOSH release which may work.
Once you have your cluster running, you have two ways to integrate your Cloud Foundry applications with it. The simplest is just to provide it to your applications as a "user-provided service", which lets you flow kafka cluster access info to your apps. The alternative would to put a service broker in front of your cluster, which would be especially useful if you have many different people who will be pushing apps that need to talk to the kafka cluster. Rather than you having to manually tell people the access info each time, they can do something simple like cf bind-service SOME_APP YOUR_KAFKA_SERVICE. Here is a kafka service broker along with more info about service brokers in general.
According to the 12-factor app description (https://12factor.net/processes), Kafka should not run as an application on top of Cloud Foundry:
Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database.
Kafka is often considered a "distributed commit log" and as such carries a large amount of state. Many companies use it to keep all events flowing through their distributed system of micro services for a long (sometimes unlimited) amount of time.
Therefore I would strongly recommend to go for the second option in the accepted answer: Kafka topics should be bound to your applications in the form of stateful services.

Have applications in bluemix on multiple zones, if one zone is down, redirect immediately on the other

I have my website on Bluemix and all of yesterday they EU region was down. I want to know if it is possible to have another instance on US or Sydney and then, if one is down, automatically redirect to the next.
The platform doesn't have such a feature to automatically redirect to applications in other regions on error conditions. Applications in other regions are treated as separate applications.
Optimally, to handle rare conditions like the one this weekend, you can create a load balancer with something like NGINX or HAProxy outside of bluemix to direct to the best/available geography.
For example: https://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch
It has been necessary for IBM to re-start its Bluemix servers this weekend due to an urgent security patch. The IBM recommendation is to take advantage of the capability to have multiple application instances deployed in the different regions, as indicated in Ram's answer.
The maintenance phase in the EU-GB and Sydney regions is now complete. It is ongoing for the US region. For the latest updates and details on this maintenance, check http://ibm.biz/bluemixstatus.
In order to integrate the Vennam response, you could create a load balancer in bluemix using containers (or VM) (of course this workaround doesn't work if containers are down) but you can install NGINX or HAProxy. You could also use Bluemix containers as environment test before moving your load balancer on outside server.