Is it possible to share data across Service Fabric partitions using Reliable Collections?
What would be the best approach to run arbitrary number of instances of a CPU/network -bound service that needs to share a small amount of data to be used for custom partitioning algorithm?
Reliable Collections themselves don't share state across partitions, no. But there are a couple ways you can share data depending on the nature of that data:
If the data you need to share is "dynamic" meaning it can change at runtime (e.g., due to user input), then you'd need to encapsulate that data in a separate service of its own, and provide an API for other services to access it. This would be accessible by any other service or application.
If the data you need to share is "static" meaning it doesn't change at runtime, then you can include it in the service as a data package or config package. These packages can be updated individually and separately from the service code without stopping or restarting the service. The same data/config package is available to all partitions of a service, but it is not directly accessible to other services or applications.
Related
I want to build an application. The application will have three microservice like content-service1, content-service2, content-service3. Also, each microservice will have its own database. And the application will have a load balancer, Load balancer's mission is distribution (the first request will go to the first container, the second request will go to the second container...). The question is coming... How can ı provide consistency between different databases? I looked at some topics like partitioning, eventual consistency, saga... But I don't understand. Are these the right solution?
system design that ı want image
It looks like you are mixing up microservices and application instances.
If content-service1, content-service2 and content-service3 is backed by the same application (same code) you only have one application (one service).
If you want to have high availability you surely need multiple instances of the same application (simply the same application run 3 times on different servers).
In this case, you don't need to have a database per application instances because all of them will be connected to the same database and you won't have inconsistency issues.
If you don't want to have a single point of failure you should also need to ensure you can have redundancy for your database. Depending on the provider you use, you might have master/slaves or multi-master topology, data replication between database nodes will probably be handled by the database itself.
Microservice is a solution to split big applications in smaller ones. Each microservice will store it's own data in a dedicated database. Microservices are application like others, they can themselves have multiple instances for high availability.
I'm writing a first Azure Service Fabric app applying partitioning to stateful services. I have a few questions:
Can I use remoting instead of HTTP to communicate from my web api to my partitions. The Azure example uses HttpCommunicationListener and I've not been able to see how to use remoting. I would expect remoting would be faster?
Can I persist my state for a given partition using a custom state persistence provider? Will that still be supported by the replication features of service fabric?
Can my stateful service partition save several hundred megabytes of state?
Examples/guidance pointers for above would be greatly appreciated.
Thanks
You can use SF remoting within the cluster, to communicate between services and actors. Http access is usually used to communicate to services from outside the cluster. (but you can still use it from within)
Yes, you can do that by implementing custom IStateProviderReplica2 and likely the serializer. But be aware that this is difficult. (Why would you require this?)
Stateful service storage capacity is limited by disk and memory. (calculation example behind the link)
Reliable services are typically partitioned, so the amount you can
store is only limited by the number of machines you have in the
cluster, and the amount of memory available on those machines.
--- extra info concerning partitioning---
Yes, have a look at this video, the start of it is about how to come up with a partitioning strategy.
The most important downside of 'partition per user' is that the #of partitions cannot be changed without recreating the service. Also, it doesn't scale. And the distribution of data is off balance.
Here is the background:
We have a cluster (of 3) different services deployed on various containers (like Tomcat, TomEE, JBoss) etc. Each of the services does one thing. Like one service manages a common DB and provides REST services to CRUD the db. One service puts some data into a JMS Queue, Another service reads from the Queue and updates the DB. There is a client app that makes a REST service call to one of the service that sets off creating a row in the db, pushing that row into a queue etc.
Question: We need to implement the client app so that we know at any given point in time where the processing is. How do I implement this in RcJava 2/Java 9?
First, you need to determine what functionality in RxJava 2 will benefit you.
Coordination between asynchronous sources. Since you have a) event-driven requests from one side, and b) network queries on the other sides, this is a good fit so far.
Managing a stream of data, transforming and combining from one or more sources. You have given no indication that this is required.
Second, you need to determine what RxJava 2 does not provide:
Network connections. This is provided by your existing libraries.
Database management. Again, this is provided in your existing solutions.
Now, you have to decide whether the firstlies add up to something you can benefit from, given the up-front costs of learning a new library.
We're working on a backend project and we've started a move to microservices development. We already have a few services in place, one of which is a FileService which stores and fetches files (using underlying Amazon S3 storage). The FileService also provides file checksum, authentication and retry mechanism and is used to share files across services and with the clients.
We are now building a new service and part of this service's private data are files that the service stores and uses for its business logic, and we have a dilemma of whether we should use the FileService to store and fetch the files or handle the storage and fetching of the files internally in the service.
The reason to use the FileService is we're getting all the features implemented in the service for free (retry, checksum etc).
The reason not to use it is we want the new service to be able to work autonomously and using the FileService ties the new service to it (it must handle OAuth2 authentication to fetch/upload files, it must deploy the FileService and the AuthService whenever this services is deployed etc).
I wanted to know if someone has best practices for storing private files in a microservices environment, and what is the best approach to it with the pros and cons.
Converting in-process FileService component to microservice will definitely have advantages as well as disadvantages. You've listed several of them, but most importantly you have to create a cost/benefit analysis matrix applicable to your business and domain specifically.
There is no "best practices" approach here.
Costs:
is it okay for you to increase response times? Because now you will have to transfer files twice: s3 -> fs microservice -> client microservice
how more likely situation of losing a connection between nodes becomes?
how big your files are? The unreliable connection between microservices may become a problem?
how frequently do you need to access those files? Maybe you will lose the ability to have local cache to speed up the process?
are you okay with implementing and supporting separate auth microservice or you can just whitelist this service in your firewall
Benefits:
you don't have to redeploy all dependent components every time the logic of storing files or doing retries changes.
you can move to another cloud provider more easily in the future if necessary, again, without redeploying everyting.
it is reusable in a heterogeneous environment, where other components may be implemented using different technological stacks
Conclusion:
There is no way to answer those questions without actually talking with business people and discussing risks around such transition.
So I am doing some research into using Service Fabric for a very large application. One thing I need to have is a service that is partitioned by name, which seems fairly trivial at the application manifest level.
However, I really would like to be able to add and remove named partitions on the fly without having to republish the application.
Each partition represents our equivalent of a tenant, and we want to have a backend management app to add new tenants.
Each partition will be a long-running application that fires up a TCP server that uses a custom protocol, and I'll need to be able to query for the address by name from the cluster.
Is this possible with Service Fabric, and if so is there any documentation on this, or something I should be looking for?
Each partition represents our equivalent of a tenant, and we want to have a backend management app to add new tenants.
You need to rethink your model. Partitioning is for distributing data so it accessible fast, for read and write. But within the same logical container.
If you want to do some multitenant in Service Fabric you can deploy an Application multiple times to the cluster.
From Visual Studio it seems you can only have one instance of an Application. This is because in the ApplicationManifest.xml there are DefaultServices defined. This is okay for developing on the local Service Fabric cluster. For production you might want to consider deploying the application with powershell, this will open up the possibility to deploy the same application multiple times with settings for each instance(like: tenant name, security, ... )
And not only Applications can be deployed multiple times, stateful/stateless services as well. So you could have one application and for each tenant you deploy a service of a certain type. Services are findable via the naming service inside Service Fabric, see the FabricClient class for more info on that.
It is not possible to change the partition count for an existing application.
From https://azure.microsoft.com/en-us/documentation/articles/service-fabric-concepts-partitioning/#plan-for-partitioning (emphasis mine):
In rare cases, you may end up needing more partitions than you have initially chosen. As you cannot change the partition count after the fact, you would need to apply some advanced partition approaches, such as creating a new service instance of the same service type. You would also need to implement some client-side logic that routes the requests to the correct service instance, based on client-side knowledge that your client code must maintain.
You are encouraged to do up-front capacity planning to determine the maximum number of partitions you will need - and if you end up needing more, you'll need to implement some special client side handling to cope.
We had the same problem and ended up creating an instance of the service for each tenant. This is pretty easy to do and will scale to any number of tenants.