How to handle private files in a microservice - service

We're working on a backend project and we've started a move to microservices development. We already have a few services in place, one of which is a FileService which stores and fetches files (using underlying Amazon S3 storage). The FileService also provides file checksum, authentication and retry mechanism and is used to share files across services and with the clients.
We are now building a new service and part of this service's private data are files that the service stores and uses for its business logic, and we have a dilemma of whether we should use the FileService to store and fetch the files or handle the storage and fetching of the files internally in the service.
The reason to use the FileService is we're getting all the features implemented in the service for free (retry, checksum etc).
The reason not to use it is we want the new service to be able to work autonomously and using the FileService ties the new service to it (it must handle OAuth2 authentication to fetch/upload files, it must deploy the FileService and the AuthService whenever this services is deployed etc).
I wanted to know if someone has best practices for storing private files in a microservices environment, and what is the best approach to it with the pros and cons.

Converting in-process FileService component to microservice will definitely have advantages as well as disadvantages. You've listed several of them, but most importantly you have to create a cost/benefit analysis matrix applicable to your business and domain specifically.
There is no "best practices" approach here.
Costs:
is it okay for you to increase response times? Because now you will have to transfer files twice: s3 -> fs microservice -> client microservice
how more likely situation of losing a connection between nodes becomes?
how big your files are? The unreliable connection between microservices may become a problem?
how frequently do you need to access those files? Maybe you will lose the ability to have local cache to speed up the process?
are you okay with implementing and supporting separate auth microservice or you can just whitelist this service in your firewall
Benefits:
you don't have to redeploy all dependent components every time the logic of storing files or doing retries changes.
you can move to another cloud provider more easily in the future if necessary, again, without redeploying everyting.
it is reusable in a heterogeneous environment, where other components may be implemented using different technological stacks
Conclusion:
There is no way to answer those questions without actually talking with business people and discussing risks around such transition.

Related

Guidance on how to make micro-services communicate effectively

We are embarking on a new project development , where we will have multiple micro-services communicating each other to provide information in cloud native system. Our application will be decomposed into multiple services like Text Cleaner , Entities Extractor, Entities Resolver , Output Converter. As you can see in diagram we have some forking where input to one service in required by other service and so forth.
Only one service is going to be exposed outside. Others would be internal. And we have to provide synchronous response to clients.
I wanted to check if some one can guide me here to best patterns:
1- Should we have one Wrapper class which has model classes for all projects as one all of details is needed in final output convertors or how should the data flow so data is sorted out in last micro-service. We want to keep systems loosely coupled and are thinking about how orchestrate this flow without having a middle layer which composes all this data?
2- How to orchestrate this flow? Service Mesh / Api Gateway?
Looks like a workflow based solution.. When so many steps are involved ; the only response you can give to consumer is that request accepted.. and in background the process starts..You cannot let consumer wait for very long because they will get connection time out.
if all these services are deployed on different servers ( which should be the case for Micro services definition for scalability); you can communicate via HTTP or using some messaging solution like JMS or if u are deployed on cloud ; they give workflow based services..

Sample REST Observable service and a remote subscriber client in Java 9/RxJava 2

Here is the background:
We have a cluster (of 3) different services deployed on various containers (like Tomcat, TomEE, JBoss) etc. Each of the services does one thing. Like one service manages a common DB and provides REST services to CRUD the db. One service puts some data into a JMS Queue, Another service reads from the Queue and updates the DB. There is a client app that makes a REST service call to one of the service that sets off creating a row in the db, pushing that row into a queue etc.
Question: We need to implement the client app so that we know at any given point in time where the processing is. How do I implement this in RcJava 2/Java 9?
First, you need to determine what functionality in RxJava 2 will benefit you.
Coordination between asynchronous sources. Since you have a) event-driven requests from one side, and b) network queries on the other sides, this is a good fit so far.
Managing a stream of data, transforming and combining from one or more sources. You have given no indication that this is required.
Second, you need to determine what RxJava 2 does not provide:
Network connections. This is provided by your existing libraries.
Database management. Again, this is provided in your existing solutions.
Now, you have to decide whether the firstlies add up to something you can benefit from, given the up-front costs of learning a new library.

Sharing Reliable Collections across partitions

Is it possible to share data across Service Fabric partitions using Reliable Collections?
What would be the best approach to run arbitrary number of instances of a CPU/network -bound service that needs to share a small amount of data to be used for custom partitioning algorithm?
Reliable Collections themselves don't share state across partitions, no. But there are a couple ways you can share data depending on the nature of that data:
If the data you need to share is "dynamic" meaning it can change at runtime (e.g., due to user input), then you'd need to encapsulate that data in a separate service of its own, and provide an API for other services to access it. This would be accessible by any other service or application.
If the data you need to share is "static" meaning it doesn't change at runtime, then you can include it in the service as a data package or config package. These packages can be updated individually and separately from the service code without stopping or restarting the service. The same data/config package is available to all partitions of a service, but it is not directly accessible to other services or applications.

Caching in a Service oriented architecture

In a distributed systems environment, we have a RESTful service that needs to provide high read throughput at low-latency. Due to limitations in the database technology and given its a read-heavy system, we decided to use MemCached. Now, in a SOA, there are atleast 2 choices for the location of the cache, basically client looks up in Cache before calling server vs client always calls server which looks up in cache. In both cases, caching itself is done in a distributed MemCached server.
Option 1: Client -> RESTful Service -> MemCached -> Database
OR
Option 2: Client -> MemCached -> RESTful Service -> Database
I have an opinion but i'd love to hear arguments for and against either option from SOA experts in the community. Please assume either option is feasible, its a architecture question. Appreciate sharing your experience.
I have seen the
Option 1: Client -> RESTful Service -> Cache Server -> Database
working very well. Pros IMHO are that you are able to operate wtih and use this layer in a way allowing you to "free" part of the load on the DB. Assuming that your end-users can have a lot of similar requests and after all the Client can decide what storage to spare for caching. Also how often to clear it.
I prefer Option 1 and I am currently using it. In this way it is easier to control the load on the DB (just as #ekostatinov mentioned). I have lots of data that are required for every user in the system, but the data is never changed (such as some system rules, types of items, etc). It really reduces the DB load. In this way you can also control the behavior of the cache (such as when to clear the items).
Option 1 is the prefered option as it makes memcache an implementation detail of the service. the other option means that if the business changes and things can't be kept in the cache (or other can etc.) the clients would have to change. Option 1 hides all that behind the service interface.
Additionally option 1 lets you evolve the service as you wish. e.g. maybe later you think you need a new technology, maybe you'd solve the performance problem with the DB etc. Again, option 1 lets you make all these changes without dragging the clients into the mess
Is the REST ful API exposed to external consumers. In that case it is up to the consumer to decide if they want to use a cache and how much stale data can they use.
As for as the REST ful service goes, the service is the container of business logic and it is the authority of data, so it decides how much to cache, cache expiry, when to flush etc. A client consuming the REST service always assumes that the service is providing it with the latest data. And hence option 1 is preferred.
Who is the client in this case?
Is it a wrapper for your REST API. Are you providing both client and the service.
I can share my experience with Enduro/X middleware implementation. For local XATMI service calls any client process connects to shared memory (LMDB) and checks the result there. If there is response saved it returns data directly from shm. If data is not there, client process goes the longest path and performs the IPC. In case of REST access, network clients still performs the HTTP invocation, but HTTP server as XATMI client returns the data from shared mem. From real life, this technique was greatly boosting the web frontend web application which used middleware via REST calls.

Is a HTTP REST request the only way to access Azure Storage?

I've started reading about Azure Storage and it seems that the only way to access it is via an HTTP REST request.
I've seen that there are a few wrappers around these requests, for example, StorageClient (by Microsoft) and cloud storage api (http://cloudstorageapi.codeplex.com/), but they all still use REST in the background (to the best of my understanding).
It seems unreasonable to me that this is actually true. If I have a machine in Azure, and I want to access data stored in Azure Storage, it would seem every inefficient to
Yes, all storage calls are normalized to the REST API. Its actually very efficient when you consider the problem. You are thinking of a machine in Azure and data in azure as stored on two servers sitting in a rack. Remember in Azure, your data, your "servers", etc may be stored in different racks, different zones, and even different datacenters. With the REST API, your apps don't have to care about any of this. They just get the data with the URL.
So while a tiny HTTP overhead may appear inefficient if these were two boxes next to each other, its actually a very elegant solution when they are on different continents. Factor in concepts such as CDN, and it becomes an even better fit.
Layered onto this base concept is the Azure load balancer and other pieces of the internal infrastructure which can further optimize every request because they are all the same (HTTP). I also wouldn't be surprised (not sure at all, I dont work for MSFT) if the LB was doing traffic management optimizations when a request is made intra-datacenter.
Throughput on the storage subsystem in Windows Azure is pretty high. I'd be very surprised if the system cannot deliver to your needs.
There are also many design patterns to increase scalability of your app, like asynch processing, batching requests, delayed processing, etc.