Do you know a simple, backend, performant process mgt lib/component for processing data by sending it through a sequence of ai models? - workflow

my team needs to decide if we go for buy or create on a production data processing workflow framework. The service needs to be performant, receives documents (like images, pdfs) and processes the document by sending it to a predefined workflow definition (orchestration should not be expensive in terms of performance). Additionally we would like to deploy it on AKS and Models are running in docker containers. We checked and used Kubeflow - but it seems not ready for production processes and with airflow we have similar doubts. All we need is a framework that allows various processes by defining the order of the calls to our models in docker containers (sequentially and parallel) in a config file. Anyone who has experience and can guide us to the right (and hopefully simple) framework / tool? Anyone who has such a component in prod live?

Related

How to build microservice applications right? (backend, frontend, database)

I am developing monolithic applications since many years and want to try microservices and containers now.
For learning microservices and containers I am planning a small calendar application (as a web application), where you can create dates and invite others to them. I identified three services that I have to implement:
Authentication service -> handles login, gives back a JWT
UserData services -> handles registration, shows user data, handles edits of user data (profile picture, name, short description)
Calendar service -> for creating, editing and deleting dates, inviting others to dates, viewing dates and so on.
This is the part I feel confident about, but if I did already something wrong, please correct me.
Where I am not sure what to do is how to implement the database and frontend part.
Database
I never used Neo4j or any graphbased database before, but they seem to work well for this use case and I want to learn something new. Maybe my database choice may not be relevant here, but my question is:
Should every microservice have its own database or should they access the same database?
Some data sets are connected, like every date has a creator and multiple participants.
And should the database run in a container or should it be hosted on the server directly? (I want to self-host the database)
For storing profile pictures I decided to use a shared container volume, so multiple instances of a service can access the same files. But I never used containers before, so if you have a better idea, I am open to hear it.
Frontend
Should I build a single (monolithic) frontend application, or is it useful to build some kind of micro frontend, which contains only a navigation and loads other parts like the calendar view or user data view from the above defined services? And if yes, should I pack the UserData frontend and the UserData service into one container?
MVP
I want the whole application to be useable by others, so they can install it on their own servers/cloud. Is it possible to pack the whole application (all services and frontend) into one package and make it installable in kubernetes with a single step, but in a way, that each service still has its own container? In my head this sounds necessary, because the calendar service won’t work without the userdata service, because every date needs a user who creates it.
Additional optional services
Imagine I want to add some additional but optional features like real time chat. What is the best way to check if these features are installed? Should I add a management service which checks which services exist and tells the frontend which navigation links it should show, or should the frontend application ping all possible services to check if they are installed?
I know, these are many questions, but I think they are tied together, because choices on one part can influence others.
I am looking forward for your answers.
I'm further along than you but far from "microservice expert". But I'll try my best:
Should every microservice have its own database or should they access the same database?
The rule of thumb is that a microservice should have its own database. This decouples them, making it so that your contract between services is just the API. Furthermore it keeps the logic simpler within a service in that there's less types for data that your service is responsible for handling.
And should the database run in a container or should it be hosted on the server directly?
I'm not a fan of running a database in a container. In general, a database:
can consume a lot of resources (depending on the query)
sustains long-lived connections
is something you vertically scale, rather than horizontally
These qualities reflect a poor case for containerization, imho.
For storing profile pictures I decided to use a shared container volume
Nothing wrong with this, but using cloud storage like Amazon S3 is a popular move here, just so you don't have to manage the state of the volume. i.e. if the volume goes away, do you still want your pictures around?
Should I build a single (monolithic) frontend application?
I would say yes: this is would be the simplest approach. The other big motivation for microservices is to allow separate development teams to work independently. If that's not the case here, I think you should avoid yourself the headache for micro-frontends.
And if yes, should I pack the UserData frontend and the UserData service into one container?
I'm not sure I perfectly understand here. I think it's fine to have the frontend and backend service as part of the same codebase, which can get "deployed" together. How you want to serve the frontend is completely up to you and how it's implemented. Some like to serve it through a CDN to minimize latency, but I've never seen the need.
Is it possible to pack the whole application (all services and frontend) into one package and make it installable in kubernetes with a single step?
This is use-case for Helm charts: A package manager for kubernetes.
Imagine I want to add some additional but optional features like real time chat. What is the best way to check if these features are installed? Should I add a management service which checks which services exist and tells the frontend which navigation links it should show, or should the frontend application ping all possible services to check if they are installed?
There's a thin line between what you're describing and monitoring. If it were me, I'd pick the service that radiates this information and just set some booleans in a config file or something. That's because this state is isn't going to change until reinstall, so there's no need to check it on an interval.
--
Those were my thoughts but, as with all things architecture, there's no universal truth. Some times exceptions are warranted depending on your actual circumstances.

Can I deploy multiple front-end apps (web/mobile) with 1 back-end on the same server?

I need some help with deciding on the architecture of my project (a web app for unlocking discounts). I am first planning on creating the website (React for the front-end & Django for the back-end, PostgreSQL database). In the future, I may create a mobile app too for Android & iOS (unsure what front-end framework yet).
So I have decided I want the front-end and back-end to be completely separated so the back-end is a REST api. This will allow me to not have to create multiple back-ends for mobile apps.
But, after researching, I have found that this could be quite expensive in terms of server costs. This is a new business and I am the only developer so funding isn't high. So I was thinking that I could deploy the front-end & back-end on the same server but as separate apps that talk via nginx?
I have 4 questions about this:
If I do this, would it still be possible to reuse the back-end as a REST api for the mobile apps or is that a no because it's linked to the web front-end?
If it is possible, would I be able to host the mobile front-end in the same server (so have everything hosted on 1 server)?
Is this a stupid idea - would I just be better off deploying everything into separate servers in the long-run (to reduce load)?
Should I just worry about this in the future? And for now just deploy the separated web front-end & back-end to the same server.
I have never really deployed anything into a real life production environment so I'm sorry if my questions seem silly. I haven't started development yet but I want to think about scalability & future extensibility before I start. Thank you.
Nowadays I'd go with a serverless approach. Instead of having servers to maintain you can focus on your app functionalities.
There are a lot of options. You can check, for example, AWS Amplify (https://aws.amazon.com/amplify/) or Netlify (https://www.netlify.com/) for a more "full-stack" approach.
In AWS, you also can keep separated projects, having your backend in lambdas and your frontend served through S3 + CloudFront. You also don't have servers to care about.
There are only examples of how you can solve your problem without servers, but answering your questions:
You can reuse your APIs regardless of the way your app is deployed. It will be more related to how you designed them;
Yes, you can host everything in a single server if you want, but I really don't recommend that;
If you don't want to pay for 24/7 servers, you can go for a serverless approach;
As I told you before, you can do what you want without worrying about servers.
Your main point of focus is to keep the cost lower and to implement a good solution also. My suggestion would be to look for AWS Lightsail. Lightsail offers fixed price VM which you can configure yourself, and it starts from $3.5 / month at the time of writing this answer.
My answers to your questions
If I do this, would it still be possible to reuse the back-end as a REST api for the mobile apps or is that a no because it's linked to the web front-end?
Yes, it's possible. Keep the frontend and backend in different repo, and you can deploy it as docker instances on the same server. You will have 1 frontend docker container and 1 backend docker container, and they can communicate with each other.
If it is possible, would I be able to host the mobile front-end in the same server (so have everything hosted on 1 server)?
For mobile, you will develop a mobile application which you can publish to playstore or deploy to smartphone. Your app can then call the backend service and get the JSON in response. So you have to design your backend in such a way that it can serve data to both requests.
Is this a stupid idea - would I just be better off deploying everything into separate servers in the long-run (to reduce load)?
For long term and design perspective, you need to consider factors like scalability, maintainability, security etc.., so its always better to have multiple server to avoid single point of failure.
Should I just worry about this in the future? And for now just deploy the separated web front-end & back-end to the same server.
My advice to you will be to think carefully now, so you don't get nightmares in the future. Invest your time now and design a stable solution which could help you in long-term. As you mentioned that its a small business, but your solution should be able to easy handle growth.
My suggestion
As suggested by the Paulo, S3 + CloudFront looks good for frontend. You can get 1 year free CDN using Lightsail.
For Backend, you should at least have 2 (I will suggest minimum 3) servers and deploy backend docker containers. You can use docker compose to automate the deployment. If you want to orchestrate then Docker Swarm Mode is best. With this you will avoid single point of failure. You can get very affordable servers from Amazon Lightsail
For database, you need to make it scalable. To ensure scalability and High Avalability we should have replicated DB. Minimum 3 DB instances will be good starting point. MongoDB is a good choice. With simple configuration you can enable DB replication. 1 Master 2 slaves instances.
1 Load-balancer in front of your servers to distribute the load. To save the cost you can configure the Load-balancer yourself but this will add learning curve and you will have to spent time and understanding the details. The better solution is to use a managed load balancer. Lightsail offers Load Balancer for $18 / month at the time writing this answer.
The above mentioned solution is cost-effective and will give you long-term benefit and also you can estimate the cost based on your solution.
Obviously, this can still be improved but I tried to cover the necessary aspects of the question asked.

Making DB Query as a Microservice API

We are currently using direct DB connection to query mongodb from our scripts and retrieve the required data.
Is it advisable / best practice to make the data retrieval from DB as a microservice.
It does until it doesn't :)
A service needs to get its data from somewhere and a database is a good start. If you have high loads you may find that you need to add a cache in the middle see this post from Instagram engineering https://instagram-engineering.com/thundering-herds-promises-82191c8af57d
edit (after comment)
generally speaking, a service should own its database and other services shouldn't access another database service directly only via its API. The idea is to keep services autonomous and enable them to evolve independently.
Depending on the size of microservice, that's now always practical since it can make the overhead of having the service be more of the utility it provide (I call this nanoservices). Also, if you have a lot of services you don't want to allow each one to talk to any other (even not via the DB) since you just get a huge mess. The way I see it there should be clear logical boundaries (services or microservices) and then within each such logical services you may find that it makes sense to have more than one "parts" (which I call aspects) e.g. they have different scaling needs or different suitable technologies etc. When you set things this way aspects can access the same database and services shouldn't (and you can still tame the chaos :) )
One last thing to think about - who said API is only a REST API, you can add views on top of the data that belongs to another service and as long as you treat that like an API (security, versioning etc.) you can have other services access that as well

How to share versioned data within a pod

We are currently serving around 140 webapps created by a bunch of different web agencies. The setup is the usual LEMP stack.
A 1.2 k8s cluster has been installed to migrate them as micro-services.
The problem we are facing is about serving static and dynamic content.
For this purpose we use, of course, two different containers (nginx and php-fpm) but we can't find an adequate solution to share data on both of them.
We hoped to be able to use versioned data containers but it is apparently not in the scope of k8s. Too bad.
gitRepo is not an option as we don't want to be dependent of a working git infra to instance pods. If it doesn't work we want to be autonomous and be able to serve traffic.
The other options (flocker, etc.) look heavy and complex in comparison to a simple data container. We also would like to be independent of data storage.
Is there an option I am not aware of? Does anyone have an advise on this?
Let me emphasise that we want to be able to version things in order to roll forward / backward easily.
Thank you for your time

OrientDB in Azure

We would like to use OrientDB Graph in an Azure environment. Does anybody has experience using it? We also would like to know if high availability from OrientDB is required under Azure cloud? Azure already offers high availability for Azure storage, Azure Drive and SQL. I understand that they have replications and load balancing built in.
This is super important because we prefer not to get into the business of replications and infrastructure management.
Thanks
So you can spin up 2 or more machines and install OrientDB on them, then configure them together as a distributed cluster. However I haven't been able to find any way that is simpler, easier to do. I am interested in this topic too.
Azure does have features such as geo-replication, which is protects your data against a major data-center incident but doesn't provide any performance benefit and will not make it highly available.
Although pretty reliable, occasionally Microsoft will reboot servers for updates, so to protect against downtime you can use affinity groups so that, of your 2 or more servers, one will always be online. This however does need to be used in conjunction with database replication and ideally load balancing.
It's also worth noting that OrientDB recommends clusters have an odd number of servers as this can prevent conflicts when synchronising data after a communication issue between the servers.
I am using it in amazon and I had to create a java project to monitor http requests inserts and queries. The queries are very fast but takes longer inserting data .
I recommend this type of graph database mode to decrease the time of the queries. Also if you have empty fields OrientDB manages very well compared to other databases .
If you need help with the java project can response to this post and I´ll help u.
I hope it helps. Good luck.