Is it recommended to use containers for databases? [closed] - postgresql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have searched for information about it but still cannot find anything convincing.
I have multiple containerized websites with apache and php, these in turn are exposed through a reverse proxy with virtual hosts for each container, but, I've been thinking about the database, most use mariadb 5.5 but there is one site web required by mariadb 10.
I was wondering if it was a good idea for each container on each website to embed its own instance of mariadb or create a unique container for this, but I have some doubts.
Mariadb uses its own load balancing system, the container will affect its use if it had to raise multiple instances of the same database even though they all use the same data directory? I'm wondering if the engine will have to do the same indexing multiple times or there will be conflict in the use of files.
Having the website in a container has no problems because the files do not undergo changes and the logs and uploaded files are stored in persistent volumes, but in the case of the database it is different because I do not know if it is a good idea that multiple engines make use of the same data directory.
In a productive environment where the database has a high query load, is it recommended to use a container? Or is it better to embed the database inside the website container or do a native installation on the server?
In which cases should I choose one or the other option?

Absolutely do not have 2 databases share the same data directory. Only 1 database server should manage its own volume.
If you need more databases because you want high availability or are worried about load, each needs to manage its own data directory and sync with each other via replication.
I'd say that using containers for database engines is a bit uncommon outside of development setups, but not unheard of especially if you want to be able to scale fast. I don't think it's super easy to automate all of this.

Databases are critical services.
In my opinion, you shouldn't use use Docker for production databases.
But You wouldn’t think twice about using Docker in a local development environment

Related

Best architecture for Kafka consumer [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm creating an application (web application) that needs to consume data (update client transactions) from a Kafka broker, but I'm not sure whats the best way to approach this.
I can think of three different scenarios to process each update:
Install the Kafka consumer directly in my app, then I can just start another instance of it (I'm using docker, so another container) and make the required updates there (I think this is the fastest one).
Create a separate service that consumes from Kafka and make the required updates in the app database. It seems to be pretty much the same as option 1, but a smaller app and more maintenance (2 apps instead of 1).
Create a separate service that consumes from Kafka and sends the updates to a REST endpoint in my app. It seems this would be a tiny service, very specific and the process remains in the app; but the app will receive more requests.
So, which are the pros/cons of each solution? Are all of them valid or some of them are a complete no? What drawbacks/risks should I be aware of?
I'm not looking just for a recommendation, I am more interested in understanding which solution works best for a given scenario.
Thank you.
With 3 you are splitting your application into multiple services. When you distribute your code across multiple services, you increase the level of indirection. The more indirection you have in your codebase, the harder it is for one person to work across the entire codebase because they have to keep more things in their head, and working across network boundaries requires a lot more code than working across files, and finally it's harder to debug across a network API.
Now, this doesn't mean that it's bad to split your application into multiple services. Doing so will help you scale your application as you can scale only the pieces that need scaling. Perhaps more importantly, splitting your application into multiple services makes it easier for more people to work on the codebase at the same time, since they have to adhere to the API contracts between the services, and are less likely to be working on the same files at the same time.
So 3 is a good choice if you have scaling issues, either for load on your application, or the number of developers that will work on it.
1 is a good choice if you want to move as quickly as possible and can put off scaling concerns for some time.
2 is the worst of both worlds. Your two services will be coupled by the database schema and will be sharing the same database instance. The separation of code means that you have extra indirection, the database schema coupling means that you won't fully get the people scaling benefits, and since most applications are bottlenecked by the database, the sharing of the db instance will deprive you of scaling independently for performance.
Personal rule-of-thumb -
If you have control of the REST API code, then the first one.
If the API has specific validation before reaching the database, dont do the second one unless you plan on copying that code into the consumer. If you want to write directly to a database, then Kafka Connect is the suggested framework for that, not a plain consumer, anyway
If you dont control the API code (its a third-party API), then you are left with option 3

What is the scope of learning kubernetes? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I came across the word "Kubernetes" recently while searching for some online courses. I understood if I learn Kubernetes, I will learn about containers and stuff related to container orchestration and how easily we can scale the microservices. But I wanted to know after learning kubernetes is there any other thing to learn to become an expert in that line?
My question is more of the stream I can select if I learn this, as like learning Python or R will help you to become a data analyst or other data related stream?
I am very new this, really appreciate your help in understanding this
Thanks in advance
The main prerequisite for Kubernetes is Docker. Once you learn Docker, you learn how to package environments into containers and deploy them. Once you've learnt how to build docker images, you need to 'orchestrate' them. What does that mean?
That means, if you have a bunch of microservices (in the form of containers), you can spin up multiple machines and tell Kubernetes which image/container goes where and so you can orchestrate your app using Docker images (packaged environments) and then Kubernetes as the underlying resource provider to run these containers, and control when they are spun up/killed.
Assuming you don't have a massive cluster on-prem (or at home) Kubernetes on a single personal computer is rather useless. You would need to learn a cloud platform (or invest in a server) to utilise Kubernetes efficiently.
Once you learn this, you would possibly need to find a way for your containers to communicate with one another. In my opinion, the two most important things any amateur programmer needs to know are:
Message brokers
REST
Message brokers: Kafka, RabbitMQ (personal fave), Google Pub/Sub, etc.
REST: Basically sending/receiving data via HTTP requests.
Once all of this is done, you've learnt how to build images, orchestrate them, have them communicate with one another and use resources from other machines (utilizing the cloud or on-prem servers)
There are many other uses for Kubernetes, but in my opinion, this should be enough to entice you to learn this key-skill.
Kubernetes and Docker is the future, because it removes the need to worry about environments. If you have a docker image, you can run that image on Mac, Linux, Windows or basically any machine with a hypervisor. Increase portability, and decreases over-head of setting up environments each time. Also allows you to spin up 1 or 100 or 1000 or 10,000 containers (excellent for scalability!)
Yes, if you are looking to explore fully then security aspect can also be a thing you can learn and these days its in demand where various clients want to get security leaks checked at level of containers, containers registry and even at level of kubernetes also.
You can become DevSecOps with couple of certifications.
And pertaining to your later question I can't envisage anything because here you can just deploy containers and you can even deploy some python code there which is expected to collect some data from sensors and do some computations.
Please comment if something specific is your question

How to implement version control on Firebase? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm currently using Firebase as a prototyping tool to showcase a front end design for a documentation tool. In the process we've come to really like the real-time power of Firebase, and are exploring the potential to use it for our production instance of an open source/community version.
The first challenge is version control. Our legacy project used Hibernate/Envers in a Java stack, and we were previously looking at Gitlab as a way to move into a more "familiar" git environment.
This way?
Is there a way to timestamp and version control the data being saved? And thoughts on how to best recall this data without redesigning the wheel (e.g. any open source modules?)?
The real-time aspect of something like Firepad is great for documentation, but we require the means to commit or otherwise distinctly timestamp the state or save of a document.
Or?
Or is it best to use Firebase only for the realtime functionality, and implement Gitlab to commit the instance to a non-realtime database. In other words abstracting the version control entirely out to a more traditional relationship to a db?
Thoughts?
Both options you offer are valid, and feasible. In general, I'd suggest you to use firebase only as your real-time stack (data sync). And connect it to your own backend (gitlib or custom-db).
I've went that path, and find the best solution is to integrate your own backend db with firebase on top. Depend on firebase exclusively for everything, and you'll hit walls sooner or later..
The best solution is to keep full control on your data structure, security and access models, and use firebase where needed to keep clients in sync (online and offline). The integration is simple.

Techniques to manage updates to a postgresql database [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
The goal is to build a concise SQL script to alter/update tables since changes have been made to the schema between any two points in time.
For example, I develop on one machine and on Day "A" I used the dump & restore utilities to install a database on a production machine. Then on Day "B" after making some changes on my development machine and testing them, I need to get those changes to my schema onto my production server.
Short of writing every single command I make to my schema (some of which may be experimental and undone), what is a good way to manage upgrading a schema from point A to point B (or point B to point F for that matter)?
Update:
It seems that diff-like concepts for databases are very much frowned upon with good reason. So this leaves me with new questions.
What is a simple method to distinctly manage your experimental changes from your production-worthy changes? Just keep restoring your dev database to a last known good state when you do something unfavorable?
Can postgresql be configured to log all of your actions in a way that can be pulled out as used as an update script? The reason I ask is that I enjoy working with PgAdminIII, and I would rather use that to work than to write update scripts for building or experimenting.
Short of writing every single command I make to my schema
If you want to do it in a controlled and "professional" way, there is no way around that. You should consider using a schema management tool to help you organize and run those migration scripts:
Liquibase
Flyway
Our experience with Liquibase is very good. We use it for migrations on Oracle, DB2 and PostgreSQL.
For a Postgres specific solution you might want to have a look at Sqitch

Should I use redis to store a large number of binary files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I need to store huge amount of binary files (10 - 20 TB, each file ranging from 512 kb to 100 MB).
I need to know if Redis will be efficient for my system.
I need following properties in my system:
High Availability
Failover
Sharding
I intend to use a cluster of commodity hardware to reduce costing as much as possible. Please suggest pros and cons of building such a system using Redis. I am also concerned about high ram requirements of Redis.
I would not use Redis for such a task. Other products will be a better fit IMO.
Redis is an in-memory data store. If you want to store 10-20 TB of data, you will need 10-20 TB of RAM, which is expensive. Furthermore, the memory allocator is optimized for small objects, not big ones. You would probably have to cut your files in various small pieces, it would not be really convenient.
Redis does not provide an ad-hoc solution for HA and failover. A master/slave replication is provided (and works quite well), but with no support for the automation of this failover. Clients have to be smart enough to switch to the correct server. Something on server-side (but this is unspecified) has to switch the roles between master and slaves nodes in a reliable way. In other words, Redis only provides a do-it-yourself HA/failover solution.
Sharding has to be implemented on client-side (like with memcached). Some clients have support for it, but not all of them. The fastest client (hiredis) does not. Anyway, things like rebalancing has to be implemented on top of Redis. Redis Cluster which is supposed to support such sharding capabilities is not ready yet.
I would suggest to use some other solutions. MongoDB with GridFS can be a possibility. Hadoop with HDFS is another one. If you like cutting edge projects, you may want to give the Elliptics Network a try.