Reindex from host with sphinx docker container - sphinx

Just want ask a clarifying question before I pursue docker further here. I'm trying to understand the life cycle of indices with Sphinx in a container.
Provided I set up a container with Sphinx with some build, so it has some shared indices, how can I reindex from the host? Will I have to determine the container IP (assuming through $CID) and then send the reindex command through SSH to the container or something else fancy manner?
I'm using Rails with thinking sphinx and have some nice capistrano hooks to reindex from my dev box, I'm guessing I'm going to loose those by putting sphinx in a docker container since sphinx would no longer be on the host itself.

A container is just like a virtual machine with an added advantage that it is much lighter. So you can do the re-indexing any manner you like or do. Either ssh or directly through the bash shell you receive when you run the container from the provided image.

Related

Connect Tarantool Docker image to Postgres

I'm trying to connect Tarantool Docker Image to local PostgreSQL, to replicate some test data, and ran into the following problems:
It seems there is no CL (except Tarantool console) to check which
files are in place (exec bin/bash fails)
pg = require('pg') leads to
an error: "init.lua:4: module 'pg.driver' not found", despite the
presence of the pg module in the Docker description
I have doubts about how to replicate efficiently 4 tables, and
relations between them, to the container from outside Postgres
Does anyone know sources to dig in and find solutions to those problems? Any direction would be greatly appreciated.
docker exec -ti tnt_container sh
the issue. You should find an older base image or build it yourself.
This is PostgreSQL-related doubts. You may pass batches of data to pg functions or use intermediate application to transfer data via COPY. It looks like tarantool's pg driver does not support COPY.

File ownership and permissions in Singularity containers

When I run singularity exec foo.simg whoami I get my own username from the host, unlike in Docker where I would get root or the user specified by the container.
If I look at /etc/passwd inside this Singularity container, an entry has been added to /etc/passwd for my host user ID.
How can I make a portable Singularity container if I don't know the user ID that programs will be run as?
I have converted a Docker container to a Singularity image, but it expects to run as a particular user ID it defines, and several directories have been chown'd to that user. When I run it under Singularity, my host user does not have access to those directories.
It would be a hack but I could modify the image to chmod 777 all of those directories. Is there a better way to make this image work on Singularity as any user?
(I'm running Singularity 2.5.2.)
There is actually a better approach than just chmod 777, which is to create a "vanilla" folder with your application data/conf in the image, and then copy it over to a target directory within the container, at runtime.
Since the copy will be carried out by the user actually running the container, you will not have any permission issues when working within the target directory.
You can have a look at what I done here to create a portable remote desktop service, for example: https://github.com/sarusso/Containers/blob/c30bd32/MinimalMetaDesktop/files/entrypoint.sh
This approach is compatible with both Docker and Singularity, but it depends on your use-case if it is a viable solution or not. Most notably, it requires you to run the Singularity container with --writable-tmpfs.
As a general comment, keep in mind that even if Singularity is very powerful, it behaves more as an environment than a container engine. You can make it work more container-like using some specific options (in particular --writable-tmpfs --containall --cleanenv --pid), but it will still have limitations (variable usernames and user ids will not go away).
First, upgrade to the v3 of Singularity if at all possible (and/or bug your cluster admins to do it). The v2 is no longer supported and several versions <2.6.1 have security issues.
Singularity is actually mounting the host system's /etc/passwd into the container so that it can be run by any arbitrary user. Unfortunately, this also effectively clobbers any users that may have been created by a Dockerfile. The solution is as you thought, to chmod any files and directories to be readable by all. chmod -R o+rX /path/to/base/dir in a %post step is simplest.
Since the final image is read-only, allowing write permission doesn't do anything and it's useful to get into the mindset about only writing to files/directories that have been mounted to the image.

postgresql docker replications

I'm relatively new to dockers, but I'm kind of wondering whether is it possible for me to create two master-slave postgres containers. I can do it on virtual machines, but I'm a bit confused on the one in docker.
If it's possible can someone please point me to right directions?
I have tried to docker exec -it, but all the files are all missing and I cannot edit the files inside.
Since you are new to Docker, and you wish to get up and running quickly, you can try using Bitnami's images, which allow you to specify a POSTGRESQL_REPLICATION_MODE environment variable, which will allow you to designate a container as a standby/slave.
Just save their docker-compose-replication.yml as docker-compose.yml in the director of your choice, run docker-compose up -d, and it will pull the necessary image and set everything up for you quickly.
However, I would highly encourage you to tinker on your own to learn how Docker works. Specifically, you could just use the community Postgres image, and then write your own entrypoint.sh file (along with any additional helper files as necessary), and customize the setup to your requirements.
Disclosure: I work for EnterpriseDB (EDB)

Persisting a single, static, large Postgres database beyond removal of the db cluster?

I have an application which, for local development, has multiple Docker containers (organized under Docker Compose). One of those containers is a Postgres 10 instance, based on the official postgres:10 image. That instance has its data directory mounted as a Docker volume, which persists data across container runs. All fine so far.
As part of testing the creation and initialization of the postgres cluster, it is frequently the case that I need to remove the Docker volume that holds the data. (The official postgres image runs cluster init if-and-only-if the data directory is found to be empty at container start.) This is also fine.
However! I now have a situation where in order to test and use a third party Postgres extension, I need to load around 6GB of (entirely static) geocoding lookup data into a database on the cluster, from Postgres backup dump files. It's certainly possible to load the data from a local mount point at container start, and the resulting (very large) tables would persist across container restarts in the volume that holds the entire cluster.
Unfortunately, they won't survive the removal of the docker volume which, again, needs to happen with some frequency. I am looking for a way to speed up or avoid the rebuilding of the single database which holds the geocoding data.
Approaches I have been or currently am considering:
Using a separate Docker volume on the same container to create persistent storage for a separate Postgres tablespace that holds only the geocoder database. This appears to be unworkable because while I can definitely set it up, the official PG docs say that tablespaces and clusters are inextricably linked such that the loss of the rest of the cluster would render the additional tablespace unusable. I would love to be wrong about this, since it seems like the simplest solution.
Creating an entirely separate container running Postgres, which mounts a volume to hold a separate cluster containing only the geocoding data. Presumably I would then need to do something kludgy with foreign data wrappers (or some more arcane postgres admin trickery that I don't know of at this point) to make the data seamlessly accessible from the application code.
So, my question: Does anyone know of a way to persist a single database from a dockerized Postgres cluster, without resorting to a dump and reload strategy?
If you want to speed up then you could convert your database dump to a data directory (import your dump to a clean postgres container, stop it and create a tarball of the data directory, then upload it somewhere). Now when you need to create a new postgres container use use a init script to stop the database, download and unpack your tarball to the data directory and start the database again, this way you skip the whole db restore process.
Note: The data tarball has to match the postgres major version so the container has no problem to start from it.
If you want to speed up things even more then create a custom postgres image with the tarball and init script bundled so everytime it starts then it will wipe the empty cluster and copy your own.
You could even change the entrypoint to use your custom script and load the database data, then call docker-entrypoint.sh so there is no need to delete a possible empty cluster.
This will only work if you are OK with replacing the whole cluster everytime you want to run your tests, else you are stuck with importing the database dump.

Using Docker and MongoDB

I have been using Docker and Kubernetes for a while now, and have set up a few databases(postgres and mysql) and services.
Now I was looking at adding a mongoDB, but it seems different when it comes to user management.
Take for example postgres:
https://hub.docker.com/_/postgres/
Immediately I can declare users with a password on setup and then connect using this. It seems the mongo image does not support this. Is there a way to simply declare users on startup and use them similar to the postgres setup? This is, without having to exec into the image, modify auth settings and restarting the mongo service.