MLflow Artifacts Storing artifacts(google cloud storage) but not displaying them in MLFlow UI - docker-compose

I am working on a docker environment(docker-compose) with a jupyter notebook docker image and a postgres docker image for running ML models and using google cloud storage to store the model artifacts. Storing the models on the cloud storage works fine but i can't get to show them within the MLFlow UI. I have seen similar problems but non of the solutions used google cloud storage as the storage location for artifacts. The error message says the following Unable to list artifacts stored under <gs-location> for the current run. Please contact your tracking server administrator to notify them of this error, which can happen when the tracking server lacks permission to list artifacts under the current run's root artifact directory.What could possibly be causing this problem?

I had the exactly the same issue. Keywords are docker-compose, google cloud storage, success in storing in GCS, but failure in listing artifacts in UI.
In my case, it turns out that in docker-compose file, if you assign the env vars by reading from a .env file (eg. GOOGLE_APPLICATION_CREDENTIALS), the server might start before the assignment. The quick solve is to assign the env var directly with key environment: instead of using key env_file:.
For sensitive data that you still need to put in .env file, you can add wait time for the server, and add depends on: in docker-compose file to make sure that the database container starts before the mlflow server if you are using database-backed store.

I faced a same issue when running mlflow from local. The issue got resolved after adding GOOGLE_APPLICATION_CREDENTIALS to the environment variables.
https://googleapis.dev/python/google-api-core/latest/auth.html

Related

Firestore: Copy/Import data from local emulator to Cloud DB

I need to copy Firestore DB data from my local Firebase emulator to the cloud instance. I can move data from the Cloud DB to the local DB fine, using the EXPORT functionality in the Firebase admin console. We have been working on the local database instance for 3-4 months and now we need to move it back to the Cloud. I have tried to move the local "--export-on-exit" files back to my storage bucket and then IMPORT from there to the Cloud DB, but it fails everytime.
I have seen one comment by Doug at https://stackoverflow.com/a/71819566/20390759 that this is not possible, but the best solution is to write a program to copy from local to cloud. I've started working on that, but I can't find a way to have both projects/databases open at the same time, both local and cloud. They are all the same project ID, app-key, etc.
Attempted IMPORT: I copied the files created by "--export-on-exit" in the emulator to my cloud storage bucket. Then, I selected IMPORT and chose the file I copied up to the bucket. I get this error message:
Google Cloud Storage file does not exist: /xxxxxx.appspot.com/2022-12-05 from local/2022-12-05 from local.overall_export_metadata
So, I renamed the file metadata file to match the directory name from the local system, and the IMPORT claims to initiate successfully, but then fails with no error message.
I've been using Firestore for several years, but we just started using the emulator this year. Overall I like it, but if I can't easily move data back to the Cloud, I can't use it. Thanks for any ideas/help.

Having problem to add local hasura to Google cloud run

Do you have some information or tutorial to add local hasura to google cloud run.
I already successfully set the hasura at google cloud run, but it seems i have a problem to connect it with our local database in hasura.
i got an error
ERROR: (gcloud.builds.submit) Unable to read file [cloudbuild.yaml]: [Errno 2] No such file or directory: 'cloudbuild.yaml'
Is there something is not configured yet or?
Best
Zaid
Your question is vague.
The error you reference is Google Cloud Build and suggests that you're trying gcloud builds submit ... and this is failing because the command is unable to find a cloudbuild.yaml file. It's entirely probably you want to do the deployment using Cloud Build but you'll need to create the cloudbuild.yaml file for this to work.
For those of unfamiliar with "hasura", do you mean hasura.io?
This appears to require a container image running that defaults to port :8080 (which is good as that's a default assumed by Cloud Run) and a connection to a PostreSQL database.
If you're using Cloud SQL to run PostgreSQL, you can follow the instructions here

Accessing streamsets web UI on another node in a cluster than where installed, which file system does it 'look in'?

I have a cluster of machines hosting hadoop (MapR) and have install streamsets on one of the nodes (say node002) following the RPM documentation. However, I am accessing the web UI for the data collector from another node, node001.
My question is, when I specify files paths (eg. an origin directory), which file system is the web UI going to be referring to? Eg. if I put an origin directory as /home/myuser/mydata, will the pipeline created in the web UI be looking for that directory in node001 or node002? New to using streamsets, so a more detailed answer would be appreciated. Thanks.
** Ultimately I am asking this because I am currently getting "FileNotFound" and "permission denied" errors while trying to follow the documentation's tutorial and am trying to debug the situation.
From the streamsets community forums: It will be the path to the local file on the machine running that particular SDC instance.
The FileNotFound and permission errors have to do with the fact that the default user for the sdc service is a user called sdc. Still working on how to fix this part, but can produce a workable prototype by setting the read and write access for the directories in question to allow public access (still need to work on this part, but this answers the posted question).

Docker and sensitive information used at run-time

We are dockerizing an application (written in Node.js) that will need to access some sensitive data at run-time (API tokens for different services) and I can't find any recommended approach to deal with that.
Some information:
The sensitive information is not in our codebase, but it's kept on another repository in encrypted format.
On our current deployment, without Docker, we update the codebase with git, and then we manually copy the sensitive information via SSH.
The docker images will be stored in a private, self-hosted registry
I can think of some different approaches, but all of them have some drawbacks:
Include the sensitive information in the Docker images at build time. This is certainly the easiest one; however, it makes them available to anyone with access to the image (I don't know if we should trust the registry that much).
Like 1, but having the credentials in a data-only image.
Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we're doing right now. This is very convenient too, but then we can't spin up new servers easily (maybe we could use something like etcd to synchronize them?)
Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).
PS: I've done some research but couldn't find anything similar to my problem. Other questions (like this one) were about sensitive information needed at build-time; in our case, we need the information at run-time
I've used your options 3 and 4 to solve this in the past. To rephrase/elaborate:
Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we're doing right now.
I use config management (Chef or Ansible) to set up the credentials on the host. If the app takes a config file needing API tokens or database credentials, I use config management to create that file from a template. Chef can read the credentials from encrypted data bag or attributes, set up the files on the host, then start the container with a volume just like you describe.
Note that in the container you may need a wrapper to run the app. The wrapper copies the config file from whatever the volume is mounted to wherever the application expects it, then starts the app.
Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).
Yes, it's cumbersome to pass a bunch of env variables using -e key=value syntax, but this is how I prefer to do it. Remember the variables are still exposed to anyone with access to the Docker daemon. If your docker run command is composed programmatically it's easier.
If not, use the --env-file flag as discussed here in the Docker docs. You create a file with key=value pairs, then run a container using that file.
$ cat >> myenv << END
FOO=BAR
BAR=BAZ
END
$ docker run --env-file myenv
That myenv file can be created using chef/config management as described above.
If you're hosting on AWS you can leverage KMS here. Keep either the env file or the config file (that is passed to the container in a volume) encrypted via KMS. In the container, use a wrapper script to call out to KMS, decrypt the file, move it in to place and start the app. This way the config data is not exposed on disk.

Mounting Page Blob as a VHD in a batch file

This is a follow-up to my stackoverflow post: how do I mount a page blob as a VHD on worker role instance? After the drive is mounted, I will pass that as the value of --dbpath parameter to mongo instance.
In a nutshell, I'm trying to start a single mongo instance with the data directory on azure blob (for durability). I'm building on the HelloWorld example on Azure's site-- instead of starting Tomcat instance, I will start mongo instance.
I suggest you follow this guide: http://www.codeproject.com/Articles/81413/Windows-Azure-Drives-Part-1-Configure-and-Mounting. This guide explains how to mount the drive but it also shows how you can save the drive letter as an environment variable.
This is interesting for when you're starting the mongo instance, you can just use this environment variable together with --dbpath. Maybe it would be best to encapsulate all the code in a console application so that you can simply start it before starting the mongo instance.
I’m not sure if you can mount a drive in Java. Currently this feature is not available in Windows Azure Storage Client for Java: https://github.com/WindowsAzure/azure-sdk-for-java. There’s no native (C++) API either. So you may need to use .NET to mount the drive, and then start your Java process from your .NET application. For now, you can also submit a feature request on http://www.mygreatwindowsazureidea.com/forums/34192-windows-azure-feature-voting.
Best Regards,
Ming Xu.