Can't deploy a model created with scikit-learn on ML Engine - deployment

I'm trying to deploy a model created with scikit-learn on ML Engine. I worked on a Datalab notebook and after creating the model, I exported it to a file by using joblib.dump (model, 'model.joblib'). Once I had that file, I copied it in Cloud Storage: gsutil cp ./model.joblib gs://... Finally, I created a model resource by typing: gcloud ml-engine models create model --regions=us-central1.
The problem comes when I try to create the model version, as it says that it expects to find a .pb/.pbtxt saved model file.
Does anyone knows how to proceed with this issue without having to move to a TensorFlow model?
Any help would be much appreciated.

If you are running Datalab, you are running an older version that doesn't support scikit learn. A new update to Datalab will be out any day now.
However, the workaround is simple. Just run
%bash
gcloud components update
in a cell of your notebook

You'll need to set a few flags, in particular --framework (which defaults to TensorFlow), but also --runtime-version (must be 1.4 or higher) and (possibly) --python-version (defaults to 2.7). Try something like (reference):
gcloud beta ml-engine versions create v1 \
--model my_model \
--origin gs://path/to/model \
--runtime-version="1.8" \
--framework SCIKIT_LEARN
--python-version="3.5"
For a list of what's contained in each runtime version (i.e. to help you choose which one), see here.

Related

Skaffold and mutiple Sub Charts

lately I was experimenting with Skaffold with our Helm Charts and I am in little bit in a dilemma that our Helm Chart \ Sub Charts are compatible with Skaffold or not.
Our helm Charts are looking like the following
my-helm-charts
+-charts
+-project1
+-project2
+-project3
+-project4
+-infrastructure_kafka
+-charts
+-kafka
+-zookeeper
+-infrastructure_cassandra
+-infrastructure_elasticsearch
+-Charts.yaml
+-Values.yaml
The reason we choosed to structure the Helm Charts this way, is that if necessary to spin up extra stages for our project.
Now when I want to develop project2 with Google Cloud Code / Skaffold (which I configured correctly and I can start without problem in IntelliJ) I have to start whole my-helm-charts.
That is actually Ok but the problem is, if I use Debug in Kubernetes, I have a feeling Google Cloud Code/Skaffold can really locate the project2 and no debugging occurs.
My feeling is Google Cloud/Skaffold is more oriented to work with following contruct...
project2-helm
+-templates
+-Charts.yaml
+-Values.yaml
My Subcharts contructs starts in Google Cloud Code/Skaffold without any exception but I can't debug, is it possible to achieve want I want with my structure and if yes, how?
Or is it not possible at all...
Thx for answers...
We recently added a feature called config dependencies which might help here. It allows you to create more specific skaffold.yamls and then map them together with a "requires" field:
https://skaffold.dev/docs/design/config/#configuration-dependencies
Once you have the skaffold.yamls created and the right dependency mapping you can run skaffold with the -m flag to choose once slice of your services:
skaffold dev -m project3
Cloud Code support for modules is incoming.
Cloud Code IntelliJ and Cloud Code VS Code recently added preview level support for deploying and debugging modules of a larger application which uses Skaffold. See more here https://cloud.google.com/code/docs/intellij/skaffold-modules

Google Cloud Platform - Specifying Project Name

I am trying to follow Google's instructions on deploying a Cloud Function from the command line. I cloned their sample project, but when I used gcloud functions deploy to deploy it, it complained that it failed to find attribute [project]. I had to provide that manually.
Where in their docs to they talk about setting the project attribute? I must've missed it, and it seems pretty important ...
This answer is in addition to #Kolban.
You can modify your gcloud settings at any time. Here are some common ones:
gcloud config set core/project my-project-id
gcloud config set compute/region us-central1
To list your projects:
gcloud projects list
To see your current settings:
gcloud config list
To see your authorization settings:
gcloud auth list
Then there are settings for individual services such as Cloud Run:
gcloud config set run/region us-central1
To get help to see the vast number of settings available:
gcloud config --help
All of this is documented. Just put a command into Google and a document link will appear. For example put this string into Google: "gcloud compute instances create". The first link takes you to the command documentation.
When you install the Google Cloud SDK (which provides the gcloud command), you have the opportunity to create one or more configurations (including the default). Think of these as "profiles" for your interaction with GCP. A configuration includes:
Your identity
Your default project
Your default region/zone
See the following article:
Initializing Cloud SDK
It sounds like you either didn't run gcloud init or didn't identify a project you wanted to use when you did run it. When you subsequently run gcloud commands and don't specify a project, then the current configuration project will be used. If you didn't set one, then that would explain the error encountered.

Opensuse Images No Longer Visible in Default Compute Instance

All,
Opensuse Images are not listed as Active in GCloud but are showing up in the gloud commands in my project.... the Opensuse project is not mine... can I link to someone else's
[Gcloud shell output
My apologies I found a suitable answer:
Is it possible in google cloud web client to create an instance from image in different project?
The Command syntax is from the Gloud Shell is:
gcloud compute instances create production-instance-from-staging-image --image staging-image-1 --image-project staging-project
For Opensuse you have to link to their staging project which is opensuse-project which includes the images. The exact command is attached.....
enter image description here
The Original Question is how to build Gcloud Instances on OpenSUSE with the Console. It used to be a default option in each project.
To achieve this you have to do two things:
1) Find the Opensuse Projects which are in fact documented in Google's online support but requires a gloud command to find the specific images.
The command is :
gloud compute images list --project opensuse-cloud --no-standard-images
2) You have to specifically provision to your project from the opensuse-cloud project
gcloud compute instances create your-instancename --image image-namefrom#1 --image-project opensuse-cloud

"No project ID could be determined from the Cloud SDK configuration" when running psqworker

When I was going through the Google Cloud tutorial: https://cloud.google.com/python/getting-started/using-pub-sub#running_the_app_on_your_local_machine
I got the following error:
google.auth._default No project ID could be determined from the Cloud SDK configuration. Consider running gcloud config set project or setting the GOOGLE_CLOUD_PROJECT environment variable
I did 'gcloud config set project [my project name]' with no success.
What's the problem?
Update: I've deployed app engines previously without any problem. The problem only happens when I run the psqworker for this Pub/Sub function. I know my project ID and used it before.
The first thing I would try would be:
gcloud info
This will tell you the account and project that gcloud is currently set to.
You may also find the available projects for your account with the following gcloud command:
gcloud projects list
Locate the project ID and project number
There are two ways to identify your project: the project number and project ID.
The project number is automatically assigned when you create a project.
The project ID is a unique identifier for a project. When you first create a project, you can accept the default generated project ID or create your own. A project ID cannot be changed after the project is created, so if you are creating a new project, be sure to choose an ID that you'll be comfortable using for the lifetime of the project.
Note: You should be aware that some resource identifiers (such as project IDs) might be retained beyond the life of your project. For
this reason, avoid storing sensitive information in resource
identifiers.
To locate your project ID and project number:
Go to the Cloud Platform Console
From the projects list, select the name of your project.
On the left, click Dashboard. The project name and ID are displayed in the Dashboard.
TL;DR
Use virtualenv -p C:/Python27/python.exe name-of-env instead of virtualenv -p C:/Python36/python.exe name-of-env in the tutorial
I ran into a similar issue. Here are the steps I went through and why. Hope it helps!
First I tried to specify the id with the command gcloud config set project name-of-your-project
This resulted in the error
ERROR: Python 3 and later is not compatible with by the Google Cloud SDK. Please use a Python 2.7.x version.
If you have a compatible Python interpreter installed, you can use it by setting
the CLOUDSDK_PYTHON environment variable to point to it.
I thought this error was weird because the tutorial tells you to use python3 but it doesn't work. So I created a virtualenv with python2.7 like so
virtualenv -p C:/Python27/python.exe name-of-env (I have python 2 and 3 so its easier to specify the whole path to the .exe file)
Then follow the rest of the tutorial with
name-of-env\scripts\activate
pip install -r requirements.txt
Don't know why you have to use python3 when it doesn't even work.

How to deploy: database, source and binary changes in 1 patch?

I'm part of a development team that works on many CMS based projects, using systems like Joomla and Drupal.
In our development process, all of our code changes are managed inside of Git. At the end of a sprint, we create a DIFF that we can apply via patch to live site.
The problem is that most of the time, the changes include
Database Schema Changes
Database Data Changes
Source Code changes
Binary file changes (like images)
Git Diff handles Source Code changes beautifully. Binary files are only not included in the Diff except for reference to the fact that the files have changed.
Database Schema Changes and Database Data Changes are a mess.
I was wandering if anything like an unified patch system exists that could be used to deploy all of these changes in 1 patch.
So the question is, "Is there a system that can be used to deploy all of these changes in 1 shot?
Ideally, this system would allow to run dry-run like patch, but for all of the 4 data types.
Edit:
Thank you everyone for the feedback that you provided, it was a starting point for my research in this area.
Here is what I found so far:
It's difficult to deploy php based
applications using linux packaging
system because the changes to the
project happen iteratively rather
then as releases.
It would be possible to use dbconfig to deploy changes to a
project, but the problem is
generating mysql db diffs (schema
and data)
what really is missing for deployment of php based applications
is a deployment manager that would
be installed on the server and would
be the interface for deploying the
patches
I started a Google Wave on this topic and produced a lot of information as a result.
If anyone is interested in reading this wave, please let me know and I will add you.
For handling installation and upgrade of our application, we use the debian packaging system . ( .deb package )
Context :
We are making J2EE + Flex application. Shipping and administred throught a VPN.
So not so far from you.
Fresh install and upgrade for a version to another are made through puppet ( a system for automating system administration tasks : he install our .deb )
In the .deb we have
our compiled sourcecode
the schema of the database ( handled by [db-config][1] )
binary stuff
how to install throught apt all other application needed ( mysql, tomcat ... )
= All stuff for a fresh install
We also add the info to go from a version to another
the script for upgrading the database ( for each version )
new binary
new stuff to lauch at the machine start ( eg : some weeks ago we have add a activeMQ server )
=> Once the .deb is made correctly, we can install or upgrade seamless in one operation. ( it's made automatically, without any prompt ).
Theire is one .deb per realease, each .deb has a version number and a signature.
You can pick any of our .deb and make a fresh install or upgrade from the actual version to the version number he hold.
The .deb is in our continous integration system. ( we build a .deb each hour, like if we are about to realease a new version )
What are the benefit ?
Install / upgrade automaticcally, with confidence.
Rollback a version
run dry are natively supported
In your precise case
* Database Schema Changes
* Database Data Changes
* Source Code changes
* Binary file changes (like images)
Database => you will have to write migration script. One for each version. ( ex : 1.2-update.sql 1.3-update.sql )
Source code and binary => add them, say in witch version they have to be copied/use
Edit : i'm not sure about source code. We are doing that with compiled code...
Some links to start :
https://wiki.ubuntu.com/PackagingGuide/Complete
http://www.debian.org/doc/manuals/maint-guide/index.fr.html#contents ( in french )
[1]: http://pwet.fr/man/linux/formats/dbconfig dbconfig
[1]: http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html debian
I don't think you'll find a fail-safe mechanism.
I recommend that, when possible, you take into account compatibility with the current published source when making schema/data changes.
This way you can make a v. simple tool that runs database scripts committed to a particular svn location (you don't want diff on database changes, as if you need further modifications you need different statements).
With the above done, you can have a simple command that runs the database changes, then the binary & source code changes.
For database there is also the option of schema&data comparisons tools, these could be used to compare environments & make sure there isn't anything unexpected missing in the change scripts - could also generate the change scripts, but as I said you really want to make sure it won't break current source.
You can create a tool to do the migrations painlessly -- something similar to Peoplesoft's Patch Upgrade Assistant.
It is basically a standalone executable that reads an "Upgrade Template" and carries out tasks. The upgrade template declaratively describes the upgrade tasks or "steps". The steps could be - copy (for backing up or moving the precompiled objects like classes and othar binaries), database (for altering schema elements), SQL Scripts (for loading or transforming current data). The steps will have some predicate logic capable - if it is this, do this, else skip it and go to next etc.
The template is usually an XML file. It also provides for manual steps with instructions for manual actions. Each step also specifies if it is recoverable or not. It would also validate if the step has succeeded or not.
It may be possible to have a Open Source project around this requirement which is quite common.
You need to save git commit objects in local file and then import them into other repo/branch.