I have built a classification model on my local machine and now for deployment I am using Azure Machine Learning.
I have registered my model on AzureML.
Now while deploying or trying to expose web service I am facing issues with the docker image creation.
wenv= CondaDependencies()
wenv.add_conda_package("scikit-learn")
with open("wenv.yml", "w") as f:
f.write(wenv.serialize_to_string())
with open("wenv.yml","r") as f:
print(f.read())
image_config =ContainerImage.image_configuration(execution_script="scorete.py",
runtime="python",
conda_file="wenv.yml")
#Expose Web Service
service_name = 'telecoinference'
service =Webservice.deploy_from_model(workspace= ws,
name= service_name,
deployment_config=aciconfig,
models=\[model\],
image_config=image_config)
service.wait_for_deployment(show_output=True)
print(service.state)
WebserviceException Traceback (most recent call last)
\<ipython-input-50-cbddf70eccff\> in \<module\>
7 deployment_config=aciconfig,
8 models=\[model\],
\----\> 9 image_config=image_config)
10 service.wait_for_deployment(show_output=True)
11 print(service.state)
\~\\AppData\\Roaming\\Python\\Python36\\site-packages\\azureml\\core\\webservice\\webservice.py in deploy_from_model(workspace, name, models, image_config, deployment_config, deployment_target, overwrite)
450
451 image = Image.create(workspace, name, models, image_config)
\--\> 452 image.wait_for_creation(True)
453 if image.creation_state != 'Succeeded':
454 raise WebserviceException('Error occurred creating image {} for service. More information can be found '
\~\\AppData\\Roaming\\Python\\Python36\\site-packages\\azureml\\core\\image\\image.py in wait_for_creation(self, show_output)
452 'current state: {}\\n'
453 'Error response from server:\\n'
\--\> 454 '{}'.format(self.creation_state, error_response), logger=module_logger)
455
456 print('Image creation operation finished for image {}, operation "{}"'.format(self.id, operation_state))
WebserviceException: WebserviceException:
Message: Image creation polling reached non-successful terminal state, current state: Failed
Error response from server:
StatusCode: 400
Message: Docker image build failed.
InnerException None
ErrorResponse
{
"error": {
"message": "Image creation polling reached non-successful terminal state, current state: Failed\\nError response from server:\\nStatusCode: 400\\nMessage: Docker image build failed."
}
}`
Related
I'm trying to stand up a proof of concept that ingests an RTSP video stream into Kinesis Video. The provided documentation has a docker image all set up that seems to have everything I need to do this, hosted by AWS on 546150905175.dkr.ecr.us-west-2.amazonaws.com. What I am having trouble with, though, is getting that deployment (via an Amplify Custom category, in TypeScript CDK) to work.
I've tried different variations on
import * as iam from "#aws-cdk/aws-iam";
import * as ecs from "#aws-cdk/aws-ecs";
import * as ec2 from "#aws-cdk/aws-ec2";
const kinesisUserAccessKey = new iam.AccessKey(this, 'KinesisStreamUserAccessKey', {
user: kinesisStreamUser,
})
const servicePrincipal = new iam.ServicePrincipal('ecs-tasks.amazonaws.com');
const executionRole = new iam.Role(this, 'IngestVideoTaskDefExecutionRole', {
assumedBy: servicePrincipal,
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonECSTaskExecutionRolePolicy'),
]
});
const taskDefinition = new ecs.FargateTaskDefinition(this, 'IngestVideoTaskDef', {
cpu: 512,
memoryLimitMiB: 1024,
executionRole,
})
const image = ecs.ContainerImage.fromRegistry('546150905175.dkr.ecr.us-west-2.amazonaws.com/kinesis-video-producer-sdk-cpp-amazon-linux:latest');
taskDefinition.addContainer('IngestVideoContainer', {
command: [
'gst-launch-1.0',
'rtspsrc',
`location="${locationParam.secretValue.toString()}"`,
'short-header=TRUE',
'!',
'rtph264depay',
'!',
'video/x-h264,',
'format=avc,alignment=au',
'!',
'kvssink',
`stream-name="${cfnStream.name}"`,
'storage-size=512',
`access-key="${kinesisUserAccessKey.accessKeyId}"`,
`secret-key="${kinesisUserAccessKey.secretAccessKey.toString()}"`,
`aws-region="${REGION}"`,
// `aws-region="${cdk.Aws.REGION}"`,
],
image,
logging: new ecs.AwsLogDriver({
streamPrefix: 'IngestVideoContainer',
}),
})
const service = new ecs.FargateService(this, 'IngestVideoService', {
cluster,
taskDefinition,
desiredCount: 1,
securityGroups: [
ec2.SecurityGroup.fromSecurityGroupId(this, 'DefaultSecurityGroup', SECURITY_GROUP_ID)
],
vpcSubnets: {
subnets: SUBNET_IDS.map(subnetId => ec2.Subnet.fromSubnetId(this, subnetId, subnetId)),
}
})
But it seems like regardless of what I do, an amplify push just stays in 'in progress' for like an hour until I go into the CloudFormation console and cancel the stack update, but deep in the my way to the ECS Console I managed to find an actual error message:
Resourceinitializationerror: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-west-2.amazonaws.com/": dial tcp 52.94.177.118:443: i/o timeout
It seems to be some kind of networking issue, but I'm not sure how to proceed. Any assistance you can provide would be wonderful. Cheers!
Figured it out. For those stuck with similar issues, you have to give it an execution role with AmazonECSTaskExecutionRolePolicy, which I already edited above, and set assignPublicIp: true in the service.
I have been tasked with updating kserve from 0.7 to 0.9. Our company mar files run fine on 0.7 but when I update to kserve 0.9 the pods are brought up without issue. However, when I when a request is sent it returns a 500 error. The logs are given below.
model being used is: pytorch
Deployment type: RawDeployment
Kubernetes version: 1.25
Defaulted container "kserve-container" out of: kserve-container, storage-initializer (init)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-11-18T13:37:44,001 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-11-18T13:37:44,203 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.6.0
TS Home: /usr/local/lib/python3.8/dist-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /usr/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /mnt/models/model-store
Model config: N/A
2022-11-18T13:37:44,208 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2022-11-18T13:37:44,288 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring
2022-11-18T13:37:44,297 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2022-11-18T13:37:44,298 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
[I 221118 13:37:46 __main__:75] Wrapper : Model names ['modelname'], inference address http//0.0.0.0:8085, management address http://0.0.0.0:8085, model store /mnt/models/model-store
[I 221118 13:37:46 TorchserveModel:54] kfmodel Predict URL set to 0.0.0.0:8085
[I 221118 13:37:46 TorchserveModel:56] kfmodel Explain URL set to 0.0.0.0:8085
[I 221118 13:37:46 TSModelRepository:30] TSModelRepo is initialized
[I 221118 13:37:46 model_server:150] Registering model: modelname
[I 221118 13:37:46 model_server:123] Listening on port 8080
[I 221118 13:37:46 model_server:125] Will fork 1 workers
[I 221118 13:37:46 model_server:128] Setting max asyncio worker threads as 12
2022-11-18T13:37:54,738 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model modelname
2022-11-18T13:37:54,738 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model modelname
[I 221118 13:40:12 TorchserveModel:78] PREDICTOR_HOST : 0.0.0.0:8085
[E 221118 13:40:12 web:1789] Uncaught exception POST /v1/models/modelname:predict (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:5000', method='POST', uri='/v1/models/modelname:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
result = await result
File "/usr/local/lib/python3.8/dist-packages/kserve/handlers/predict.py", line 70, in post
response = await model(body)
File "/usr/local/lib/python3.8/dist-packages/kserve/model.py", line 86, in __call__
response = (await self.predict(request)) if inspect.iscoroutinefunction(self.predict) \
File "/home/model-server/kserve_wrapper/TorchserveModel.py", line 80, in predict
response = await self._http_client.fetch(
ConnectionRefusedError: [Errno 111] Connection refused
[E 221118 13:40:12 web:2239] 500 POST /v1/models/modelname:predict (127.0.0.1) 9.66ms
[I 221118 13:40:13 TorchserveModel:78] PREDICTOR_HOST : 0.0.0.0:8085
[E 221118 13:40:13 web:1789] Uncaught exception POST /v1/models/modelname:predict (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:5000', method='POST', uri='/v1/models/modelname:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
result = await result
File "/usr/local/lib/python3.8/dist-packages/kserve/handlers/predict.py", line 70, in post
response = await model(body)
File "/usr/local/lib/python3.8/dist-packages/kserve/model.py", line 86, in __call__
response = (await self.predict(request)) if inspect.iscoroutinefunction(self.predict) \
File "/home/model-server/kserve_wrapper/TorchserveModel.py", line 80, in predict
response = await self._http_client.fetch(
ConnectionRefusedError: [Errno 111] Connection refused
[E 221118 13:40:13 web:2239] 500 POST /v1/models/modelname:predict (127.0.0.1) 3.31ms
[I 221118 13:40:14 TorchserveModel:78] PREDICTOR_HOST : 0.0.0.0:8085
[E 221118 13:40:14 web:1789] Uncaught exception POST /v1/models/modelname:predict (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:5000', method='POST', uri='/v1/models/modelname:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
result = await result
File "/usr/local/lib/python3.8/dist-packages/kserve/handlers/predict.py", line 70, in post
response = await model(body)
File "/usr/local/lib/python3.8/dist-packages/kserve/model.py", line 86, in __call__
response = (await self.predict(request)) if inspect.iscoroutinefunction(self.predict) \
File "/home/model-server/kserve_wrapper/TorchserveModel.py", line 80, in predict
response = await self._http_client.fetch(
ConnectionRefusedError: [Errno 111] Connection refused
[E 221118 13:40:14 web:2239] 500 POST /v1/models/modelname:predict (127.0.0.1) 3.38ms
I was not able to find the package (/usr/local/lib/python3.8/dist-packages/tornado/web.py) tornado inside the mar file. So I don't think it is being used directly by the model.
I tried deploying it on both kserver 0.7 and 0.9. our mar file works on kserve 0.7 but fails on kserve 0.9.
I also deployed the sample inference (https://kserve.github.io/website/0.9/modelserving/v1beta1/torchserve/#create-the-torchserve-inferenceservice) on kserve 0.9 and it worked as expected.
deployed it on GKE, rke2 and Docker Desktop Kubernetes.
I already have a schema of users with authentication-key and wanted to do authentication via that. I tried implementing authentication via sql but due to different structure of my schema I was getting error and so I implemented external-authentication method. The technologies and OS used in my application are :
Node.JS
Ejabberd as XMPP server
MySQL Database
React-Native (Front-End)
OS - Ubuntu 18.04
I implemented the external authentication configuration as mentioned in https://docs.ejabberd.im/admin/configuration/#external-script and took php script https://www.ejabberd.im/files/efiles/check_mysql.php.txt as an example. But I am getting the below mentioned error in error.log. In ejabberd.yml I have done following configuration.
...
host_config:
"example.org.co":
auth_method: [external]
extauth_program: "/usr/local/etc/ejabberd/JabberAuth.class.php"
auth_use_cache: false
...
Also, is there any external auth javascript script?
Here is the error.log and ejabberd.log as mentioned below
error.log
2019-03-19 07:19:16.814 [error]
<0.524.0>#ejabberd_auth_external:failure:103 External authentication
program failed when calling 'check_password' for admin#example.org.co:
disconnected
ejabberd.log
2019-03-19 07:19:16.811 [debug] <0.524.0>#ejabberd_http:init:151 S:
[{[<<"api">>],mod_http_api},{[<<"admin">>],ejabberd_web_admin}]
2019-03-19 07:19:16.811 [debug]
<0.524.0>#ejabberd_http:process_header:307 (#Port<0.13811>) http
query: 'POST' <<"/api/register">>
2019-03-19 07:19:16.811 [debug]
<0.524.0>#ejabberd_http:process:394 [<<"api">>,<<"register">>] matches
[<<"api">>]
2019-03-19 07:19:16.811 [info]
<0.364.0>#ejabberd_listener:accept:238 (<0.524.0>) Accepted connection
::ffff:ip -> ::ffff:ip
2019-03-19 07:19:16.814 [info]
<0.524.0>#mod_http_api:log:548 API call register
[{<<"user">>,<<"test">>},{<<"host">>,<<"example.org.co">>},{<<"password">>,<<"test">>}]
from ::ffff:ip
2019-03-19 07:19:16.814 [error]
<0.524.0>#ejabberd_auth_external:failure:103 External authentication
program failed when calling 'check_password' for admin#example.org.co:
disconnected
2019-03-19 07:19:16.814 [debug]
<0.524.0>#mod_http_api:extract_auth:171 Invalid auth data:
{error,invalid_auth}
Any help regarding this topic will be appreciated.
1) Your config about the auth_method looks good.
2) Here is a python script I've used and upgraded to make an external authentication for ejabberd.
#!/usr/bin/python
import sys
from struct import *
import os
def openAuth(args):
(user, server, password) = args
# Implement your interactions with your service / database
# Return True or False
return True
def openIsuser(args):
(user, server) = args
# Implement your interactions with your service / database
# Return True or False
return True
def loop():
switcher = {
"auth": openAuth,
"isuser": openIsuser,
"setpass": lambda(none): True,
"tryregister": lambda(none): False,
"removeuser": lambda(none): False,
"removeuser3": lambda(none): False,
}
data = from_ejabberd()
to_ejabberd(switcher.get(data[0], lambda(none): False)(data[1:]))
loop()
def from_ejabberd():
input_length = sys.stdin.read(2)
(size,) = unpack('>h', input_length)
return sys.stdin.read(size).split(':')
def to_ejabberd(result):
if result:
sys.stdout.write('\x00\x02\x00\x01')
else:
sys.stdout.write('\x00\x02\x00\x00')
sys.stdout.flush()
if __name__ == "__main__":
try:
loop()
except error:
pass
I didn't created the communication with Ejabberd from_ejabberd() and to_ejabberd(), and unfortunately can't find back the sources.
I get the following error on ApiaryWe are sorry, but the API call failed.
My host is defined as
FORMAT: 1A
HOST: https://test.mynetwork.com/
GET CALL IS DEFINED AS
data urn [models/v2/files/{urn}/data{?guid}]
GET [GET]
Parameters
urn (required, string, ttt)...design urn.
guid (optional, string,067e6162-3b6f-4ae2-a171-2470b63dfe02)...filter by guid.
Response 200 (application/vnd.api+json)
Body
data
{
"version": "1.0",
}
When i invoke this , i get error . Any inputs
I have edited your API Blueprint as follows:
FORMAT: 1A
HOST: https://test.mynetwork.com
# Test API
This is a test API
## data urn [/models/v2/files/{urn}/data{?guid}]
+ Parameters
+ urn: `ttt` (required, string) - design urn
+ guid: `067e6162-3b6f-4ae2-a171-2470b63dfe02` (optional, string) - filter by guid
### test [GET]
+ Response 200 (application/vnd.api+json)
{
"version": "1.0"
}
You can find tutorials here: https://help.apiary.io/
Also not sure what you mean "invoke" the API - are you calling the mock server or are you hitting your own server through Apiary.
Happy to help further via the support option in Apiary itself or email us on support [at] apiary.io.
I have a big error with pouchDB communicating to my Cloudant database in a angular/ionic app.
Can you please help me figure out how to fix this ?
POST https://louisromain.cloudant.com/boardline_users/_bulk_get?revs=true&attachments=true&_nonce=1446478625328 400 (Bad Request)
pouchdb.min.js:8 Database has a global failure DOMError {}message: ""name: "QuotaExceededError"__proto__: DOMErrora.8.G.onsuccess.H.onabort # pouchdb.min.js:8
ionic.bundle.min.js:139 o {status: 500, name: "abort", message: "unknown", error: true, reason: "QuotaExceededError"}error: truemessage: "unknown"name: "abort"reason: "QuotaExceededError"result: Objectdoc_write_failures: 1docs_read: 1docs_written: 0end_time: Mon Nov 02 2015 16:37:05 GMT+0100 (CET)errors: Array[1]last_seq: "3478-g1AAAAFJeJzLYWBgYMlgTmGQT0lKzi9KdUhJMtXLSs1LLUst0kvOyS9NScwr0ctLLckBKmRKZEiy____f1YGUxIDA3N6LlCMPdXM1MzEMo1oM5IcgGRSPcKYcLAxKZYGlslpSajGmOA2Jo8FSDI0ACmgSftRXJSSamFoYWmOapQ5IaMOQIwCuooZZFQhxHPmJkCURtigLAAxFGUZ"ok: falsestart_time: Mon Nov 02 2015 16:36:59 GMT+0100 (CET)status: "aborting"__proto__: Objectstatus: 500__proto__: r(anonymous function) # ionic.bundle.min.js:139b.$get # ionic.bundle.min.js:111(anonymous function) # ionic.bundle.min.js:151a.$get.n.$eval # ionic.bundle.min.js:165a.$get.n.$digest # ionic.bundle.min.js:163(anonymous function) # ionic.bundle.min.js:166e # ionic.bundle.min.js:74(anonymous function) # ionic.bundle.min.js:76
11ionic.bundle.min.js:139 Error: Failed to execute 'transaction' on 'IDBDatabase': The database connection is closing.
at Error (native)
at a.9.n.openTransactionSafely (http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:8:9233)
at i.a.8.e._getLocal (http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:8:2521)
at i.<anonymous> (http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:7:6737)
at i.<anonymous> (http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:10:28092)
at i.a.90.t.exports (http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:10:28931)
at http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:9:28802
at i.<anonymous> (http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:9:28722)
at i.a.90.t.exports [as get] (http://localhost:8101/lib/pouchdb/dist/pouchdb.min.js:10:28931)
at i.angular.module.constant.service.$q.qify [as get] (http://localhost:8101/lib/angular-pouchdb/angular-pouchdb.js:35:27)(anonymous function) # ionic.bundle.min.js:139b.$get # ionic.bundle.min.js:111(anonymous function) # ionic.bundle.min.js:151a.$get.n.$eval # ionic.bundle.min.js:165a.$get.n.$digest # ionic.bundle.min.js:163(anonymous function) # ionic.bundle.min.js:166e # ionic.bundle.min.js:74(anonymous function) # ionic.bundle.min.js:76
The error is that the device has run out of space. Unfortunately this is an error thrown by IndexedDB itself when the device is too low on storage, so there's nothing you can do about it except to use less space. PouchDB's compact() can help; there's also the transform-pouch plugin if you want to just reduce the size of your documents.