I can't trigger SageMaker TrainJob from ECS; getting this exception: botocore.exceptions.NoCredentialsError: Unable to locate credentials - amazon-ecs

I'm trying to trigger a Sagemaker Trainjob from my flask app, which will be hosted on ECS Fargate.
When I run the same script in separate Python script it runs fine, but when I'm trying to do the same in Flask script, it gives the above mentioned exception. Below is a snippet of the code!
role = 'AmazonSageMaker-ExecutioRole-20221019T223514'
estimator = Estimator(role=role,
instance=1,
instance_type='ml.g4dn.xlarge'
image_uri=img,
hyperparameters=parms)
estimator.fit()
The exception occurs at estimator.fit()
Any help would be appreciated!
I have added the respective policies of ECS and Sagemaker to the role, but still no luck!

Related

How to run data bricck notebook with mlflow in azure data factory pipeline?

My colleagues and I are facing an issue when trying to run my databricks notebook in Azure Data Factory and the error is coming from MLFlow.
The command that is failing is the following:
# Take the parent notebook path to use as path for the experiment
context = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
nb_base_path = context['extraContext']['notebook_path'][:-len("00_training_and_validation")]
experiment_path = nb_base_path + 'trainings'
mlflow.set_experiment(experiment_path)
experiment = mlflow.get_experiment_by_name(experiment_path)
experiment_id = experiment.experiment_id
run = mlflow.start_run(experiment_id=experiment_id, run_name=f"run_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}")
And the error that is throwing is:
An exception was thrown from a UDF: 'mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: No experiment ID was specified. An experiment ID must be specified in Databricks Jobs and when logging to the MLflow server from outside the Databricks workspace. If using the Python fluent API, you can set an active experiment under which to create runs by calling mlflow.set_experiment("/path/to/experiment/in/workspace") at the start of your program.', from , line 32.
The pipeline just runs the notebook from ADF, it does not have any other step and the cluster we are using is type 7.3 ML.
Could you please help us?
Thank you in advance!
I think you need to set artifact URI and specify experiment ID (if in the artifact directory has much experiment ID
Reference: https://www.mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded

Unexpected close tag in aws cdk deploy

I am trying to create a CDK Code Construct for my python scripts, in stack i have added s3 and lambda.
When I am trying to execute cdk deploy, it is exiting after 0% progress or it is giving following error.
When i tried for s3 only it is working fine but when i added the lambda it is giving me error.
file_feed_lambda = _lambda.Function(
self, id='MyLambdaHandler001',
runtime=_lambda.Runtime.PYTHON_3_7,
code=_lambda.Code.asset('lambda'),
handler='lambda_function.lambda_handler',
)
bucket = s3.Bucket(self,
"FeedBucket-01")
Note : cdk diff and cdk synth are working properly
Apparently error which is showing is wrong i have updated the version of node and cdk to latest.
After update I have received the meaningful error which was socket time out.
After setting the proxy it worked for me.

Vault error in windows while configuring Oracle Database Plugin

I am trying to test for the first time Vault with Oracle Database on windows.
I made first steps of the tutorials but while I execute
vault write D:\Applications\vault_1.4.0_windows_amd64\oracle-database-plugin sha256="DEDDFSQ23EF" command=vault-plugin-database-oracle
I have got an error
Error writing data to D:\Applications\vault_1.4.0_windows_amd64\oracle-database-plugin: Error making API request.
URL: PUT http://127.0.0.1:8200/v1/D:%5CApplications%5Cvault_1.4.0_windows_amd64%5Coracle-database-plugin
Code: 404. Errors:
* no handler for route 'D:\Applications\vault_1.4.0_windows_amd64\oracle-database-plugin' here
I can't understand what's the issue.
what I did to fix this:
Inside Vault directory, I have created plugins directory where I have dwnloaded the right oracle plugins. Then I have used this command on cmd or Powershell on windows
vault write sys/plugins/catalog/database/oracle-database-plugin sha256="8A8ABE17E7A2A75BD871RE43D" command=vault-plugin-database-oracle

RDS OptionGroup not working while creating it from via cloudformation for SQL Server

I am trying to create an rds option group for RDS SQlserver independently via cloud formation while creating it is getting failed with the below error. The same when I am created with the same parameters it is able to create. Any pointers would be very helpful.
SqlServerOptionGroup:
Type: AWS::RDS::OptionGroup
Properties:
EngineName: "sqlserver-ex"
MajorEngineVersion: "14.0.0"
OptionGroupDescription: rds-sql-optiongroup
OptionConfigurations:
- OptionName: SQLSERVER_BACKUP_RESTORE
Error:
Cannot find major version 14.0.0 for sqlserver-ex (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination
The same when I have created via console it is getting created
Try "14.00" for MajorEngineVersion.
I also found you need to quote the EngineName and MajorEngineVersion, which you have done.

Unable to login to screwdriver.cd via github

Every time I try logging in to https://cd.screwdriver.cd/login I am getting {"statusCode":403,"error":"Forbidden","message":"User github:tannupriyasingh is not allowed access"} error.
I tried adding webhook to Github with steps mentioned here https://developer.github.com/webhooks/creating/ and running into Tunnel 541e163b.ngrok.io not found response
I am expecting to login and create a deployment pipeline in screwdriver-cd.
https://cd.screwdriver.cd is our demo Screwdriver cluster used for deploying open source Screwdriver. We currently do not host any instances for public use. You can login with Guest Access to look at the UI in cd.screwdriver.cd, but you'll need to run your own instance of Screwdriver in order to create a deployment pipeline.
A couple options for running your own instance:
Helm chart: https://docs.screwdriver.cd/cluster-management/helm
Docker compose: https://docs.screwdriver.cd/cluster-management/running-locally