aws-sdk-go getting NoCredentialProviders when using kube2iam - aws-sdk-go

We are using kube2iam to pass ec2 roles inside containers. Occasionally we get:
Error: Failed to list store contents: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
but then if we restart the container - credentials are picked up.
Seems like we are hitting an issue when kube2iam takes time to pass the credentials.
I did find this commit that makes the timeout even shorter.
So question is:
Is there a way to introduce delayed retries in aws-sdk-go similar to AWS_METADATA_SERVICE_TIMEOUT/AWS_METADATA_SERVICE_NUM_ATTEMPTS?

The Go AWS SDK does not have a direct notion of AWS_METADATA_SERVICE_TIMEOUT/AWS_METADATA_SERVICE_NUM_ATTEMPTS, but it does have a similar concept.
By default, when a EC2Metadata type is created (which is used by the default credentials chain), a timeout override of 1 second is set on the http.Client. If your kube2iam does not respond within that time, the credential call will fail.
You can disable this behavior using the EC2MetadataDisableTimeoutOverride option on your aws.Config when creating the session.Session. When doing this, the timeout will match the behavior of http.DefaultClient.
Example:
cfg := aws.NewConfig().
WithEC2MetadataDisableTimeoutOverride(true).
WithCredentialsChainVerboseErrors(true)
sess, _ := session.NewSession(cfg)
db := dynamodb.New(sess) // etc.

Related

OpenSearch 1.3 > 2.3 upgrade, CloudFormation fails on domain update

I recently updated our CDK code to move our OpenSearch cluster from version 1.3 to 2.3. The cluster itself seems to have upgraded to a healthy state and is still accessible / usable by our application, but CloudFormation failed when attempting to update our domain resource with:
Resource handler returned message: "Resource handler returned message: "Invalid request provided: DP Nodes are OOS, Tags operation is not allowed"
This kicked the stack into UPDATE_ROLLBACK_FAILED, which is not allowed. The cluster cannot be downgraded back to 1.3.
I'm struggling to find any information about this error it's kicking out and not quite sure how to resolve it to unblock the CloudFormation stack.
Things I have tried:
Digging through CloudWatch logs only revealed information pertaining to queries.
Forcing the rollback to occur without Domain resource. This got me back to an UPDATE_COMPLETE state, but each subsequent deploy of this stack will cause it to fail again since the core issue is not resolved.
This was an odd presentation of a permissions issue. As I was reading through some docs, I stumbled upon this section, which discusses changes to tag-based access control.
This lead me start looking into CloudTrail a bit and stumbled upon the exact error that was firing when this deploy happened. It was a little odd because the assumed role granted admin access to CloudFormation, but the last line of this event record caught my eye:
"sourceIPAddress": "cloudformation.amazonaws.com",
"userAgent": "cloudformation.amazonaws.com",
"errorCode": "ValidationException",
"errorMessage": "DP Nodes are OOS, Tags operation is not allowed",
"eventSource": "es.amazonaws.com",
Upon adding es.amazonaws.com to the trust relationship of that role, the deploy fully re-ran successfully.
Hopefully this helps someone else.

Failure/timeout invoking Lambda locally with SAM

I'm trying to get a local env to run/debug Python Lambdas with VSCode (windows). I'm using a provided HelloWorld example to get the hang of this but I'm not being able to invoke.
Steps used to setup SAM and invoke the Lambda:
I have Docker installed and running
I have installed the SAM CLI
My AWS credentials are in place and working
I have no connectivity issues and I'm able to connect to AWS normally
I create the SAM application (HelloWorld) with all the files and resources, I didn't change anything.
I run "sam build" and it finishes sucessfully
I run "sam local invoke" and it fails with timeout. I increased the timeout to 10s, still times out. The HelloWorld Lambda code only prints and does nothing else, so I'm guessing the code isn't the problem, but something else relating to the container or the SAM env itself.
C:\xxxxxxx\lambda-python3.8>sam build Your template contains a
resource with logical ID "ServerlessRestApi", which is a reserved
logical ID in AWS SAM. It could result in unexpected behaviors and is not recommended.
Building codeuri:
C:\xxxxxxx\lambda-python3.8\hello_world runtime: python3.8 metadata:
{} architecture: x86_64 functions: ['HelloWorldFunction'] Running
PythonPipBuilder:ResolveDependencies Running
PythonPipBuilder:CopySource
Build Succeeded
Built Artifacts : .aws-sam\build Built Template :
.aws-sam\build\template.yaml
C:\xxxxxxx\lambda-python3.8>sam local invoke Invoking
app.lambda_handler (python3.8) Skip pulling image and use local one:
public.ecr.aws/sam/emulation-python3.8:rapid-1.51.0-x86_64.
Mounting C:\xxxxxxx\lambda-python3.8.aws-sam\build\HelloWorldFunction
as /var/task:ro,delegated inside runtime container Function
'HelloWorldFunction' timed out after 10 seconds
No response from invoke container for HelloWorldFunction
Any hints on what's missing here?
Thanks.
Mostly, a lambda function gets timed out because of some resource dependency. Are you using any external resource, maybe db connection or some REST API call ?
Please put more prints in lambda_handler(your function handler), before calling any resource, then you might know where exactly it is waiting. Also increase the timeout to 1 minute or more because most of the external resource call over HTTPS will have 30 secs timeouts.
The log suggests that either the container wasn't started, or SAM couldn't connect to it.
Sometimes the hostname resolution on Windows can be affected by hosts file or system settings.
Try running the invoke command as follows (this will make the container ports bind to all interfaces):
sam local invoke --container-host-interface 0.0.0.0
...additionally try setting the container-host parameter (set to localhost by default):
sam local invoke --container-host-interface 0.0.0.0 --container-host host.docker.internal
The next piece of puzzle is incorporating these settings into VSCODE. This can to be done in two places:
create samconfig.toml in the root dir of the project with the following contents. This will allow running sam local invoke from the terminal without having to add the command line argument:
version=0.1
[default.local_invoke.parameters]
container_host_interface = "0.0.0.0"
update launch configuration as follows to enable VSCode debugging:
...
"sam": {
"localArguments": ["--container-host-interface","0.0.0.0"]
}
...

Updating a CloudFormation stack with a Cognito pool claims that we're adding attributes when we're not

Starting on Nov 7, 2018 we started getting the following error when updating our CloudFormation stacks:
Updating user pool schema is not allowed from cloudformation. Use the
AddCustomAttributes API or the AWS Cognito Console to update user pool
schema.
Our CF stacks don't have any changes to the custom attributes of the Cognito pool. They only have changes to the PostConfirmation and CustomMessage triggers, as well the addition of API Gateway responses.
Does anybody know why we might be seeing this? How can we avoid this error message?
We had the same problem with deployment. For now we are deploying it without CustomMessage trigger and setting CustomMessage trigger manually after deployment.
we removed the CustomMessage changes from our template and that seemed to do the trick.
Mostly by luck, I've found an answer that allows me to get around this in an automated manner.
How our scripts used to work
First, let me explain how this used to work. I used to have the following set of cloudFormation scripts:
cognitoSetup.template --> <Serverless Framework> --> <cognitoSetup.template updated with triggers>
So we'd setup the Cognito pool, run the Serverless Framework to add the Cognito Lambda functions, and then update the cognitoSetup.template file with the ARNs for the lambdas exported when the Serverless Framework ran.
The Fix
Now, we include the ARNs for the Lambdas in the cognitoSetup.template. So now cognitoSetup.template looks like this:
"CognitoUserPool": {
"Type": "AWS::Cognito::UserPool"
...
"Properties": {
...
"LambdaConfig": {
"CustomMessage": "arn:aws:lambda:<our aws region>:<our account#>:function:main-<our stage>-onCognitoCustomMessage"
}
}
Note, we're setting this trigger before the lambda even exists. The trigger just needs an ARN, and it doesn't seem to care that it's not there yet. Then we run sls deploy which creates the actual Lambda function and everything works fine.
Now our scripts look like this:
cognitoSetup.template --> <Serverless Framework>
Why does this fix this error? I don't actually know. CloudFormation seems to be fine with this modification but not okay with modifying the same file later in our process. But it works.

spring.cloud.vault.fail-fast - does it still work?

We use Spring cloud vault as a credential store in our environment.
The fail fast option does not work when the Vault url returns 404 - no exception is thrown but the application continues to start and goes ahead with Spring default password (since no credentials have been fetched from Vault).
I was checking the logs, searched a little bit and found the following:
https://github.com/spring-projects/spring-vault/commit/5078a4c133211adb1bb3642cc867b40deed7b0f0
This takes away the VaultClient - does this mean it also takes away the fail-fast option?
https://github.com/spring-cloud/spring-cloud-vault/issues/143

Using multiple keytabs in Kerberos JGSS without using JAAS .conf file

I want to use multiple keytabs in multiple server threads. I don't want to use JAAS conf file so i implemented my own login configuration in LoginConfiguration class. The getGSSCredentials() function in KerberosLogin class is used to get the Credentials by giving keytab location as parameter.
KerberosLogin -> http://ideone.com/vaip3H
LoginConfiguration -> http://ideone.com/jDqlN0
When i ran only two server threads, first one was able to get the credentials from its keytab ( both the server threads use different service principal ) while second one failed. Somehow using parms.put("refreshKrb5Config","true"); in LoginConfiguration solved the problem.
I am not able to understand why it's not working without refreshing the configuration and for cases in which there will be several such server threads will it be safe to use. Is there any better way to use multiple keytabs ?
It was due to the way java6 handles login config, it has been fixed in java7.