Go Stackdriver debugger error loading program - kubernetes

I am trying to set up Stackdriver debugging using Go. Using the article and this great medium post I came up with this solution.
Key parts, in cloudbuild.yaml
- name: gcr.io/cloud-builders/wget
args: [
"-O",
"go-cloud-debug",
"https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug"
]
...
Dockerfile I have
...
COPY gopath/bin/stackdriver-demo /stackdriver-demo
ADD go-cloud-debug /
ADD source-context.json /
CMD ["/go-cloud-debug","-sourcecontext=./source-context.json", "-appmodule=go-errrep","-appversion=1.0","--","/stackdriver-demo"]
...
However the pods keeps crashing, the container logs show this error:
Error loading program: decoding dwarf section info at offset 0x0: too short
EDIT: Using https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug may be outdated as I haven't seen it used outside Daz's medium post. The official docs uses the package cloud.google.com/go/cmd/go-cloud-debug-agent
I have update cloudbuild.yaml file to install this package:
- name: 'gcr.io/cloud-builders/go'
args: ["get", "-u", "cloud.google.com/go/cmd/go-cloud-debug-agent"]
env: ['PROJECT_ROOT=github.com/roberson34/stackdriver-demo', 'CGO_ENABLED=0', 'GOOS=linux']
- name: 'gcr.io/cloud-builders/go'
args: ["install", "cloud.google.com/go/cmd/go-cloud-debug-agent"]
env: ['PROJECT_ROOT=github.com/roberson34/stackdriver-demo', 'CGO_ENABLED=0', 'GOOS=linux']
And in the Dockerfile I can get access to the binary in gopath/bin/go-cloud-debug-agent
When I execute the gopath/bin/go-cloud-debug-agent with my own program as an argument:
/go-cloud-debug-agent -sourcecontext=./source-context.json -appmodule=go-errrep -appversion=1.0 -- /stackdriver-demo
I get another opaque error:
Error loading program: AttrStmtList not present or not int64 for unit 88
So basically using the cloud-debug binary from https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug and cloud-debug-agent binary from the package cloud.google.com/go/cmd/go-cloud-debug-agent both don't work and give different errors.
Would appreciate any tips on what I'm doing wrong and how to fix it.

OK :-)
Yes, you should follow the current Stackdriver documentation, e.g. go-cloud-debug-agent
Unfortunately, there are now various issues with my post including a (currently broken) gcr.io/cloud-builders/kubectl for regions.
I think your issue pertains to your use of golang:alpine. Alpine uses musl rather than the glibc that you find on most other Linux distro's and so, you really must compile for Alpine to ensure your binaries reference the correct libc.
I'm able to get your solution working primarily by switching your Dockerfile to pull the Cloud Debug Agent while on Alpine and to compile your source on Alpine:
FROM golang:alpine
RUN apk add git
RUN go get -u cloud.google.com/go/cmd/go-cloud-debug-agent
ADD main.go src
RUN CGO_ENABLED=0 go build -gcflags=all='-N -l' src/main.go
ADD source-context.json /
CMD ["bin/go-cloud-debug-agent","-sourcecontext=/source-context.json", "-appmodule=stackdriver-demo","-appversion=1.0","--","main"]
I think that should get you beyond the errors that you documented and you should be able to deploy your container to Kubernetes.
I've made my version of your image publicly available (and will retain it for a few days for you):
gcr.io/dazwilkin-190402-55473323/roberson34#sha256:17cb45f1320e2fe04e0681310506f4c229896429192b0d1c2c8dc20ed54adb0d
You may wish to reference it (by that digest) in your deployment.yaml
NB For Error Reporting to be "interesting", your code needs to generate errors and, with your example, this is going to be challenging (usually a good thing). You may consider adding another errorful handler that always results in errors so that you may test the service.

Related

Failure/timeout invoking Lambda locally with SAM

I'm trying to get a local env to run/debug Python Lambdas with VSCode (windows). I'm using a provided HelloWorld example to get the hang of this but I'm not being able to invoke.
Steps used to setup SAM and invoke the Lambda:
I have Docker installed and running
I have installed the SAM CLI
My AWS credentials are in place and working
I have no connectivity issues and I'm able to connect to AWS normally
I create the SAM application (HelloWorld) with all the files and resources, I didn't change anything.
I run "sam build" and it finishes sucessfully
I run "sam local invoke" and it fails with timeout. I increased the timeout to 10s, still times out. The HelloWorld Lambda code only prints and does nothing else, so I'm guessing the code isn't the problem, but something else relating to the container or the SAM env itself.
C:\xxxxxxx\lambda-python3.8>sam build Your template contains a
resource with logical ID "ServerlessRestApi", which is a reserved
logical ID in AWS SAM. It could result in unexpected behaviors and is not recommended.
Building codeuri:
C:\xxxxxxx\lambda-python3.8\hello_world runtime: python3.8 metadata:
{} architecture: x86_64 functions: ['HelloWorldFunction'] Running
PythonPipBuilder:ResolveDependencies Running
PythonPipBuilder:CopySource
Build Succeeded
Built Artifacts : .aws-sam\build Built Template :
.aws-sam\build\template.yaml
C:\xxxxxxx\lambda-python3.8>sam local invoke Invoking
app.lambda_handler (python3.8) Skip pulling image and use local one:
public.ecr.aws/sam/emulation-python3.8:rapid-1.51.0-x86_64.
Mounting C:\xxxxxxx\lambda-python3.8.aws-sam\build\HelloWorldFunction
as /var/task:ro,delegated inside runtime container Function
'HelloWorldFunction' timed out after 10 seconds
No response from invoke container for HelloWorldFunction
Any hints on what's missing here?
Thanks.
Mostly, a lambda function gets timed out because of some resource dependency. Are you using any external resource, maybe db connection or some REST API call ?
Please put more prints in lambda_handler(your function handler), before calling any resource, then you might know where exactly it is waiting. Also increase the timeout to 1 minute or more because most of the external resource call over HTTPS will have 30 secs timeouts.
The log suggests that either the container wasn't started, or SAM couldn't connect to it.
Sometimes the hostname resolution on Windows can be affected by hosts file or system settings.
Try running the invoke command as follows (this will make the container ports bind to all interfaces):
sam local invoke --container-host-interface 0.0.0.0
...additionally try setting the container-host parameter (set to localhost by default):
sam local invoke --container-host-interface 0.0.0.0 --container-host host.docker.internal
The next piece of puzzle is incorporating these settings into VSCODE. This can to be done in two places:
create samconfig.toml in the root dir of the project with the following contents. This will allow running sam local invoke from the terminal without having to add the command line argument:
version=0.1
[default.local_invoke.parameters]
container_host_interface = "0.0.0.0"
update launch configuration as follows to enable VSCode debugging:
...
"sam": {
"localArguments": ["--container-host-interface","0.0.0.0"]
}
...

How can i access the Open Policy Agent Command Line via Docker Desktop in Windows 10

I am attempting to learn the various features of something called Open Policy Agent because I think it may be a useful tool in a microservices based application.
Here is a link to the 'Running with Docker' section of the documentation for this application: https://www.openpolicyagent.org/docs/latest/deployments/#running-with-docker
Currently, I am running Docker using the Docker Desktop in a Windows 10 environment and I already have a docker-compose file set up for my main application which includes various docker images. My thoughts were that I could simply add the latest openpolicyagent image as well as the openpolicyagent demo-restful api so that I could begin learning about the service. To do this, I added the following lines to my docker-compose.yml:
opa:
image: openpolicyagent/opa:0.34.2
ports:
- 8181:8181
command:
- "run"
- "--server"
- "--log-level=debug"
- "api_authz.rego"
volumes:
- C:\Sites\prosaurus\policy\api_authz.rego:/api_authz.rego
api_server:
image: openpolicyagent/demo-restful-api:latest
ports:
- 5000:5000
environment:
- OPA_ADDR=http://opa:8181
- POLICY_PATH=/v1/data/httpapi/authz
This appears to have worked in that I can go to localhost:8181 and i see the Query and Input Data (JSON) boxes as I presume is supposed to happen, however I would like to test some of the command line functions as are mentioned here:
https://www.openpolicyagent.org/docs/latest/#2-try-opa-eval
However I can not seem to access the command line of the docker container which is running the OPA agent. The way I have attempted this is via the Docker Desktop application GUI in Windows. In this application I can see all of the docker instances which are running and each one has an option to run the CLI (you click the button and the cli opens). They all work except for the OPA one. When I click on that one a cmd window opens for a split second, displays something too fast for me to read it and then closes.
What have I done wrong?
OPA can be run in a few different ways, and opa eval is distinctly different from running OPA as a server, i.e. opa run --server.
When you run OPA as a server - which is how you'd normally run OPA in production - you query OPA for policy decisions through OPA's REST API.
opa eval on the other hand is more like a Swiss army knife of OPA, allowing you to quickly evaluate a rule or expression given some provided policy and data.
You can think of them as two entirely different tools.

Cloud SQL API [sql-component.googleapis.com] not enabled on project

I am running a cloud build trigger on a cloudbuid.yaml file in which I build a docker container and then deploy it to cloud run. The error stacktrace is as follows:
API [sql-component.googleapis.com] not enabled on project
The problem is that I have enabled both SQL and SQL Admin APIs in both projects (one for the cloud build and one for the database), which was confirmed in the console and in gcloud.
Here is the yaml code for the step I am referring to:
- name: 'gcr.io/cloud-builders/gcloud'
args: [
'beta',
'run',
'deploy',
'MY_NAME',
'--image', 'gcr.io/MY_PROJECT/MY_IMAGE',
'--region', 'MY_REGION',
'--platform', 'managed',
'--set-cloudsql-instances', 'MY_CONNECTION_NAME',
'--set-env-vars', 'NODE_ENV=production,INSTANCE_CONNECTION_NAME=MY_CONNECTION_NAME,SQL_USER=MY_USER,SQL_PASSWORD=MY_PASSWORD,SQL_NAME=MY_SCHEMA,TOPIC_NAME=MY_TOPIC'
]
Any suggestions?
Thanks.
P.S.: As per Eespinola suggestion, I checked and confirmed I am running Google Cloud SDK 254.0.0.
P.S. 2: I have also tried to create a project from scratch but ended up with the same results.
Ok so as per the same thread eespinola posted (see above), the Cloud Build gcloud step will be updated according to Cloud SDK 254.0.0 update in a near future (the actual date may or may not be posted in the same thread in the future). Until then, the alternative is to use the YAML file without the --add-cloudsql-instances flag and add it manually in the UI (I still have not tried this but it should work as per Google's development team).

Problems when using Chapel 1.19 along with GASNet PSM (OmniPath) substrate

After Changing to version 1.19, but using Omnipath implementation, I'm randomly receiving the following error: ERROR calling: gasnet_barrier_try(id, 0).
I know that the Omnipath implementation of GASNet is no longer supported by the current version of Chapel. However, I would like to use some features available only in version 1.19, and the cluster I use runs over an Omnipath network.
In order to use the PSM substrate (OmniPath), I proceed as suggested by Chapel's Gitter community:
export CHPL_GASNET_ALLOW_BAD_SUBSTRATE=true
wget https://gasnet.lbl.gov/download/GASNet-1.32.0.tar.gz
tar xzf GASNet-1.32.0.tar.gz
rm -rf $CHPL_HOME/third-party/gasnet/gasnet-src
mv GASNet-1.32.0 $CHPL_HOME/third-party/gasnet/gasnet-src
Then, I setup other variables:
export CHPL_COMM='gasnet'
export CHPL_LAUNCHER='gasnetrun_psm'
export CHPL_COMM_SUBSTRATE='psm'
export CHPL_GASNET_SEGMENT='everything'
export CHPL_TARGET_CPU='native'
export GASNET_PSM_SPAWNER='ssh'
export HFI_NO_CPUAFFINITY=1
Next, I build the runtime, etc.
However, when I run experiments, I randomly receive the following error:
ERROR calling: gasnet_barrier_try(id, 0)
at: comm-gasnet.c:1020
error: GASNET_ERR_BARRIER_MISMATCH (Barrier id's mismatched)
Which finishes the execution of the program.
I cannot find in GASNet documentation the reason for this error. I could only find a bit of information on GASNet's code.
Do you know what's the cause of this problem?
Thank you all.
I realize this is an old question, but for the record the current version of Chapel (1.28.0) now embeds a version of GASNet (GASNet-EX 2022.3.0 as of this writing) that provides CHPL_COMM=gasnet CHPL_COMM_SUBSTRATE=ofi (aka GASNet ofi-conduit) that provides high-quality support for Intel Omni-Path.
In particular, there should no longer be any reason to clobber Chapel's embedded version of GASNet-EX with an ancient/outdated GASNet-1 to get Omni-Path support, as suggested in the original question.
For more details see Chapel's detailed Omni-Path instructions.

I cannot just deploy a function with Serverless-framework 1.20.2

I wanted to follow these tips
and just redeploy my function, as the serverless.yml had not been changed.
However, it just hangs on the Serverless: Uploading function stage. Forever, apparently.
The whole deploy (with sls deploy) works, though slowly.
How can debug this, as there is apparently no error message?
EDIT
When I use sls deploy my project takes about 4 min and 15s to deploy.
It seems rather long to me, so I thought I would use sls deploy function -f myFunction instead, which is supposed to be much faster.
However, when I try sls deploy function -f myFunction, it seems to just hang forever on Serverless: Uploading function: myFunction.
I have no idea how to debug that.
It seems using 'verbose', with Serverless: Uploading function: myFunction --verbose does not make a difference, the messages returned are the same.
I will try to wait and see if, eventually, the function deploy completes...
Well, I waited, and it doesn't: after about 8 min 30s I get the following error message:
Serverless Error ---------------------------------------
Connection timed out after 120000ms
Get Support --------------------------------------------
Docs: docs.serverless.com
Bugs: github.com/serverless/serverless/issues
Forums: forum.serverless.com
Chat: gitter.im/serverless/serverless
Your Environment Information -----------------------------
OS: linux
Node Version: 7.10.0
Serverless Version: 1.20.2
Another oddity: when hanging, it reads:
Serverless: Uploading function: myFunction (12.05 MB)...
But the function itself is just 3.2 kB, and does not include any packages.
When I use sls deploy, the size displayed is the same:
Serverless: Uploading service .zip file to S3 (12.05 MB)...
What could be wrong with my function deploy?
EDIT 2
As #dashmug hinted, there is a config issue in serverless.yml.
In the functions dir of my serverless project, I would like to have a common package.json and node_modules. Then each function could import modules as needed.
I tried to follow the official guide.
My serverless.yml is like so:
functions:
myFunction:
package:
exclude:
- 'functions/node_modules/**'
- '!functions/node_modules/module1_I_want_to_include/**'
- '!functions/node_modules/module2_I_want_to_include/**'
Now I get, with sls deploy:
Serverless: Uploading service .zip file to S3 (31.02 MB)...
and the function works :)
However, with sls deploy function -f myFunction, I get:
Serverless: Uploading function: dispatch (1.65 MB)...
It does upload in a reasonable time, but the function now gives the following error:
Unable to import module 'functions/myFunction': Error
Things I would look at:
Try comparing what happens between the two:
$ SLS_DEBUG=true sls deploy --verbose
and
$ SLS_DEBUG=true sls deploy function -f myFunction --verbose
Check your serverless config (packaging, etc.) against your project structure. One red flag is that the function deploy is as big as the service deploy. This could be a misconfiguration problem.
Use serverless package to see how the package(s) are zipped. It can provide some clues.
Are you using any plugins which may have altered the way your package is created?
How many node_modules directory do you have? Do you have only one for the entire service or one for each function?
You can make the deploy process more verbose by passing the --verbose argument to the deploy function.
Either sls deploy --verbose or sls deploy -v will do the trick.
I wasn't able to figure out why function deployment (as opposed to service deployment) would hang. I may have misconfigured my serverless.yml file.
But no big deal: I can do without sls deploy function -myFunction.
Because my expectations were wrong. I thought deploying a function would be way faster than deploying a service, by somehow not redeploying the node_modules directory.
But there is no partial function deployment in AWS: when a function is deployed, all necessary node modules must be deployed as well for the function to work.
As explained in serverless doc:
The Framework packages up the targeted AWS Lambda Function into a zip file.
The Framework fetches the hash of the already uploaded function .zip file and compares it to the local .zip file hash.
The Framework terminates if both hashes are the same.
That zip file is uploaded to your S3 bucket using the same name as the previous function, which the CloudFormation stack is pointing to.
I had (naively) hoped that only the updated handler would be uploaded to S3.
But as the function is packaged before deployment, it does need all of its modules and dependencies.
So the way I see it, function deployment would save time (as opposed to service deployment) only if the service has multiple functions, and the service functions do not use many common nodejs modules. And if sls deploy function -f myFunction does not hang, that is :)
So to increase development speed, the trick is to use offline emulation with a tool like serverless offline
serverless offline provides a local server, and lambda function myFunction becomes accessible locally, by calling http://localhost:3000/myFunction in Postman or the browser
In most cases, sls deploy can be called only once, after the handler has been thoroughly tested offline.