Failed to enroll admin, error:%o message=Calling enroll endpoint failed, CONNECTION Timeout - kubernetes

I am running my fabric network on kubernetes and I have setup ca servers for all the organisations. I am able to register and enroll the user from the cli but when i am using the fabric-ca-client library with nodejs to register and enroll the users. I am facing the CONNECTION Timeout issue, also at the same time if I look at the logs of my ca-server it show that is able to process the request.
Edit1: I am using the same code provided in fabric-sample to register and enroll the users.
All the all the pods are communicating with each other using these services in kubernetes
this is how my connection profile looks
"certificateAuthorities": {
"ca-org2": {
"url": "https://ca-org2:8054",
"caName": "ca-org2",
"tlsCACerts": {
"pem": ["-----BEGIN CERTIFICATE-----\nMIICBjCCAa2gAwIBAgIUHwBYatG6KhezYWHxdGgYGqs77PIwCgYIKoZIzj0EAwIw\nYDELMAkGA1UEBhMCVUsxEjAQBgNVBAgTCUhhbXBzaGlyZTEQMA4GA1UEBxMHSHVy\nc2xleTEZMBcGA1UEChMQb3JnMi5leGFtcGxlLmNvbTEQMA4GA1UEAxMHY2Etb3Jn\nMjAeFw0yMTAzMjAxMDI4MDBaFw0zNjAzMTYxMDI4MDBaMGAxCzAJBgNVBAYTAlVL\nMRIwEAYDVQQIEwlIYW1wc2hpcmUxEDAOBgNVBAcTB0h1cnNsZXkxGTAXBgNVBAoT\nEG9yZzIuZXhhbXBsZS5jb20xEDAOBgNVBAMTB2NhLW9yZzIwWTATBgcqhkjOPQIB\nBggqhkjOPQMBBwNCAAQUIABkRhfPdwoy2QrCY3oh8ZuzP5OprZJawVXO2ojid3j4\nC9W4l46QXR5J7iG5MLczguPZWB9dZWygRQdUQeoAo0UwQzAOBgNVHQ8BAf8EBAMC\nAQYwEgYDVR0TAQH/BAgwBgEB/wIBATAdBgNVHQ4EFgQURx/h3nkH0fq+3TlRPnQW\nWTHbR7YwCgYIKoZIzj0EAwIDRwAwRAIgCF+vcLFERb+VHa6Att0rh5yhpMd0bHEn\nmkNo0YfKuX4CICodtpp6AKtNWXreskaN+kRMH8eDmwvxkhvTK68ejv8U\n-----END CERTIFICATE-----\n"]
},
"httpOptions": {
"verify": false
}
}
}

I found the solution to this issue. The issue was related to the connection timeout, my CA Server was receving the requests and able to process them also but due to the short timeout the request was being cancelled. The solution was to increase the connection timeout and request-timeout. The default value of timeouts is 3s and I increased it to 30s and it started working. The default configuration can be found here
{
"request-timeout" : 3000,
"tcert-batch-size" : 10,
"crypto-hash-algo": "SHA2",
"crypto-keysize": 256,
"crypto-hsm": false,
"connection-timeout": 3000
}
we can update the timeout values from source code of the fabric-ca-client library or simply can use the methods of fabric-common library to update the these configuration values like this.
const { Utils: utils } = require('fabric-common');
const path=require('path');
let config=utils.getConfig()
config.file(path.resolve(__dirname,'config.json'))
And here is our modified configuration file config.json
{
"request-timeout" : 30000,
"tcert-batch-size" : 10,
"crypto-hash-algo": "SHA2",
"crypto-keysize": 256,
"crypto-hsm": false,
"connection-timeout": 30000
}

Related

Linkerd inbound port annotation leads to "Failed to bind inbound listener"

We are using Linkerd 2.11.1 on Azure AKS Kubernetes. Amongst others there is a Deployment using using an Alpine Linux image containing Apache/mod_php/PHP8 serving an API. HTTPS is resolved by Traefik v2 with cert-manager, so that in coming traffic to the APIs is on port 80. The Linkerd proxy container is injected as a Sidecar.
Recently I saw that the API containers return 504 errors during a short period of time when doing a Rolling deployment. In the Sidecars log, I found the following :
[ 0.000590s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[ 0.001062s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.001078s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.001081s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.001083s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[ 0.001085s] INFO ThreadId(01) linkerd2_proxy: Local identity is default.my-api.serviceaccount.identity.linkerd.cluster.local
[ 0.001088s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.001090s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.014676s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity: default.my-api.serviceaccount.identity.linkerd.cluster.local
[ 3674.769855s] INFO ThreadId(01) inbound:server{port=80}: linkerd_app_inbound::detect: Handling connection as opaque timeout=linkerd_proxy_http::version::Version protocol detection timed out after 10s
My guess is that this detection leads to the 504 errors somehow. However, if I add the linkerd inbound port annotation to the pod template (terraform syntax):
resource "kubernetes_deployment" "my_api" {
metadata {
name = "my-api"
namespace = "my-api"
labels = {
app = "my-api"
}
}
spec {
replicas = 20
selector {
match_labels = {
app = "my-api"
}
}
template {
metadata {
labels = {
app = "my-api"
}
annotations = {
"config.linkerd.io/inbound-port" = "80"
}
}
I get the following:
time="2022-03-01T14:56:44Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8"
time="2022-03-01T14:56:44Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der"
[ 0.000547s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
thread 'main' panicked at 'Failed to bind inbound listener: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', /github/workspace/linkerd/app/src/lib.rs:195:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Can somebody tell me why it fails to bind the inbound listener?
Any help is much appreciated,
thanks,
Pascal
Found it : Kubernetes sends asynchronuously requests to shutdown the pods and to no longer send traffic to them. And if the pod shuts down faster than it's removal from the IP lists, it can receive requests when already being dead.
To fix this, I added a preStop lifecycle hook to the application container:
lifecycle {
pre_stop {
exec {
command = ["/bin/sh", "-c" , "sleep 5"]
}
}
}
and the following annotation to pod template :
annotations = {
"config.alpha.linkerd.io/proxy-wait-before-exit-seconds" = "10"
}
Documented here :
https://linkerd.io/2.11/tasks/graceful-shutdown/
and here :
https://blog.gruntwork.io/delaying-shutdown-to-wait-for-pod-deletion-propagation-445f779a8304
annotations = {
"config.linkerd.io/inbound-port" = "80"
}
I don't think you want this setting. Linkerd will transparently proxy connections without you setting anything.
This setting configures Linkerd's proxy to try to listen on port 80. This would likely conflict with your web server's port configuration; but the specific error you're hitting is that the Linkerd proxy does not run as root and so it does not have permission to bind port 80.
I'd expect it all to work if you removed that annotation :)

NestJS Mongoose connection dies on load testing

When multiple devs use my API, multiple concurrent requests are being sent to Mongoose.
When the concurrency is high, the connection just "dies" and refuses to fulfil any new request, no matter how long I wait (hours!).
I just want to state everything is working fine on regular use. Heavy use leads the connection to crash.
My MongooseModule initialization:
MongooseModule.forRoot(DatabasesService.MONGO_FULL_URL, {
useNewUrlParser: true,
useUnifiedTopology: true,
useFindAndModify: false,
autoEncryption: {
keyVaultNamespace: DatabasesService.keyVaultNamespace,
kmsProviders: DatabasesService.kmsProviders,
extraOptions: {
mongocryptdSpawnArgs: ['--pidfilepath', '/tmp/mongocryptd.pid']
}
} as any
})
Module that imports the feature:
#Module({
imports: [MongooseModule.forFeature([{ name: 'modelName', schema: ModelNameSchema }])],
providers: [ModelNameService],
controllers: [...],
exports: [...]
})
Service:
#Injectable()
export class ModelNameService {
constructor(
#InjectModel('modelName') private modelName: Model<IModelName>
) {}
async findAll(): Promise<IModelName[]> {
const result: IModelName[] = await this.modelName.find().exec();
if (!result) throw new BadRequestException(`No result was found.`);
return result;
}
}
I've tried loadtesting using different utils, the easiest was:
ab -c 200 -n 300 -H "Authorization: Bearer $TOKEN" -m GET -b 0 https://example.com/getModelName
Any new request after the connection hangs gets stuck at ModelNameService.findAll() first line (the request to mongo).
On mongodb logs with verbosity of "-vvvvv" I can see few suspicious lines:
User Assertion: Unauthorized: command endSessions requires authentication src/mongo/db/commands.cpp
Cancelling outstanding I/O operations on connection to 127.0.0.1:33134
And I've also found that it doesn't exceed 12 open connections at the same time. It always waits to close one before opening a new one.
Other key points:
Mongoose doesn't return any value or notifies about any error. It just hangs without notifying anything.
Terminus health check able to ping the DB and returns a healthy status.
NestJS API still works - I'm able to send new requests and receive a response. Just requests that are related to the faulty connection hang.
When I inject connection and check its readyState it returns connected.
Restarting the API fixes it immediately.
MongoDB itself keeps working as normal.
Increasing Mongoose poolSize is able to handle more requests at the same time but will still crash on a larger amount of requests.
My main question here is how do I handle this case? Currently, I've added another health check to try and send a query to the problematic connection every half a minute, and k8s restarts the pod if it determines a failure. This works but it is not optimal.

How to create/start cluster from data bricks web activity by invoking databricks rest api

I have 2 requirements:
1:I have a clusterID. I need to start the cluster from a "Wb Activity" in ADF. The activity parameters look like this:
url:https://XXXX..azuredatabricks.net/api/2.0/clusters/start
body: {"cluster_id":"0311-004310-cars577"}
Authentication: Azure Key Vault Client Certificate
Upon running this activity I am encountering with below error:
"errorCode": "2108",
"message": "Error calling the endpoint
'https://xxxxx.azuredatabricks.net/api/2.0/clusters/start'. Response status code: ''. More
details:Exception message: 'Cannot find the requested object.\r\n'.\r\nNo response from the
endpoint. Possible causes: network connectivity, DNS failure, server certificate validation or
timeout.",
"failureType": "UserError",
"target": "GetADBToken",
"GetADBToken" is my activity name.
The above security mechanism is working for other Databricks related activity such a running jar which is already installed on my databricks cluster.
2: I want to create a new cluster with the below settings:
url:https://XXXX..azuredatabricks.net/api/2.0/clusters/create
body:{
"cluster_name": "my-cluster",
"spark_version": "5.3.x-scala2.11",
"node_type_id": "i3.xlarge",
"spark_conf": {
"spark.speculation": true
},
"num_workers": 2
}
Upon calling this api, if a cluster creation is successful I would like to capture the cluster id in the next activity.
So what would be the output of the above activity and how can I access them in an immediate ADF activity?
For #2 ) Can you please check if you change the version
"spark_version": "5.3.x-scala2.11"
to
"spark_version": "6.4.x-scala2.11"
if that helps

IBM BLUEMIX BLOCKCHAIN SDK-DEMO failing

I have been working with HFC SDK for Node.js and it used to work, but since last night I am having some problems.
When running helloblockchain.js only few times works, most time I get this error when it tries to enroll a new user:
E0113 11:56:05.983919636 5288 handshake.c:128] Security handshake failed: {"created":"#1484304965.983872199","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484304965.983866102","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
Error: Failed to register and enroll JohnDoe: Error
Other times, the enroll works and the failure appears deploying the chaincode:
Enrolled and registered JohnDoe successfully
Deploying chaincode ...
E0113 12:14:27.341527043 5455 handshake.c:128] Security handshake failed: {"created":"#1484306067.341430168","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306067.341421859","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
Failed to deploy chaincode: request={"fcn":"init","args":["a","100","b","200"],"chaincodePath":"chaincode","certificatePath":"/certs/peer/cert.pem"}, error={"error":{"code":14,"metadata":{"_internal_repr":{}}},"msg":"Error"}
Or:
Enrolled and registered JohnDoe successfully
Deploying chaincode ...
E0113 12:15:27.448867739 5483 handshake.c:128] Security handshake failed: {"created":"#1484306127.448692244","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306127.448668047","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
events.js:160
throw er; // Unhandled 'error' event
^
Error
at ClientDuplexStream._emitStatusIfDone (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:189:19)
at ClientDuplexStream._readsDone (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:158:8)
at readCallback (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:217:12)
E0113 12:15:27.563487641 5483 handshake.c:128] Security handshake failed: {"created":"#1484306127.563437122","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306127.563429661","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
This code worked yesterday, so I don't know what could be happening.
Does anybody know how can I fix it?
Thanks,
Javier.
ibm-bluemix
blockchain
These types of intermittent issues are usually related to GRPC. An initial suggestion is to ensure that you are using at least GRPC version 1.0.0.
If you are using a Mac, then the maximum number of open file descriptors should be checked (using ulimit -n). Sometimes this is initially set to a low value such as 256, so increasing the value could help.
There are a couple of GRPC issues with similar symptoms.
https://github.com/grpc/grpc/issues/8732
https://github.com/grpc/grpc/issues/8839
https://github.com/grpc/grpc/issues/8382
There is a grpc.initial_reconnect_backoff_ms property that is mentioned in some of these issues. Increasing the value past the 1000 ms level might help reduce the frequency of issues. Below are instructions for how the helloblockchain.js file can be modified to set this property to a higher value.
Open the helloblockchain.js file in the Hyperledger Fabric Client example and find the enrollAndRegisterUsers function.
Add “grpc.initial_reconnect_backoff_ms": 5000 to the setMemberServicesUrl call.
chain.setMemberServicesUrl(ca_url, {
pem: cert, "grpc.initial_reconnect_backoff_ms": 5000
});
Add “grpc.initial_reconnect_backoff_ms": 5000 to the addPeer call.
chain.addPeer("grpcs://" + peers[i].discovery_host + ":" + peers[i].discovery_port,
{pem: cert, "grpc.initial_reconnect_backoff_ms": 5000
});
Note that setting the grpc.initial_reconnect_backoff_ms property may reduce the frequency of issues, but it will not necessarily eliminate all issues.
The connection to the eventhub that is made in the helloblockchain.js file can also be a factor. There is an earlier version of the Hyperledger Fabric Client that does not utilize the eventhub. This earlier version could be tried to determine if this makes a difference. After running git clone https://github.com/IBM-Blockchain/SDK-Demo.git, run git checkout b7d5195 to use this prior level. Before running node helloblockchain.js from a Node.js command window, the git status command can be used to check the code level that is being used.

How debug akka association porcess?

Here is a scenario:
I have packaged scala project with spray into jar file.
Launch jar file on RedHat 6.5 on Virtual Box (ip - 192.168.1.38)
Launch jar file on RedHat 6.5 on Virtual Box (ip - 192.168.1.41)
Everything works locally - I can send REST request to each virtual machine and get response.
Problem
Akka systems can not became to cluster. I run 192.168.1.38 with default settings, but 192.168.1.41 have an additional property - akka.cluster.seed-nodes which is set to akka.tcp://mySystem#192.168.1.38:2551. So I get:
[WARN] [12/09/2014 17:10:24.043] [mySystem-akka.remote.default-remote-dispatcher-8] [akka.tcp://mySystem#192.168.1.41:2551/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FmySystem%40192.168.1.38%3A2551-0] Association with remote system [akka.tcp://mySystem#192.168.1.38:2551] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://mySystem#192.168.1.38:2551]].
No other errors or warning. Also how can I test akka association or print debug akka association settings?
Also can linux settings influence to akka association?
Most probably iptables is blocking particular port, if it's your test configuration just disable iptables.
service iptables save
service iptables stop
chkconfig iptables off
service ip6tables save
service ip6tables stop
chkconfig ip6tables off
If it will not help try to check you SELinux configuration using command getenforce and the same for test purposes you can completely disable it. SELinux manual
In case of your application.conf, try using following configuration for each node:
akka {
log-dead-letters = on
loglevel = "debug"
actor
{
provider = "akka.cluster.ClusterActorRefProvider"
}
extensions = ["akka.contrib.pattern.ClusterReceptionistExtension"]
remote {
log-remote-lifecycle-events = off
netty.tcp {
port = 6001
}
}
cluster {
seed-nodes = [
"akka.tcp://ActorSystem#192.168.1.38:6001",
"akka.tcp://ActorSystem#192.168.1.41:6001"
]
auto-down-unreachable-after = 10s
}
}
All the logs related to the cluster nodes are logged as info but having debug log level in test environment is in general good idea.
When the second, node will join the cluster, you should notice following log:
INFO [ActorSystem-akka.actor.default-dispatcher-4] [Cluster(akka://ActorSystem)] - Cluster Node [akka.tcp://ActorSystem#10.0.1.41:6001] - Marking node(s) as REACHABLE [Member(address = akka.tcp://ActorSystem#10.0.1.41:6001, status = Up)]
Cluster state could be also monitored using jmx akka.Cluster MXBean
{ "self-address": "akka.tcp://ActorSystem#10.0.1.82:6001", "members": [ { "address": "akka.tcp://ActorSystem#10.0.1.82:6001", "status": "Up" } ], "unreachable": [ ] }