The business network didn't upgrade on IBM cloud - ibm-cloud

I need to upgrade my hyperledger composer business network on IBM. I had already started one business network on IBM cloud. But once I upgrade my bna on IBM cloud it throws an error that is
Response from attempted peer comms was an error: Error: 8 RESOURCE_EXHAUSTED: received trailing metadata size exceeds limit

It could be you have run out of resources to start a new chaincode container. One thing you can try is to stop all your peers and then restart them. This should kill all the chaincode containers (ones which may still be hanging around from previous upgrades that are not used anymore) and would only start the latest chaincodes that are registered to channels once you make your first request, which can be your upgrade request.

Related

Node Creation Takes too long and failed

I installed Blockchain platform v2 beta then I tried to configure it and add nodes.
My Question is:
is there anyone faced long delay in node creation like CA node for example.
I faced this problem and cannot find from where I can check logs.
Notification Error Image:
Note:
the node did not be created till now since 2 days.
Here the link to the official IBP documentation where is explained how to retrieve and visualize logs.
IBM Blockchain Platform - Viewing your node logs
I also suggest you to check if there is any issue in your kubernetes cluster where the IBP is running.
As per the IBM Cloud documentation,
If you use Enterprise Plan networks, you can view component logs in a
text file format. If you use Starter Plan networks, component logs are
gathered by the IBM Cloud Log Analysis service and
you can view the logs in Kibana.
Each component generates logs from different activities. This is
because each component plays different roles within the Hyperledger
Fabric network architecture and transaction flows.
Certificate Authority logs The Certificate Authority manages the
identity of participants within the network. In Certificate Authority
logs, you can find logs from when participants generate public and
private keys to communicate with the network (enroll), or when new
members, peers, or applications register with the Certificate
Authority. You can also use the CA logs to debug if there are any
problems with certificate verification.
So, you should be able to see the logs in the IBM Cloud Log Analysis service. By default, your logs are collected by the Lite Plan of the Log Analysis service. This plan is free and stores your logs for three days before discarding them. It also allows you to search only the first 500 MB of your logs per day. If your network logs exceed 500 MB, you cannot view new logs in Kibana. If your network generates more than 500 MB of logs, or you would like to retain your logs for more than three days, you can upgrade to a paid version of the Log Analysis Service.
For more info, refer the IBM cloud docs here

Service fabric fails to roll back application when deployment fails

I have a 3 node cluster for service fabric where the deployment is stuck for 10hr on the third node. Looking at the SF explorer we saw that there is wrong SQL creds being passed hence the deployment is stuck.
1) Why is SF recognizing it at a "warning" rather than an "Error"
2) Why is it stuck and not doing a roll back?
3) Is there extra setting I need to do so it does auto rollback sooner?
Generally, it rollback when the deployment fail, but it will depend on the parameter you pass for the upgrade, like FailureAction, UpgradeMode and Timeouts.
UpgradeMode values can be:
Monitored: Indicates that the upgrade mode is monitored. After the cmdlet finishes an upgrade for an upgrade domain, if the health of the upgrade domain and the cluster meet the health policies that you define, Service Fabric upgrades the next upgrade domain. If the upgrade domain or cluster fails to meet health policies, the upgrade fails and Service Fabric rolls back the upgrade for the upgrade domain or reverts to manual mode per the policy specified on FailureAction. This is the recommended mode for application upgrades in a production environment.
Unmonitored Auto: Indicates that the upgrade mode is unmonitored automatic. After Service Fabric upgrades an upgrade domain, Service Fabric upgrades the next upgrade domain irrespective of the application health state. This mode is not recommended for production, and is only useful during development of an application.
Unmonitored Manual: Indicates that the upgrade mode is unmonitored manual. After Service Fabric upgrades an upgrade domain, it waits for you to upgrade the next upgrade domain by using the Resume-ServiceFabricApplicationUpgrade cmdlet.
FailureAction is the compensating action to perform when a Monitored upgrade encounters monitoring policy or health policy violations. The values can be:
Rollback specifies that the upgrade will automatically roll back to the pre-upgrade version.
Manual indicates that the upgrade will switch to the UnmonitoredManual upgrade mode.
Invalid indicates that the failure action is invalid and does nothing.
Given that, if the upgrade is not set as Monitored for UpgradeMode and Rollback for FailureAction, the upgrade will wait a manual action from the operator(user).
If the upgrade is already set to these values, the problem can be either:
The health check and retries are too long, preventing the upgrade to fail quickly, an example is when you HealthCheckDuration is too long or there are too much delay between checks.
The old version is also failing
The following docs give all details: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade-parameters

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.

Force delete of IBM Bluemix Kubernetes Cluster

I tried to set up a new Kubernetes cluster on IBM Bluemix and after a while I received the message that the deploy failed. To start over again I have tried to delete the cluster from the Bluemix interface, with no success. The error messages are not consistent, and go from elaborate messages error messages to the most common: 500: internal server error.
The command line does not help either. I expected this to work
bx cs cluster-rm k8s_demo
But the most of the time it leads to and EOF error. Somehow internal connections are an issue because
bx cs clusters
leads to the error
FAILED
unable to connect to https://us-south.containers.bluemix.net/v1/clusters, please check your Internet Connection
most of the time. Every so often a list including the k8s_demo cluster appears, but being as persistent with the cluster-rm command has not brought such luck that the cluster is deleted.
Is there any other way I can try to start over again? Apart from setting up another Bluemix account of course, something I would prefer to avoid.
If this problem is continuing, I would suggest contacting IBM Support. They'll have the tools to troubleshoot what has happened with the cluster provisioning and/or deleting.

Debug Service Fabric DNX/asp.net 5 Stateless Service in Azure Cluster

I have published my dnx/Web Service Fabric stateless service to local - it works. I publish to the cloud (carefully setting up the correct ports) and it does not start correctly. The error is the usual partition is below replica count
My suspicion is that dnx is not installed by default on the cluster VMs. Any way to get around that? I don't appear to get a login to those VMs so I can install asp.net 5 manually.
Found the issue - it was not DNX.
I set up a new cluster and was able to log in. There are 22304 error messages saying that my second non-dnx stateless service which is in the same application package is causing this event:
.NET Runtime version : 4.0.30319.34014 - This application could not be started.This application requires one of the following versions of the .NET Framework:
.NETFramework,Version=v4.5.2
Do you want to install this .NET Framework version now?
I'll figure out how to target correctly.