I'm trying to run the deploy pipeline created by the Simple Container Toolchain example. The output of the deployment log is:
...
2017-07-03 15:49:43 UTC : creating group: /tmp/extension_content/cf ic group create --name hello-containers-XXXX_2 --publish 80 --desired 2 --min 1 --max 6 registry.ng.bluemix.net/chstest/hello-containers-XXXX:1
OK
The creation of the container group was requested.
The container group "hello-containers-XXXX_2" (ID: YYYY) was created.
Minimum container instances: 1
Maximum container instances: 6
Desired container instances: 2
2017-07-03 15:49:49 UTC : hello-containers-XXXX_2 is 'CREATE_IN_PROGRESS'
2017-07-03 15:49:53 UTC : hello-containers-XXXX_2 is 'CREATE_IN_PROGRESS'
...
... CREATE_IN_PROGRESS message repeated about 150 times
...
2017-07-03 16:02:51 UTC : hello-containers-XXXX_2 is 'CREATE_IN_PROGRESS'
2017-07-03 16:02:55 UTC : hello-containers-XXXX_2 is 'CREATE_IN_PROGRESS'
2017-07-03 16:02:58 UTC : Create group is not completed and stays in status 'CREATE_IN_PROGRESS'
2017-07-03 16:02:58 UTC : Failed to deploy group
To send notifications, set SLACK_WEBHOOK_PATH or HIP_CHAT_TOKEN in the environment
Finished: FAILED
If I navigate to the container dashboard in Bluemix, I see the following error log:
Group failed
Resource CREATE failed: ResourceInError: resources.asg.resources.i2jghszuv3br.resources.server: Went to status ERROR due to "Message: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance ZZZZ. Last exception: [u'Traceback (most recent call last): \n', u' File "/opt/bbc/openstack-12.1.90/nova/local/lib/python2.7/site-packages, Code: 500"
How can I debug this further?
Please open a Bluemix support ticket against the Containers SRE Team. Do this by logging into your Bluemix account and in the upper right hand corner there is an option for "Support" = "Add Ticket".
Related
After starting the image, orderer cannot start normally
2021-02-02 11:17:13.897 UTC [orderer.common.server] Main -> PANI 005 Failed validating bootstrap block: initializing channelconfig failed: could not create channel Consortiums sub-group config: Attempted to define two different versions of MSP: OrgcppMSP
panic: Failed validating bootstrap block: initializing channelconfig failed: could not create channel Consortiums sub-group config: Attempted to define two different versions of MSP: OrgcppMSP```
Hello Everyone i am working on setup of fabric default first-network in kubernetes. But when i am instantiate the chaincode it gives me error. Please check below are my peer logs.
2019-07-22 07:25:02.134 UTC [endorser] SimulateProposal -> ERRO 066 [mychannel][c4b4e2ae] failed to invoke chaincode name:"lscc" , error: container exited with 0
github.com/hyperledger/fabric/core/chaincode.(*RuntimeLauncher).Launch.func1
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/runtime_launcher.go:63
runtime.goexit
/opt/go/src/runtime/asm_amd64.s:1333
chaincode registration failed
Getting this error on Cli :-
2019-07-22 07:24:58.263 UTC [chaincodeCmd] checkChaincodeCmdParams -> INFO 001 Using default escc
2019-07-22 07:24:58.264 UTC [chaincodeCmd] checkChaincodeCmdParams -> INFO 002 Using default vscc
Error: could not assemble transaction, err proposal response was not successful, error code 500, msg chaincode registration failed: container exited with 0
Once check if all your docker containers are up and running and if you are simply running the sample network without making any changes to the smart contract and the docker files then you can simply stop your network and freshly start the network(it worked in my case).
I have check with my configuration files it was due to wrong CORE_PEER_CHAINCODELISTENADDRESS env variable value for the peer.
Let's say my job was running for some time and it went to suspend state due to machine overloading and became running after sometime and got completed.
Now the status acquired by this job were RUNNING -> SUSPEND -> RUNNING
How to get all states acquired by a given job ?
bjobs -l If the job hasn't been cleaned from the system yet.
bhist -l Otherwise. You might need -n, depending on how old the job is.
Here's an example of bhist -l output when a job was suspended and later resumed because the system load temporarily exceeded the configured threshold.
$ bhist -l 1168
Job <1168>, User <mclosson>, Project <default>, Command <sleep 10000>
Fri Jan 20 15:08:40: Submitted from host <hostA>, to
Queue <normal>, CWD <$HOME>, Specified Hosts <hostA>;
Fri Jan 20 15:08:41: Dispatched 1 Task(s) on Host(s) <hostA>, Allocated 1 Slot(
s) on Host(s) <hostA>, Effective RES_REQ <select[type == any] or
der[r15s:pg] >;
Fri Jan 20 15:08:41: Starting (Pid 30234);
Fri Jan 20 15:08:41: Running with execution home </home/mclosson>, Execution CW
D </home/mclosson>, Execution Pid <30234>;
Fri Jan 20 16:19:22: Suspended: Host load exceeded threshold: 1-minute CPU ru
n queue length (r1m)
Fri Jan 20 16:21:43: Running;
Summary of time in seconds spent in various states by Fri Jan 20 16:22:09
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
1 0 4267 0 141 0 4409
At 16:19:22 the jobs was suspended because r1m exceeded the threshold. Later at 16:21:43 the job resumes.
After I restarted my sharded cluster I noticed the balancer was not migrating any data anymore but the command sh.isBalancerRunning() always returned true.
I tried to to run the command sh.stopBalancer() and it stuck forever on:
sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Checking on the config server locks here is the data:
configsvr> db.locks.find({_id: "balancer"})
{ "_id" : "balancer", "process" : "myserver.mongodb.com:27017:1452776409:1804289383",
"state" : 2, "ts" : ObjectId("56cb817f2c4edd1226d6ae07"), "when" : ISODate("2016-02-22T21:45:35.360Z"), "who" : "myserver.mongodb.com:27017:1452776409:1804289383:Balancer:846930886",
"why" : "doing balance round" }
Also, if I try to run sh.startBalancer() it times out:
mongos> sh.startBalancer()
2016-02-23T22:51:11.204-0500 E QUERY [thread1] Error: assert.soon failed, msg:Waited too long for lock balancer to change to state undefined :
doassert#src/mongo/shell/assert.js:15:14
assert.soon#src/mongo/shell/assert.js:200:13
sh.waitForDLock#src/mongo/shell/utils_sh.js:171:1
sh.waitForBalancer#src/mongo/shell/utils_sh.js:264:9
sh.startBalancer#src/mongo/shell/utils_sh.js:146:5
#(shell):1:1
in the sh.status():
balancer:
Currently enabled: yes
Currently running: yes
Balancer lock taken at Mon Feb 22 2016 16:45:35 GMT-0500 (EST) by myserver.mongodb.com:27017:1452776409:1804289383:Balancer:846930886
Balancer active window is set between 8:00 and 6:00 server local time
Failed balancer rounds in last 5 attempts: 5
Last reported error: Connection refused
Time of Reported error: Tue Feb 23 2016 17:27:26 GMT-0500 (EST)
Migration Results for the last 24 hours:
No recent migrations
I have tried restarting the servers, stepping down primaries, changing the locks balancer state to 0 and running sh.startBalancer() and removing the balancer field from the locks db and trying to run sh.startBalancer() again with no results.
At the end it was an issue with the server clocks been out of sync, for some reason the logs about this issue didn't appear until the next day.
Hope this helps someone with a similar issue :)
i have a setup where i am using 3 mesos masters and 3 mesos slasves. after making all the required configurations i can see 3 mesos masters are part of a cluster which is maintained by zookeepers.
now i have setup 3 mesos slaves and when i am starting mesos-slave service, i am expecting that mesos slaves will be available to the mesos masters web UI page. But i can not see any of them in the slaves tab.
selinux, firewall, iptalbes all are disabled. able to perform ssh between node.
[cloud-user#slave1 ~]$ sudo systemctl status mesos-slave -l
mesos-slave.service - Mesos Slave
Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled)
Active: active (running) since Sat 2016-01-16 16:11:55 UTC; 3s ago
Main PID: 2483 (mesos-slave)
CGroup: /system.slice/mesos-slave.service
├─2483 /usr/sbin/mesos-slave --master=zk://10.0.0.2:2181,10.0.0.6:2181,10.0.0.7:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins
├─2493 logger -p user.info -t mesos-slave[2483]
└─2494 logger -p user.err -t mesos-slave[2483]
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628670 2497 detector.cpp:482] A new leading master (UPID=master#127.0.0.1:5050) is detected
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628732 2497 slave.cpp:729] New master detected at master#127.0.0.1:5050
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628825 2497 slave.cpp:754] No credentials provided. Attempting to register without authentication
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628844 2497 slave.cpp:765] Detecting new master
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628872 2497 status_update_manager.cpp:176] Pausing sending status updates
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: E0116 16:11:55.628922 2503 process.cpp:1911] Failed to shutdown socket with fd 11: Transport endpoint is not connected
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.629093 2502 slave.cpp:3215] master#127.0.0.1:5050 exited
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: W0116 16:11:55.629107 2502 slave.cpp:3218] Master disconnected! Waiting for a new master to be elected
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: E0116 16:11:55.983531 2503 process.cpp:1911] Failed to shutdown socket with fd 11: Transport endpoint is not connected
Jan 16 16:11:57 slave1.novalocal mesos-slave[2494]: E0116 16:11:57.465049 2503 process.cpp:1911] Failed to shutdown socket with fd 11: Transport endpoint is not connected
So the problematic line is:
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.629093 2502 slave.cpp:3215] master#127.0.0.1:5050 exited
Specifically, note it's detecting the master as having the IP address 127.0.0.1. The Mesos Agent[1] sees that IP address, and tries to connect which fails (The master isn't running on the same machine as the agent).
This happens because the master announces what it thinks it's IP address is into Zookeeper. In your case, the master is thinking it's IP is 127.0.0.1 and then storing that into zk. Mesos has several configuration flags to control this behavior, mainly --hostname, --no-hostname_lookup, --ip, --ip_discovery_command, and via setting the environment variable LIBPROCESS_IP. See http://mesos.apache.org/documentation/latest/configuration/ for details about them and what they do.
The best thing you can do to make sure things work out of the box is to make sure the machines have resolvable hostnames. Mesos does a reverse-DNS lookup of the boxes hostname in order to figure out what IP people will contact it from.
If you can't get the hostnames setup properly, I would recommend setting --hostname and --ip manually which should cause mesos to announce exactly what you want.
[1]The mesos slave has been renamed to agent, see: https://issues.apache.org/jira/browse/MESOS-1478