CEPH Octupus: ceph command hangs. No data for prometheus also - ceph

I'm facing an issue with ceph. I cannot run any ceph command. It literally hangs. I need to hit CTRL-C to get this:
^CCluster connection interrupted or timed out
This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get information from the cluster, but now there is no data to graph. Any clue?
cephadm version INFO:cephadm:Using recent ceph image ceph/ceph:v15
ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
(stable)
cephadm ls
[
{
"style": "cephadm:v1",
"name": "mon.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d#mon.osswrkprbe001",
"enabled": true,
"state": "running",
"container_id": "afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf",
"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
"version": "15.2.1",
"started": "2020-10-19T19:03:16.759730",
"created": "2020-09-04T23:30:30.250336",
"deployed": "2020-09-04T23:48:20.956277",
"configured": "2020-09-04T23:48:22.100283"
},
{
"style": "cephadm:v1",
"name": "mgr.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d#mgr.osswrkprbe001",
"enabled": true,
"state": "running",
"container_id": "1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8",
"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
"version": "15.2.1",
"started": "2020-10-19T20:43:38.329529",
"created": "2020-09-04T23:30:31.110341",
"deployed": "2020-09-04T23:47:41.604057",
"configured": "2020-09-05T00:00:21.064246"
}
]
Thank you in advance.

This has been solved by rebooting MONs machines.

Related

My pulumi stack was created with an older plugin version that I can't install on M1 mac

I have a Pulumi project which I haven't touched (deployed etc) for a while.
Now I need to make some changes but I get the "403 HTTP error fetching plugin" described here
The description in the docs makes sense: I have bought a new Apple M1 laptop since last time I worked on it, my stack was built with the digitalocean provider v3.1.1 but I can't install that version on my new laptop.
The docs say that if you have access to an Intel system, set up your project there, update the providers and run pulumi up. The implication is that you can install both old and latest versions of the provider plugin, pulumi up will update the stack using the latest version. After that I should be able to manage the stack from my new laptop using the latest provider version.
I asked on Pulumi Slack about this and confirmed the above.
But it doesn't seem to work for me.
First I started a Docker container with --platform=linux/amd64 to emulate Intel. Inside that I checked out my project, installed pulumi and the old and new provider versions.
In my docker container pulumi plugin ls shows:
NAME KIND VERSION SIZE INSTALLED LAST USED
cloudflare resource 4.7.0 38 MB n/a 1 day ago
cloudflare resource 2.8.0 46 MB n/a 1 day ago
digitalocean resource 4.14.0 42 MB n/a 1 day ago
digitalocean resource 3.1.1 45 MB n/a 1 day ago
From there I successfully ran pulumi up.
However if I go back to my local shell and try pulumi preview I get:
error: could not load plugin for digitalocean provider 'urn:pulumi:staging::myproject::pulumi:providers:digitalocean::default': no resource plugin 'pulumi-resource-digitalocean' found in the workspace at version v3.1.1 or on your $PATH, install the plugin using pulumi plugin install resource digitalocean v3.1.1
So despite updating the stack Pulumi is still trying to use the old provider version that I can't install.
How do I get around this?
You also need to update your pulumi program's dependencies.
Simply installing the new plugin isn't enough.
If you do pulumi stack export you'll see a JSON file with all your resources. Those resources have a provider attached to them with a specific version of the plugin. As an example:
{
"version": 3,
"deployment": {
"manifest": {
"time": "2022-06-23T12:03:30.071863-07:00",
"magic": "eccb7d9cc1cab43d7465783c52b0648063d5e7228dd3bb2fc7600583a8bca5d5",
"version": "v3.34.1"
},
"secrets_providers": {
"type": "service",
"state": {
"url": "https://api.pulumi.com",
"owner": "jaxxstorm",
"project": "s3_event_bridge",
"stack": "dev"
}
},
"resources": [
{
"urn": "urn:pulumi:dev::s3_event_bridge::pulumi:pulumi:Stack::s3_event_bridge-dev",
"custom": false,
"type": "pulumi:pulumi:Stack",
"outputs": {
"bucketName": "test001-c86ab36"
},
"sequenceNumber": 1
},
{
"urn": "urn:pulumi:dev::s3_event_bridge::pulumi:providers:aws::default_5_9_1",
"custom": true,
"id": "484b75ed-d5cd-4ee1-96d1-b3f641236ab6",
"type": "pulumi:providers:aws",
"inputs": {
"region": "us-west-2",
"version": "5.9.1"
},
"outputs": {
"region": "us-west-2",
"version": "5.9.1"
},
"sequenceNumber": 1
},
{
"urn": "urn:pulumi:dev::s3_event_bridge::aws:s3/bucket:Bucket::test001",
"custom": true,
"id": "test001-c86ab36",
"type": "aws:s3/bucket:Bucket",
"inputs": {
"__defaults": [
"bucket",
"forceDestroy"
],
"acl": "private",
"bucket": "test001-c86ab36",
"forceDestroy": false,
"tags": {
"Environment": "Dev",
"Name": "My bucket",
"__defaults": []
}
},
"outputs": {
"accelerationStatus": "",
"acl": "private",
"arn": "arn:aws:s3:::test001-c86ab36",
"bucket": "test001-c86ab36",
"bucketDomainName": "test001-c86ab36.s3.amazonaws.com",
"bucketRegionalDomainName": "test001-c86ab36.s3.us-west-2.amazonaws.com",
"corsRules": [],
"forceDestroy": false,
"grants": [],
"hostedZoneId": "Z3BJ6K6RIION7M",
"id": "test001-c86ab36",
"lifecycleRules": [],
"loggings": [],
"objectLockConfiguration": null,
"region": "us-west-2",
"replicationConfiguration": null,
"requestPayer": "BucketOwner",
"serverSideEncryptionConfiguration": null,
"tags": {
"Environment": "Dev",
"Name": "My bucket"
},
"tagsAll": {
"Environment": "Dev",
"Name": "My bucket"
},
"versioning": {
"enabled": false,
"mfaDelete": false
},
"website": null
},
"parent": "urn:pulumi:dev::s3_event_bridge::pulumi:pulumi:Stack::s3_event_bridge-dev",
"provider": "urn:pulumi:dev::s3_event_bridge::pulumi:providers:aws::default_5_9_1::484b75ed-d5cd-4ee1-96d1-b3f641236ab6",
"propertyDependencies": {
"acl": null,
"tags": null
},
"sequenceNumber": 1
},
{
"urn": "urn:pulumi:dev::s3_event_bridge::aws:s3/bucketNotification:BucketNotification::bucketNotification",
"custom": true,
"id": "test001-c86ab36",
"type": "aws:s3/bucketNotification:BucketNotification",
"inputs": {
"__defaults": [],
"bucket": "test001-c86ab36",
"eventbridge": false
},
"outputs": {
"bucket": "test001-c86ab36",
"eventbridge": false,
"id": "test001-c86ab36",
"lambdaFunctions": [],
"queues": [],
"topics": []
},
"parent": "urn:pulumi:dev::s3_event_bridge::pulumi:pulumi:Stack::s3_event_bridge-dev",
"dependencies": [
"urn:pulumi:dev::s3_event_bridge::aws:s3/bucket:Bucket::test001"
],
"provider": "urn:pulumi:dev::s3_event_bridge::pulumi:providers:aws::default_5_9_1::484b75ed-d5cd-4ee1-96d1-b3f641236ab6",
"propertyDependencies": {
"bucket": [
"urn:pulumi:dev::s3_event_bridge::aws:s3/bucket:Bucket::test001"
],
"eventbridge": null
},
"sequenceNumber": 1
}
]
}
}
If you look at my BucketNotification resource, you can see a provider field which has a version in it for the AWS provider I've used:
"provider": "urn:pulumi:dev::s3_event_bridge::pulumi:providers:aws::default_5_9_1::484b75ed-d5cd-4ee1-96d1-b3f641236ab6"
Which in this case is 5.9.1
So, in order to fix this problem, you need to update your resources to have a new version of the provider.
To do this, it depends on the language you're using with Pulumi.
If you're using TypeScript or JavaScript, update your #pulumi/digitalocean dependency in your package.json
If you're using Python, update pulumi_digitalocean in your requirements.txt
Make sure you update with your package manager with npm update or pip3 upgrade
The same applies if you're using DotNet, Go and Java.
then you need to run a successful pulumi up. Pulumi will update the provider version associated with each resource as you saw above, you can verify this by again doing a pulumi stack export
From here, you should be able to successfully use your M1 mac without the legacy plugins.

POD is being terminated and created again due to scale up and it's running twice

I have an application that runs a code and at the end it sends an email with a report of the data. When I deploy pods on GKE , certain pods get terminated and a new pod is created due to Auto Scale, but the problem is that the termination is done after my code is finished and the email is sent twice for the same data.
Here is the JSON file of the deploy API:
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "$name",
"namespace": "$namespace"
},
"spec": {
"template": {
"metadata": {
"name": "********"
},
"spec": {
"priorityClassName": "high-priority",
"containers": [
{
"name": "******",
"image": "$dockerScancatalogueImageRepo",
"imagePullPolicy": "IfNotPresent",
"env": $env,
"resources": {
"requests": {
"memory": "2000Mi",
"cpu": "2000m"
},
"limits":{
"memory":"2650Mi",
"cpu":"2650m"
}
}
}
],
"imagePullSecrets": [
{
"name": "docker-secret"
}
],
"restartPolicy": "Never"
}
}
}
}
and here is a screen-shot of the pod events:
Any idea how to fix that?
Thank you in advance.
"Perhaps you are affected by this "Note that even if you specify .spec.parallelism = 1 and .spec.completions = 1 and .spec.template.spec.restartPolicy = "Never", the same program may sometimes be started twice." from doc. What happens if you increase terminationgraceperiodseconds in your yaml file? – "
#danyL
my problem was that I had another jobs that deploy pods on my nodes with more priority , so it was trying to terminate my running pods but the job was already done and the email was already sent , so i fixed the problem by fixing the request and the limit resources on all my json files , i don't know if it's the perfect solution but for now it solved my problem.
Thank you all for you help

Hyperledger Explorer Error 12 UNIMPLEMENTED: service protos.Endorser

I am trying to run the Hypeledger Explorer for my blockchain network. I have followed the instructions almost word for word using the Hyperldger Explorer
But anytime I do the final call: ./start.sh --- I get a litany of errors
error: [client-utils.js]: sendPeersProposal - Promise is rejected: Error: 12 UNIMPLEMENTED: unknown service protos.Endorser
at new createStatusError (/home/ubuntu/bludev/blockchain-explorer/node_modules/grpc/src/client.js:64:15)
at /home/ubuntu/bludev/blockchain-explorer/node_modules/grpc/src/client.js:583:15
error: [Client.js]: Failed Installed Chaincodes Query. Error: Error: 12 UNIMPLEMENTED: unknown service protos.Endorser
at new createStatusError (/home/ubuntu/bludev/blockchain-explorer/node_modules/grpc/src/client.js:64:15)
at /home/ubuntu/bludev/blockchain-explorer/node_modules/grpc/src/client.js:583:15
...
And so on. For more info I am using
nodejs 6.9 and PostgreSQL 9.5. This is the way my config.json file looks:
{
"network-config": {
"org1": {
"name": "peerOrg1",
"mspid": "Org1MSP",
"peer1": {
"requests": "grpc://127.0.0.1:7051",
"events": "grpc://127.0.0.1:7053",
"server-hostname": "peer0.org1.example.com",
"tls_cacerts": "/home/ubuntu/bludev/fabric-samples/first-network/crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls/ca.crt"
},
"admin": {
"key": "/home/ubuntu/bludev/hlcomposer/fabric-dev-servers/fabric-scripts/hlfv1/composer/crypto-config/peerOrganizations/org1.example.com/users/Admin#org1.example.com/msp/keystore",
"cert": "/home/ubuntu/bludev/hlcomposer/fabric-dev-servers/fabric-scripts/hlfv1/composer/crypto-config/peerOrganizations/org1.example.com/users/Admin#org1.example.com/msp/signcerts"
}
}
},
"host": "localhost",
"port": "3000",
"channel": "mychannel",
"keyValueStore": "/tmp/fabric-client-kvs",
"eventWaitTime": "30000",
"pg": {
"host": "12.109.99.233",
"port": "3000",
"database": "fabricexplorer",
"username": "postgres",
"passwd": "password1"
}}
The problem is your hyperledger network does not have any endorser in network.
try first-network sample from official fabric-samples folder, rebuild the explorer and then try again.

How could a spring-boot application determine if it is running on cloud foundry?

I'm writting a micro service with spring-boot. The db is mongodb. The service works perfectly in my local environment. But after I deployed it to the cloud foundry it doesn't work. The reason is connecting mongodb time out.
I think the root cause is the application doesn't know it is running on cloud. Because it still connecting 127.0.0.1:27017, but not the redirected port.
How could it know it is running on cloud? Thank you!
EDIT:
There is a mongodb instance bound to the service. And when I checked the environment information, I got following info:
{
"VCAP_SERVICES": {
"mongodb": [
{
"credentials": {
"hostname": "10.11.241.1",
"ports": {
"27017/tcp": "43417",
"28017/tcp": "43135"
},
"port": "43417",
"username": "xxxxxxxxxx",
"password": "xxxxxxxxxx",
"dbname": "gwkp7glhw9tq9cwp",
"uri": "xxxxxxxxxx"
},
"syslog_drain_url": null,
"volume_mounts": [],
"label": "mongodb",
"provider": null,
"plan": "v3.0-container",
"name": "mongodb-business-configuration",
"tags": [
"mongodb",
"document"
]
}
]
}
}
{
"VCAP_APPLICATION": {
"cf_api": "xxxxxxxxxx",
"limits": {
"fds": 16384,
"mem": 1024,
"disk": 1024
},
"application_name": "mock-service",
"application_uris": [
"xxxxxxxxxx"
],
"name": "mock-service",
"space_name": "xxxxxxxxxx",
"space_id": "xxxxxxxxxx",
"uris": [
"xxxxxxxxxx"
],
"users": null,
"application_id": "xxxxxxxxxx",
"version": "c7569d23-f3ee-49d0-9875-8e595ee76522",
"application_version": "c7569d23-f3ee-49d0-9875-8e595ee76522"
}
}
From my understanding, I think my spring-boot service should try to connect the port 43417 but not 27017, right? Thank you!
Finally I found the reason is I didn't specify the profile. After adding following code in my manifest.yml it works:
env:
SPRING_PROFILES_ACTIVE: cloud

presto: Discovery server cannot get connect

Recently I build presto with cluster mode, 1 coordinator & 1 worker, it works.
Then I repackage "presto-main-0.148.jar" without any change , and replace it to production environment, it doesn't work! Always get response with "No worker nodes available"
I search the Server.log and see below messages:
ERROR Discovery-0 io.airlift.discovery.client.CachingServiceSelector Cannot
connect to discovery server for refresh (collector/general): Lookup
of collector failed for
ht*p://10.3.2.33:18080/v1/service/collector/general
ERROR Discovery-0 io.airlift.discovery.client.CachingServiceSelector Cannot
connect to discovery server for refresh (presto/general): Lookup of
presto failed for ht*p://10.3.2.33:18080/v1/service/presto/general
INFO Discovery-1 io.airlift.discovery.client.CachingServiceSelector Discovery
server connect succeeded for refresh (collector/general)
INFO Discovery-2 io.airlift.discovery.client.CachingServiceSelector Discovery
server connect succeeded for refresh (presto/general)
So I guess discover server is not started,But I use command curl "h*tp://10.3.2.33:18080/v1/service/collector/general",
and get response below, and I also get coordinator status as 'ACTIVE'
{
"environment": "presto_**_flt",
"services": [
{
"id": "954e886d-7506-4f00-b954-eeab49209835",
"nodeId": "4c0f2596-7e6e-11e6-ae22-56b6b6499611",
"type": "presto",
"pool": "general",
"location": "/4c0f2596-7e6e-11e6-ae22-56b6b6499611",
"properties": {
"node_version": "a0e36ae",
"coordinator": "false",
"http": "h*tp://10.3.2.24:18080",
"http-external": "h*tp://10.3.2.24:18080",
"datasources": "hive,system"
}
},
{
"id": "6790b522-cd17-48ef-b077-e4e8fa97e310",
"nodeId": "4c0f2366-7e6e-11e6-ae22-56b6b6499611",
"type": "presto",
"pool": "general",
"location": "/4c0f2366-7e6e-11e6-ae22-56b6b6499611",
"properties": {
"node_version": "c34bef3-dirty",
"coordinator": "true",
"http": "h*tp://10.3.2.33:18080",
"http-external": "h*tp://10.3.2.33:18080",
"datasources": ""
}
}
]
}
I think this is because that you have two different node_version in these two services.
If you are repackaging presto-main or any other component, make sure you are using the same binaries on all the nodes.