Concourse error - discovering any new version loop - concourse

Has anyone encountered an infinite loopdiscovering any new versions when building? I suspect there's an issue with one of the workers, does this worker config look ok? Installing the binary - my worker config is shown below.Fly workers list returns nothing.
concourse:
worker:
config:
name: ci_worker01
bind-ip: 0.0.0.0
bind-port: 7777
tsa-host: 127.0.0.1
tsa-port: 2222
tsa-public-key: /opt/concourse/.ssh/id_web_rsa.pub
tsa-worker-private-key: /opt/concourse/.ssh/id_worker_rsa
service: True

Related

Docker Compose Config Server unreachable by Spring Cloud Data Flow Microservices

The config server is reachable from localhost:8888 but when I deploy my applications on SCDF the following error occurs:
Fetching config from server at : http://localhost:8888
2021-07-30 14:58:53.535 INFO 143 --- [ main] o.s.b.context.config.ConfigDataLoader : Connect Timeout Exception on Url - http://localhost:8888. Will be trying the next url if available
2021-07-30 14:58:53.535 WARN 143 --- [ main] o.s.b.context.config.ConfigDataLoader : Could not locate PropertySource ([ConfigServerConfigDataResource#3de88f64 uris = array<String>['http://localhost:8888'], optional = true, profiles = list['default']]): I/O error on GET request for "http://localhost:8888/backend-service/default": Connection refused (Connection refused); nested exception is java.net.ConnectException: Connection refused (Connection refused)
The application(s) deploy successfully on SCDF apart from the config server connection. The only property I specify in SCDF is the docker network. I'm using spring.config.import and am not using any bootstraps. This all works correctly when deployed locally but the microservices can't connect to the config server when deployed on SCDF.
Spring Boot Version: 2.5.1
app properties
spring.application.name=backend-service
spring.cloud.config.fail-fast=true
spring.cloud.config.retry.max-attempts=6
spring.cloud.config.retry.max-interval=11000
spring.config.import=optional:configserver:http://localhost:8888
config server properties
spring.cloud.config.server.git.uri=...
management.endpoints.web.exposure.include=*
spring.cloud.config.fail-fast=true
spring.cloud.config.retry.max-attempts=6
spring.cloud.config.retry.max-interval=11000
spring.cloud.bus.id=my-config-server
spring.cloud.stream.rabbit.bindings.springCloudBus.consumer.declareExchange=false
spring.rabbitmq.host=127.0.0.1
spring.rabbitmq.port=5672
spring.rabbitmq.username=guest
spring.rabbitmq.password=guest
spring.cloud.bus.enabled=true
spring.cloud.bus.refresh.enabled: true
spring.cloud.bus.env.enabled: true
server.port=8888
docker-compose.yml
version: '3.1'
services:
h2:
...
rabbitmq-container:
image: rabbitmq:3.7.14-management
hostname: dataflow-rabbitmq
expose:
- '5672'
ports:
- "5672:5672"
- "15672:15672"
networks:
- scdfnet
dataflow-server:
...
networks:
- scdfnet
app-import:
...
networks:
- scdfnet
skipper-server:
...
networks:
- scdfnet
configserver-container:
image: ...
ports:
- "8888:8888"
expose:
- '8888'
environment:
- spring_rabbitmq_host=rabbitmq-container
- spring_rabbitmq_port=5672
- spring_rabbitmq_username=guest
- spring_rabbitmq_password=guest
depends_on:
- rabbitmq-container
networks:
- scdfnet
networks:
scdfnet:
external:
name: scdfnet
volumes:
h2-data:
For anyone else having this problem, I have found two ways of solving it. The problem is that once the Spring Boot application is containerized, the localhost referred to in the properties file will cause the program to fetch the localhost of the application container's virtual network and not that of your local machine.
There are numerous Stack Overflow answers for this same error but all center around corrections to bootstrap properties. However, bootstrap context initialization is deprecated since Spring Boot 2.4.
The first solution is to use your IPv4 address instead of localhost.
spring.config.import=configserver:http://<insert IPv4 address>:8888
For Example:
spring.config.import=configserver:http://10.6.39.148:8888
A much better solution than hardwiring addresses is to reference the config server container running in docker compose:
spring.config.import=optional:configserver:http://configserver-container:8888
Make sure that all of the Docker Compose services are running on the same network (scdf_network in my case) and note that this address will only work when running on docker-compose so if you are building the maven file on Eclipse, you may need to remove or disable your tests to build successfully. That might be unnecessary; it could just be that there is some property that I failed to copy to my local application.properties file which is causing the context tests to fail. According to the documentation, the optional label should allow the config client to run even if contact cannot be established with the config server.

Need help setting BROKER_URL in Airflow's config and Celery Executor

Summary
I'm using Apache-Airflow for the first time. I've gotten the webserver, SequentialExecutor and LocalExecutor to work, but I'm running into issues when using the CeleryExecutor with rabbitmq-server. I currently have two AWS EC2 instances.
Error
To summarize: My worker cannot connect to the rabbitmq-server on my scheduler node. Whenever I run airflow worker on the worker instance, it gives:
- ** ---------- [config]
- ** ---------- .> app: airflow.executors.celery_executor:0x7f53a8dce400
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> default exchange=default(direct) key=default
[2019-02-15 02:26:23,742: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#127.0.0.1:5672//: [Errno 111] Connection refused.
Configuration
I followed all of the directions I could find online. Both instances have the same airflow.cfg file, with
[core]
executor = CeleryExecutor
[celery]
broker_url = pyamqp://username:password#hostname:port/virtual_host
and result_backend pointing at the same MySQL database on RDS that airflow is working off of.
From what I could tell, no matter what, the worker node always tried connecting to a local rabbitmq-server and completely ignored that broker_url in my airflow.cfg file.
What I've Tried
I went spelunking in the source code, and noticed in celery/app/base.py, if I error log out the configurations it gets in _get_config() when it goes to create a connection, there are actually TWO values in the dictionary returned.
BROKER_URL = None
broker_url = pyamqp://username:password#hostname:port/virtual_host
and all of the connection logic seems to point at the BROKER_URL key.
I tried setting BROKER_URL and CELERY_BROKER_URL in airflow.cfg, but it seems to be case insensitive, and ignores the latter. Just to see if it would work, I modified the _get_config() method and hacked in:
s['BROKER_URL'] = s['broker_url']
return s
And, like I expected, everything started working.
Am I doing something wrong? I'd really rather not use this hack, but I can't understand why it's ignoring the configuration values.
Thanks!
From the error message, it seems like the hostname being passed in the URI is wrong:
If rabbitmq-server and worker are in different machines: instead of localhost/127.0.0.1, the hostname should be the IP address of the rabbitmq machine
If rabbitmq-server and worker are in the same machine as part of a Docker Compose application (e.g. if you took inspiration from here): the hostname should be the service name associated to the RabbitMQ image in docker-compose.yml, e.g. amqp://guest:guest#rabbitmq:5672/

Received AliveMessage from a peer with the same PKI-ID as myself

I am attempting to port the Hyperledger Fabric Getting Started to Kubernetes. But am struggling to get peer1's to deploy. If I enable CORE_PEER_GOSSIP_BOOTSTRAP, I receive errors "Received AliveMessage from a peer with the same PKI-ID as myself".
How can I debug a peer reportedly having the same PKI-ID as another?
Using this as a starting point:
https://hyperledger-fabric.readthedocs.io/en/latest/getting_started.html
I am able to create:
orderer and cli pods in default namespace
peer0's one in each org1|org2 namespace.
peer1's but only if I disable (comment out) CORE_PEER_GOSSIP_BOOTSTRAP
If I enable CORE_PEER_GOSSIP_BOOTSTRAP for the peer1's, I receive the following warning and error:
[gossip/gossip#10.0.0.10:7051] NewGossipService -> WARN 01c External endpoint is empty, peer will not be accessible outside of its organization
...
[gossip/discovery#10.0.0.10:7051] handleAliveMessage -> ERRO 02a Bad configuration detected: Received AliveMessage from a peer with the same PKI-ID as myself: tag:EMPTY alive_msg:<membership:<pki_id:"[[REDACTED]]" > timestamp:<inc_number:1495468533769417608 seq_num:416 > >
In order to better map the Orderer, Peers to DNS names, I'm using Kubernetes Namespaces and this configuration:
OrdererOrgs:
- Name: Orderer
Domain: default.svc.cluster.local
Specs:
- Hostname: orderer
PeerOrgs:
- Name: Org1
Domain: org1.svc.cluster.local
Template:
Count: 2
Users:
Count: 2
- Name: Org2
Domain: org2.svc.cluster.local
Template:
Count: 2
Users:
Count: 2
In order to expose the peer0's to the other peers in the org and to expose the orderer, I have ClusterIP services for the peer0's (selecting only the peer0's) and orderer. It's inelegant but I'm trying to get it to work before I get it working more beautifully.
I am able to resolve orderer.default.svc.cluster.local, peer0.org1.svc.cluster.local, `peer0.org2.svc.cluster.local' using nslookup from within a pod deployed to default on the cluster.
Absent a curl-like tool for gPRC, I am able to open sockets against these endpoints on 7051 and 7053.
First, make sure you are using the right certificates.
Second, verify that your environment/configuration for gossip is set correctly
environment:
- CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.org1.example.com:8051
- CORE_PEER_GOSSIP_BOOTSTRAP=peer0.org1.example.com:7051
- CORE_PEER_GOSSIP_ENDPOINT=peer0.org1.example.com:7051
OR in core.yaml
peer:
gossip:
bootstrap: peer0.org1.example.com:7051
externalEndpoint: peer1.org1.example.com:8051
endpoint: peer0.org1.example.com:7051
Edited: Also make sure that you have properly setup your CA
Hope this helps, it worked for me. And I was successfully able to connect peers.
If the peers are started from the same node, its possible that you are mounting the same crypto-material (path to mspconfig directory) for both the peers. If that is the case, separate the directory structures for both the peers and keep their respective certificates in them, update the respective paths for msp in docker-compose file and try to run.

mesos-master crash with zookeeper cluster

I am deploying a zookeeper cluster which has 3 nodes. I use it to keep my mesos master high availability. I download the zookeeper-3.4.6.tar.gz tarball and uncompress it to /opt, rename it to /opt/zookeeper, enter the directory, edit the conf/zoo.cfg(pasted below), create a myid file in dataDir(which is set to /var/lib/zookeeper in zoo.cfg), and start zookeeper using ./bin/zkServer.sh start, and it goes well. I start all the 3 nodes one by one and they all seems well. I use ./bin/zkCli.sh to connect the server , no problem.
But when I start mesos (3 masters and 3 slaves, each node runs a master and a slave), then the masters soon crashed, one by one, and in the webpage http://mesos_master:5050, slave tab, no slaves are displayed. But when I run only one zookeeper, these are all fine. So I think it's the zookeeper cluster's problem.
I got 3 PV host in my ubuntu server. they are all running ubuntu 14.04 LTS:
node-01, node-02, node-03,
I have /etc/hosts in all three nodes like this:
172.16.2.70 node-01
172.16.2.81 node-02
172.16.2.80 node-03
I installed zookeeper, mesos on all the three nodes. Zookeeper configure file is like this (all three nodes) :
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=node-01:2888:3888
server.2=node-02:2888:3888
server.3=node-03:2888:3888
they can be started normally and run well. And then I start the mesos-master service, using the command line ./bin/mesos-master.sh --zk=zk://172.16.2.70:2181,172.16.2.81:2181,172.16.2.80:2181/mesos --work_dir=/var/lib/mesos --quorum=2, and after a few seconds, it gives me errors like this:
F0817 15:09:19.995256 2250 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
# 0x7fa2b8be71a2 google::LogMessage::Fail()
# 0x7fa2b8be70ee google::LogMessage::SendToLog()
# 0x7fa2b8be6af0 google::LogMessage::Flush()
# 0x7fa2b8be9a04 google::LogMessageFatal::~LogMessageFatal()
▽
# 0x7fa2b81a899a mesos::internal::master::fail()
▽
# 0x7fa2b8262f8f _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvJS1_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
▽
# 0x7fa2b823fba7 _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEEclIJS1_EvEET0_DpOT_
# 0x7fa2b820f9f3 _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENUlS6_E_clES6_
# 0x7fa2b826305c _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x499480 process::Future<>::fail()
# 0x7fa2b806b4b4 process::Promise<>::fail()
# 0x7fa2b826011b process::internal::thenf<>()
# 0x7fa2b82a0757 _ZNSt5_BindIFPFvRKSt8functionIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEERKSt10shared_ptrINS1_7PromiseIS3_EEERKNS2_IS7_EEESB_SH_St12_PlaceholderILi1EEEE6__callIvISM_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b82962d9 std::_Bind<>::operator()<>()
# 0x7fa2b827ee89 std::_Function_handler<>::_M_invoke()
I0817 15:09:20.098639 2248 http.cpp:283] HTTP GET for /master/state.json from 172.16.2.84:54542 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36'
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b827efaf _ZZNK7process6FutureIN5mesos8internal8RegistryEE5onAnyIRSt8functionIFvRKS4_EEvEES8_OT_NS4_6PreferEENUlS8_E_clES8_
# 0x7fa2b82a07fe _ZNSt17_Function_handlerIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEZNKS5_5onAnyIRSt8functionIS8_EvEES7_OT_NS5_6PreferEEUlS7_E_E9_M_invokeERKSt9_Any_dataS7_
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b82e4419 process::internal::run<>()
# 0x7fa2b82da22a process::Future<>::fail()
# 0x7fa2b83136b5 std::_Mem_fn<>::operator()<>()
# 0x7fa2b830efdf _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b8307d7f _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEEclIJS8_EbEET0_DpOT_
# 0x7fa2b82fe431 _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENUlS9_E_clES9_
# 0x7fa2b830f065 _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS8_FbS1_EES8_St12_PlaceholderILi1EEEEbEERKS8_OT_NS8_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x7fa2b82da202 process::Future<>::fail()
# 0x7fa2b82d2d82 process::Promise<>::fail()
Aborted
sometimes the warning is like this, and then crashed with the same output above:
0817 15:09:49.745750 2104 recover.cpp:111] Unable to finish the recover protocol in 10secs, retrying
I want to know whether zookeeper is deployed and run well in my case, and How can I locate where the problem is. Any answers and suggests are welcomed. thanks.
Actually, in my case, It's because I didn't open firewall port 5050 to allow three servers to communicate with each others. After updating firewall rule, it starts to work as expected.
I fall into same issue, I tried different ways and different options and finally --ip option worked for me. Initially I used --hostname option
mesos-master --ip=192.168.0.13 --quorum=2 --zk=zk://m1:2181,m2:2181,m3:2181/mesos --work_dir=/opt/mm1 --log_dir=/opt/mm1/logs
You need to check that all mesos/zookeeper master nodes can communicate correctly. For that, you need:
Zookeeper ports open: TCP 2181, 2888, 3888
Mesos port open: TCP 5050
ping available (ICMP message 0 and 8)
If you use FQDN instead of IP in your config, check that the DNS resolution is working correctly as well.
Split your mesos masters' work_dir to different dir, do not use a share work_dir for all masters, because of zk

Hector test example not working on Cassandra 0.7.4

I have set up my single node Cassandra 0.7.4 and started the service with
bin/cassandra -f. Now I am trying to use the Hector API (v. 0.7.0) to manage the
DB.
The Cassandra CLI works fine and I can create keyspaces and so on.
I tried to run the test example and create a single keyspace:
Cluster cluster = HFactory.getOrCreateCluster("TestCluster",
new CassandraHostConfigurator("localhost:9160"));
Keyspace keyspace = HFactory.createKeyspace("Keyspace1", cluster);
But all I get is this:
2011-04-14 22:20:27,469 [main ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService
- Downed Host
Retry service started with queue size -1 and retry delay 10s
2011-04-14 22:20:27,492 [main ] DEBUG
me.prettyprint.cassandra.connection.HThriftClient -
Transport open status false
for client CassandraClient<localhost:9160-1>
....this again about 20 times
me.prettyprint.cassandra.service.JmxMonitor - Registering JMX
me.prettyprint.cassandra.service_TestCluster:ServiceType=hector,
MonitorType=hector
2011-04-14 22:20:27,636 [Thread-0 ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService -
Downed Host
retry shutdown hook called
2011-04-14 22:20:27,646 [Thread-0 ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService -
Downed Host
retry shutdown complete
Can you please tell me what I'm doing wrong?
Thanks
When you connect via the CLI, do you specify "-h localhost -p 9160"?
Can you actually do stuff on the command line with the above?
The error from HThriftClient indicates it could not connect to the Cassandra Daemon.
FTR, you would get responses much faster via hector-users#googlegroups.com
If you are on a linux machine, try starting up your cassandra server by this command:
/bin$ ./cassandra start -f
Then for the cli, use this command:
./cassandra-cli -h {hostname}/9160.
Then make sure that the configures are ok.