AWS CDK CloudFormationInit timeout when installing yum package - aws-cloudformation

I am trying to deploy the CDK stack below:
class MyCdkStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id=EXISTING_VPC_ID)
amzn_linux = ec2.MachineImage.latest_amazon_linux(
generation=ec2.AmazonLinuxGeneration.AMAZON_LINUX_2
)
role = iam.Role(
self, "Role", assumed_by=iam.ServicePrincipal("ec2.amazonaws.com")
)
role.add_managed_policy(
iam.ManagedPolicy.from_aws_managed_policy_name(
"AmazonSSMManagedInstanceCore"
)
)
instance = ec2.Instance(
self,
"Instance",
instance_type=ec2.InstanceType("t3.micro"),
machine_image=amzn_linux,
vpc=vpc,
vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC),
role=role,
init=ec2.CloudFormationInit.from_elements(
ec2.InitPackage.yum("docker"),
),
init_options=ec2.ApplyCloudFormationInitOptions(
timeout=Duration.minutes(5),
ignore_failures=True,
),
)
# Allow ssh connections from anywhere
instance.connections.allow_from_any_ipv4(ec2.Port.tcp(22))
# Elastic IP
eip = ec2.CfnEIP(self, "EIP", instance_id=instance.instance_id)
# Outputs
CfnOutput(self, "EIP Address", value=eip.ref)
The deployment fails after 5 minutes and rolls back with the following error message:
Failed to receive 1 resource signal(s) within the specified duration
Here are possible problems I have considered:
The server might not have outbound internet access (but I have put it on a public subnet).
I've tried using an Amazon Linux 2022 AMI instead.
The 5 minute timeout might not be sufficient (but I have tried increasing to 15 minutes to no avail).
There is something else wrong with my setup (but without the CloudFormationInit stuff the server is created as expected).
Yum installing docker might be impossible (but if I create the server without the CloudFormationInit stuff, I can SSH into the instance and then sudo yum install docker works.
The server is not allowed to send cfg signals (but the raw CloudFormation template created by CDK seems to include the relevant auto-generated user data and permissions, see below):
// Excerpts from autogenerated CDK template json
"UserData": {
"Fn::Base64": {
"Fn::Join": [
"",
[
"#!/bin/bash\n# fingerprint: 7d8f48713aedxxxx\n(\n set +e\n /opt/aws/bin/cfn-init -v --region ",
{
"Ref": "AWS::Region"
},
" --stack ",
{
"Ref": "AWS::StackName"
},
" --resource Instance5FFEF8E4e0ce835dd5aaxxxx -c default\n /opt/aws/bin/cfn-signal -e 0 --region ",
{
"Ref": "AWS::Region"
},
" --stack ",
{
"Ref": "AWS::StackName"
},
" --resource Instance5FFEF8E4e0ce835dd5aaxxxx\n cat /var/log/cfn-init.log >&2\n)"
]
]
}
}
// -----
"RoleDefaultPolicy5FFBxxx": {
"Type": "AWS::IAM::Policy",
"Properties": {
"PolicyDocument": {
"Statement": [
{
"Action": [
"cloudformation:DescribeStackResource",
"cloudformation:SignalResource"
],
"Effect": "Allow",
"Resource": {
"Ref": "AWS::StackId"
}
}
],
"Version": "2012-10-17"
},
"PolicyName": "RoleDefaultPolicy5FFB7xxx",
"Roles": [
{
"Ref": "Role1ABCxxxx"
}
]
},
"Metadata": {
"aws:cdk:path": "xxx/Role/DefaultPolicy/Resource"
}
},
Wondering what else there is left for me to try! Any help would be greatly appreciated. I have that sinking feeling that I've overlooked something obvious...
Edit:
In response to Paolo's comment, here is the full output from cdk synth with identifiers obfuscated.
Resources:
Role1ABCXXXX:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service: ec2.amazonaws.com
Version: "2012-10-17"
ManagedPolicyArns:
- Fn::Join:
- ""
- - "arn:"
- Ref: AWS::Partition
- :iam::aws:policy/AmazonSSMManagedInstanceCore
Metadata:
aws:cdk:path: MyCDK/Role/Resource
RoleDefaultPolicy5FFBXXXX:
Type: AWS::IAM::Policy
Properties:
PolicyDocument:
Statement:
- Action:
- cloudformation:DescribeStackResource
- cloudformation:SignalResource
Effect: Allow
Resource:
Ref: AWS::StackId
Version: "2012-10-17"
PolicyName: RoleDefaultPolicy5FFBXXXX
Roles:
- Ref: Role1ABCXXXX
Metadata:
aws:cdk:path: MyCDK/Role/DefaultPolicy/Resource
InstanceInstanceSecurityGroup698618EC:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: MyCDK/Instance/InstanceSecurityGroup
SecurityGroupEgress:
- CidrIp: 0.0.0.0/0
Description: Allow all outbound traffic by default
IpProtocol: "-1"
SecurityGroupIngress:
- CidrIp: 0.0.0.0/0
Description: from 0.0.0.0/0:22
FromPort: 22
IpProtocol: tcp
ToPort: 22
VpcId: vpc-07848d9441fddea14
Metadata:
aws:cdk:path: MyCDK/Instance/InstanceSecurityGroup/Resource
InstanceInstanceProfile01ECXXXX:
Type: AWS::IAM::InstanceProfile
Properties:
Roles:
- Ref: Role1ABCXXXX
Metadata:
aws:cdk:path: MyCDK/Instance/InstanceProfile
Instance5FFEF8E47f468d710e75XXXX:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: eu-central-1a
IamInstanceProfile:
Ref: InstanceInstanceProfile01ECXXXX
ImageId:
Ref: SsmParameterValueawsserviceamiamazonlinuxlatestamzn2amihvmx8664gp2C96584B6F00A464EAD1953AFF4B05118Parameter
InstanceType: t3.micro
SecurityGroupIds:
- Fn::GetAtt:
- InstanceInstanceSecurityGroup698618EC
- GroupId
SubnetId: subnet-079be82ff7754XXXX
UserData:
Fn::Base64:
Fn::Join:
- ""
- - |-
#!/bin/bash
# fingerprint: 5af534616771e4af
(
set +e
/opt/aws/bin/cfn-init -v --region
- Ref: AWS::Region
- " --stack "
- Ref: AWS::StackName
- |-2
--resource Instance5FFEF8E47f468d710e75XXXX -c default
/opt/aws/bin/cfn-signal -e 0 --region
- Ref: AWS::Region
- " --stack "
- Ref: AWS::StackName
- |-2
--resource Instance5FFEF8E47f468d710e75XXXX
cat /var/log/cfn-init.log >&2
)
DependsOn:
- RoleDefaultPolicy5FFBXXXX
- Role1ABCXXXX
CreationPolicy:
ResourceSignal:
Count: 1
Timeout: PT5M
Metadata:
aws:cdk:path: MyCDK/Instance/Resource
AWS::CloudFormation::Init:
configSets:
default:
- config
config:
packages:
yum:
docker: []
EIP:
Type: AWS::EC2::EIP
Properties:
InstanceId:
Ref: Instance5FFEF8E47f468d710e75XXXX
Metadata:
aws:cdk:path: MyCDK/EIP
CDKMetadata:
Type: AWS::CDK::Metadata
Properties:
Analytics: v2:deflate64:H4sIAAAAAAAA/2VOyQ6CMBD9Fu5lFDwYz8YYTjbwAabWIY6UlnSJIU3/XcDt4OmteXklFFtYZ+LhcnntckUXiI0XsmM1OhOsRDZl50iih1gbhWzf6gW5USTHWf5YpZ0XWiK3piWFiaEsIX5c1qAMlvx4tXXXX//P+FYnfqh4Ssu+sKJHj3YWp+CH4JcX74OJ8dHfjF5tYAdFmd0dUW6D9tQj1C98AstX0JrnXXXX
Metadata:
aws:cdk:path: MyCDK/CDKMetadata/Default
Parameters:
SsmParameterValueawsserviceamiamazonlinuxlatestamzn2amihvmx8664gp2C96584B6F00A464EAD1953AFF4B05118Parameter:
Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2
BootstrapVersion:
Type: AWS::SSM::Parameter::Value<String>
Default: /cdk-bootstrap/hnb659fds/version
Description: Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]
Outputs:
EIPAddress:
Value:
Ref: EIP
Rules:
CheckBootstrapVersion:
Assertions:
- Assert:
Fn::Not:
- Fn::Contains:
- - "1"
- "2"
- "3"
- "4"
- "5"
- Ref: BootstrapVersion
AssertDescription: CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI.à
Edit 2: Here is the init-cloud-output.log.
Cloud-init v. 19.3-45.amzn2 running 'init-local' at Mon, 30 May 2022 10:42:35 +0000. Up 6.48 seconds.
Cloud-init v. 19.3-45.amzn2 running 'init' at Mon, 30 May 2022 10:42:37 +0000. Up 7.60 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
ci-info: | eth0 | True | 10.0.0.156 | 255.255.255.0 | global | 02:6c:e8:e3:39:84 |
ci-info: | eth0 | True | fe80::6c:e8ff:fee3:3984/64 | . | link | 02:6c:e8:e3:39:84 |
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
ci-info: | lo | True | ::1/128 | . | host | . |
ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
ci-info: ++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 10.0.0.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.0.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: | 2 | 169.254.169.254 | 0.0.0.0 | 255.255.255.255 | eth0 | UH |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | 9 | fe80::/64 | :: | eth0 | U |
ci-info: | 11 | local | :: | eth0 | U |
ci-info: | 12 | ff00::/8 | :: | eth0 | U |
ci-info: +-------+-------------+---------+-----------+-------+
Cloud-init v. 19.3-45.amzn2 running 'modules:config' at Mon, 30 May 2022 10:42:38 +0000. Up 9.21 seconds.
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=<repoid> ...
4. Disable the repository permanently, so yum won't use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable <repoid>
or
subscription-manager repos --disable=<repoid>
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn2-core/2/x86_64
Could not retrieve mirrorlist https://amazonlinux-2-repos-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com/2/core/latest/x86_64/mirror.list error was
12: Timeout on https://amazonlinux-2-repos-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com/2/core/latest/x86_64/mirror.list: (28, 'Failed to connect to amazonlinux-2-repos-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com port 443 after 2702 ms: Connection timed out')
May 30 10:42:58 cloud-init[2199]: util.py[WARNING]: Package upgrade failed
May 30 10:42:58 cloud-init[2199]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
May 30 10:42:58 cloud-init[2199]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_package_update_upgrade_install.pyc'>) failed
Cloud-init v. 19.3-45.amzn2 running 'modules:final' at Mon, 30 May 2022 10:42:59 +0000. Up 29.98 seconds.
Unknown error retrieving Instance5FFEF8E4e0ce835dd5aaXXXX
ValidationError: Stack arn:aws:cloudformation:eu-central-1:ACCOUNT_ID:stack/MyCDK/d1772460-e004-11ec-b341-29280531XXXX is in CREATE_FAILED state and cannot be signaled
2022-05-30 10:43:00,475 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.eu-central-1.amazonaws.com
2022-05-30 10:43:00,476 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:44:00,476 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:44:00,476 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:44:00,478 [DEBUG] Sleeping for 0.648091 seconds before retrying
2022-05-30 10:44:01,128 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:45:01,128 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:45:01,128 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:45:01,129 [DEBUG] Sleeping for 2.585657 seconds before retrying
2022-05-30 10:45:03,717 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:46:03,717 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:46:03,718 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:46:03,718 [DEBUG] Sleeping for 4.082728 seconds before retrying
2022-05-30 10:46:07,805 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:47:07,805 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:47:07,806 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:47:07,806 [DEBUG] Sleeping for 11.379097 seconds before retrying
2022-05-30 10:47:19,197 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:48:19,197 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:48:19,197 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:48:19,521 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.eu-central-1.amazonaws.com
2022-05-30 10:48:19,523 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:49:19,524 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:49:19,524 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:49:19,525 [DEBUG] Sleeping for 0.292454 seconds before retrying
2022-05-30 10:49:19,818 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:50:19,818 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:50:19,818 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:50:19,819 [DEBUG] Sleeping for 1.337550 seconds before retrying
2022-05-30 10:50:21,158 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:51:21,158 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:51:21,158 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:51:21,159 [DEBUG] Sleeping for 6.997329 seconds before retrying
2022-05-30 10:51:28,163 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:52:28,164 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:52:28,164 [ERROR] Client-side timeout
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
return f(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
"Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:52:28,164 [DEBUG] Sleeping for 5.279977 seconds before retrying
2022-05-30 10:52:33,450 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
ci-info: no authorized ssh keys fingerprints found for user ec2-user.
Cloud-init v. 19.3-45.amzn2 finished at Mon, 30 May 2022 10:52:33 +0000. Datasource DataSourceEc2. Up 604.40 seconds

The problem was that the instance didn't have internet access (despite being on a public subnet).
The reason for this was that the VPC is not our default VPC, and therefore the public subnet we created did not have Auto-assign public IPv4 address enabled. Enabling this setting fixed the problem.
Phew!

Related

Galera connection issues over haproxy

In our K8 cluster, we use haproxy app for connecting to Galera cluster.
Our haproxy.cnf file looks like
global
maxconn 2048
external-check
stats socket /var/run/haproxy.sock mode 600 expose-fd listeners level user
user haproxy
group haproxy
defaults
log global
mode tcp
retries 10
timeout client 30000
timeout connect 100500
timeout server 30000
frontend mysql-router-service
bind *:6446
mode tcp
option tcplog
default_backend galera_cluster_backend
# MySQL Cluster BE configuration
backend galera_cluster_backend
mode tcp
option tcpka
option mysql-check user haproxy
balance source
server pitipana-opsdb1 192.168.144.82:3306 check weight 1
server pitipana-opsdb2 192.168.144.83:3306 check weight 1
server pitipana-opsdb3 192.168.144.84:3306 check weight 1
Dockerfile for creating haproxy image
FROM haproxy:2.3
COPY haproxy.cfg /usr/local/etc/haproxy/haproxy.cfg
In my Galera nodes, I get constant warning in /var/log/mysql/error.log
2021-12-20 21:16:47 5942 [Warning] Aborted connection 5942 to db: 'ourdb' user: 'ouruser' host: '192.168.1.2' (Got an error reading communication packets)
2021-12-20 21:16:47 5943 [Warning] Aborted connection 5943 to db: 'ourdb' user: 'ouruser' host: '192.168.1.2' (Got an error reading communication packets)
2021-12-20 21:16:47 5944 [Warning] Aborted connection 5944 to db: 'ourdb' user: 'ouruser' host: '192.168.1.2' (Got an error reading communication packets)
I had increased max_packet_size to 64MB and max_connections to 1000.
When I take a tcpdump from galera node :
Frame 16: 106 bytes on wire (848 bits), 106 bytes captured (848 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.1.2, Dst: 192.168.10.3
Transmission Control Protocol, Src Port: 62495, Dst Port: 3306, Seq: 1, Ack: 1, Len: 50
Source Port: 62495
Destination Port: 3306
[Stream index: 2]
[TCP Segment Len: 50]
Sequence number: 1 (relative sequence number)
[Next sequence number: 51 (relative sequence number)]
Acknowledgment number: 1 (relative ack number)
0101 .... = Header Length: 20 bytes (5)
Flags: 0x018 (PSH, ACK)
000. .... .... = Reserved: Not set
...0 .... .... = Nonce: Not set
.... 0... .... = Congestion Window Reduced (CWR): Not set
.... .0.. .... = ECN-Echo: Not set
.... ..0. .... = Urgent: Not set
.... ...1 .... = Acknowledgment: Set
.... .... 1... = Push: Set
.... .... .0.. = Reset: Not set
.... .... ..0. = Syn: Not set
.... .... ...0 = Fin: Not set
[TCP Flags: ·······AP···]
Window size value: 507
[Calculated window size: 64896]
[Window size scaling factor: 128]
Checksum: 0x3cec [unverified]
[Checksum Status: Unverified]
Urgent pointer: 0
[SEQ/ACK analysis]
[Timestamps]
TCP payload (50 bytes)
[PDU Size: 45]
[PDU Size: 5]
MySQL Protocol
Packet Length: 41
Packet Number: 1
Request Command SLEEP
Command: SLEEP (0)
Payload: 820000008000012100000000000000000000000000000000...
[Expert Info (Warning/Protocol): Unknown/invalid command code]
[Unknown/invalid command code]
[Severity level: Warning]
[Group: Protocol]
MySQL Protocol
Packet Length: 1
Packet Number: 0
Request Command Quit
Command: Quit (1)
Here 192.168.1.2 is a K8 worker node and 192.168.10.3 is the galera node.
When I connect our applications in K8, we can access to applications, but when we try to edit, we get stuck.
Any suggestion to fix this?

Not able to run self-hosted sentry on CentOS

I am trying to install self-hosted sentry on CentOS (CentOS Linux release 7.9.2009 (Core)) according to this documentation - https://develop.sentry.dev/self-hosted/.
But when I run sudo ./install.sh I get this error:
Failed to connect to clickhouse:9000
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 260, in connect
return self._init_connection(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 226, in _init_connection
self.socket = self._create_socket(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 202, in _create_socket
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/local/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
Connection to Clickhouse cluster clickhouse:9000 failed (attempt 9)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/clickhouse/native.py", line 81, in execute
result: Sequence[Any] = conn.execute(
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/client.py", line 205, in execute
self.connection.force_connect()
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 180, in force_connect
self.connect()
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 279, in connect
raise err
clickhouse_driver.errors.NetworkError: Code: 210. Name or service not known (clickhouse:9000)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/snuba/snuba/migrations/connect.py", line 30, in check_clickhouse_connections
check_clickhouse(clickhouse)
File "/usr/src/snuba/snuba/migrations/connect.py", line 49, in check_clickhouse
ver = clickhouse.execute("SELECT version()")[0][0]
File "/usr/src/snuba/snuba/clickhouse/native.py", line 96, in execute
raise ClickhouseError(e.code, e.message) from e
snuba.clickhouse.errors.ClickhouseError: [210] Name or service not known (clickhouse:9000)
Failed to connect to clickhouse:9000
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 260, in connect
return self._init_connection(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 226, in _init_connection
self.socket = self._create_socket(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 202, in _create_socket
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/local/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
The error from docker-compose logs clickhouse:
clickhouse_1 | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/sentry.xml'.
clickhouse_1 | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Failed to merge config with '/etc/clickhouse-server/config.d/sentry.xml': Access to file denied: /etc/clickhouse-server/config.d/sentry.xml, Stack trace (when copying this message, always include the lines below):
clickhouse_1 |
clickhouse_1 | 0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0x105351b0 in /usr/bin/clickhouse
clickhouse_1 | 1. ? # 0xdefbd83 in /usr/bin/clickhouse
clickhouse_1 | 2. DB::ConfigProcessor::loadConfig(bool) # 0xdef9e97 in /usr/bin/clickhouse
clickhouse_1 | 3. BaseDaemon::reloadConfiguration() # 0x9157010 in /usr/bin/clickhouse
clickhouse_1 | 4. BaseDaemon::initialize(Poco::Util::Application&) # 0x91597d2 in /usr/bin/clickhouse
clickhouse_1 | 5. DB::Server::initialize(Poco::Util::Application&) # 0x8f96458 in /usr/bin/clickhouse
clickhouse_1 | 6. Poco::Util::Application::run() # 0x10457659 in /usr/bin/clickhouse
clickhouse_1 | 7. DB::Server::run() # 0x8f96045 in /usr/bin/clickhouse
clickhouse_1 | 8. mainEntryClickHouseServer(int, char**) # 0x8f8ce23 in /usr/bin/clickhouse
clickhouse_1 | 9. main # 0x8ee8799 in /usr/bin/clickhouse
clickhouse_1 | 10. __libc_start_main # 0x21b97 in /lib/x86_64-linux-gnu/libc-2.27.so
clickhouse_1 | 11. _start # 0x8ee802e in /usr/bin/clickhouse
clickhouse_1 | (version 20.3.9.70 (official build))
clickhouse_1 | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/sentry.xml'.
clickhouse_1 | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Failed to merge config with '/etc/clickhouse-server/config.d/sentry.xml': Access to file denied: /etc/clickhouse-server/config.d/sentry.xml, Stack trace (when copying this message, always include the lines below):
There is error - Access to file denied: /etc/clickhouse-server/config.d/sentry.xml at clickhouse logs (similar issue - https://forum.sentry.io/t/click-house-giving-permission-errors/12418). How can I set correct permission for this file?
Fixed it by giving 777 permissions for some folders of sentry installation. I know that it can be not desirable for someone but it is easy and fast solution.
sudo chmod -R 777 ./clickhouse
sudo chmod -R 777 ./relay
sudo chmod -R 777 ./sentry
sudo chmod -R 777 ./postgress

Openstack Magnum kube_master: Went to status ERROR due to "Message: Exceeded maximum number of retries

I tried to deploy a kubernetes cluster with one master node and one worker node in openstack magnum. But it returned an error:
ResourceInError:
resources.kube_masters.resources[0].resources.kube-master: Went to
status ERROR due to "Message: Exceeded maximum number of retries.
Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500"
Additional info:
Image used: Fedora-CoreOS 32
Flavor: 4VCPUs, 4GB RAM, 25GB disk
docker storage driver: devicemapper
docker volume size: 10GB
Nova logs:
2021-03-05 02:58:29.180 22 ERROR nova.scheduler.utils
[req-aa6a7c8b-3d9a-4c8d-9d56-dfcb104ae828
53dd56f7fefc4bd38e2e4b1e8dde2b51 a67073edfee14079a6dda119969895c9 -
default default] [instance: 8fd65139-c43e-40bd-836e-0ec57ea78960]
Error from last host: kolla-ceph-compute17 (node
kolla-ceph-compute17): ['Traceback (most recent call last):\n', '
File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2437, in _build_and_run_instance\n
block_device_info=block_device_info)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 3550, in spawn\n mdevs=mdevs)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 6158, in _get_guest_xml\n xml = conf.to_xml()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 79, in to_xml\n root = self.format_dom()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2726, in format_dom\n self._format_devices(root)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2680, in _format_devices\n
devices.append(dev.format_dom())\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 1037, in format_dom\n auth.set("username",
self.auth_username)\n', ' File "src/lxml/etree.pyx", line 815, in
lxml.etree._Element.set\n', ' File "src/lxml/apihelpers.pxi", line
593, in lxml.etree._setAttributeValue\n', ' File
"src/lxml/apihelpers.pxi", line 1525, in lxml.etree._utf8\n',
"TypeError: Argument must be bytes or unicode, got 'NoneType'\n",
'\nDuring handling of the above exception, another exception
occurred:\n\n', 'Traceback (most recent call last):\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2161, in _do_build_and_run_instance\n filter_properties,
request_spec)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2537, in _build_and_run_instance\n
instance_uuid=instance.uuid, reason=six.text_type(e))\n',
"nova.exception.RescheduledException: Build of instance
8fd65139-c43e-40bd-836e-0ec57ea78960 was re-scheduled: Argument must
be bytes or unicode, got 'NoneType'\n"]
2021-03-05 02:59:08.970 20 ERROR nova.scheduler.utils
[req-aa6a7c8b-3d9a-4c8d-9d56-dfcb104ae828
53dd56f7fefc4bd38e2e4b1e8dde2b51 a67073edfee14079a6dda119969895c9 -
default default] [instance: 8fd65139-c43e-40bd-836e-0ec57ea78960]
Error from last host: kolla-ceph-compute18 (node
kolla-ceph-compute18): ['Traceback (most recent call last):\n', '
File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2437, in _build_and_run_instance\n
block_device_info=block_device_info)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 3550, in spawn\n mdevs=mdevs)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 6158, in _get_guest_xml\n xml = conf.to_xml()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 79, in to_xml\n root = self.format_dom()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2726, in format_dom\n self._format_devices(root)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2680, in _format_devices\n
devices.append(dev.format_dom())\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 1037, in format_dom\n auth.set("username",
self.auth_username)\n', ' File "src/lxml/etree.pyx", line 815, in
lxml.etree._Element.set\n', ' File "src/lxml/apihelpers.pxi", line
593, in lxml.etree._setAttributeValue\n', ' File
"src/lxml/apihelpers.pxi", line 1525, in lxml.etree._utf8\n',
"TypeError: Argument must be bytes or unicode, got 'NoneType'\n",
'\nDuring handling of the above exception, another exception
occurred:\n\n', 'Traceback (most recent call last):\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2161, in _do_build_and_run_instance\n filter_properties,
request_spec)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2537, in _build_and_run_instance\n
instance_uuid=instance.uuid, reason=six.text_type(e))\n',
"nova.exception.RescheduledException: Build of instance
8fd65139-c43e-40bd-836e-0ec57ea78960 was re-scheduled: Argument must
be bytes or unicode, got 'NoneType'\n"] root#kolla-infra2:~# cat
/var/log/kolla/magnum/*log | egrep -i "(2021-03-05).*error"
Magnum logs:
2021-03-05 02:18:26.681 19 ERROR
magnum.drivers.heat.k8s_fedora_template_def
[req-bc1f4cad-1636-42b4-ab80-f7e7c3459c5d - - - - -] Failed to load
default keystone auth policy: FileNotFoundError: [Errno 2] No such
file or directory: '/etc/magnum/keystone_auth_default_policy.json'
2021-03-05 02:20:25.721 32 ERROR
magnum.drivers.heat.k8s_fedora_template_def
[req-4e8ac06c-debd-4980-9cbf-7a48684e75e1 - - - - -] Failed to load
default keystone auth policy: FileNotFoundError: [Errno 2] No such
file or directory: '/etc/magnum/keystone_auth_default_policy.json'
2021-03-05 02:45:17.069 27 ERROR
magnum.drivers.heat.k8s_fedora_template_def
[req-205dd35a-5546-4919-875a-cfd15feeeadf - - - - -] Failed to load
default keystone auth policy: FileNotFoundError: [Errno 2] No such
file or directory: '/etc/magnum/keystone_auth_default_policy.json'
2021-03-05 03:00:00.418 6 ERROR magnum.drivers.heat.driver
[req-17a7777b-0e70-4b8f-9199-026cd3f3d2ae - - - - -] Nodegroup error,
stack status: CREATE_FAILED, stack_id:
1671e517-8227-47c3-9c7a-4642ececde33, reason: Resource CREATE failed:
ResourceInError:
resources.kube_masters.resources[0].resources.kube-master: Went to
status ERROR due to "Message: Exceeded maximum number of retries.
Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500" 2021-03-05 03:00:00.621 6
ERROR magnum.drivers.heat.driver
[req-17a7777b-0e70-4b8f-9199-026cd3f3d2ae - - - - -] Nodegroup error,
stack status: CREATE_FAILED, stack_id:
1671e517-8227-47c3-9c7a-4642ececde33, reason: Resource CREATE failed:
ResourceInError:
resources.kube_masters.resources[0].resources.kube-master: Went to
status ERROR due to "Message: Exceeded maximum number of retries.
Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500"
Heat logs:
2021-03-05 02:59:49.270 21 ERROR heat.engine.resource Traceback (most
recent call last): 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resource.py",
line 920, in _action_recorder 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource yield 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resource.py",
line 1033, in _do_action 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource yield self.action_handler_task(action,
args=handler_args) 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/scheduler.py",
line 346, in wrapper 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource step = next(subtask) 2021-03-05 02:59:49.270
21 ERROR heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resource.py",
line 982, in action_handler_task 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource done = check(handler_data) 2021-03-05
02:59:49.270 21 ERROR heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resources/stack_resource.py",
line 409, in check_create_complete 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource return
self._check_status_complete(self.CREATE) 2021-03-05 02:59:49.270 21
ERROR heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resources/stack_resource.py",
line 463, in _check_status_complete 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource action=action) 2021-03-05 02:59:49.270 21
ERROR heat.engine.resource heat.common.exception.ResourceFailure:
ResourceInError: resources[0].resources.kube-master: Went to status
ERROR due to "Message: Exceeded maximum number of retries. Exceeded
max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500" 2021-03-05 02:59:49.270
21 ERROR heat.engine.resource 2021-03-05 02:59:49.291 21 INFO
heat.engine.stack [req-9b34fa4a-da80-45aa-9596-6880e4a5d1ce - admin -
default default] Stack CREATE FAILED
(k8s-test-cluster-nyrxxu2pqvnj-kube_masters-5w5xewm2u4c3): Resource
CREATE failed: ResourceInError: resources[0].resources.kube-master:
Went to status ERROR due to "Message: Exceeded maximum number of
retries. Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500"
Thank you.

Not running RabbitMQ on Linux, can not find the file asn1.app

I installed on CentOs successfully ever. However, here is another CentOs I used, and it failed to stared rabbitMq.
My erlang from here.
[rabbitmq-erlang]
name=rabbitmq-erlang
baseurl=https://dl.bintray.com/rabbitmq/rpm/erlang/20/el/7
gpgcheck=1
gpgkey=https://dl.bintray.com/rabbitmq/Keys/rabbitmq-release-signing-key.asc
repo_gpgcheck=0
enabled=1
this is my erl_crash.dump.
erl_crash_dump:0.5
Sat Jun 23 09:17:30 2018
Slogan: init terminating in do_boot ({error,{no such file or directory,asn1.app}})
System version: Erlang/OTP 20 [erts-9.3.3] [source] [64-bit] [smp:24:24] [ds:24:24:10] [async-threads:384] [hipe] [kernel-poll:true]
Compiled: Tue Jun 19 22:25:03 2018
Taints: erl_tracer,zlib
Atoms: 14794
Calling Thread: scheduler:2
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:2
Scheduler Sleep Info Flags:
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process: <0.0.0>
Current Process State: Running
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ
Current Process Program counter: 0x00007fbd81fa59c0 (init:boot_loop/2 + 64)
Current Process CP: 0x0000000000000000 (invalid)
how to identify this problem ? Thank you.

celery failing on dotcloud deployment with IO Error

Celery is failing on one of my dotcloud deployments, and I'm not sure how to fix. The deployment is almost identical to an existing dotcloud deployment (verified via doing a file diff) which seems to be working ok.
The error I get in djcelery log:
dotcloud#hack-default-www-0:/var/log/supervisor$ more djcelery_error.log
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
/home/dotcloud/env/lib/python2.6/site-packages/djcelery/loaders.py:108: UserWarn
ing: Using settings.DEBUG leads to a memory leak, never use this setting in prod
uction environments!
warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2012-06-04 03:27:32,139: WARNING/MainProcess] -------------- celery#hack-defaul
t-www-0 v2.5.3
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: amqp://root#hack-OQVADQ2K.dotcloud.com:29210//
- ** ---------- . loader: djcelery.loaders.DjangoLoader
- ** ---------- . logfile: [stderr]#INFO
- ** ---------- . concurrency: 2
- ** ---------- . events: ON
- *** --- * --- . beat: OFF
-- ******* ----
--- ***** ----- [Queues]
-------------- . celery: exchange:celery (direct) binding:celery
[Tasks]
. experiments.tasks.pushMessageToIphone
. experiments.tasks.sendTestMessage
[2012-06-04 03:27:32,172: INFO/PoolWorker-1] child process calling self.run()
[2012-06-04 03:27:32,185: INFO/PoolWorker-2] child process calling self.run()
[2012-06-04 03:27:32,188: WARNING/MainProcess] celery#hack-default-www-0 has sta
rted.
[2012-06-04 03:27:35,315: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 2 seconds...
[2012-06-04 03:27:40,374: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 4 seconds...
[2012-06-04 03:27:47,479: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 6 seconds...
[2012-06-04 03:27:56,509: ERROR/MainProcess] Consumer: Connection Error: Socket
Interestingly, the error log of celery cam shows something a bit different. I'm not sure if this is a red herring..
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
[2012-06-04 03:27:31,373: INFO/MainProcess] -> evcam: Taking snapshots with djce
lery.snapshot.Camera (every 1.0 secs.)
Traceback (most recent call last):
File "hack/manage.py", line 14, in
execute_manager(settings)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 459, in execute_manager
utility.execute()
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 74, in run_from_argv
return super(CeleryCommand, self).run_from_argv(argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 67, in execute
super(CeleryCommand, self).execute(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 232, in execute
output = self.handle(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/comma
nds/celerycam.py", line 26, in handle
ev.run(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 38, in run
detach=detach)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 70, in run_evcam
return cam()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/snapshot.py
", line 116, in evcam
recv.capture(limit=None)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 204, in capture
list(self.itercapture(limit=limit, timeout=timeout, wakeup=wakeup))
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 193, in itercapture
with self.consumer(wakeup=wakeup) as consumer:
File "/usr/lib/python2.6/contextlib.py", line 16, in __enter__
return self.gen.next()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 185, in consumer
queues=[self.queue], no_ack=True)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
279, in __init__
self.revive(self.channel)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
286, in revive
channel = channel.default_channel
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 581, in default_channel
self.connection
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 574, in connection
self._connection = self._establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 533, in _establish_connection
conn = self.transport.establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 279, in establish_connection
connect_timeout=conninfo.connect_timeout)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 89, in __init__
super(Connection, self).__init__(*args, **kwargs)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 144, in __init__
(10, 30), # tune
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/abstra
ct_channel.py", line 95, in wait
self.channel_id, allowed_methods)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 202, in _wait_method
self.method_reader.read_method()
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/method
_framing.py", line 221, in read_method
raise m
IOError: Socket closed
My supervisord file:
[program:djcelery]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
As mentioned, I have nearly identical code deployed under a different dotcloud account that is working fine.
Status of the rabbitmq broker:
$ ./dotcloud info hack.broker
aliases:
- hackxxxx.dotcloud.com
config:
password: xxxx
rabbitmq_management: true
user: root
created_at: 1338702527.075196
datacenter: Amazon-us-east-1c
image_version: 924a079b622a (latest)
memory: 49M/512M (9%)
ports:
- name: ssh
url: ssh://dotcloud#hackxxx.dotcloud.com:29209
- name: amqp
url: amqp://root:xxxx#hackxxxx.dotcloud.com:29210
- name: http
url: http://root:xxx#hack1-xxxx.dotcloud.com/
state: running
type: rabbitmq
It looks like it is having an issue connection to your broker. Have you confirmed that you can connect to your broker, and it is up and running?
What are you using for a broker?