Mesos DCOS doesn't install Kafka - apache-kafka

I'm trying to install Kafka on Mesos. Installation seems to have succeeded.
vagrant#DevNode:/dcos$ dcos package install kafka
This will install Apache Kafka DCOS Service.
Continue installing? [yes/no] yes
Installing Marathon app for package [kafka] version [0.9.4.0]
Installing CLI subcommand for package [kafka] version [0.9.4.0]
New command available: dcos kafka
The Apache Kafka DCOS Service is installed:
docs - https://github.com/mesos/kafka
issues - https://github.com/mesos/kafka/issues
vagrant#DevNode:/dcos$ dcos package list
NAME VERSION APP COMMAND DESCRIPTION
kafka 0.9.4.0 /kafka kafka Apache Kafka running on top of Apache Mesos
But kafka task is not started.
vagrant#DevNode:/dcos$ dcos kafka
Error: Kafka is not running
vagrant#DevNode:/dcos$
Marathon UI says service is waiting. Looks like it is not accepting resource that was allocated to it. More logs here.
Mar 23 03:52:59 ip-10-0-4-194.ec2.internal java[1425]: [2016-03-23 03:52:59,335] INFO Offer ID: [54f71504-b37a-4954-b082-e1f2a04b7fa4-O77]. Considered resources with roles: [*]. Not all basic resources satisfied: cpu not in offer, disk SATISFIED (0.0 <= 0.0), mem not in offer (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-11)
Mar 23 03:52:59 ip-10-0-4-194.ec2.internal java[1425]: [2016-03-23 03:52:59,370] INFO Offer [54f71504-b37a-4954-b082-e1f2a04b7fa4-O77]. Insufficient resources for [/kafka] (need cpus=0.5, mem=307.0, disk=0.0, ports=(1 dynamic), available in offer: [id { value: "54f71504-b37a-4954-b082-e1f2a04b7fa4-O77" } framework_id { value: "54f71504-b37a-4954-b082-e1f2a04b7fa4-0000" } slave_id { value: "54f71504-b37a-4954-b082-e1f2a04b7fa4-S1" } hostname: "10.0.4.190" resources { name: "ports" type: RANGES ranges { range { begin: 1 end: 21 } range { begin: 23 end: 5050 } range { begin: 5052 end: 32000 } } role: "slave_public" } resources { name: "cpus" type: SCALAR scalar { value: 4.0 } role: "slave_public" } resources { name: "mem" type: SCALAR scalar { value: 14019.0 } role: "slave_public" } resources { name: "disk" type: SCALAR scalar { value: 32541.0 } role: "slave_public" } attributes { name: "public_ip" type: TEXT text { value: "true" } } url { scheme: "http" address { hostname: "10.0.4.190" ip: "10.0.4.190" port: 5051 } path: "/slave(1)" }] (mesosphere.mesos.TaskBuilder:marathon-akka.actor.default-dispatcher-11)
Mesos master logs..
Mar 23 15:38:22 ip-10-0-4-194.ec2.internal mesos-master[1371]: I0323 15:38:22.339759 1376 master.cpp:5350] Sending 2 offers to framework 54f71504-b37a-4954-b082-e1f2a04b7fa4-0000 (marathon) at scheduler-f86b567c-de59-4891-916b-fb00c7959a09#10.0.4.194:60450
Mar 23 15:38:22 ip-10-0-4-194.ec2.internal mesos-master[1371]: I0323 15:38:22.341790 1381 master.cpp:3673] Processing DECLINE call for offers: [ 54f71504-b37a-4954-b082-e1f2a04b7fa4-O373 ] for framework 54f71504-b37a-4954-b082-e1f2a04b7fa4-0000 (marathon) at scheduler-f86b567c-de59-4891-916b-fb00c7959a09#10.0.4.194:60450
Mar 23 15:38:22 ip-10-0-4-194.ec2.internal mesos-master[1371]: I0323 15:38:22.342041 1381 master.cpp:3673] Processing DECLINE call for offers: [ 54f71504-b37a-4954-b082-e1f2a04b7fa4-O374 ] for framework 54f71504-b37a-4954-b082-e1f2a04b7fa4-0000 (marathon) at scheduler-f86b567c-de59-4891-916b-fb00c7959a09#10.0.4.194:60450
No sure why marathon didn't like that offer. I'm fairly sure there is enough resource.

Marathon is waiting for an Offer with enough resources for the Kafka scheduler. The offers it is rejecting appear not to have any cpu or memory. The Offer you see which does have sufficient resources is already statically reserved for the Role "slave_public".
The Kafka scheduler will be running with the Role *. Your cluster lacks sufficient resources in the default Role *. This is the role associated with private agents
You should go look at mesos/state and look at the available Resources on agents with the "*" role.

Related

Confluent for Kubernetes can't access kafka externally

I'm trying to create an external access about confluent-kafka from an AKS cluster. I've been able to connect with control center with an Ingress but i can't create the external access from kafka.
I added this to spec of kafka:
listeners:
external:
externalAccess:
type: loadBalancer
loadBalancer:
domain: lb.example.it
advertisedPort: 39093
and it creates two load-balancer services. Then i added the external IPs on my etc/hosts file:
20.31.10.27 kafka.lb.example.it
20.31.9.167 b0.lb.example.it
But when i ceate a nodejs producer i don't know what to put into broker:
const { Kafka } = require('kafkajs')
const kafka = new Kafka({
clientId: 'my-app',
brokers: ['kafka.lb.example.it:39093']
})
const producer = kafka.producer()
console.log("produced")
const asyncOperation = async () => {
console.log("connecting")
await producer.connect()
console.log("connected")
let i = 0
try{
while(true){
await producer.send({
topic: 'topic_prova3',
messages: [
{
key: JSON.stringify("hello"),
value: JSON.stringify({"NUM":i}),
},
]
})
await producer.send({
topic: 'topic_prova3',
messages: [
{
value: JSON.stringify({"NUMERO":i.toString()}),
},
]
})
console.log("sended")
i++
await new Promise(resolve => setTimeout(resolve, 5000));
}
}
catch(err){
console.error("error: " + err)
}
await producer.disconnect()
}
asyncOperation();
This is the log of the error:
{"level":"ERROR","timestamp":"2023-01-22T08:43:20.761Z","logger":"kafkajs","message":"[Connection] Connection error: connect ECONNREFUSED 127.0.0.2:9092","broker":"kafka-0.kafka.ckafka.svc.cluster.local:9092","clientId":"my-app","stack":"Error: connect ECONNREFUSED 127.0.0.2:9092\n at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)"}
The broker should be kafka.ckafka.svc.cluster.local:9092 , instead of kafka-0.kafka.ckafka.svc.cluster.local:9092 but it does it automatically
The load balancer appears to be working (it's not throwing unknown host exception), and it returned a cluster local broker address. See above comment that explains why this happens within Kubernetes with Kafka.
You need to modify, or use, the appropriate advertised.listeners port that corresponds to the external LoadBalancer.
from an AKS cluster.
You should use Azure services to create a public DNS route rather than specifically target one IP of your cluster for any given service in /etc/hosts; especially when you'd deploy multiple replicas of the Kafka pod.

In CDK, can I wait until a Helm-installed operator is running before applying a manifest?

I'm installing the External Secrets Operator alongside Apache Pinot into an EKS cluster using CDK. I'm running into an issue that I think is being caused by CDK attempting to create a resource defined by the ESO before the ESO has actually gotten up and running. Here's the relevant code:
// install Pinot
const pinot = cluster.addHelmChart('Pinot', {
chartAsset: new Asset(this, 'PinotChartAsset', { path: path.join(__dirname, '../pinot') }),
release: 'pinot',
namespace: 'pinot',
createNamespace: true
});
// install the External Secrets Operator
const externalSecretsOperator = cluster.addHelmChart('ExternalSecretsOperator', {
chart: 'external-secrets',
release: 'external-secrets',
repository: 'https://charts.external-secrets.io',
namespace: 'external-secrets',
createNamespace: true,
values: {
installCRDs: true,
webhook: {
port: 9443
}
}
});
// create a Fargate Profile
const fargateProfile = cluster.addFargateProfile('FargateProfile', {
fargateProfileName: 'externalsecrets',
selectors: [{ 'namespace': 'external-secrets' }]
});
// create the Service Account used by the Secret Store
const serviceAccount = cluster.addServiceAccount('ServiceAccount', {
name: 'eso-service-account',
namespace: 'external-secrets'
});
serviceAccount.addToPrincipalPolicy(new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: [
'secretsmanager:GetSecretValue',
'secretsmanager:DescribeSecret'
],
resources: [
'arn:aws:secretsmanager:us-east-1:<MY-ACCOUNT-ID>:secret:*'
]
}))
serviceAccount.node.addDependency(externalSecretsOperator);
// create the Secret Store, an ESO Resource
const secretStoreManifest = getSecretStoreManifest(serviceAccount);
const secretStore = cluster.addManifest('SecretStore', secretStoreManifest);
secretStore.node.addDependency(serviceAccount);
secretStore.node.addDependency(fargateProfile);
// create the External Secret, another ESO resource
const externalSecretManifest = getExternalSecretManifest(secretStoreManifest)
const externalSecret = cluster.addManifest('ExternalSecret', externalSecretManifest)
externalSecret.node.addDependency(secretStore);
externalSecret.node.addDependency(pinot);
Even though I've set the ESO as a dependency to the Secret Store, when I try to deploy this I get the following error:
Received response status [FAILED] from custom resource. Message returned: Error: b'Error from server (InternalError): error when creating "/tmp/manifest.yaml": Internal error occurred:
failed calling webhook "validate.clustersecretstore.external-secrets.io": Post "https://external-secrets-webhook.external-secrets.svc:443/validate-external-secrets-io-v1beta1-clusterse
cretstore?timeout=5s": no endpoints available for service "external-secrets-webhook"\n'
If I understand correctly, this is the error you'd get if you try to add a Secret Store before the ESO is fully installed. I'm guessing that CDK does not wait until the ESO's pods are running before attempting to apply the manifest. Furthermore, if I comment out the lines the create the Secret Store and External Secret, do a cdk deploy, uncomment those lines and then deploy again, everything works fine.
Is there any way around this? Some way I can retry applying the manifest, or to wait a period of time before attempting the apply?
The addHelmChart method has a property wait that is set to false by default - setting it to true lets CDK know to not mark the installation as complete until of its its K8s resources are in a ready state.

How to pass extra configuration to RabbitMQ with Helm?

I'm using this chart: https://github.com/helm/charts/tree/master/stable/rabbitmq to deploy a cluster of 3 RabbitMQ nodes on Kubernetes. My intention is to have all the queues mirrored within 2 nodes in the cluster.
Here's the command I use to run Helm: helm install --name rabbitmq-local -f rabbitmq-values.yaml stable/rabbitmq
And here's the content of rabbitmq-values.yaml:
persistence:
enabled: true
resources:
requests:
memory: 256Mi
cpu: 100m
replicas: 3
rabbitmq:
extraConfiguration: |-
{
"policies": [
{
"name": "queue-mirroring-exactly-two",
"pattern": "^ha\.",
"vhost": "/",
"definition": {
"ha-mode": "exactly",
"ha-params": 2
}
}
]
}
However, the nodes fail to start due to some parsing errors, and they stay in crash loop. Here's the output of kubectl logs rabbitmq-local-0:
BOOT FAILED
===========
Config file generation failed:
=CRASH REPORT==== 23-Jul-2019::15:32:52.880991 ===
crasher:
initial call: lager_handler_watcher:init/1
pid: <0.95.0>
registered_name: []
exception exit: noproc
in function gen:do_for_proc/2 (gen.erl, line 228)
in call from gen_event:rpc/2 (gen_event.erl, line 239)
in call from lager_handler_watcher:install_handler2/3 (src/lager_handler_watcher.erl, line 117)
in call from lager_handler_watcher:init/1 (src/lager_handler_watcher.erl, line 51)
in call from gen_server:init_it/2 (gen_server.erl, line 374)
in call from gen_server:init_it/6 (gen_server.erl, line 342)
ancestors: [lager_handler_watcher_sup,lager_sup,<0.87.0>]
message_queue_len: 0
messages: []
links: [<0.90.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 228
neighbours:
15:32:53.679 [error] Syntax error in /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf after line 14 column 1, parsing incomplete
=SUPERVISOR REPORT==== 23-Jul-2019::15:32:53.681369 ===
supervisor: {local,gr_counter_sup}
errorContext: child_terminated
reason: killed
offender: [{pid,<0.97.0>},
{id,gr_lager_default_tracer_counters},
{mfargs,{gr_counter,start_link,
[gr_lager_default_tracer_counters]}},
{restart_type,transient},
{shutdown,brutal_kill},
{child_type,worker}]
=SUPERVISOR REPORT==== 23-Jul-2019::15:32:53.681514 ===
supervisor: {local,gr_param_sup}
errorContext: child_terminated
reason: killed
offender: [{pid,<0.96.0>},
{id,gr_lager_default_tracer_params},
{mfargs,{gr_param,start_link,[gr_lager_default_tracer_params]}},
{restart_type,transient},
{shutdown,brutal_kill},
{child_type,worker}]
If I remove the rabbitmq.extraConfiguration part, the nodes start properly, so it must be something wrong with the way I'm typing in the policy. Any idea what I'm doing wrong?
Thank you.
According to https://github.com/helm/charts/tree/master/stable/rabbitmq#load-definitions, it is possible to link a JSON configuration as extraConfiguration. So we ended up with this setup that works:
rabbitmq-values.yaml:
rabbitmq:
loadDefinition:
enabled: true
secretName: rabbitmq-load-definition
extraConfiguration:
management.load_definitions = /app/load_definition.json
rabbitmq-secret.yaml:
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-load-definition
type: Opaque
stringData:
load_definition.json: |-
{
"vhosts": [
{
"name": "/"
}
],
"policies": [
{
"name": "queue-mirroring-exactly-two",
"pattern": "^ha\.",
"vhost": "/",
"definition": {
"ha-mode": "exactly",
"ha-params": 2
}
}
]
}
The secret must be loaded into Kubernetes before the Helm chart is played, which goes something like this: kubectl apply -f ./rabbitmq-secret.yaml.
You can use config default of HelmChart
If needed, you can use extraSecrets to let the chart create the secret for you. This way, you don't need to manually create it before deploying a release. For example :
extraSecrets:
load-definition:
load_definition.json: |
{
"vhosts": [
{
"name": "/"
}
]
}
rabbitmq:
loadDefinition:
enabled: true
secretName: load-definition
extraConfiguration: |
management.load_definitions = /app/load_definition.json
https://github.com/helm/charts/tree/master/stable/rabbitmq
Instead of using extraConfiguration, use advancedConfiguration, you should put all these info in this section as it is for classic config format (erlang)

Google container engine cluster showing large number of dns errors in logs

I am using google container engine and getting tons of dns errors in the logs.
Like:
10:33:11.000 I0720 17:33:11.547023 1 dns.go:439] Received DNS Request:kubernetes.default.svc.cluster.local., exact:false
And:
10:46:11.000 I0720 17:46:11.546237 1 dns.go:539] records:[0xc8203153b0], retval:[{10.71.240.1 0 10 10 false 30 0 /skydns/local/cluster/svc/default/kubernetes/3465623435313164}], path:[local cluster svc default kubernetes]
This is the payload.
{
metadata: {
severity: "ERROR"
serviceName: "container.googleapis.com"
zone: "us-central1-f"
labels: {
container.googleapis.com/cluster_name: "some-name"
compute.googleapis.com/resource_type: "instance"
compute.googleapis.com/resource_name: "fluentd-cloud-logging-gke-master-cluster-default-pool-f5547509-"
container.googleapis.com/instance_id: "instanceid"
container.googleapis.com/pod_name: "fdsa"
compute.googleapis.com/resource_id: "someid"
container.googleapis.com/stream: "stderr"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/container_name: "kubedns"
}
timestamp: "2016-07-20T17:33:11.000Z"
projectNumber: ""
}
textPayload: "I0720 17:33:11.547023 1 dns.go:439] Received DNS Request:kubernetes.default.svc.cluster.local., exact:false"
log: "kubedns"
}
Everything is working just the logs are polluted with errors. Any ideas on why this is happening or if I should be concerned?
Thanks for the question, Aaron. Those error messages are actually just tracing/debugging output from the container and don't indicate that anything is wrong. The fact that they get written out as error messages has been fixed in Kubernetes at head and will be better in the next release of Kubernetes.

Fiware Orion - pepProxy

i'm part of a team that is developing an application that uses the Fiware GE's has part of the Smart-AgriFood accelerator.
We are using the Orion Context Broker for gathering the data provided by the sensor network, and we intend to use the Pep-Proxy to authenticate the sensor node for access the Orion instance. We have tried the following pepProxy's:
https://github.com/telefonicaid/fiware-orion-pep
https://github.com/ging/fi-ware-pep-proxy
We only have success implementing the second (fi-ware-pep-proxy) implementation of the proxy. With the fiware-orion-pep we haven't been able to connect to the Keystone Global instance (account.lab.fi-ware.org), we have tried the account.lab... and the cloud.lab..., my question are:
1) is the keystone (IDM) instance for authentication the account.lab or the cloud.lab?? and what port's to use or address's?
2) is the fiware-orion-pep prepared for authenticate at the account.lab.fi-ware.org?? here is way i ask this:
This one works with the curl command at >> cloud.lab.fiware.org:4730/v2.0/tokens
{
"auth": {
"passwordCredentials": {
"username": "<my_user>",
"password": "<my_password>"
}
}
}'
This one does't work with the curl comand at >> account.lab.fi-ware.org:5000/v3/auth/tokens
{
"auth": {
"identity": {
"methods": [
"password"
],
"password": {
"user": {
"domain": {
"name": "<my_domain>"
},
"name": "<my_user>",
"password": "<my_password>"
}
}
}
} }'
3) what is the implementation that i should be using for authenticate the devices or other calls to the Orion instance???
Here are the configuration that i used:
fiware-orion-pep
config.authentication = {
checkHeaders: true,
module: 'keystone',
user: '<my_user>',
password: '<my_password>',
domainName: '<my_domain>',
retries: 3,
cacheTTLs: {
users: 1000,
projectIds: 1000,
roles: 60
},
options: {
protocol: 'http',
host: 'account.lab.fiware.org',
port: 5000,
path: '/v3/role_assignments',
authPath: '/v3/auth/tokens'
}
};
fi-ware-pep-proxy (this one works), i have set the listing port to 1026 at the source code
var config = {};
config.account_host = 'https://account.lab.fiware.org';
config.keystone_host = 'cloud.lab.fiware.org';
config.keystone_port = 4731;
config.app_host = 'localhost';
config.app_port = '10026';
config.username = 'pepProxy';
config.password = 'pepProxy';
// in seconds
config.chache_time = 300;
config.check_permissions = false;
config.magic_key = undefined;
module.exports = config;
Thanks in advance for the time ... :)
The are currently some differences in how both PEP Proxies authenticate and validate against the global instances, so they do not behave in exactly the same way.
The one in telefonicaid/fiware-orion-pep was developed to fulfill the PEP Proxy requirements (authentication and validation against a Keystone and Access Control) in individual projects with their own Keystone and Keypass (a flavour of Access Control) installations, and so it evolved faster than the one in ging/fi-ware-pep-proxy and in a slightly different direction. As an example, the former supports multitenancy using the fiware-service and fiware-servicepath headers, while the latter is transparent to those mechanisms. This development direction meant also that the functionality slightly differs from time to time from the one in the global instance.
That being said, the concrete answer:
- Both PEP Proxies should be able to contact the global instance. If one doesn't, please, fill a bug in the issues of the Github repository and we will fix it as soon as possible.
- The ging/fi-ware-pep-proxy was specifically designed for accessing the global instance, so you should be able to use it as expected.
Please, if you try to proceed with the telefonicaid/fiware-orion-pep take note also that:
- the configuration flag authentication.checkHeaders should be false, as the global instance does not currently support multitenancy.
- current stable release (0.5.0) is about to change to next version (probably today) so maybe some of the problems will solve with the update.
Hope this clarify some of your doubts.
[EDIT]
1) I have already install the telefonicaid/fiware-orion-pep (v 0.6.0) from sources and from the rpm package created following the tutorial available in the github. When creating the rpm package, this is created with the following name pep-proxy-0.4.0_next-0.noarch.rpm.
2) Here is the configuration that i used:
/opt/fiware-orion-pep/config.js
var config = {};
config.resource = {
original: {
host: 'localhost',
port: 10026
},
proxy: {
port: 1026,
adminPort: 11211
} };
config.authentication = {
checkHeaders: false,
module: 'keystone',
user: '<##################>',
password: '<###################>',
domainName: 'admin_domain',
retries: 3,
cacheTTLs: {
users: 1000,
projectIds: 1000,
roles: 60
},
options: { protocol: 'http',
host: 'cloud.lab.fiware.org',
port: 4730,
path: '/v3/role_assignments',
authPath: '/v3/auth/tokens'
} };
config.ssl = {
active: false,
keyFile: '',
certFile: '' }
config.logLevel = 'DEBUG'; // List of component
config.middlewares = {
require: 'lib/plugins/orionPlugin',
functions: [
'extractCBAction'
] };
config.componentName = 'orion';
config.resourceNamePrefix = 'fiware:';
config.bypass = false;
config.bypassRoleId = '';
module.exports = config;
/etc/sysconfig/pepProxy
# General Configuration
############################################################################
# Port where the proxy will listen for requests
PROXY_PORT=1026
# User to execute the PEP Proxy with
PROXY_USER=pepproxy
# Host where the target Context Broker is located
# TARGET_HOST=localhost
# Port where the target Context Broker is listening
# TARGET_PORT=10026
# Maximum level of logs to show (FATAL, ERROR, WARNING, INFO, DEBUG)
LOG_LEVEL=DEBUG
# Indicates what component plugin should be loaded with this PEP: orion, keypass, perseo
COMPONENT_PLUGIN=orion
#
# Access Control Configuration
############################################################################
# Host where the Access Control (the component who knows the policies for the incoming requests) is located
# ACCESS_HOST=
# Port where the Access Control is listening
# ACCESS_PORT=
# Host where the authentication authority for the Access Control is located
# AUTHENTICATION_HOST=
# Port where the authentication authority is listening
# AUTHENTICATION_PORT=
# User name of the PEP Proxy in the authentication authority
PROXY_USERNAME=XXXXXXXXXXXXX
# Password of the PEP Proxy in the Authentication authority
PROXY_PASSWORD=XXXXXXXXXXXXX
In the files above i have tried the following parameters:
Keystone instance: account.lab.fiware.org or cloud.lab.fiware.org
User: pep or pepProxy or "user from fiware account"
Pass: pep or pepProxy or "user password from account"
Port: 4730, 4731, 5000
The result it's the same as before... the telefonicaid/fiware-orion-pep is unable to authenticate:
log file at /var/log/pepProxy/pepProxy
time=2015-04-13T14:49:24.718Z | lvl=ERROR | corr=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | trans=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | op=/v1/updateContext | msg=VALIDATION-GEN-003] Error connecting to Keystone authentication: KEYSTONE_AUTHENTICATION_ERROR: There was a connection error while authenticating to Keystone: 500
time=2015-04-13T14:49:24.721Z | lvl=DEBUG | corr=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | trans=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | op=/v1/updateContext | msg=response-time: 50745 statusCode: 500
result from the client console
{
"message": "There was a connection error while authenticating to Keystone: 500",
"name": "KEYSTONE_AUTHENTICATION_ERROR"
}
I'm doing something wrong here??