Log spam with "unable to find container named fluentd-gcp" - kubernetes

Last night my Kubernetes cluster on GKE was upgraded to 1.16.8-gke.9. Since then the logs show error: unable to find container named fluentd-gcp every minute. Logging from my applications still works, but I'd like to know what causes this error and how to get rid of this.
Expanding the error yields slightly more details:
{
"textPayload": "error: unable to find container named fluentd-gcp\n",
"insertId": "v1b2u2ldrnswujhz2",
"resource": {
"type": "k8s_container",
"labels": {
"project_id": "foo",
"pod_name": "fluentd-gke-scaler-cd4d654d7-tgg27",
"cluster_name": "foo-cluster",
"container_name": "fluentd-gke-scaler",
"namespace_name": "kube-system",
"location": "us-east1-d"
}
},
"timestamp": "2020-04-24T16:15:40.224944500Z",
"severity": "ERROR",
"labels": {
"gke.googleapis.com/log_type": "system",
"k8s-pod/k8s-app": "fluentd-gke-scaler",
"k8s-pod/pod-template-hash": "cd4d654d7"
},
"logName": "projects/foo/logs/stderr",
"receiveTimestamp": "2020-04-24T16:15:45.923960735Z"
}
kubectl get all --all-namespaces shows fluentd-gke pods with a fluentd-gke container, not fluentd-gcp.
Any advice would be appreciated and I'm happy to post more details, if you tell me where to look for them.
Edit: More details and related problems on the GKE issue tracker: https://issuetracker.google.com/issues/156965162

This will be fixed in GKE 1.16.9-gke.6 according to the issue tracker: https://issuetracker.google.com/issues/156965162

1.16.8-gke.9 is currently being offered through rapid channel. Keep in mind that such a channel is offered on an early access basis for people to test new releases, as such the version offered may be subject to unresolved issues with no known workaround. That said a possible fix could be to drain and migrate your workloads to another node. If issue persists, then create an issue here.

Related

Deploying azure storage fileServices/shares - error: The value for one of the HTTP headers is not in the correct format

As part of a durable function app deployment, I am deploying azure storage.
On deploying the fileServices/shares, I am getting the following error:
error": {
"code": "InvalidHeaderValue",
"message": "The value for one of the HTTP headers is not in the correct format.\nRequestId:6c0b3fb0-701a-0058-0509-a8af5d000000\nTime:2022-08-04T13:49:24.6378224Z"
}
I would appreciate any advice as this is eating up a lot of time and I am no closer to resolving it.
Section of arm template for the share deployment is below:
{
"type": "Microsoft.Storage/storageAccounts/fileServices/shares",
"apiVersion": "2021-09-01",
"name": "[concat(parameters('storageAccount1_name'), '/default/FuncAppName')]",
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts/fileServices', parameters('storageAccount1_name'), 'default')]",
"[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccount1_name'))]"
],
"properties": {
"accessTier": "TransactionOptimized",
"shareQuota": 5120,
"enabledProtocols": "SMB"
}
}
Answer to this: removing the property "accessTier": "TransactionOptimized" resolves the issue. The default value for this is TransactionOptimized.
Although the template exported from azure portal includes this property, deployment fails if this parameter is present.

How to pass a flag to klog for structured logging

As part of kubernetes 1.19, structured logging has been implemented.
I've read that kubernetes log's engine is klog and structured logs are following this format :
<klog header> "<message>" <key1>="<value1>" <key2>="<value2>" ...
Cool ! But even better, you apparently can pass a --logging-format=json flag to klog so logs are generated in json directly !
{
"ts": 1580306777.04728,
"v": 4,
"msg": "Pod status updated",
"pod":{
"name": "nginx-1",
"namespace": "default"
},
"status": "ready"
}
Unfortunately, I haven't been able to find out how and where I should specify that --logging-format=json flag.
Is it a kubectl command? I'm using Azure's aks.
--logging-format=json is a flag which need to be set on all Kuberentes System Components ( Kubelet, API-Server, Controller-Manager & Scheduler). You can check all flags here.
Unfortunately you cant do it right now with AKS as you have the managed control plane from Microsoft.

Cannot create Topic with ARM to Service Bus Namespaces with Geo-Redundant Disaster Recovery

I have created "Service Bus Namespaces with Geo-Redundant Disaster Recovery", which creates 2 premium namespaces with 1 units each as it should. https://github.com/Azure/azure-quickstart-templates/tree/master/101-servicebus-create-namespace-geo-recoveryconfiguration
Then I try to create Topic, but failing. I like to create with own ARM so that any day I can add new Topics. I would like to create several topics here.
This ARM seems to try create new namespace while I would like to use existing namespace created earlier.
https://github.com/Azure/azure-quickstart-templates/tree/master/101-servicebus-topic
New-AzResourceGroupDeployment : 11.05.49 - Resource Microsoft.ServiceBus/namespaces 'sb-namepace-a' failed with message '{
"error": {
"message": "SKU change invalid for ServiceBus namespace. Cannot downgrade premium namespace. CorrelationId: 1111f842-1ddf-417a-a302-
829b6445e30c",
"code": "BadRequest"
}
}'
the error pretty clearly says - you are trying to change the SKU. add the SKU part back and it should work:
"sku": {
"name": "Premium",
"tier": "Premium",
"capacity": 4
},

Can not restore backup to target instance - replicated setup, target instance non replicated setup

When trying to restore a backup to a new cloud sql instance I get the following message when using curl:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "invalidOperation",
"message": "This operation isn\"t valid for this instance."
}
],
"code": 400,
"message": "This operation isn\"t valid for this instance."
}
}
When trying via google cloud console, after clicking 'ok' in the 'restore instance from backup' menu nothing happens.
I'll answer even thought this is a very old question, maybe useful for someone else (would have been for me).
I just had the same exact same error, my problem was that the storage capacity for the target instance was different than the one for the source instance. My source instance was accidentally deleted so this was a bit troublesome to figure out. This check list helped me https://cloud.google.com/sql/docs/postgres/backup-recovery/restore#tips-restore-different-instance

pod is not showing in ready state

I am trying to configure php phabricator example from kubernetes but after creating the replication controller. POD is not showing in ready state ever. It shows in below state:
NAME READY STATUS RESTARTS AGE
phabricator-controller-z0nk3 0/1 CrashLoopBackOff 5 2m
Below is the controller yaml:
{
"kind": "ReplicationController",
"apiVersion": "v1",
"metadata": {
"name": "phabricator-controller",
"labels": {
"name": "phabricator"
}
},
"spec": {
"replicas": 1,
"selector": {
"name": "phabricator"
},
"template": {
"metadata": {
"labels": {
"name": "phabricator"
}
},
"spec": {
"containers": [
{
"name": "phabricator",
"image": "fgrzadkowski/example-php-phabricator",
"ports": [
{
"name": "http-server",
"containerPort": 80
}
]
}
]
}
}
}
}
Can someone please suggest me how to fix this?
This Pod is crash-looping. You can tell because the number of restarts is greater than zero.
kubectl describe pods <pod-name>
Should give further details to help debug. As will
kubectl logs <pod-name>
Actually tracking issues with kubectl describe pods <pod-name> and kubectl logs <pod-name> is indeed the default way to track issues, unfortunately in my case it WASN'T helpful (at first.) All logs were nice or at least were giving no error or clue that something goes wrong.
Readiness and Liveness probes were however showing the app is not passing through...
So where the devil were hiding? In my case increasing values for "initialDelaySeconds" and/or "timeoutSeconds" for Readiness and Liveness probes did the thing.
My first assumption was the app has not enough time to reach "Ready status". However app was still not ready and failed in fact... !!!BUT!!! extending those values increased deployment attempt time and thus I've been able to reach more logs. And what I got??? "Database connection failed attempt due to the timeout". So no connection to the database, and the app is died in fact. Tricky moment is - timeouts are not appearing quickly and you need to wait a bit more ... at least default values for "initialDelaySeconds" and/or "timeoutSeconds" were unable to give me needed time to see the "database connectivity timeout".
When firewall rule was set to allow app talk to the database, issue has gone!