My container executes successfully, but provision status is "Failed" and system logs say ContainerCrashing; why? - azure-container-apps

I have an Azure Container App that continuously fails at provisioning and the logs continuously report "ContainerCrashing", however from my end I'm seeing that the container is executing and terminating successfully.
The only difference between this app and others is that this one has a cron scale rule like so:
"scale": {
"minReplicas": 0,
"maxReplicas": 1,
"rules": [
{
"name": "cron",
"custom": {
"type": "cron",
"metadata": {
"timezone": "America/Los_Angeles",
"start": "0-59/5 * * * *",
"end": "1-59/5 * * * *",
"desiredReplicas": "1"
}
}
}
]
}
I'm guessing this is the culprit. My intent is to have an app that runs one instance to completion every five minutes. But because I can't do exactly that, I've set the rules to trigger the container to scale up from 0 to 1 instance every five minutes, then back off to 0 after 1 minute of expected run time.
It was my understanding that if I have a container that's still running after the back off period, then it'll be allowed to complete. Or if my container terminates successfully then it'll just scale itself back to 0 or restart no problem. But based on this ContainerCrashing I'm imagining the possibility that my app is somehow not behaving in a way that's compatible with the cron rules.
FYI, the troubleshooting center shows no issues with KEDA scaling and my console logs show no issues with my app.
Can anyone help me understand what might be happening?

Related

Is it possible, and how to limit kubernetes job to create a maxium number of pods if always fail?

As a QA in our company I am daily user of kubernetes, and we use kubernetes job to create performance tests pods. One advantage of job, according to the docs, is
to create one Job object in order to reliably run one Pod to completion
But in our tests this feature will create infinite pods if previous ones fail, which will occupy resources of our team's shared cluster, and deleting such pods will take a lot of time. see this image:
Currently the job manifest is like this:
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "upgradeperf",
"namespace": "ntg6-grpc26-tts"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "upgradeperfjob",
"image":
"mycompany.com:5000/ncs-cd-qa/upgradeperf:0.1.1",
"command": [
"python",
"/jmeterwork/jmeter.py",
"-gu",
"git#gitlab-pri-eastus2.dev.mycompany.net:mobility-ncs-tools/tts-cdqa-tool.git",
"-gb",
"upgradeperf",
"-t",
"JMeter/testcases/ttssvc/JMeterTestPlan_ttssvc_cmpsize.jmx",
"-JtestDataFile",
"JMeter/testcases/ttssvc/testData/avaml_opus.csv",
"-JthreadNum",
"3",
"-JthreadLoopCount",
"1500",
"-JresultsFile",
"results_upgradeperf_cavaml_opus_t3_l1500.csv",
"-Jhost",
"mtl-blade32-03.mycompany.com",
"-Jport",
"28416"
]
}
],
"restartPolicy": "Never",
"imagePullSecrets": [
{
"name": "docker-registry-secret"
}
]
}
}
}
}
In some cases, such as misconfiguring of ip/ports, 'reliably run one Pod to completion' is impossible and recreating pods is waste of time and resource.
So is it possible, and how to limit kubernetes job to create a maxium number(say 3) of pods if always fail?
Depending on your kubernetes version, you can resolve this problem with these methods:
set the option: restartPolicy: OnFailure, then the failed container will be restarted in the same Pod, so you will not get lots of failed Pods, instead you will get a Pod with lots of restart.
From Kubernetes 1.8 on, There is a parameter backoffLimit to control the restart policy of failed job. This parameter defines the retry times of the job before treating the job to be failed, default 6 times. For this parameter to work you must set the parameter restartPolicy: Never .
You probably didn't set restartPolicy: Never in your pod spec, add that and I would expect it matches your expected behaviors better.

Azure Service Fabric gets terminated in local cluster when running tests

So we are using Azure Service Fabric and gets a weird behavior when trying to run API tests against my local development cluster.
Every time I start the test the app gets terminated, sometimes it gets restarted again but most often it just stays terminated (and even deleted from the cluster).
I guess its somehow connected against that when I run the API test it will run and build stuff that the service fabric is using, but since the outcome is different depending on something (maybe the sun?) it feels like I am either missing something or experience a bug with service fabric.
Do anyone have any idea? Consider me as a noob and assume that I have done something wrong myself (I am doing that atleast).
UPDATE
There was a question on how we do run our tests:
Starts 2 instances of Visual Studio
Open the same .sln in both of them
Start the Service Fabric project.
Wait until cluster reports OK.
Run api test through a unit test (both service bus tests and REST tests) with Resharper test runner
Now we get the messages that is attached in diagnostics.
Diagnostics:
Event #1
{
"Timestamp": "2018-10-16T08:14:03.0590414+02:00",
"ProviderName": "Microsoft-ServiceFabric",
"Id": 23083,
"Message": ApplicationHostTerminated: ApplicationId=fabric:/<MyService>, ServiceName=fabric:/<MyService>, ServicePackageName=<MyPackage>, ServicePackageActivationId=8f36ac97-9271-4a49-94ce-dd296aebffa5, IsExclusive=True, CodePackageName=Code, EntryPointType=Exe, ExeName=MyExe, ProcessId=24568, HostId=d2a820b5-5b4d-42af-ae87-350028a3fa72, ExitCode=3221225786, UnexpectedTermination=False, StartTime=10/16/2018 08:12:14. ",
"ProcessId": 22660,
"Level": "Informational",
"Keywords": "0x4000000000000001",
"EventName": "Hosting",
"ActivityID": null,
"RelatedActivityID": null,
"Payload": {
"eventInstanceId": "\"07f15452-2f75-49e3-ad5d-d16ea49bdc8f\"",
"applicationName": "MyAppName",
"ServiceName": "fabric:/MyServiceName",
"ServicePackageName": "MyPackageName",
"ServicePackageActivationId": "8f36ac97-9271-4a49-94ce-dd296aebffa5",
"IsExclusive": true,
"CodePackageName": "Code",
"EntryPointType": 1,
"ExeName": "MyExe",
"ProcessId": 24568,
"HostId": "d2a820b5-5b4d-42af-ae87-350028a3fa72",
"ExitCode": 3221225786,
"UnexpectedTermination": false,
"StartTime": "\"\/Date(1539670334917)\/\""
}
}
Event #2
{
"Timestamp": "2018-10-16T08:14:02.3557708+02:00",
"ProviderName": "Microsoft-ServiceFabric",
"Id": 29625,
"Message": "Application deleted: Application = fabric:/MyApp, Application Type = MyServiceType ",
"ProcessId": 22660,
"Level": "Informational",
"Keywords": "0x4000000000000001",
"EventName": "CM",
"ActivityID": null,
"RelatedActivityID": null,
"Payload": {
"eventInstanceId": "\"ca608cec-8d55-4606-a331-8ebfcfff8fa6\"",
"applicationName": "fabric:/MyAppName",
"applicationTypeName": "MyAppTypeName",
"applicationTypeVersion": "1.0.0"
}
}
I think you can experience a side effect of Application Debug Mode set for your .sfproj.
By default Application Debug Mode is set to Refresh Application (which if you are using 5-node cluster is automatically changed to Remove Application) or Remove Application debug mode. This instructs Visual Studio to recreate the application for each debugging session and remove it when session ends.
Changing it to Keep Application should prevent the Visual Studio from recreating the application during debugging session.

write log to Application insights from local service fabric

I am trying to integrate Azure App insights service into the service fabric app for logging and instrumentation. I am running fabric code on my local VM. I exactly followed the document here [scenario 2]. Other resources on learn.microsoft.com also seem to indicate the same steps. [ex: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-event-aggregation-eventflow
For some reason, I don’t see any event entries in App insights. No errors in code when I do this:
ServiceEventSource.Current.ProcessedCountMetric("synced",sw.ElapsedMilliseconds, crc.DataTable.Rows.Count);
eventflowconfig.json contents
{
"inputs": [
{
"type": "EventSource",
"sources": [
{ "providerName": "Microsoft-ServiceFabric-Services" },
{ "providerName": "Microsoft-ServiceFabric-Actors" },
{ "providerName": "mystatefulservice" }
]
}
],
"filters": [
{
"type": "drop",
"include": "Level == Verbose"
}
],
"outputs": [
{
"type": "ApplicationInsights",
// (replace the following value with your AI resource's instrumentation key)
"instrumentationKey": "XXXXXXXXXXXXXXXXXXXXXX",
"filters": [
{
"type": "metadata",
"metadata": "metric",
"include": "ProviderName == mystatefulservice && EventName == ProcessedCountMetric",
"operationProperty": "operation",
"elapsedMilliSecondsProperty": "elapsedMilliSeconds",
"recordCountProperty": "recordCount"
}
]
}
],
"schemaVersion": "2016-08-11"
}
In ServiceEventSource.cs
[Event(ProcessedCountMetricEventId, Level = EventLevel.Informational)]
public void ProcessedCountMetric(string operation, long elapsedMilliSeconds, int recordCount)
{
if (IsEnabled())
WriteEvent(ProcessedCountMetricEventId, operation, elapsedMilliSeconds, recordCount);
}
EDIT
Adding diagnostics pipeline code from "Program.cs in fabric stateful service
using (var diagnosticsPipeline =
ServiceFabricDiagnosticPipelineFactory.CreatePipeline($"{ServiceFabricGlobalConstants.AppName}-mystatefulservice-DiagnosticsPipeline")
)
{
ServiceRuntime.RegisterServiceAsync("mystatefulserviceType",
context => new mystatefulservice(context)).GetAwaiter().GetResult();
ServiceEventSource.Current.ServiceTypeRegistered(Process.GetCurrentProcess().Id,
typeof(mystatefulservice).Name);
// Prevents this host process from terminating so services keep running.
Thread.Sleep(Timeout.Infinite);
}
Event Source is a tricky technology, I have been working with it for a while and always have problems. The configuration looks good, It is very hard to investigate without access to the environments, so I will make my suggestions.
There are a few catches you must be aware of:
If you are listening etw events from a different process, your process must be running with a user with permissions on 'Performance Log Users. Check which identity your service is running on and if it is part of performance log users, who has permissions to create event sessions to listen for these events.
Ensure the events are being emitted correctly and you can listen to them from Diagnostics Events Window, if it is not showing there, there is a problem in the provider.
For testing purposes, comment out the line if (IsEnabled()). it is an internal check to validate if your events should be emitted. I had situations where it is always false and skip the emit of events, probably it cache the results for a while, the docs are not clear how it should work.
Whenever possible, use the EventSource from the nuget package instead of the framework one, the framework version is full of bugs and lack fixes found in the nuget version.
Application Insights are not RealTime, sometimes it might take a few minutes to process your events, I would recommend to output the events to a console or file and check if it is listening correctly, afterwards, enable the AppInsights.
The link you provide is quite outdated and there's actually a much better way to log application error and exception info to Application insights. For example, the above won't help with tracking the call hierarchy of an incoming request between multiple services.
Have a look at the Microsoft App Insights Service Fabric nuget packages. It works great for:
Sending error and exception info
Populating the application map with all your services and their dependencies (including database)
Reporting on app performance metrics,
Tracing service call dependencies end-to-end,
Integrating with native as well as non-native SF applications

Automatically schedule future deployment in Octopus

Update: I found executing script on the octopus server is now available in version 3.3, I haven't update my octopus yet but I will take that would work as designed. I'm still wondering if there is a better way to do this without octo.exe?
The task I'm trying to accomplish is after each successful production deployment, automatically schedule a DR deployment to happen next 24 hours.
My desired approach is have octopus do it.
I added a new Octopus step at the end of the deployment only runs upon success of previous step. I attempted to use octo deploy-release --deployAt can be found here in the newly created step.
My challenge is, a script step requires me to pick a target role, which means it will be executed on a tentacle. Also, presence of Octo.exe is required.
I tried to create my own octopus step template, a deployment target role is still required in my customized step.
{
"Id": "ActionTemplates-2",
"Name": "Octopus - Schedule Deployment",
"Description": "Schedule a future octopus deployment",
"ActionType": "Octopus.Script",
"Version": 3,
"Properties": {
"Octopus.Action.Script.Syntax": "PowerShell",
"Octopus.Action.Script.ScriptBody": "--hide--"
},
"SensitiveProperties": {},
"Parameters": [
{
"Name": "OctoPath",
"Label": "Path for Octo.exe",
"HelpText": "Location for octo.exe",
"DefaultValue": null,
"DisplaySettings": {
"Octopus.ControlType": "SingleLineText"
}
},
{
"Name": "projName",
"Label": "Project Name",
"HelpText": "The name of the project should be deployed",
"DefaultValue": null,
"DisplaySettings": {
"Octopus.ControlType": "SingleLineText"
}
},
{
"Name": "days",
"Label": "Days",
"HelpText": "The days in future this deployment would happen",
"DefaultValue": null,
"DisplaySettings": {
"Octopus.ControlType": "SingleLineText"
}
},
{
"Name": "hours",
"Label": "Hours",
"HelpText": "The hours in future this deployment would happen",
"DefaultValue": null,
"DisplaySettings": {
"Octopus.ControlType": "SingleLineText"
}
},
{
"Name": "env",
"Label": "Environment to deploy",
"HelpText": "The environment next deployment should happen",
"DefaultValue": null,
"DisplaySettings": {
"Octopus.ControlType": "SingleLineText"
}
}
],
"$Meta": {
"ExportedAt": "2016-04-20T13:58:54.263Z",
"OctopusVersion": "3.2.0",
"Type": "ActionTemplate"
}
}
Is there a way to alter the template to get rid of the role selection and have octopus server directly execute it as it does for Azure script step?
Is there any another way we can have octopus server automatically schedule the deployment without external help? I guess this go back to first problem, I may still need octopus to run something on the server side.
Note: We kick off production deployment manually, thus I don't have another tool waiting for the response of the deployment. I think it is possible to have a process regularly call out the last deployment and do some analysis then schedule new deployment accordingly but this is not as clean as have octopus do it directly. Injecting octo.exe to a random production machine is not desired at all
You could create new WebAPI project in C#, pull in the Octopus.Deploy nuget package,
write code that accepts HTTP requests, and deals with the scheduling logic.
Host that project on the same server as Octopus server itself. Should be 20-30 minute job to set the website up in IIS.
In your deployment process, add step that creates http request, and done. You could go even one step further, and have the site/service listen for every successful deployment, and do decisions based on that, such that other projects don't have to add extra steps to octopus deployment process.
As you said, polling is also viable option.
Alternatively, if you're on Octopus deploy 3.0, they already expose REST API, I am not sure if it's powerful enough to allow you create scheduled deployment, but you could explore that: https://github.com/OctopusDeploy/OctopusDeploy-Api/wiki/Releases
I agree floating octo.exe in production servers is bad idea. It might get out of sync, and your production server shouldn't deal with this.

pod is not showing in ready state

I am trying to configure php phabricator example from kubernetes but after creating the replication controller. POD is not showing in ready state ever. It shows in below state:
NAME READY STATUS RESTARTS AGE
phabricator-controller-z0nk3 0/1 CrashLoopBackOff 5 2m
Below is the controller yaml:
{
"kind": "ReplicationController",
"apiVersion": "v1",
"metadata": {
"name": "phabricator-controller",
"labels": {
"name": "phabricator"
}
},
"spec": {
"replicas": 1,
"selector": {
"name": "phabricator"
},
"template": {
"metadata": {
"labels": {
"name": "phabricator"
}
},
"spec": {
"containers": [
{
"name": "phabricator",
"image": "fgrzadkowski/example-php-phabricator",
"ports": [
{
"name": "http-server",
"containerPort": 80
}
]
}
]
}
}
}
}
Can someone please suggest me how to fix this?
This Pod is crash-looping. You can tell because the number of restarts is greater than zero.
kubectl describe pods <pod-name>
Should give further details to help debug. As will
kubectl logs <pod-name>
Actually tracking issues with kubectl describe pods <pod-name> and kubectl logs <pod-name> is indeed the default way to track issues, unfortunately in my case it WASN'T helpful (at first.) All logs were nice or at least were giving no error or clue that something goes wrong.
Readiness and Liveness probes were however showing the app is not passing through...
So where the devil were hiding? In my case increasing values for "initialDelaySeconds" and/or "timeoutSeconds" for Readiness and Liveness probes did the thing.
My first assumption was the app has not enough time to reach "Ready status". However app was still not ready and failed in fact... !!!BUT!!! extending those values increased deployment attempt time and thus I've been able to reach more logs. And what I got??? "Database connection failed attempt due to the timeout". So no connection to the database, and the app is died in fact. Tricky moment is - timeouts are not appearing quickly and you need to wait a bit more ... at least default values for "initialDelaySeconds" and/or "timeoutSeconds" were unable to give me needed time to see the "database connectivity timeout".
When firewall rule was set to allow app talk to the database, issue has gone!