Dags in Airflow UI do not show on/off and keep on loading - kubernetes

I installed airflow using Kubernetes and have login the airflow UI. It shows all dags, but they are not shown correctly.
1/ There is no on/off buttons on the left of Dag name, it just show empty checkbox.
2/ The "Recent Tasks" and "DAG Runs" columns look like they are trying to load something;
3/ If I click and therefore go to any of DAG, it looks like it tries to load something;
I tried both airflow 2.0.0 and 1.10.11 and they show the same so it is not because of version.
What is the problem of the airflow and how to fix that?
------- here I provide more information according to Ofek Hod's suggestion:
1/ run "kubectl logs <pod_id> webserver", after I login airflow web UI, I got many http 404 response. e.g.
after I click any dag in airflow WebUI I got some other 404 response

Airflow interprets all your .py files in your dags folder first, I guess something goes wrong there.
As a rule-of-thumb, first access webserver and scheduler logs (for kub kubectl logs?), maybe you can find a hint there.
If not, try first to make a "clean" airflow instance without any of your dags code or related .py files- point the dags folder to an empty directory and see what happens (better if you turn on example dags configuration).
If that works, add your .py files from the original dag folder incrementally until you find problematic code.
If it's not working, probably the scheduler or webserver are messed up, please check the logs again with better attention.

find the answer myself:
The package I use to setup k8s-airflow has a step to run ./airflow/www/compile_assets.sh using npm but the package missed the step to install npm. so I added "apt install -y npm" in the step and now I see airflow page correctly.

Related

`gcloud run deploy` raises "Revision <revision_name> is not ready and cannot serve traffic."

Command
gcloud run deploy api --region=$REGION --image=$IMAGE
Logs
Deploying container to Cloud Run service [api] in project [[MASKED]] region [[MASKED]]
Deploying...
Creating Revision...........interrupted
Deployment failed
ERROR: (gcloud.run.deploy) Revision [[MASKED]] is not ready and cannot serve traffic.
I've tried to search Google Cloud documentation, but it does not mention such problem.
How to solve the "Revision is not ready and cannot serve traffic."?
Try to wait a few minutes and then just re-launch the procedure. The good old "let's retry without changing anything" worked for me! :)
EDIT: I talked with a Cloud Architect who works with me and he told me that this is the actual solution, because if you retry too quickly to restart the deploy, GCP may still have some pending operations from the previous one!
I faced the same error in Cloud Run after getting the container working correctly locally. In my case the revisions weren't showing as failing, they had a grey checkmark
and when hovering I got the message
The revision is healthy but not currently serving traffic.
I just needed to click Manage Traffic and set 100% of the traffic to a new revision
I faced this problem as well. In my case I checked "Cloud Run" section from hamburger menu of google cloud console. The "Logs" section should give you more idea about what went wrong. I was missing a python library, and adding correct python dependency in my requirements.txt solved the issue for me. Somehow my local testing went well without this issue. I hope this helps. :)
I faced with this problem, my problem is that my docker image is missing required dependency package at build stage, my Dockerfile missed some steps to copy required files for preparing to install package.
To find you problem if cloud build logs was not make sense for you, I think you should:
From gcloud console, go to service "Container Registry" > Images
Select your repository name
From the image version (maybe latest) that you want to check > more actions > show pull command > then copy that command ex: docker pull gcr.io/..
From gcloud console header > select activate cloud shell
At cloud shell terminal, pull docker images of your latest build by running "pull command" that you copied before.
Start your container from this image to see what exactly happens with your run revision

How to test the validity of alertmanager.yaml

Is there a way to find why my alert manager configurations are not being applied?
from the doc, the reason for that is that the file is not valid.
Alertmanager can reload its configuration at runtime. If the new configuration is not well-formed, the changes will not be applied and an error is logged. A configuration reload is triggered by sending a SIGHUP to the processor sending an HTTP POST request to the /-/reload endpoint.
Blockquote
I am trying to find a way to test the validity of my alertmanager.yaml
I came through the amtool git repo then it takes me through installing the alert manager itself where amtool is included inside.
Ok, I did that and I got the amtool.exe downloaded inside the alert manager package.
I added my file in the config folder that is supposed to be scanned by the tool. But I didn't get an answer from it, it keeps shutting its console screen without showing any log.
Second,
I have installed the Prometheus stack Helm chart on my k8s cluster from the Prometheus stack community git repo, how do I find the amtool inside?
No code sample it is not a code issue.
Thanks everyone.

Does anyone have tried the HLF 2.0 feature "External Builders and Launchers" and wants to get in touch?

I'm getting my way through the HLF 2.0 docs and would love to discuss and try out the new features "External Builders and Launchers" and "Chaincode as an external service".
My goal is to run HLF2.0 on an K8s cluster (OpenShift). Does anyone wants to get in touch or has anyone already figured his way through?
Cheers from Germany
Also trying to use the ExternalBuilder. Setup core.yaml, rebuilt the containers to use it. I get an error that on "peer lifecycle chaincode install .tgz...", that the path to the scripts in core.yaml can not be found.
I've added volume bind commands in the peer-base.yaml, and in docker-compose-cli.yaml, and am using the first-network setup. Dropped out the part of the byfn.sh that would connect to the cli container, so that I do that part manually, do the create, join, update anchors successfully, and then try to do the install and fail. However, on the install, I'm failing on the /bin/detect, because it can't find that file to fork/exec it. To get that far, peer was able to read my external configuration, and read the core.yaml file. At the moment, trying the "mode: dev" in the core.yaml which seems to indicate that the scripts and the chaincode will be run "locally", which I think means it should run in the cli container. Otherwise, tried to walk the code to see how the docker containers are being created dynamically, and from what image, but haven't been able to nail that down yet.

Running mapbox-gl-js locally (unable to serve debug page)

Edit:
Summary, I tried to follow only the steps listed in the below two links as applies to windows:
https://github.com/mapbox/mapbox-gl-js/blob/master/CONTRIBUTING.md
https://github.com/stackgl/headless-gl#windows
Here I have reattached the screenshot of the commands that I had problems with:
https://imgur.com/RCQCNU5
One more step I took that I should mention is I also did not find the headless gl when I downloaded the repository, when the install headless gl command did not work I manually copied the file and put it in my local copy under the nodemodules directory thinking it would work but it didnt solve anything. I do think this is related to access issues but I dont know what else I should try to get it working?
First, let's clarify your problem: you want a version of mapbox-gl.js which contains a recently fixed bug.
Your best option is to just wait a couple of weeks for a release.
Failing that, you should build your own, from master. You don't need to set up a debug server for that. You can skip straight to the "Creating a Standalone Build" section.
If the steps for building on Windows don't work for some reason, you could set up a local virtual machine running Ubuntu and use that.
But honestly, just wait a couple of weeks. :)
Just in case some one else need to run this on local server.
After clone
Run npm install
npm run start-debug
It will start listening on port 9966.
Test the debug html files entering to
localhost:9966/debug/FILE_NAME_TO_TEST.html

Jenkins - Plugins not installing and jobs and features are missing after upgrade.

I'm using latest Jenkins (v 1.590) LOL, but Jenkins official site say: 1.588. I'm 200% sure that I did see 1.589 and 1.590 few days back on Jenkins official download site(when I wanted to upgrade Jenkins to newer version).
This is what I see at the bottom of my Jenkins instance page.
Page generated: Nov 19, 2014 12:07:51 PMREST APIJenkins ver. 1.590
Now, the issue I'm facing is: Since I upgraded few of the plugins and Jenkins itself recently, some of the jobs are missing (I see this can happen during upgrades but upgrading to latest Jenkins should fix it and I'm two steps ahead of what Jenkins has on their official site, right):
I go to Manage Jenkins, Manage Plugins, Go to Available tab, check mark bunch of plugins to install (Artifactory, Maven project plugin etc ) and restart Jenkins using Jenkins GUI interface (which happens automatically once plugins are downloaded/installed in Jenkins GUI). After the restart, I do the same to see whether the plugin is now showing under "Installed" tab or not, but to my luck, it's still showing under "Available tab" and is NOT listed under "Installed" tab. If I open an existing job's configuration OR create a new job, the features available due to installed plugins are NOT visible i.e. if I installed Maven Project Plugin, I don't see an option to create a Maven style(2/3) project job while creating a new job.
I see valid .jpi files for respective plugins in plugins folder in JENKINS_HOME and there are some .pinned files as well. I have tried this a couple of times but the plugins are not visible once installed. Installation doesn't give any error during the whole operation.
Jenkins system log file (upon Jenkins restart) is attached (NOTE: Use slow download button to see/download this log file).
Download at SpeedyShare
or
[code]http://speedy.sh/x6vd8/Jenkins.System.Log[/code]
Issue was with the plugins permissions and expanded folders.
If you see under the plugins folder, you'll see .jpi or .hpi files (Jenkins jpi and Hudson hpi).
If I have awesomeplugin.jpi then there will be a folder called awesomeplugin.
Using Slav's hint, I ran bunch of checks and found out of 70+ plugins that I had installed, few of them somehow got "root" and "root" as their owner and group for their .jpi files and corresponding folder.
Now, The best solution one can try (the safest approach) is to chown -R yourvalidjenkinsuser:yourvalidgroup * and chmod -R 755 * as root. Before doing this, stop/shutdown jenkins.
I went even a step further, I first took backup of config files / whole jenkins JENKINS_HOME folder. Then I went to plugins folder and remove all .jpi corresponding folders using root account or as the owner of those folders (NOTE, I didn't delete the .jpi files). Then, I ran the above two commands (chown/chmod) and started Jenkins.
OUTCOME:
when I'm going to Jenkins > New item (to create a new job), Shenzi, all different types of jobs options are showing up (which included the Maven2/3 type job which I found got missing and few others like "Multi-configuration project" and Multijob Project job type.. all were missing and now they are showing up.
OK, I also checked one of the old job, went to its job's configuration and Shenzi!! I now see all the features there i.e. (Promoted Job plugin feature "Promtoe builds when.." check box. This feature which I configured sometime back, got missing but now it's showing up again.
Few of the Maven jobs that I created in past with Maven Release Plugin and Release Plugin POC work had bunch of steps in it. I found there was nothing in the Build step (after this whole mess), but after the above solution, I now see everything is back. I can see the configurations and build steps populated as they were set.
I hope this can help someone facing similar issue.
Still, I don't know why my Jenkins version is 1.590 (which Jenkins updated recently in automatic fashion) and Jenkins site today says, their latest Jenkins artifact is version 1.588 (seems like a mystery).
When you say "valid .hpi files", did you actually test that they are valid? You should be able to rename them to .zip and extract as a valid archive. An issue I face a lot is the network layer filtering system that we have in the office. It intercepts Jenkins's calls sometimes with the filtering system's login page, instead of whatever internet resource was being loaded.
If your .hpi files are not valid zip archives, open them in a text editor, and see if they are in the form of an html page/response of some kind.