Error when attempting to add maintenance exclusion to GKE cluster - kubernetes

When attempting to add a maintenance exclusion to my GKE cluster to prevent minor upgrades to the control and data layer between 1/25/23 and 4/30/23, I receive the following error:
gcloud container clusters update <my-cluster-name> \
--add-maintenance-exclusion-name suspend_upgrades_past_eol \
--add-maintenance-exclusion-start 2023-01-25T00:00:00-05:00 \
--add-maintenance-exclusion-end 2023-04-30T23:59:59-05:00 \
--add-maintenance-exclusion-scope no_minor_or_node_upgrades
ERROR: (gcloud.container.clusters.update) ResponseError: code=400, message=MaintenancePolicy.maintenanceExclusions["suspend_upgrades_past_eol"].endTime needs to be before minor version 1.21 end of life: (2023-1). See release schedule at https://cloud.google.com/kubernetes-engine/docs/release-schedule.
According to an email I received from GCP, maintenance exclusions for GKE clusters running 1.21 should be able to create maintenance exclusions up to April 30th 2023. I believe my command should've been valid especially considering I got it directly from the GCP email I received. I've also tried reducing the time range to 4/28/23 to no avail.
I'm running the latest version of gcloud:
Google Cloud SDK 415.0.0
alpha 2023.01.20
beta 2023.01.20
bq 2.0.84
core 2023.01.20
gsutil 5.18
Any clue on what I'm doing wrong or ideas on how to get around this are appreciated.

I belive you can do this in 2 parts:
Part 1: Set maintenance exclusion window for no updates until Feb 28th.
You can use a maintenance exclusion window to set a 30-day maintenance exclusion window which will let you push it off until Feb 28th.
Note there are 3 types of maintenance exclusion windows. 2 will still complain about can't go past the EoL date, but the 3rd will work. (the 2 that fail are the ones titled "No minor or node upgrades" and "No minor upgrades", they're the one's that can go up to 90 / up to 180 in cases where EoL(end of life) isn't a factor), the one that will work is the up to 30-days no_upgrades option.
^-- You may need to temporarily change your release channel to No Channel / Static version in order to set that option. (It's a reversible change)
Part 2: That longer than 30 day delay option that's not working today, try it again 1-27 days from now and it "might" work, you'll be able to wait 1-27 days thanks to part 1
I've heard an unconfirmed rumor of a not yet released update, of a change that could be released as soon as Feb 1st, 2023 (if not then potentially sometime before the end of Feb 28th, with early Feb being likely), which would then allow one of those 90 day no update exclusion windows to be put into place to extend the deadline for the forced update to 1.22 as far as April 30th, 2023, but that's the absolute deadline. Either way don't delay / try to update asap. (I'd also recommend you don't depend on an extension to April 30th/try to update by Feb 28th, as I could be incorrect/I don't work at Google.)
^-- O right you'll have to temporarily switch your release channel from No Channel / Static version to Stable Release channel in order to get the other 2 options.
(Side Note: It's my understanding that the whole reason for the potential/future/not yet released update I'm referring to is that this is being done as an exception. Normally forcing an auto upgrade would be a non-issue, but 1.21 -> 1.22 had some API deprecations that can cause breakage if not ready, which explains why they're making an exception to slightly extend past the end-of-life deadline of Jan 31st, 2023.)
Update - The change came through:
Here's a working example:
Note the linter/policy enforcement is extremely finicky + fails without proper error messages, but you can get it to work if you thread the needle just right:
No minor or node upgrades from 1/29/23 - 4/29/23 will fail (start date too early)
No minor or node upgrades from 1/31/23 - 4/30/23 will fail (end date too late)
No minor or node upgrades from 1/31/23 - 4/29/23 will work (goldie locks)

Related

difference between next_execution_date and data_interval_end on Airflow?

We recently migrated to Airflow 2.3.3.
We get some warnings followed by exception saying next_execution_date is deprecated , so use data_interval_end instead.
But when we changed, we got some failures regarding the time difference between these 2 macros.
Also, i checked the code, both are using UTC timezone.

What would cause Error: multiple IAM policies found matching criteria

our terraform plan is suddenly reporting errors such as the following while it is 'refreshing state':
Error: multiple IAM policies found matching criteria (ARN:arn:aws:iam::aws:policy/ReadOnlyAccess); try different search;
on ../../modules/xxxx/policies.tf line 9, in data "aws_iam_policy" "read_only_access":
9: data "aws_iam_policy" "read_only_access" {
and
Error: no IAM policy found matching criteria (ARN: arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy); try different search
on ../../modules/xxxx/iam.tf line 97, in data "aws_iam_policy" "aws_eks_worker_node":
97: data "aws_iam_policy" "aws_eks_worker_node" {
We recently updated our dev EKS cluster from 1.20 to 1.21. Stage and Live environments are still on 1.20 but they are built from the same module. We didn't see these errors until a day after the upgrade and there were no changes to these reported terraform files. The errors also appear to be somewhat intermittent and random. One plan run will be successful, while the next will include some of these policies that we have defined.
I know this is a shot in the dark with limited information so please ask questions if you have them. I'm just really looking for someone knows what this error means because Google isn't returning anything useful.
We also run terraform version 0.14

Bitbucket Pipeline schedule trigger

I can't see anyone talking about what I'm looking to do. I'm currently running a pipeline on a branch merge within the bitbucket area.
branches:
staging:
- step:
name: Clone
script:
- echo "Clone all the things!"
What I want to do is when a branch gets merged into master, trigger an event that will enable the schedule to run for the next day.
If there are no changes I don't want anything to run, however, if there are I want the schedule to kick in and work.
I've read through the Pipeline triggers:
https://support.atlassian.com/bitbucket-cloud/docs/pipeline-triggers/
But I can't see anywhere that would allow me to do it. Has anyone done this sort of thing? Is it possible, or am I limited by bitbucket itself?
Never done this, but there's an API for creating schedules. I think you would need to determine the date and specify the single cron task, e.g. March 30, 2022 at midnight:
0 0 30 3 * 2022
However year is an extension, not a standard CRON field; "at" is an alternative that may be accessible (but also not standard). It all depends on what Bitbucket allows for CRON schedule, so I think this is not a conclusive answer (still needs info on how to setup the schedule).
Here is the docs
https://developer.atlassian.com/bitbucket/api/2/reference/resource/repositories/%7Bworkspace%7D/%7Brepo_slug%7D/pipelines_config/schedules/

Deployment error -

I'm am getting an error when deploying ADF pipelines. I don't understand how to resolve this error message:
Pipeline Populate SDW_dbo_UserProfiles from SDW_dbo_CTAS_ApptraxDevices is in Failed state. Cannot set active period Start=05/30/2017 00:00:00, End=05/29/2018 23:59:59 for pipeline 'Populate SDW_dbo_UserProfiles from SDW_dbo_CTAS_ApptraxDevices' due to conflicts on Output: SDW_dbo_UserProfiles with Pipeline: Populate SDW_dbo_UserProfiles from SDW_dbo_Manifest, Activity StoredProcedureActivityTemplate, Period: Start=05/30/2017 00:00:00, End=05/30/2018 00:00:00
.
Try changing the active period or using autoResolve option when setting the active period.
I'm am authoring and deploying from within Visual Studio 2015. All of my pipelines have the same values for Start and End.
"start": "2017-05-30T00:00:00Z",
"end": "2018-05-29T23:59:59Z"
How do I resolve this issue?
Visual Studio can be fun sometimes when it comes to validating your JSON because not only does it check everything in your solution it also validates against what you already have deployed in Azure!
I suspect this error will be because there is a pipeline that you have already deployed that now differs from Visual Studio. If you delete the affected pipeline from ADF in Azure manually and then redeploy you should be fine.
Sadly the tooling isn't yet clever enough to understand which values should take presedence and be overwritten at deployment time. So for now it simiply errors because of a mismatch, any mismatch!
You will also encounter similar issues if you remove datasets from your solution. They will still be used for validation at deployment time because the wizard first deploys all new things before trying to delete the old. I've fed this back to Microsoft already as an issue that needs attention for complex solutions with changing schedules.
Hope this helps.

Google cloud datalab deployment unsuccessful - sort of

This is a different scenario from other question on this topic. My deployment almost succeeded and I can see the following lines at the end of my log
[datalab].../#015Updating module [datalab]...done.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deployed module [datalab] to [https://main-dot-datalab-dot-.appspot.com]
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Step deploy datalab module succeeded.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deleting VM instance...
The landing page keeps showing a wait bar indicating the deployment is still in progress. I have tried deploying several times in last couple of days.
About additions described on the landing page -
An App Engine "datalab" module is added. - when I click on the pop-out url "https://datalab-dot-.appspot.com/" it throws an error page with "404 page not found"
A "datalab" Compute Engine network is added. - Under "Compute Engine > Operations" I can see a create instance for datalab deployment with my id and a delete instance operation with *******-ompute#developer.gserviceaccount.com id. not sure what it means.
Datalab branch is added to the git repo- Yes and with all the components.
I think the deployment is partially successful. When I visit the landing page again, the only option I see is to deploy the datalab again and not to start it. Can someone spot the problem ? Appreciate the help.
I read the other posts on this topic and tried to verify my deployment using - "https://console.developers.google.com/apis/api/source/overview?project=" I get the following message-
The API doesn't exist or you don't have permission to access it
You can try looking at the App Engine dashboard here, to verify that there is a "datalab" service deployed.
If that is missing, then you need to redeploy again (or switch to the new locally-run version).
If that is present, then you should also be able to see a "datalab" network here, and a VM instance named something like "gae-datalab-main-..." here. If either of those are missing, then try going back to the App Engine console, deleting the "datalab" service, and redeploying.