I'm testing a workflow runbook that utilizes Add-Type to add some custom C# code.
All of a sudden I started getting 'type already exists' errors on subsequent test jobs, as if a new PSSession is not being created.
In other words, it looks like new jobs are sharing the same execution context. I only get this locally if I try to run the same command twice per PS instance.
The type in question is a static class with some Extension methods. Since it also happens to be the first type declared in the source block, I don't doubt other non-static types would throw errors as well.
I've executed this handfuls of times already, so I fully expect that 'eventually' this will stop happening, but I can't seem to force it, and I have no idea what I could've done to trip it into this situation, either.
Seeing evidence of shared execution contexts across jobs like this - even (especially?) if only temporal - makes me wonder if some or all of the general execution inconsistencies we've seen in the past when making & deploying changes & performing subsequent tests soon-after are related to this.
I'm tempted to think that this is simply a part of the difference between a Test Job and a 'real' one, but that raises questions about the validity of the Test jobs themselves WRT mimicking Published Jobs.
Are all Azure Automation Jobs supposed to execute in Isolation? Can this be controlled/exploited by a developer?
Each automation account has its own isolated sandboxes where its jobs run. Those sandboxes are distributed among a number of worker machines. For test jobs, to try to improve job start time since [make code change, retest] over and over is very common, Automation reuses the same sandbox as used for previous test jobs of this runbook, if the sandbox has not been cleaned up yet, so that sandboxes do not have to be spun up for each unique test job (sandbox creation is one reason for a longer job start time than desired). Due to this behavior, if you execute test jobs of the same runbook within a short amount of time, you will get the behavior you're seeing above.
However, even for production jobs, jobs of the same automation account (across runbooks) can share the same sandboxes. We randomly distribute jobs across our worker machines, so its possible job A is queued for execution and is placed on worker W, then 5 minutes later, job B is queued for execution and is placed on worker W as well. If job A and job B are of the same automation account and have the same "demands" in terms of modules / module versions, they will be placed in the same sandbox, if job A's sandbox is still around. "Module / module version demands" does not mean the modules used by the runbook, but the modules / latest module versions that existed in the automation account at the time when the job was started / runbook was scheduled (for jobs started via schedule) / runbook was assigned to a webhook (for jobs started via webhook)
In terms of resolving your specific problem, you could surround Add-Type with a try, catch statement, or maybe use Add-Type -IgnoreWarnings
Related
We have a cloud full of self-hosted Azure Agents running on custom AMIs. In some cases, I have some cleanup operations which I'd really like to do either before or after a job runs on the machine, but I don't want the developer waiting for the job to wait either at the beginning or the end of the job (which holds up other stages).
What I'd really like is to have the Azure Agent itself say "after this job finishes, I will run a set of custom scripts that will prepare for the next job, and I won't accept more work until that set of scripts is done".
In a pinch, maybe just a "cooldown" setting would work -- wait 30 seconds before accepting another job. (Then at least a job could trigger some background work before finishing.)
Has anyone had experience with this, or knows of a workable solution?
I suggest three solutions
Create another pipeline to run the clean up tasks on agents - you can also add demand for specific agents (See here https://learn.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml) by Agent.Name -equals [Your Agent Name]. You can set frequency to minutes, hours, as you like using cron pattern. As while this pipeline will be running and taking up the agent, the agent being cleaned will not be available for other jobs. Do note that you can trigger this pipeline from another pipeline, but if both are using the same agents - they can just get deadlocked.
Create a template containing scripts tasks having all clean up logic and use it at the end of every job (which you have discounted).
Rather than using static VM's for agent hosting, use Azure scaleset for Self hosted agents - everytime agents are scaled down they are gone and when scaled up they start fresh. This saves a lot of money in terms of sitting agents not doing anything when no one is working. We use this option and went away from static agents. We have also used packer to create the VM image/vhd overnight to update it with patches, softwares required, and docker images cached.
ref: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/scale-set-agents?view=azure-devops
For those discovering this question, there's a much better way: run your self-hosted agent with the --once flag, documented here.
You'll need to wrap it in your own bash script, but something like this works:
while :
do
echo "Performing pre-job setup..."
echo "Waiting for job..."
./run.sh --once
echo "Cleaning up..."
sleep 2
done
Another option would be to use a ScaleSet VM setup which preps a new agent for each run and discards the VM when the run is done. It can prepare new VMs in the background while the job is running.
And I suspect you could implement your own IMaintenanceProvirer..
https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Agent.Worker/Maintenance/MaintenanceJobExtension.cs#L53
I have a reporting application that uses Celery to process thousands of jobs per day. There is a python module per each report type that encapsulates all job steps. Jobs take customer-specific parameters and typically complete within a few minutes. Currently, jobs are triggered by customers on-demand when they create a new report or request a refresh of an existing one.
Now, I would like to add scheduling, so the jobs run daily, and reports get refreshed automatically. I understand that Airflow shines at task orchestration and scheduling. I also like the idea of expressing my jobs as DAGs and getting the benefit of task retries. I can see how I can use Airflow to run scheduled batch-processing jobs, but I am unsure about my use case.
If I express my jobs as Airflow DAGs, I will still need to run them parametrized for each customer. It means, if the customer creates a new report, I will need to have a way to trigger a DAG with the customer-specific configuration. And with a scheduled execution, I will need to enumerate all customers and create a parametrized (sub-)DAG for each of them. My understanding this should be possible since Airflow supports DAGs created dynamically, however, I am not sure if this is an efficient and correct way to use Airflow.
I wonder if anyway considered using Airflow for a scenario similar to mine.
Celery workflows do literally the same, and you can create and run them at any point of time. Also, Celery has a pretty good scheduler (I have never seen it failing in 5 years of using Celery) - Celery Beat.
Sure, Airflow can be used to do what you need without any problems.
You can use Airflow to create DAGs dynamically, I am not sure if this will work with a scale of 1000 of DAGs though. There are some good examples on astronomer.io on Dynamically Generating DAGs in Airflow.
I have some DAGs and task that are dynamically generated by a yaml configuration with different schedules and configurations. It all works without any issue.
Only thing that might be challenging is the "jobs are triggered by customers on-demand" - I guess you could trigger any DAG with Airflow's REST API, but it's still in a experimental state.
I'm comparing salt-cloud and terraform as tools to manage our infrastructure at GCE. We use salt stack to manage VM configurations, so I would naturally prefer to use salt-cloud as an integral part of the stack and phase out terraform as a legacy thing.
However my use case is critical on VM deployment time because we offer PaaS solution with VMs deployed on customer request, so need to deliver ready VMs on a click of a button within seconds.
And what puzzles me is why salt-cloud takes so long to deploy basic machines.
I have created neck-to-neck simple test with deploying three VMs based on default CentOS7 image using both terraform and salt-cloud (both in parallel mode). And the time difference is stunning - where terraform needs around 30 seconds to deploy requested machines (which is similar to time needed to deploy through GCE GUI), salt-cloud takes around 220 seconds to deploy exactly same machines under same account in the same zone. Especially strange is that first 130 seconds salt-cloud does not start deploying and does seemingly nothing at all, and only after around 130 seconds pass it shows message deploying VMs and those VMs appear in GUI as in deployment.
Is there something obvious that I'm missing about salt-cloud that makes it so slow? Can it be sped up somehow?
I would prefer to user full salt stack, but with current speed issues it has I cannot really afford that.
Note that this answer is a speculation based on what I understood about terraform and salt-cloud, I haven't verified with an experiment!
I think the reason is that Terraform keeps state of the previous run (either locally or remotely), while salt-cloud doesn't keep state and so queries the cloud before actually provisioning anything.
These two approaches (keeping state or querying before doing something) are needed, since both tools are idempotent (you can run them multiple times safely).
For example, I think that if you remove the state file of Terraform and re-run it, it will assume there is nothing in the cloud and will actually instantiate a duplicate. This is not to imply that terraform does it wrong, it is to show that state is important and Terraform docs say clearly that when operating in a team the state should be saved remotely, exactly to avoid this kind of problem.
Following my line of though, this should also mean that if you either run salt-cloud in verbose debug mode or look at the network traffic generated by it, in the first 130 secs you mention (before it says "deploying VMs"), you should see queries from salt-cloud to the cloud provider to dynamically construct the state.
Last point, the fact that salt-cloud doesn't store the state of a previous run doesn't mean that it is automatically safe to use in a team environment. It is safe to use as long as no two team members run it at the same time. On the other hand, terraform with remote state on Consul allows for example to lock, so that team concurrent usage will always be safe.
I'm using Jenkins for my builds, and I wrote some test scripts that I need to run after the compilation of the build.
I want to save some time, so I have to run the test scripts parallel. How can I do that?
EDIT: ok, I understand know that I need a separate Job for each test (for 4 tests I need 4 jobs, right?)
So, I did that, and via the parent job I ran this jobs. (using "build other projects" plugin).
But I didn't managed to aggregate the results (using aggregate downstream test results). The parent job exits before the downstream jobs were finished.
What shall I do?
Thanks.
You can use multi-job plugin. This would allow you to run multiple jobs in parallel and the parent job would wait for the sub jobs to be completed. The parent jobs status can be determined by the sub jobs status.
https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin
Jenkins doesn't really allow you to run things in parallel. You can however split your build into different jobs to achieve this. It would look like this.
Job to compile the source runs.
Jobs that run the tests are triggered by the completion of the compilation and start running. They copy compilation results from the previous job into their workspaces.
This is a big kludgy though. The better alternative would be to parallelise within the scripts that run the tests, i.e. you run a single script and this then runs the tests in parallel. If this is not possible, you'll have to split into different jobs though.
Have you looked at the Jenkins JOIN Plugin? I have not used it but I believe it is what you are attempting to accomplish.
- Mike
Actually you can but you will need some coding behind.
In my case, I have parallel test execution on jenkins.
1) Create a small job with parameters that is supposed to do a test run with a small suite
2) Edit this job to run on a list of slaves (where you have the proper environment)
3) Edit this build to allow concurrent builds
And now the hard part.
4) Create a small java program for computing the list of parameters for each job to run.
5) Iterate trough the list and launch a new Jenkins job on a new thread.
Put a Thread.sleep(5000) between runs in order to avoid communication errors
6) Join the threads
At the end of each job, I send the results to a shared location in order to perform some reporting at the end of all tests.
For starting a jenkins job with parameters use CLI
I intend to make my code as generic as possible and publish it if anyone else will need it.
You can use https://wiki.jenkins-ci.org/display/JENKINS/Build+Flow+Plugin with code like this
parallel (
// job 1, 2 and 3 will be scheduled in parallel.
{ build("job1") },
{ build("job2") },
{ build("job3") }
)
You can use any one of the followings:
Multijob plugin
Build Flow plugin
I have a few jobs in Jenkins that use Selenium to modify a database through a website's front end. If some of these jobs run at the same time, errors due to dirty reads can result. Is there a way to force certain jobs in Jenkins to be unable to run at the same time? I would prefer not to have to place or pick up a lock on the database, which could be read or modified by any number of users who are also testing.
You want the Throttle Concurrent Builds plugin which lets you define global and per-node semaphores.
Locks and latches is being deprecated in favor of Throttle Concurrent builds.
I've tried both the locks & latches plugin and the port allocator plugin as ways to achieve what you're trying to do. Neither worked reliably for me. Locks & latches worked some of the time, but I'd occasionally get hung jobs. Using port allocator as a hack will work unless you have multiple jenkins nodes, but the config overhead is kind of high. What I've ultimately settled upon is another hack, but it works reliably and uses core Jenkins stuff (no plugins):
create a slave node running on the same box as the master (or not, if you have lots of boxes)
give this slave a single executor (key)
tie the 2 (or n) jobs that must not run together to this new slave node
optionally set the slave's usage to 'tied jobs only' if it'll screw up your other jobs if they happen to run on the new slave
Since the slave has only one executor, the jobs tied to it can never run together.
If you regard the database as a shared resource that can only be used exclusively then this fits the usecase of the Lockable resources plugin.
It is being actively developed and improved and is very versatile.