I am doing some question about pipelines. This one I need help with.
Why can there be a pipeline stall after a load instruction but not after
an add instruction?
I know that a unused slot in the pipeline is called a pipeline stall. My guess is that it can be a pipeline stall after a load instruction because we need to wait for a register that might be updated. But I can not come up with an answer for why an add instruction can not create a pipeline stall. Maybe it is because at this stage there we have already read from a register?
A pipeline stall is used to resolve hazards usually caused by data dependencies. An add can actually produce a pipeline stall, but let's first consider an example in which it does not.
SUB r2, r3
ADD r1, r2
Even though the add instruction uses the result of the subtract, there is no stall. This is because the EX stage has access to the data from the previous EX stage.
Now let's consider an example in which an add can produce a stall.
LOAD r2, RAM[a]
ADD r1, r2
Here, data produced by the MEM stage from the load instruction is required as input by the EX stage of the ADD instruction. The EX stage only has access to the data from the previous EX stage, therefore stalling the pipeline due to a read after write hazard. This is illustrated by this diagram
This is resolved by introducing a bubble into the pipeline (like a NOP) which resolves the data dependency without needing to propagate data backwards in time (which is impossible).
You can learn more about this in much greater detail by reading up on hazards, bubbles, and forwarding
Related
I have several Azure DevOps release pipelines. I have a step which copies all my deployment scripts to a unique and one-time Azure Storage blob container, named for the release variable System.JobId. This is so that I can guarantee that whenever I run a release, I am always using up-to-date scripts and it shouldn't be possible to be referring to old copies, regardless of any unpredictable error conditions. The container is deleted at the end of a release.
The documentation for System.JobId states:
A unique identifier for a single attempt of a single job.
However, intermittently and at random, sometimes my pipeline fails on the blob copy step with:
2020-03-30T19:28:55.6860345Z [2020/03/30 19:28:55][ERROR] [path-to-my-file]: No input is received when user needed to make a choice among several given options.
I can't directly see that this is because the blob already exists, but when I navigate to the storage account, I can see that a container exists and it contains all the files I would expect it to. Most other times when the pipeline runs successfully, the container is gone at this point.
Why does it seem that sometimes the JobId is reused, when the documentation says it is unique? I am not redeploying the same release stage on multiple attempts to trigger this problem. Each occurrence is a new release.
Why does it seem that sometimes the JobId is reused, when the
documentation says it is unique?
Sorry for the mis-leading that our document guide.
The unique only represent relative to current pipeline, instead of the complete system.
Take a simple sample, there has 2 pipeline: p1, p2, and each pipeline has 2 jobs: job1, job2. In pipeline p1, the job id of job1 is unique and other jobs will not has same job id value.
BUT in meanwhile, the corresponding job id value in pipeline p2 is same with the one in pipeline p1. Also, it will has same value even after re-run pipeline p1.
To use your words, this job id value will re-use in a new pipeline execution, including one pipeline re-run.
I am configuring a release pipeline in Azure DevOps and I want the variables that get generated along the tasks to persist across re-execution of the same release. I wanted to know if that is possible.
The main goal is to create a pipeline that i can redeploy in case of a failure, if for example I have a release pipeline with 30 tasks, I would want handle skipping the tasks that got completed, but once I reach the relevant task, I need the persisted variable values.
I have looked online and I see it isn't possible to persist variables across phases, but does that also mean it cannot be persisted in the same release pipeline if I redeploy it?
From searching stack exchange and google I got to the following GitHub issue on the subject, I just wasn't sure if it also affects my own situation in the same way.
https://github.com/Microsoft/azure-pipelines-tasks/issues/4743
You have that by default unless I interpret you wrong. When redeploying the same release pipeline variables values you define (in the pipeline) do not change.
Calculated values are not persisted
I am running a pipeline where i am looping through all the tables in INFORMATION.SCHEMA.TABLES and copying it onto Azure Data lake store.My question is how do i run this pipeline for the failed tables only if any of the table fails to copy?
Best approach I’ve found is to code your process to:
0. Yes, root cause the failure and identify if it is something wrong with the pipeline or if it is a “feature” of your dependency you have to code around.
1. Be idempotent. If your process ensures a clean state as the very first step, similar to Command Design pattern’s undo (but more naive), then your process can re-execute.
* with #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.
* this is an ADFv1 or v2 compatible approach
2. If ADFv2, then you have more options and can have more complex logic to handle errors:
* for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.
* you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.
3. You can also use asynchronous communication to future process executions that save success to a central store. Then later executions “if” I already was successful then stop processing before the activity.
* this is powerful for more generalized pipelines, since you can choose where to begin
4. Last resort I know (and I would love to learn new ways to handle) is manual re-execution of failed activities.
Hope this helps,
J
I would like to check if I'm missing any important points here.
My pipeline is only for Featurization. I understand that once a pipeline that includes an Estimator is fitted; then saving the pipeline will persist the params the Estimator has learned. So loading a saved pipeline in this case means not having to re-train the Estimator; which is a huge point.
However; for the case of a pipeline which only consists of a number of Transform stages; would I always get the same result on feature extraction from a input dataset using either of the below two approaches?
1)
Creating a pipeline with a certain set of stages; and configuration per stage.
Saving and reloading the pipeline.
Transforming an input dataset
versus
2)
Each time just instantiating a new pipeline (of course with the exact same set of stages; and configuration per stage)
Transforming the input dataset
So; alternative phrasing would be; as long as the exact set of stages; and configuration per stage is known; a Featurization pipeline can be efficiently (because there is no 'training an estimator' phase) recreated without using save or load?
Thanks,
Brent
I would like to update some of the parameters of an ADF pipeline (e.g. concurrency level) of lots of mappings. I am not able to find out any cmdlet to be able to do this through powershell. I know I can drop existing pipeline and create new one, but that will start reprocessing all the Ready slices for that pipelines active period, which I don't want. Because in that case it will involve calculating up to what point existing pipeline has processed slices. And then this is only temporary, at some stage again I am going to revert back settings. I just want pipelines to change one of its properties. Doing this manually through the UI is slow and tedious. I am guessing there is no way around this, but let me know if you know of.
You can still use "New-AzureRmDataFactoryPipeline" for this Update scenario:
https://msdn.microsoft.com/en-us/library/mt619358.aspx
Use with the -Force parameter to force it to proceed even if the message reads "... may overwrite the existing resource".
Under the hood, it's the same HTTP PUT api call used by Azure UX Portal. You can verify that with Fiddler.
The already executed slices won't be re-run unless you set their status back to PendingExecution.
This rule applies to LinkedService and Dataset as well but NOT the top level DataFactory resource. A New-AzureRmDataFactory will cause the service to delete the existing DF along with all its sub-resources and create a brand new one. So be careful from there.