Run Databricks notebook jobs via API in a shared context - rest

In the REST documentation for Databricks, you can submit a notebook task as a job to a cluster using the 2.0 API or you can submit a command or python script using the 1.2 API
The 1.2 API allows you to create a context and then all subsequent commands or scripts can be submitted against this context. This allows you to maintain state (dataframes, variables etc) which is much more akin to running notebooks interactively in the browser
What i want is to be able to submit my notebooks into the same context and get the same behaviour as the 1.2 API but this does not seem possible, is there a reason for that? Or am i missing something if it can be done?
My use case is i want to be able to re-run a notebook from the API and have it remember its last state (in the most basic example just knowing its already loaded a dataframe) but more generally having the ability for subsequent jobs to only run what changed since the last run.
As far as I can tell, failing the ability to do this via the 2.0 API, I have 2 options:
Convert my notebook to Python script and have a bootstrap script on client side that invokes an entry point using the 1.2 API within the same context
Create temp tables at checkpoints in my notebook and possibly maintain a special variables dataframe of state variables
Both of these seem unecessarily complex, any other ideas?

Related

How can i send values to a Jenkins script?

I have a Jenkins job and in that job the first thing that runs is a powershell script that I want to capture user inputs values and set them as global variables that are used through out the Jenkins job.
Now i want the user to be able to put these values in from their machine and then run the job with these values ?
How can i do this ?
EDIT: In case anybody else finds this answer. Please see the comments below. This should not be used for credentials! As the communication can be secured by TLS, the credentials will still be visible in build logs etc.
You need to check the This project is parameterized checkbox in the settings of your job in Jenkins. Then define the name, type etc.
The given name is already accessible via standard syntax.
In shell script ${nameOfParam} or %nameOfParam (depending on your shell / os).
In pipelines they are also accessible via params.nameOfParam.
You can set these variables via GUI using Build with parameters or via API call http://<JENKINS_URL>/job/<JOB_NAME>/buildWithParameters/nameOfParam=foo
See also: https://www.baeldung.com/ops/jenkins-parameterized-builds
Only thing I quiet don't get from your question is, what you exactly want to do with the powershell script. A pipeline script in Jenkins is executed on a node, so if the job starts it should be running without any user interaction. To set values from the user input as global variables in a powershell script, you already need to have them available within the jenkins node, hence it's nonsense to set them in the powershell script because they are already available.

Azure Devops - Manage, Run and Track one-time Sql Scripts

We have a database project that uses a dacpac to deploy schema changes and also allows a pre-deployment and post-deployment script.
However, we frequently have to run one-off scripts and security would prefer that developers not have write access in prod (we do not have DBA role at this time). I'm trying to find a solution that would work with azure devops to store one-time run scripts in git, run the script if it has not been run before, and not run the script the next time the pipeline runs. We'd like this done through devops so the SP has access to run the queries and not the dev, and anything flowing through the pipe has been through our peer review process, plus we have record of what was executed.
I'm looking for suggestions from anyone who has done this or is aware of any product which can do this.
Use liquibase. Though I would have it as part of my code base you can also use it from the CLI and run your scripts using that tool.
Liquibase keeps track of what SQL files you have published across deployments so you can have multiple stages say DIT, UAT, STAGING, PROD and it can apply the remaining one off SQL changes over time.
Generally unless you really need support, I doubt you'd need the commercial version. The opensource version is more than sufficient for my system needs and I have a relatively complex system already.
The main reason I like liquibase over other technologies is it allows for SQL based change sets. So the learning curve is a lot lower.
Two tips:
don't rely on the automatic computation of the logicalFilePath, explicitly set it even if it is repeating yourself. This allows you to refactor your scripts so instead of lumping everything into a single folder you may group them later on.
Name your scripts with the date first. That way you can leverage the natural sorting order.
I've faced a similar problem in the past:
Option 1
If you can afford to have an additional table in your database to keep track of what was executed or not, your problem can be easily solved, there is a tool which helps you: https://github.com/DbUp/DbUp
Then you would have a new repository let's call it OneOffSqlScriptsRepository and your pipeline would consume this repository:
resources:
repositories:
- repository: OneOffSqlScriptsRepository
endpoint: OneOffSqlScriptsEndpoint
type: git
Thus you'd create a pipeline to run this DbUp application consuming the scripts from the OneOffSqlScripts repository, the DB would take care of executing the scripts only once (it's configurable).
The username/password for the database can be stored safely in the library combined with azure keyvaults, so only people with the right access rights could access them (apart from the pipeline).
Option 2
This option assumes that you wanna do everything by using only the native resources that azure pipelines can provide.
Create a OneOffSqlScripts as in option1
Create a ScriptsRunner repository
In the ScriptRunner repository, you'd create a folder containing a .json file with the name of the scripts and the amount of times (or a boolean) you've had run them.
eg.:
[{
"id": 1
"scriptName" : "myscript1.sql"
"runs": 0 //or hasRun : false
}]
Then write a python script that reads and writes a json file by updating the amount of runs, thus you'd need to update your repository after each pipeline run. It would mean that your pipeline will perform a git commit / push operation after each run in case there new scripts to be run.
The algorithm is like these, the implementation can be tuned.

Can I re-run a Power Automate flow instance from history?

Is there any way to find and re-run an earlier instance of a Power Automate workflow programmatically?
I can do this manually: download the .csv file containing the instances, search in the Trigger output column the one I want, get the id, copy-paste the run URL, and click resubmit.
I tried with Power Automate itself:
The built-in Flow Management connector supports only to find a specific flow by name, and does not even go to the history.
PowerShell:
Installed the PowerApps module, I can list the instances with
Get-FlowRun -FlowName {flow name}
But I don't see the same properties as in the exported .csv file, and there's also no Run-Flow command that would let me run it.
So, I am a little stuck here; could someone please help me out?
We cannot programmatically resubmit the Flow run from the history with PowerShell or by any other api method yet.
But can avoid some manual work by using workflow function in a Flow compose step, we can automate the composition of Flow history run url. Read more
https://xxx.flow.microsoft.com/manage/environments/07aa1562-fea6-4583-8d76-9a8e67cbf298/flows/141e89fb-af2d-47ac-be25-f9176e64e9a0/runs/08586722084717816659969428791CU12?backUrl=%2Fflows%2F141e89fb-af2d-47ac-be25-f9176e64e9a0%2Fdetails&runStatus=Failed
There are 3 guids that I need to find aso that I can build up the flow history url.
The first guid is my environmentName (07aa1562-fea6-4583-8d76-9a8e67cbf298), then I’ve got the flow name ( 141e89fb-af2d-47ac-be25-f9176e64e9a0) and finally the run (08586722084717816659969428791CU12).
There is a cmdlet from Microsoft 365 CLI to resubmit a flow run
m365 flow run resubmit --environment flowEnvironmentID --flow flowGUID --name flowRunID –confirm
You can also resubmit a flow run using Power Automate REST API
https://api.flow.microsoft.com/providers/Microsoft.ProcessSimple/environments/{FlowEnvironment}/flows/{FlowGUID}/triggers/manual/histories/{FlowRunID}/resubmit?api-version=2016-11-01
For the Power Automate REST API, you will have to pass an authorization token.
For more information, go through the following post
https://ashiqf.com/2021/05/09/resubmit-your-failed-power-automate-flow-runs-automatically-using-m365-cli-and-rest-api/

Creating custom node operation to execute via JBoss CLI

We'd like to perform occasional configuration refreshes on some of our servers, but we'd prefer that that operation wasn't exposed via a web-service, but rather that the infrastructure guys could call a command via the JBoss CLI to do a refresh.
Is it possible to create new commands / node operations that are exposed via the CLI? Googling seems to show only various methods of accessing the existing operations, and this page, which is a bit light on details of how it's called.

Talend TAC export of list of tasks

I would like to take an export of the list of jobs that are created as tasks under job conductor in TAC along with its configurations. Is this possible and if yes, please support. I am using the Enterprise edition of Talend V5.6
There is an API to query the TAC called MetaServletCaller. Using this API, you can send a command to get a list of the tasks deployed in Job Conductor.
The API can be called from a url in a browser, or by calling MetaServletCaller.sh (or .bat) script from your Talend installation.
The command to get this list is listTasks.
Here is a tutorial on how to do that : http://edwardost.github.io/talend/di/2015/05/28/Using-the-TAC-API/
All TAC configuration is linked to DB tables. Just connect to this database (you can get the name through TAC, in Configuration>Database menu), and have a look at "executiontask" table, it contains all jobs deployed in Job Conductor, with Context/Job Version, etc.