How to send a list as parameter in databricks notebook task? - rest

I am using Databricks Resi API to create a job with notebook_task in an existing cluster and getting the job_id in return.
Then I am calling the run-now api to trigger the job.
In this step, I want to send a list as argument via the notebook_params, which throws an error saying "Expected non-array for field value".
Is there any way I can send a list as an argument to the job?
I have tried sending the list argument in base_params as well with same error.
user_json={
"name": job_name,
"existing_cluster_id": cluster_id,
"notebook_task": {
"notebook_path": notebook_path
},
"email_notifications":{
"on_failure":[email_id]
},
"max_retries": 0,
"timeout_seconds": 3600
}
response=requests.post('https://<databricks_uri>/2.0/jobs/create',headers=head,json=user_json,timeout=5, verify=False)
job_id=response.json()['job_id']
json_job={"job_id":job_id,"notebook_params":{"name":"john doe","my_list":my_list}}
response = requests.post('https://<databricks_uri>/2.0/jobs/run-now', headers=head, json=json_job, timeout=200, verify=False)

Not found any native solution yet, but my solution was to pass the list as a string and parse it back out on the other side:
json_job={"job_id":job_id,
"notebook_params":{
"name":"john doe",
"my_list":"spam,eggs"
}
}
Then in databricks:
my_list=dbutils.widgets.get("my_list")
my_list=my_list.split(",")
With appropriate care around special characters or e.g. conversion to numeric types.
If the objects in the list are more substantial, then sending them as a file to dbfs using the CLI or API before running the job may be another option to explore.

Hi may be I'm bit late but found a better solution.
Step 1:
Use JSON.stringyfy() in the console of any browser to convert your value(object, array, JSON etc) into string
Ex:
Now use this value in the body of URL
In Databricks notebook convert string to JSON using python json module.
Hope this helps

Related

azure data factory - convert single value output from query into json

In azure data factory, I am getting a single record back from a database.
I need to take one column from this and pass it to a web call body.
The body takes data in this format:
["cdd-lm-54"]
I have tried multiple expressions but none of them work. appreciate any advice on how to perform this.
The data returned from the database looks like this:
"value": [
{
"RowNumber": 1,
"Tag": "cdd-lm-54",
"Val1": "val 1",
"Val2": "val b",
"LastSyncDateTime": "2022-07-26T13:14:28Z",
"LastTimeModified": "2021-07-28T10:33:47.7Z"
}
]
The below expressions are the closest i have gotten, they output the data as i expect it to be but the web call still rejects it:
#concat('[','"',pipeline().parameters.DeviceRecord[0]['Tag'], '"',']')
#concat('[','''',pipeline().parameters.DeviceRecord[0]['Tag'], '''',']')
Odd thing is if i paste the exact value from ADF into postman, it works.
is adf doing something odd to the body?
Odd thing is if I paste the exact value from ADF into postman, it works.
The reason behind this is that the value that you take from the value and give it as ["cdd-lm-54"], it indicates that the body accepts an array containing required string.
Using #concat() to build ["cdd-lm-54"] will return a just a string, which is not the required data that the body accepts.
Instead use the following dynamic content
#array(pipeline().parameters.DeviceRecord[0]['Tag'])
The above returns an array containing the required value.

What is the appropriate way to build JSON within a Data Factory Pipeline

In my earlier post, SQL Server complains about invalid json, I was advised to use an 'appropriate methods' for building a json string, which is to be inserted into a SQL Server table for logging purposes. In the earlier post, I was using string concatenation to build a json string.
What is the appropriate tools/functions to build json within a Data Factory pipeline? I've looked into the json() and string() functions, but they would still rely on concatenation.
Clarification: I'm trying to generate a logging message that looks like this: Right now I'm using string concatenation to generate the logging json. Is there a better, more elegant (but lightweight) way to generate the json data?
{ "EventType": "DataFactoryPipelineRunActivity",
"DataFactoryName":"fa603ea7-f1bd-48c0-a690-73b92d12176c",
"DataFactoryPipelineName":"Import Blob Storage Account Key CSV file into generic SQL table using Data Flow Activity Logging to Target SQL Server",
"DataFactoryPipelineActivityName":"Copy Generic CSV Source to Generic SQL Sink",
"DataFactoryPipelineActivityOutput":"{runStatus:{computeAcquisitionDuration:316446,dsl: source() ~> ReadFromCSVInBlobStorage ReadFromCSVInBlobStorage derive() ~> EnrichWithDataFactoryMetadata EnrichWithDataFactoryMetadata sink() ~> WriteToTargetSqlTable,profile:{ReadFromCSVInBlobStorage:{computed:[],lineage:{},dropped:0,drifted:1,newer:1,total:1,updated:0},EnrichWithDataFactoryMetadata:{computed:[],lineage:{},dropped:0,drifted:1,newer:6,total:7,updated:0},WriteToTargetSqlTable:{computed:[],lineage:{__DataFactoryPipelineName:{mapped:false,from:[{source:EnrichWithDataFactoryMetadata,columns:[__DataFactoryPipelineName]}]},__DataFactoryPipelineRunId:{mapped:false,from:[{source:EnrichWithDataFactoryMetadata,columns:[__DataFactoryPipelineRunId]}]},id:{mapped:true,from:[{source:ReadFromCSVInBlobStorage,columns:[id]}]},__InsertDateTimeUTC:{mapped:false,from:[{source:EnrichWithDataFactoryMetadata,columns:[__InsertDateTimeUTC]}]},__DataFactoryName:{mapped:false,from:[{source:EnrichWithDataFactoryMetadata,columns:[__DataFactoryName]}]},__FileName:{mapped:false,from:[{source:EnrichWithDataFactoryMetadata,columns:[__FileName]}]},__StorageAccountName:{mapped:false,from:[{source:EnrichWithDataFactoryMetadata,columns:[__StorageAccountName]}]}},dropped:0,drifted:1,newer:0,total:7,updated:7}},metrics:{WriteToTargetSqlTable:{rowsWritten:4,sinkProcessingTime:1436,sources:{ReadFromCSVInBlobStorage:{rowsRead:4}},stages:[{stage:3,partitionTimes:[621],bytesWritten:0,bytesRead:24,streams:{WriteToTargetSqlTable:{type:sink,count:4,partitionCounts:[4],cached:false},EnrichWithDataFactoryMetadata:{type:derive,count:4,partitionCounts:[4],cached:false},ReadFromCSVInBlobStorage:{type:source,count:4,partitionCounts:[4],cached:false}},target:WriteToTargetSqlTable,time:811}]}}},effectiveIntegrationRuntime:DefaultIntegrationRuntime (East US)}",
"DataFactoryPipelineRunID":"63759585-4acb-48af-8536-ae953efdbbb0",
"DataFactoryPipelineTriggerName":"Manual",
"DataFactoryPipelineTriggerType":"Manual",
"DataFactoryPipelineTriggerTime":"2019-11-05T15:27:44.1568581Z",
"Parameters":{
"StorageAccountName":"fa603ea7",
"FileName":"0030_SourceData1.csv",
"TargetSQLServerName":"5a128a64-659d-4481-9440-4f377e30358c.database.windows.net",
"TargetSQLDatabaseName":"TargetDatabase",
"TargetSQLUsername":"demoadmin"
},
"InterimValues":{
"SchemaName":"utils",
"TableName":"vw_0030_SourceData1.csv-2019-11-05T15:27:57.643"
}
}
You can using Data Flow, it help you build the JSON string within pipeline in Data Factory.
Here's the Data Flow tutorial: Mapping data flow JSON handling.
It can help you:
Creating JSON structures in Derived Column
Source format options
Hope this helps.

How can I pass context params using talend api?

I'm trying to automate talend job executions using the Talend API but I'm getting an error when I try to pass the context params using the api.
The json I'm encoding to 64 is the following:
JSON='{ "actionName":"runTask", "authPass": "TalendPass", "authUser": "name#example.com", "jvmParams": [ "-Xmx256m" , "-Xms64m" ], "contextParams": ["host_mysql_db01": "failed", "database_analytics": "testing.it"],"mode": "synchronous", "taskId": 43}'
Error message:
{"error":"Expected a ',' or ']' at character 172","returnCode":2}
I found another stackoverflow issue Add context parameters to Talend job in Tac via API without actually running it but he doesn't say how he pass it and I cannot reply with a comment asking how he did it
The real talend api call is:
wget -O file http://localhost:8080/org.talend.administrator/metaServlet?$JSON_ENCODED
Can I get some help?
Actually, the json your are passing to the metaservlet is not valid json. You can check it with an online validator like http://jsonlint.com.
You are specifying the contextParams attribute as an array, but that syntax is not valid in json. An array can contain either a list of values (like jvmParams) or objects (which can themselves contain arrays). Here's an example.
Moreover, according to Talend reference, the attribute should be called "context" and must be an object instead of an array, like so:
"context":{"varname1": "varvalue", "varname2": "varvalue2"}

Creating Azure Stream Analytics Job through Powershell

I would like to create a stream analytics job using only Powershell. I know that the command to do this is: New-AzureRMStreamAnalyticsInput. However it requires a JSON file with job details. I found a documentation provided by Microsoft where there is a small template of such a JSON file (check Create paragraph). However it's not enough for me.
I want to create an input from blob storage hence my JSON looks like this:
{
"properties":{
"type":"stream",
"datasource":{
"type":"Microsoft.Storage/Blob",
"properties":{
"accountName":"abc",
"accountKey":"###",
"container":"appinsights",
"pathPattern":"test-blob_2324jklj/PageViews/{date}/{time}",
"dateFormat":"YYYY-MM-DD",
"timeFormat":"HH"
}
}
}
}
After saving it and passing as an argument in New-AzureRMStreamAnalyticsInput I receive following error: New-AzureRMStreamAnalyticsInput : Stream analytics input name cannot be null. I think that my JSON file is not correct.
Do you have any templates of json files containing stream analytics job details or can you just tell me how to correctly set up a job through powershell?
A simple way of getting your template right is to manually create an input from Portal and then run PowerShell command Get-AzureRmStreamAnalyticsInput to get the JSON payload.
From your example, it seems you missed the input name. Try something like below:
{
"Name": "BlobInput1",
"Properties": {
... ...
}
}

How to get buildbot build properties in the email subject while using MailNotifier?

I am trying to send custom email status notification on our buildbot system. I could not find a way to get build properties in the Email subject while using MailNotifier.
I found build object in the messageFormatter callback function parameter. But it can be used only in the body and not in subject.
I also tried using Json API by calling it from my master.cfg itself but it is not working and buildbot server goes on some kind of infinite loop. Json api if called separately works fine to query build specific data.
I am using buildbot 0.8.12 and I am new to this framework. Thanks for your help.
Per MailNotifier's docstring:
#param messageFormatter: function taking (mode, name, build, result,
master_status) and returning a dictionary
containing two required keys "body" and "type",
with a third optional key, "subject". The
"body" key gives a string that contains the
complete text of the message. The "type" key
is the message type ('plain' or 'html'). The
'html' type should be used when generating an
HTML message. The optional "subject" key
gives the subject for the email.
So you can just add one more item to the result dictionary and you get what you want. E.g.
...
return {..., 'subject': 'Abracadabra %s' % build.getProperty('my-favourite-build-property')}