USQL Execution and refer another script - azure-data-factory

Could you guys help with following:
How can we execute script usql script stored in ADL store using ADF. What is standard practice of storing script?
Currently I don't see a way to refer script from another script. It will make script execution simple because then I can make a deep chain where ScriptA will refer to ScriptB and so on and Only submitting ScriptB would be sufficient since it will automatically invoke dependent script.
Please point me to documentation for recommendation for better partition/indexing schema and performance improvement tips/tricks

This was just asked and answered here: Execute U-SQL script in ADL storage from Data Factory in Azure
U-SQL offers you a meta-data service with Procedures and Functions. So instead of doing file chaining, you can register your reusable script components as procedures and functions instead.
take a look at the performance tuning slides on http://www.slideshare.net/MichaelRys. If you have access to SQLPASS or TechReady presentation recordings, there are videos of that presentation available as well.

Related

How to execute COSMOS DB stored procedure with parameters via powershell

Looking for a powershell script/rest API to execute cosmos db stored procedure with partition key value.
You can use the REST API to execute Stored Procedures.
https://{databaseaccount}.documents.azure.com/dbs/{db-id}/colls/{coll-id}/sprocs/{sproc-name}
There is no native means to interact with Cosmos DB on its data-plane via PowerShell. There are three options you can explore. One of them is calling REST directly from PowerShell as indicated in the previous answer below. Your other options...
You can use this PowerShell Rest API Sample from the .NET SDK GitHub Repo. However, this requires authenticating via the REST API mentioned in the previous answer which can be a bit cumbersome.
You can create your own Custom PowerShell cmdlet in C#/.NET and then call that from your PS script. This may take longer than the example above but is easier to write and maintain. It also gives you the ability to do whatever you were looking to do in a stored procedure and simply implement in C# using the .NET SDK which can also yield benefits in maintainability.

For Azure Data Factories is there a way to 'Validate all' using powershell rather than the GUI?

A working Azure Data Factory (ADF) exists that contains pipelines with activities that are dependent on database tables
The definition of a database table changes
The next time the pipeline runs it fails
Of course we can set up something so it fails gracefully but ...
I need to proactively execute a scheduled Powershell script that iterates through all ADFs (iterating is easy) to do the equivalent of the 'Validate All' (validating is impossible?) functionality that the GUI provides
I do realise that the Utopian CI/CD DevOps environment I dream about will one day in the next year or so achieve this via other ways
I need the automation validation method today - not in a year!
I've looked at what I think are all of the powershell cmdlets available and short of somehow deleting and redeploying each ADF (fraught with danger) I can't find a simple method to validate an Azure Data Factory via Powershell.
Thanks in advance
In the ".Net" SDK, each of the models has a "Validate()" method. I have not yet found anything similar in the Powershell commands.
In my experience, the (GUI) validation is not foolproof. Some things are only tested at runtime.
I know it has been a while and you said you didn't want the validation to work in an year - but after a couple of years we finally have both the Validate all and Export ARM template features from the Data Factory user experience via a publicly available npm package #microsoft/azure-data-factory-utilities. The full guidance can be found on this documentation.

Use PowerShell to write Data direct to OMS

I'd like to write my custom data direct to our Microsoft Azure OMS using powershell. Does anyone know how this is possible?
There's alot of information about configuring instances, adding new datasources, or querying the data using Azure​RM.​Operational​Insights:
https://blogs.technet.microsoft.com/privatecloud/2016/04/05/using-the-oms-search-api-with-native-powershell-cmdlets/
https://learn.microsoft.com/de-ch/powershell/module/azurerm.operationalinsights/?view=azurermps-4.0.0
Has anyone found a way to directly write data to OMS instead of storing it in a log-file and then use custom-log-import? I know this is possible, i saw it in a keynote, but cannot find any information about this.
Thank you for your Inputs!
This should do the trick - writing to Log Analytics:
https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-data-collector-api#sample-requests
Really not much to add, it's a copy/paste process. Entries will show using the type you define in $LogType appended with '_CL' because you essentially create a Custom Log entry.

Celery Webhook Script in PhantomJS

I am trying to use celery to create a task management system, where the tasks are written in phantomjs. Basically the task is to crawl a URL and return a JSON object (that I'll want to use once the task is complete). There will be a list of URLs which will each have it's own crawl task.
I looked at this http://docs.celeryproject.org/en/latest/userguide/remote-tasks.html#calling-webhook-tasks to implement tasks in another language. However celery does not provide very good documentation at all on how the external webhook script should be written, what dependencies are required, and how it is run.
Should the task simply be stored on a server, which I will call with the HTTPCallback? How is it run in the PhantomJS framework. How are results from here stored or returned and how do I keep track of it? I have read a lot of docs and I still cannot find the exact interface or API for the other side of the webhook API.
I realize my question may not be too clear, so let me know, otherwise some good examples of this HTTPCallback API workflow would be helpful for a basic understanding, and how the webhook script in a different language is structured.
Any insight would be greatly appreciated. Thanks a lot!

automating downloads of facebook insight data

I'm looking for a tool or process for exporting facebook insights data for a facebook page and a facebook app. Currently I am just manually downloading csv files from their Insights interface but ideally I want to automate this process and load the data into Pentaho Kettle, so I can perform some operations on the data.
Is there some way to automate the downloading and input of csv files? Or will I have to use the facebook graph api explorer? I am currently looking at a set-up where I use NetBeans and RestFB to pull the data I want, and then access that data using Pentaho Kettle. I am not sure if this will work, or if it is the best approach.
As Codek says, a Kettle plugin is a very good idea, and would be very helpful to the Kettle project. However, it's also a serious effort.
If you don't want to put in that kind of effort, you can certainly download files with a Kettle Job as long as the files are available through a standard transfer method (FTP, SFTP, SSH, etc). I've never used RestFB, so I don't know what's available. You might be able to get directly from a web service with the REST Client transform step.
After downloading the files, you can send them to a transform to be loaded. You can do this with either the Execute for every input row? option on the Transformation job step, or you can get the filenames from the job's result set in the transform with Get files from result.
Then you can archive the files after loading with Copy or Move result filenames. In one job, I find only files that are not in my archive using a Get File Names and Merge Join, and then a Set files in result step in a transform, so that can be done if need too.
To automate it, you can run your job from a scheduler using Kitchen.bat/Kitchen.sh. Since I use PostgreSQL a lot, I use PGAgent as my scheduler, but the Windows scheduler or cron work too.
Hope that helps.