Creating a Stored Procedure in DocumentDB via powershell

Creating a Stored Procedure in DocumentDB via powershell - powershell

We can create an SP by getting the collection's self link like the below statement:
$procedure = $client.CreateStoredProcedureAsync($coll_list.SelfLink,$proc)
Can we create a stored procedure using Uri like in the below statement:
$procedure = $docClient.CreateStoredProcedureAsync($dbUri,$proc).Result
Also, how to take care of the default Indexing Policy that gets created using powershell? Right now it creates a custom indexing policy. I want to create a default indexing policy.

Related

Azure Data Factory - Insert Sql Row for Each File Found

I need a data factory that will:
check an Azure blob container for csv files
for each csv file
insert a row into an Azure Sql table, giving filename as a column value
There's just a single csv file in the blob container and this file contains five rows.
So far I have the following actions:
Within the for-each action I have a copy action. I did give this a source of a dynamic dataset which had a filename set as a parameter from #Item().name. However, as a result 5 rows were inserted into the target table whereas I was expecting just one.
The for-each loop executes just once but I don't know to use a data source that is variable(s) holding the filename and timestamp?

You are headed in the right direction, but within the For each you just need a Stored Procedure Activity that will insert the FileName (and whatever other metadata you have available) into Azure DB Table.
Like this:
Here is an example of the stored procedure in the DB:
CREATE Procedure Log.PopulateFileLog (#FileName varchar(100))
INSERT INTO Log.CvsRxFileLog
select
#FileName as FileName,
getdate() as ETL_Timestamp
EDIT:
You could also execute the insert directly with a Lookup Activity within the For Each like so:
EDIT 2
This will show how to do it without a for each
NOTE: This is the most cost effective method, especially when dealing with hundred or thousands of files on a recurring basis!!!
1st, Copy the output Json Array from your lookup/get metadata activity using a Copy Data activity with a Source of Azure SQLDB and Sink of Blob Storage CSV file
-------SOURCE:
-------SINK:
2nd, Create another Copy Data Activity with a Source of Blob Storage Json file, and a Sink of Azure SQLDB
---------SOURCE:
---------SINK:
---------MAPPING:
In essence, you save the entire json Output to a file in Blob, you then copy that file using a json file type to azure db. This way you have 3 activities to run even if you are trying to insert from a dataset that has 500 items in it.

Of course there is always more than one way to do things, but I don't think you need a For Each activity for this task. Activities like Lookup, Get Metadata and Filter output their results as JSON which can be passed around. This JSON can contain one or many items and can be passed to a Stored Procedure. An example pattern:
This is the sort of ELT pattern common with early ADF gen 2 (prior to Mapping Data Flows) which makes use of resources already in use in your architecture. You should remember that you are charged by the activity executions in ADF (eg multiple iteration in an unnecessary For Each loop) and that generally compute in Azure is expensive and storage is cheap, so think about this when implementing patterns in ADF. If you build the pattern above you have two types of compute: the compute behind your Azure SQL DB and the Azure Integration Runtime, so two types of compute. If you add a Data Flow to that, you will have a third type of compute operating concurrently to the other two, so personally I only add these under certain conditions.
An example implementation of the above pattern:
Note the expression I am passing into my example logging proc:
#string(activity('Filter1').output.Value)
Data Flows is perfectly fine if you want a low-code approach and do not have compute resource already available to do this processing. In your case you already have an Azure SQL DB which is quite capable with JSON processing, eg via the OPENJSON, JSON_VALUE and JSON_QUERY functions.
You mention not wanting to deploy additional code which I understand, but then where did your original SQL table come from? If you are absolutely against deploying additional code, you could simply call the sp_executesql stored proc via the Stored Proc activity, use a dynamic SQL statement which inserts your record, something like this:
#concat( 'INSERT INTO dbo.myLog ( logRecord ) SELECT ''', activity('Filter1').output, ''' ')
Shred the JSON either in your stored proc or later, eg
SELECT y.[key] AS name, y.[value] AS [fileName]
FROM dbo.myLog
CROSS APPLY OPENJSON( logRecord ) x
CROSS APPLY OPENJSON( x.[value] ) y
WHERE logId = 16
AND y.[key] = 'name';

How do I query Postgresql with IDs from a parquet file in an Data Factory pipeline

I have an azure pipeline that moves data from one point to another in parquet files. I need to join some data from a Postgresql database that is in an AWS tenancy by a unique ID. I am using a dataflow to create the unique ID I need from two separate columns using a concatenate. I am trying to create where clause e.g.
select * from tablename where unique_id in ('id1','id2',id3'...)
I can do a lookup query to the database, but I can't figure out how to create the list of IDs in a parameter that I can use in the select statement out of the dataflow output. I tried using a set variable and was going to put that into a for-each, but the set variable doesn't like the output of the dataflow (object instead of array). "The variable 'xxx' of type 'Array' cannot be initialized or updated with value of type 'Object'. The variable 'xxx' only supports values of types 'Array'." I've used a flatten to try to transform to array, but I think the sync operation is putting it back into JSON?
What a workable approach to getting the IDs into a string that I can put into a lookup query?
Some notes:
The parquet file has a small number of unique IDs compared to the total unique IDs in the database.
If this were an azure postgresql I could just use a join in the dataflow to do the join, but the generic postgresql driver isn't available in dataflows. I can't copy the entire database over to Azure just to do the join and I need the dataflow in Azure for non-technical reasons.
Edit:
For clarity sake, I am trying to replace local python code that does the following:
query = "select * from mytable where id_number in "
df = pd.read_parquet("input_file.parquet")
df['id_number'] = df.country_code + df.id
df_other_data = pd.read_sql(conn, query + str(tuple(df.id_number))
I'd like to replace this locally executing code with ADF. In the ADF process, I have to replace the transformation of the IDs which seems easy enough if a couple of different ways. Once I have the IDs in the proper format in a column in a dataset, I can't figure out how to query a database that isn't supported by Data Flow and restrict it to only the IDs I need so I don't bring down the entire database.

Due to variables of ADF only can store simple type. So we can define an Array type paramter in ADF and set default value. Paramters of ADF support any type of elements including complex JSON structure.
For example:
Define a json array:
[{"name": "Steve","id": "001","tt_1": 0,"tt_2": 4,"tt3_": 1},{"name": "Tom","id": "002","tt_1": 10,"tt_2": 8,"tt3_": 1}]
Define an Array type paramter and set its default value:
So we will not get any error.

Automate SSAS tabular model refresh in the table level

I am trying to automate SSAS tabular model refresh. The requirement is - depending on the tables chosen, the model will be refreshed only for those tables. I am looking for a way to dynamically build the script to process only the selected tables in the first step of an SQL agent job and pass that dynamically built script to next step which will be SQL Server Analysis Services Command step. Or maybe execute the script built in step 1 itself. But I am not sure how could this be achieved. Please let me know the possible ways.

Have you considered doing this through SSIS and executing the package from SQL Agent? You can use an Analysis Services Processing Task and select the tables that you want to process. If you want to do this in a more dynamic manner, the follow items outline how this can be done.
The table names that you want to process will be stored in an object variable. One option for this is to query an SSAS DMV from an Execute SQL Task for the names of that tables that will be processed and output these names into an object variable. You'll need to set the Result Set to use a full result set and map the object variable in the Result Set pane. The following command will return the unique table names (table_type filter is used to remove results prefixed with $) select table_name from $SYSTEM.DBSCHEMA_TABLES where table_catalog = 'YourTabularModel' and table_schema = 'Model' and table_type = 'SYSTEM TABLE'
If you will be using SSAS DMVs then create an OLE DB connection manager using Microsoft OLE DB Provider for Analysis Services 13.0 as the provider. Make sure to set the initial catalog to the SSAS model with the tables that will be processed.
Add a Foreach ADO Enumerator Loop that will use the object variable as the source variable in the Collection pane. In the Variable Mappings pane, add a variable to store the table name.
Inside the Foreach Loop, add an Analysis Services Execute DDL Task.
Create a string variable with an expression that is the SSAS process command for the table. In the expression replace the table field (assuming you're using TMSL) with the variable holding the table name.

DocumentDB Why triggers can not be triggered from Azure portal?

I have a pre-trigger on replace action. I realized that, unlike SQL Server triggers, the DocumentDB triggers won't fire when you update documents in Azure portal. Do I miss any settings from the portal? or this is how DocumentDB trigger work? only can be triggered from application code?
Thanks!

Your understanding is correct. There is no technical blocker to allow this feature from portal, only that it is currently missing.
It's certainly a valid request to be placed # https://feedback.azure.com/forums/263030-documentdb

DocumentDb Database Triggers are not automatically raised via DML, like create & delete operations, which is common in other databases.
That is, triggers must be specified for each database operation you make in application code. Also, the trigger should be of the same type, that is, an insert operations can only take a create trigger type, not replace type.
Since, I have azure function documentdb output bindings, and don't do the DML operations myself. After wasting a lot time debugging, moved on to create a stored procedure under the database collection, and then called it via Azure function code using below kind of code.
This works perfectly:
// call stored procedure, nodejs, azure
'use strict';
var DocumentClient = require('documentdb').DocumentClient;
var client = new DocumentClient(process.env.DB_HOST, {masterKey: process.env.DB_M_KEY});
var sprocLink =
'dbs/' + sprocDbName1 +
'/colls/' + sprocCollName1 +
'/sprocs/' + sprocName1;
var sprocParams = {
key1: "val1",
key2: "val2"
};
client
.executeStoredProcedure(
sprocLink,
sprocParams1,
function (err, results) {
if (err) {
context.log.error('err');
context.log.error(err);
return;
}
context.log.verbose('results');
context.log.verbose(results);
return;
});
Note: Give values for DB_HOST (url ending with :443/), DB_M_KEY, sprocDbName1 (your db name), sprocCollName1 (your collection name, sprocName1 (your stored proc name)
Before doing above, a stored procedure (sproc) should be created in inside DocumentDb database collection.
Hope that helps.

JPA: How to call a stored procedure

I have a stored procedure in my project under sql/my_prod.sql
there I have my function delete_entity
In my entity
#NamedNativeQuery(name = "delete_entity_prod",
query = "{call /sql/delete_entity(:lineId)}",
and I call it
Query query = entityManager.createNamedQuery("delete_entity_prod")
setParameter("lineId",lineId);
I followed this example: http://objectopia.com/2009/06/26/calling-stored-procedures-in-jpa/
but it does not execute the delete and it does not send any error.
I haven't found clear information about this, am I missing something? Maybe I need to load the my_prod.sql first? But how?

JPA 2.1 standardized stored procedure support if you are able to use it, with examples here http://en.wikibooks.org/wiki/Java_Persistence/Advanced_Topics#Stored_Procedures

This is actually they way you create a query.
Query query = entityManager.createNamedQuery("delete_entity_prod")
setParameter("lineId",lineId);
To call it you must execute:
query.executeUpdate();
Of course, the DB must already contain the procedure. So if you have it defined in your SQL file, have a look at Executing SQL Statements from a Text File(this is for MySQL but other database systems use a similar approach to execute scripts)

There is no error shown because query is not executed at any point - just instance of Query is created. Query can be executed by calling executeUpdate:
query.executeUpdate();
Then next problem will arise: Writing some stored procedures to file is not enough - procedures live in database, not in files. So next thing to do is to check that there is correct script to create stored procedure in hands (maybe that is currently content of sql/my_prod.sql) and then use that to create procedure via database client.
All JPA implementations do not support calling stored procedures, but I assume Hibernate is used under the hood, because that is also used in linked tutorial.
It can be the case that current
{call /sql/delete_entity(:lineId)}
is right syntax for calling stored procedure in your database. It looks rather suspicious because of /sql/. If it turns out that this is incorrect syntax, then:
Consult manual for correct syntax
Test via client
Use that as a value of query attribute in NamedNativeQuery annotation.
All that with combination MySQL+Hibernate is explained for example here.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse