Calling Snowflake Stored Procedure from Tableau - tableau-api

I have a snowflake stored procedure which exports data to S3 based on dynamic input parameters. I am trying to set this up via tableau, so that I can use tableau parameters and call the snowflake stored procedure from Tableau, is this possible in any way?

While there's no straightforward solution, you could accomplish this task with a series of Snowflake facilities:
Create a task that monitors information_schema.query_history() every X minutes.
Have this task check for queries executed under a Tableau session.
If any of these queries have a parameter set by your Tableau dashboard that indicates the user wants to export these results, then do so.
You can check that a session was initiated by Tableau searching the query history for ALTER SESSION SET QUERY_TAG = { "tableau-query-origins": { "query-category": "Data" } }.

Related

Azure Data Factory - Insert Sql Row for Each File Found

I need a data factory that will:
check an Azure blob container for csv files
for each csv file
insert a row into an Azure Sql table, giving filename as a column value
There's just a single csv file in the blob container and this file contains five rows.
So far I have the following actions:
Within the for-each action I have a copy action. I did give this a source of a dynamic dataset which had a filename set as a parameter from #Item().name. However, as a result 5 rows were inserted into the target table whereas I was expecting just one.
The for-each loop executes just once but I don't know to use a data source that is variable(s) holding the filename and timestamp?
You are headed in the right direction, but within the For each you just need a Stored Procedure Activity that will insert the FileName (and whatever other metadata you have available) into Azure DB Table.
Like this:
Here is an example of the stored procedure in the DB:
CREATE Procedure Log.PopulateFileLog (#FileName varchar(100))
INSERT INTO Log.CvsRxFileLog
select
#FileName as FileName,
getdate() as ETL_Timestamp
EDIT:
You could also execute the insert directly with a Lookup Activity within the For Each like so:
EDIT 2
This will show how to do it without a for each
NOTE: This is the most cost effective method, especially when dealing with hundred or thousands of files on a recurring basis!!!
1st, Copy the output Json Array from your lookup/get metadata activity using a Copy Data activity with a Source of Azure SQLDB and Sink of Blob Storage CSV file
-------SOURCE:
-------SINK:
2nd, Create another Copy Data Activity with a Source of Blob Storage Json file, and a Sink of Azure SQLDB
---------SOURCE:
---------SINK:
---------MAPPING:
In essence, you save the entire json Output to a file in Blob, you then copy that file using a json file type to azure db. This way you have 3 activities to run even if you are trying to insert from a dataset that has 500 items in it.
Of course there is always more than one way to do things, but I don't think you need a For Each activity for this task. Activities like Lookup, Get Metadata and Filter output their results as JSON which can be passed around. This JSON can contain one or many items and can be passed to a Stored Procedure. An example pattern:
This is the sort of ELT pattern common with early ADF gen 2 (prior to Mapping Data Flows) which makes use of resources already in use in your architecture. You should remember that you are charged by the activity executions in ADF (eg multiple iteration in an unnecessary For Each loop) and that generally compute in Azure is expensive and storage is cheap, so think about this when implementing patterns in ADF. If you build the pattern above you have two types of compute: the compute behind your Azure SQL DB and the Azure Integration Runtime, so two types of compute. If you add a Data Flow to that, you will have a third type of compute operating concurrently to the other two, so personally I only add these under certain conditions.
An example implementation of the above pattern:
Note the expression I am passing into my example logging proc:
#string(activity('Filter1').output.Value)
Data Flows is perfectly fine if you want a low-code approach and do not have compute resource already available to do this processing. In your case you already have an Azure SQL DB which is quite capable with JSON processing, eg via the OPENJSON, JSON_VALUE and JSON_QUERY functions.
You mention not wanting to deploy additional code which I understand, but then where did your original SQL table come from? If you are absolutely against deploying additional code, you could simply call the sp_executesql stored proc via the Stored Proc activity, use a dynamic SQL statement which inserts your record, something like this:
#concat( 'INSERT INTO dbo.myLog ( logRecord ) SELECT ''', activity('Filter1').output, ''' ')
Shred the JSON either in your stored proc or later, eg
SELECT y.[key] AS name, y.[value] AS [fileName]
FROM dbo.myLog
CROSS APPLY OPENJSON( logRecord ) x
CROSS APPLY OPENJSON( x.[value] ) y
WHERE logId = 16
AND y.[key] = 'name';

Unable to specify parameters to parameterized source data set in ADF data flow

I have a data flow that has a parameter: TableName. The dataset that is used as a source within the flow is parameterized for a TableName parameter (SQL Server dataset). When selecting this dataset in source setting within the ADF dataflow, it does not allow me to set the TableName parameter as it does when setting the source within a standard CopyActivity.
So how does one use a parameterized dataset in a dataflow if it never allows you to set the parameters?
UPDATE: The settings are actually on the DataFlow activity itself.
As I understand, you mean that you can set the TableName in Copy Active and can't in Data Flow.
In Copy Active, we could set parameter like this:
But in Data Flow, the UI looks like:
I have a workaround is that we could choose the table with Query in Source operations:
'select * from ' +$TableName
Pipeline parameter:
Data Flow parameter:
It works well.
In data flow, you will set the dataset parameter in Debug Settings when designing/debugging your data flow. You can then set the parameter at runtime in the data flow activity settings in the pipeline.

Is there a way to SkipLinesAtEnd in a TextFormat Azure Data Factory

We receive Text files from a external partner.
They claim to be csv but have some difficult pre-header and footers.
In a ADF TextFormat I can use "skipLineCount": 6, But at the end i'm running in troubles ...
Any suggestions ?
Can't find something like SkipLinesAtEnd ....
This is the Sample
TITLE : Liste de NID_C_BG_NPIG configuré.
FILE NAME : Ines_bcn_npig_net_f.csv
CREATION DATE : 09/10/2019 23:18:43
ENVIRONMENT : Production 12c
<Begin of file>
"NID_C";"NID_BG";"N_PIG"
"253";"0";"0"
"253";"0";"1"
"253";"1";"0"
"253";"1";"1"
"253";"2";"0"
"253";"2";"1"
"253";"3";"0"
<End of file>
It seems that you are using skipLineCount setting in Data Flow.No feature like skipLinesAtEnd in ADF.
You could follow suggestion mentioned by #Joel that using Alter Row.
However,based on the official document,it only supports database mode sink.
So,if you are limited by that,i would suggest you parse the file first before copy job.For example,add an Azure Function Activity to cut the extra rows if you know the specific location of header and foot.Inside the Azure Function,just use the code to alter the file.
Jay & Joel are correct in pointing you toward Data Flows to solve this problem. Use Copy Activity in ADF for data movement-focused operations and Data Flows for data transformation.
You'll find the price for data movement similar to that of data transformation.
I would solve this in Data Flow and use a Filter transformation to filter out any row that has the string "" in it.
Should not need an Alter Row in this case. HTH!!

Automate SSAS tabular model refresh in the table level

I am trying to automate SSAS tabular model refresh. The requirement is - depending on the tables chosen, the model will be refreshed only for those tables. I am looking for a way to dynamically build the script to process only the selected tables in the first step of an SQL agent job and pass that dynamically built script to next step which will be SQL Server Analysis Services Command step. Or maybe execute the script built in step 1 itself. But I am not sure how could this be achieved. Please let me know the possible ways.
Have you considered doing this through SSIS and executing the package from SQL Agent? You can use an Analysis Services Processing Task and select the tables that you want to process. If you want to do this in a more dynamic manner, the follow items outline how this can be done.
The table names that you want to process will be stored in an object variable. One option for this is to query an SSAS DMV from an Execute SQL Task for the names of that tables that will be processed and output these names into an object variable. You'll need to set the Result Set to use a full result set and map the object variable in the Result Set pane. The following command will return the unique table names (table_type filter is used to remove results prefixed with $) select table_name from $SYSTEM.DBSCHEMA_TABLES where table_catalog = 'YourTabularModel' and table_schema = 'Model' and table_type = 'SYSTEM TABLE'
If you will be using SSAS DMVs then create an OLE DB connection manager using Microsoft OLE DB Provider for Analysis Services 13.0 as the provider. Make sure to set the initial catalog to the SSAS model with the tables that will be processed.
Add a Foreach ADO Enumerator Loop that will use the object variable as the source variable in the Collection pane. In the Variable Mappings pane, add a variable to store the table name.
Inside the Foreach Loop, add an Analysis Services Execute DDL Task.
Create a string variable with an expression that is the SSAS process command for the table. In the expression replace the table field (assuming you're using TMSL) with the variable holding the table name.

Tableau Extract API with multiple tables in a database

I am currently experimenting with Tableau Extract API to generate some TDE from the tables I have in a PostgreSQL database. I was able to write a code to generate the TDE from single table, but I would like to do this for multiple joined tables. To be more specific, if I have two tables that are inner joined by some field, how would I generate the TDE for this?
I can see that if I am working with small number of tables, I could use a SQL query with JOIN clauses to create a one gigantic table, and generate the TDE from that table.
>> SELECT * FROM table_1 INNER JOIN table_2
INTO new_table_1
ON table_1.id_1 = table_2.id_2;
>> SELECT * FROM new_table_1 INNER JOIN TABLE_3
INTO new_table_2
ON new_table_1.id_1 = table_3.id_3
and then generate the TDE from new_table_2.
However, I have some tables that have over 40 different fields, so this could get messy.
Is this even a possibility with current version of the API?
You can read from as many tables or other sources as you want. Or use complex query with lots of joins, or create a view and read from that. Usually, creating a view is helpful when you have a complex query joining many tables.
The data extract API is totally agnostic about how or where you get the data to feed it -- the whole point is to allow you to grab data from unusual sources that don't have pre-built drivers for Tableau.
Since Tableau has a Postgres driver and can read from it directly, you don't need to write a program with the data extract API at all. You can define your extract with Tableau Desktop. If you need to schedule automated refreshes of the extract, you can use Tableau Server or its tabcmd command.
Many thanks for your replies. I am aware that I could use Tableau Desktop to define my extract. In fact, I have done this many times before. I am just trying to create the extracts using the API, because I need to create some calculated fields, which is near impossible to create using the Tableau Desktop.
At this point, I am hesitant to use JOINs in the SQL query because the resulting table would look too complicated to comprehend (some of these tables also have same field names).
When you say that I could read from multiple tables or sources, does that mean with the Tableau Extract API? At this point, I cannot find anywhere in this API that accommodates multiple sources. For example, I know that when I use multiple tables in the Tableau Desktop, there are icons on the left hand side that tells me that the extract is composed of multiple tables. This just doesn't seem to be happening with the API, which leaves me stranded. Anyways, thank you again for your replies.
Going back to the topic, this is something that I tried few days ago on my python code
try:
tdefile= tde.Extract("extract.tde")
except:
os.remove("extract.tde")
tdefile = tde.Extract("extract.tde")
tableDef = tde.TableDefinition()
# Read each column in table and set the column data types using tableDef.addColumn
# Some code goes here...
for eachTable in tableNames:
tableAdd = tdeFile.addTable(eachTable, tableDef)
# Use SQL query to retrieve bunch_of_rows from eachTable
for some_row in bunch_of_rows:
# Read each row in table, and set the values in each column position of each row
# Some code goes here...
tableAdd.insert(some_row)
some_row.close()
tdefile.close()
When I execute this code, I get the error that eachTable has to be called "Extract".
Of course, this code has its flaws, as there is no where in this code that tells how each table are being joined.
So I am little thrown off here, because it doesn't seem like I can use multiple tables unless I use JOINs to generate one table that contains everything.