Cloning Power BI DirectQuery into a regular query - import

I have a Power Bi dataset that someone shared with me. I would like to import it into Power Bi Desktop and transform its data
I used DirectQuery to import the dataset and I managed to create a calculated table:
My_V_Products = CALCULATETABLE(V_Products)
However, when I try using TransformData, I do not see this table. I guess this is due to the fact that this is not actually a table created from a query but from a DAX.
Is there a way to import the entire table using a query or convert the data to transformable data?

Only if the Dataset is on a premium capacity. If it is you can connect to the XMLA endpoint for the workspace using the Analysis Services connector, and create an Import table using a custom DAX query, like evaluate V_Products.

Related

Converting data types of Foreign Keys to use Joiner in Google Cloud Data Fusion Pipeline

I am building a pipeline that connects to an on-prem Oracle DB using the Database Plugin, queries two tables (table_a, table_b), and joins those tables using Joiner Plugin, before uploading to a BigQuery table.
The problem I have now is that the Foreign Keys to join table_a and table_b have different data types when I use Get Schema in the Database Plugin. In Joiner, I am joining the tables on table_a.customer_id = table_b.customer_id.
The dtype of table_a.customer_id is LONG but table_b.customer_id is DOUBLE. In the source Oracle DB, both columns are actually integers. For some reason, though, using Get Schema thinks they are LONG and DOUBLE.
I am obviously getting an error in Joiner trying to join on a foreign keys with different data types.
Is there a way to cast/convert the columns from the tables to match so that I can use Joiner?
I've seen some examples using Wrangler Transform to parse dates, but I don't see anything to convert to any other data types. I couldn't find any directive examples either: https://github.com/data-integrations/wrangler.
You can transform your data before joining them by using any of the transform plugins that Cloud Data Fusion offers. As #muscat mentioned, you can use Wrangler transform and utilize the Set type directives, or you can use the Projection transform and configure the Convert field.

Calling Snowflake Stored Procedure from Tableau

I have a snowflake stored procedure which exports data to S3 based on dynamic input parameters. I am trying to set this up via tableau, so that I can use tableau parameters and call the snowflake stored procedure from Tableau, is this possible in any way?
While there's no straightforward solution, you could accomplish this task with a series of Snowflake facilities:
Create a task that monitors information_schema.query_history() every X minutes.
Have this task check for queries executed under a Tableau session.
If any of these queries have a parameter set by your Tableau dashboard that indicates the user wants to export these results, then do so.
You can check that a session was initiated by Tableau searching the query history for ALTER SESSION SET QUERY_TAG = { "tableau-query-origins": { "query-category": "Data" } }.

Importing data to SQL from Excel using Talend

I am trying to import data to SQL from Excel. I have created a successful connection with the database but while trying to retrieve the schema I am not getting my table, instead I am having the schema of the database (Type CATALOG).
How do I get the schema of the table to which I will export the Excel data?
I have refereed to this video to do the import.
http://www.youtube.com/watch?v=JDBYU9f1p-I
What you can use is tFileExcelSheetOutput, map what you need with tMap and send the to t[DB]Input.
http://www.talendbyexample.com/talend-tdbinput-reference.html

Tableau Extract API with multiple tables in a database

I am currently experimenting with Tableau Extract API to generate some TDE from the tables I have in a PostgreSQL database. I was able to write a code to generate the TDE from single table, but I would like to do this for multiple joined tables. To be more specific, if I have two tables that are inner joined by some field, how would I generate the TDE for this?
I can see that if I am working with small number of tables, I could use a SQL query with JOIN clauses to create a one gigantic table, and generate the TDE from that table.
>> SELECT * FROM table_1 INNER JOIN table_2
INTO new_table_1
ON table_1.id_1 = table_2.id_2;
>> SELECT * FROM new_table_1 INNER JOIN TABLE_3
INTO new_table_2
ON new_table_1.id_1 = table_3.id_3
and then generate the TDE from new_table_2.
However, I have some tables that have over 40 different fields, so this could get messy.
Is this even a possibility with current version of the API?
You can read from as many tables or other sources as you want. Or use complex query with lots of joins, or create a view and read from that. Usually, creating a view is helpful when you have a complex query joining many tables.
The data extract API is totally agnostic about how or where you get the data to feed it -- the whole point is to allow you to grab data from unusual sources that don't have pre-built drivers for Tableau.
Since Tableau has a Postgres driver and can read from it directly, you don't need to write a program with the data extract API at all. You can define your extract with Tableau Desktop. If you need to schedule automated refreshes of the extract, you can use Tableau Server or its tabcmd command.
Many thanks for your replies. I am aware that I could use Tableau Desktop to define my extract. In fact, I have done this many times before. I am just trying to create the extracts using the API, because I need to create some calculated fields, which is near impossible to create using the Tableau Desktop.
At this point, I am hesitant to use JOINs in the SQL query because the resulting table would look too complicated to comprehend (some of these tables also have same field names).
When you say that I could read from multiple tables or sources, does that mean with the Tableau Extract API? At this point, I cannot find anywhere in this API that accommodates multiple sources. For example, I know that when I use multiple tables in the Tableau Desktop, there are icons on the left hand side that tells me that the extract is composed of multiple tables. This just doesn't seem to be happening with the API, which leaves me stranded. Anyways, thank you again for your replies.
Going back to the topic, this is something that I tried few days ago on my python code
try:
tdefile= tde.Extract("extract.tde")
except:
os.remove("extract.tde")
tdefile = tde.Extract("extract.tde")
tableDef = tde.TableDefinition()
# Read each column in table and set the column data types using tableDef.addColumn
# Some code goes here...
for eachTable in tableNames:
tableAdd = tdeFile.addTable(eachTable, tableDef)
# Use SQL query to retrieve bunch_of_rows from eachTable
for some_row in bunch_of_rows:
# Read each row in table, and set the values in each column position of each row
# Some code goes here...
tableAdd.insert(some_row)
some_row.close()
tdefile.close()
When I execute this code, I get the error that eachTable has to be called "Extract".
Of course, this code has its flaws, as there is no where in this code that tells how each table are being joined.
So I am little thrown off here, because it doesn't seem like I can use multiple tables unless I use JOINs to generate one table that contains everything.

Using Script Task to create ADO NET (ODBC) Data Flow Source

I need some help with a SSIS Script Task (SQL 2008 R2) that dynamically creates a package. I am refining a package that copies data from a Sage Timberline (Now rebranded to Sage 300) Pervasive SQL environment to a SQL server data warehouse. I can create a package that opens the connection to Timberline and copies the data to a table in SQL Server. The problem is, for each company in timberline and each table in SQL, I need to create a separate data flow task. Given the three Timberline company folders and the number of tables in each folder, this would take a lot of time to create and be cumbersome to maintain and troubleshoot.
I am trying to create a package that uses a Foreach Loop to create a package that creates a ADO/ODBC source (Timberline), a OLE destination (SQL) and dynamically handles the column mapping. I found code here that almost does what I need.
I tested this code and it works great using OLE SQL source and destinations. What makes this script work is that it dynamically handles the column mapping. So, it you placed it into a Foreach Loop of the 100 or so tables, with each loop it could dynamically create the data flow and map the columns, then execute the new package.
My problem is that I can only connect to Timberline using ODBC. So, I need to modify the script to create the source connection with ADO NET (ODBC) instead of OLE. I’m having a lot of trouble trying to figure this out. Could someone please help me out with this?
Here the other couple of things I tried first, other than this approach:
Solution: Setup a Linked server to Timberline Pervasive SQL
Problem: SQL server is 64-bit and the Timberline driver is 32-bit. Using a linked server returns a architecture mismatch error. I called Sage and they said they have no plans to release a 64-bit drive.
Solution: Use one of the SQL Transfer tasks
Problem: Only works with SQL databases. This source is a Pervasive SQL database
Solution: Use a “INSERT … INTO …” type script
Problem: This requires a linked server. See the problem above
Here’s the section of the original VB .NET code I need help with:
'To Create a package named [Sample Package]
Dim package As New Package()
package.Name = "Sample Package"
package.PackageType = DTSPackageType.DTSDesigner100
package.VersionBuild = 1
'To add Connection Manager to the package
'For source database (OLTP)
Dim OLTP As ConnectionManager = package.Connections.Add("OLEDB")
OLTP.ConnectionString = "Data Source=.;Initial Catalog=OLTP;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;"
OLTP.Name = "LocalHost.OLTP"
'To add Load Employee Dim to the package [Data Flow Task]
Dim dataFlowTaskHost As TaskHost = DirectCast(package.Executables.Add("SSIS.Pipeline.2"), TaskHost)
dataFlowTaskHost.Name = "Load Employee Dim"
dataFlowTaskHost.FailPackageOnFailure = True
dataFlowTaskHost.FailParentOnFailure = True
dataFlowTaskHost.DelayValidation = False
dataFlowTaskHost.Description = "Data Flow Task"
'-----------Data Flow Inner component starts----------------
Dim dataFlowTask As MainPipe = TryCast(dataFlowTaskHost.InnerObject, MainPipe)
' Source OLE DB connection manager to the package.
Dim SconMgr As ConnectionManager = package.Connections("LocalHost.OLTP")
' Create and configure an OLE DB source component.
Dim source As IDTSComponentMetaData100 = dataFlowTask.ComponentMetaDataCollection.[New]()
source.ComponentClassID = "DTSAdapter.OLEDBSource.2"
' Create the design-time instance of the source.
Dim srcDesignTime As CManagedComponentWrapper = source.Instantiate()
' The ProvideComponentProperties method creates a default output.
srcDesignTime.ProvideComponentProperties()
source.Name = "Employee Dim from OLTP"
' Assign the connection manager.
source.RuntimeConnectionCollection(0).ConnectionManagerID = SconMgr.ID
source.RuntimeConnectionCollection(0).ConnectionManager = DtsConvert.GetExtendedInterface(SconMgr)
' Set the custom properties of the source.
srcDesignTime.SetComponentProperty("AccessMode", 0)
' Mode 0 : OpenRowset / Table - View
srcDesignTime.SetComponentProperty("OpenRowset", "[dbo].[Employee_Dim]")
' Connect to the data source, and then update the metadata for the source.
srcDesignTime.AcquireConnections(Nothing)
srcDesignTime.ReinitializeMetaData()
srcDesignTime.ReleaseConnections()
Thanks in advance!
The C# code here is what you need if you need a Derived Column transform between the Source and Destination...
http://bifuture.blogspot.com/2011/01/ssis-adding-derived-column-to-ssis.html
To get the Source & Destination connections working, there is some secret sauce here to get things working between COM and .Net...
http://blogs.msdn.com/b/mattm/archive/2008/12/30/api-sample-ado-net-source.aspx
There is a similar page showing what to do for OleDB connections too.
Creating the source tables is easy. The available ODBC Metadata collections accessible should be retrieved with GetSchema("MetaDataCollections"). This will return a list of the available schema collections available for that particular ODBC driver.
Next, you'll want to see the data types returned from GetSchema("DataTypes"), so you can correctly interpret the data types for each column retrieved from GetSchema("Columns") to make your SQL Server create table script (which I'm assuming you've done).
To at least figure out which tables have primary keys, you'll need to loop over each table returned from GetSchema("Tables") in order to work with GetSchema("Indexes"). There's a bug that requires you to query the Indexes one table at a time. It is easy to google this - create a string array to pass in as the 3rd parameter: GetSchema("Indexes", tblName, resultArray[])
What I did was got the Tables and Columns collections into object variables in my parent SSIS package. Because Timberline is so fast (not), it seemed more efficient to pull all the columns down and filter them locally...which I do to create the tables in SQL Server, if necessary.
Once that is done, use the local copy of Tables again to manipulate a SSIS package in a Script task in "design mode" (change source and destination target tables, and redo the column mappings), and execute the now-in-memory SSIS package.
For me it took awhile to figure out. Both above URLs were required. I found and copied the .Net 2.0 Dts.PipelineWrap and Dts.RuntimeWrap .dlls to Microsoft.Net\FrameworkV2.0xxxxx folder, then referenced these in each script task wanting to use them, before setting up my "using DtsPW = Microsoft.SqlServer.Dts.Pipeline.Wrapper", etc.
Of note, because Timberline is 32-bit ODBC, I think it's necessary to build the SSIS package to use "X86", and target the script tasks to use .Net 2.0 framework.
I used the Derived Column code because I needed to copy multiple Timberline DBs into one SQL Server DB. Derived Column adds a "CompanyID" value to the output pipeline to SQL Server.
In the end, map the Destination's Virtual Input columns to its External Metadata columns, based off of the pipeline the Destination is attached to:
foreach (DtsPW.IDTSVirtualInputColumn100 vColumn in destVirtInput.VirtualInputColumnCollection)
{
var vCol = destInst.SetUsageType(destInput.ID, destVirtInput, vColumn.LineageID, DtsPW.DTSUsageType.UT_READWRITE);
destInst.MapInputColumn(destInput.ID, vCol.ID, destInput.ExternalMetadataColumnCollection[vColumn.Name].ID);
}
Anyways, that code will make more sense in the context of the bifuture.blogspot.com page.
The EzApi library could help with this too, but the AdoNet connection source for it is coded as a virtual class, so you'd need to implement specific classes to use. My C# kungfu is not strong enough for that in the time I have...
Also, CozyRoc sells a toolset with custom SSIS controls (data flow Source and Destination controls...) that looks like it does this on the fly input-to-output column mapping as well.
My package seems to work good enough now... Oh, and one more, I did not have luck trying to use DSN-less ODBC connections to Timberline, just: Dsn=dsnname;Uid=user;Pwd=pwd;
SSIS packages running in 64-bit land cannot see 32-bit DSNs on 64-bit OS, it seems...at least, it didn't work for me (win7-64, 32-bit Text ODBC DSN).