A field that causes the Power BI Merge and left anti join to not work ?
In Sharepoint it is this shown like this - ELEKTRÄ°K SÄ°STEMLERÄ° LTD.ÅžT
In Azure Data Bricks it is shown like this - ELEKTRİK SİSTEMLERİ LTD.ŞT ( LOOKS DIFFERENT ? THE “RİK” bit )
We got a key that uses this Manafacturer Name. This is used to do a Comparison between the SharePoint list and the Azure Data Bricks Table and work out the deleted records .
In Power BI there is a couple of records where the Merge and the left ant join don’t work . These are the keys of the records that do not work . So this is how this key is seen in the backend of Power BI ( Power Query side )
ELEKTRÄ°K SÄ°STEMLERÄ° LTD.ÅžT ( FROM SHAREPOINT LIST DATA SOURCE IN POWER BI )
ELEKTRÄ°K SÄ°STEMLERÄ° LTD.ÅžT ( FROM AZURE DATA BRICKS DATA SOURCE IN POWER BI )
Looks the same so should match and not be returned as a deleted record .
In Power BI Merge statement do not work with this key for some reason ? Is it a special Character set problem ? Any way round it ?
You may try to verify locale settings for your source.
You can also change columnTypes using a specific locale.
Related
I have a Power Bi dataset that someone shared with me. I would like to import it into Power Bi Desktop and transform its data
I used DirectQuery to import the dataset and I managed to create a calculated table:
My_V_Products = CALCULATETABLE(V_Products)
However, when I try using TransformData, I do not see this table. I guess this is due to the fact that this is not actually a table created from a query but from a DAX.
Is there a way to import the entire table using a query or convert the data to transformable data?
Only if the Dataset is on a premium capacity. If it is you can connect to the XMLA endpoint for the workspace using the Analysis Services connector, and create an Import table using a custom DAX query, like evaluate V_Products.
I need a data factory that will:
check an Azure blob container for csv files
for each csv file
insert a row into an Azure Sql table, giving filename as a column value
There's just a single csv file in the blob container and this file contains five rows.
So far I have the following actions:
Within the for-each action I have a copy action. I did give this a source of a dynamic dataset which had a filename set as a parameter from #Item().name. However, as a result 5 rows were inserted into the target table whereas I was expecting just one.
The for-each loop executes just once but I don't know to use a data source that is variable(s) holding the filename and timestamp?
You are headed in the right direction, but within the For each you just need a Stored Procedure Activity that will insert the FileName (and whatever other metadata you have available) into Azure DB Table.
Like this:
Here is an example of the stored procedure in the DB:
CREATE Procedure Log.PopulateFileLog (#FileName varchar(100))
INSERT INTO Log.CvsRxFileLog
select
#FileName as FileName,
getdate() as ETL_Timestamp
EDIT:
You could also execute the insert directly with a Lookup Activity within the For Each like so:
EDIT 2
This will show how to do it without a for each
NOTE: This is the most cost effective method, especially when dealing with hundred or thousands of files on a recurring basis!!!
1st, Copy the output Json Array from your lookup/get metadata activity using a Copy Data activity with a Source of Azure SQLDB and Sink of Blob Storage CSV file
-------SOURCE:
-------SINK:
2nd, Create another Copy Data Activity with a Source of Blob Storage Json file, and a Sink of Azure SQLDB
---------SOURCE:
---------SINK:
---------MAPPING:
In essence, you save the entire json Output to a file in Blob, you then copy that file using a json file type to azure db. This way you have 3 activities to run even if you are trying to insert from a dataset that has 500 items in it.
Of course there is always more than one way to do things, but I don't think you need a For Each activity for this task. Activities like Lookup, Get Metadata and Filter output their results as JSON which can be passed around. This JSON can contain one or many items and can be passed to a Stored Procedure. An example pattern:
This is the sort of ELT pattern common with early ADF gen 2 (prior to Mapping Data Flows) which makes use of resources already in use in your architecture. You should remember that you are charged by the activity executions in ADF (eg multiple iteration in an unnecessary For Each loop) and that generally compute in Azure is expensive and storage is cheap, so think about this when implementing patterns in ADF. If you build the pattern above you have two types of compute: the compute behind your Azure SQL DB and the Azure Integration Runtime, so two types of compute. If you add a Data Flow to that, you will have a third type of compute operating concurrently to the other two, so personally I only add these under certain conditions.
An example implementation of the above pattern:
Note the expression I am passing into my example logging proc:
#string(activity('Filter1').output.Value)
Data Flows is perfectly fine if you want a low-code approach and do not have compute resource already available to do this processing. In your case you already have an Azure SQL DB which is quite capable with JSON processing, eg via the OPENJSON, JSON_VALUE and JSON_QUERY functions.
You mention not wanting to deploy additional code which I understand, but then where did your original SQL table come from? If you are absolutely against deploying additional code, you could simply call the sp_executesql stored proc via the Stored Proc activity, use a dynamic SQL statement which inserts your record, something like this:
#concat( 'INSERT INTO dbo.myLog ( logRecord ) SELECT ''', activity('Filter1').output, ''' ')
Shred the JSON either in your stored proc or later, eg
SELECT y.[key] AS name, y.[value] AS [fileName]
FROM dbo.myLog
CROSS APPLY OPENJSON( logRecord ) x
CROSS APPLY OPENJSON( x.[value] ) y
WHERE logId = 16
AND y.[key] = 'name';
I am trying to use the VSTS.Feed() function in Power BI to read WorkItemSnapshot data. There are multiple problems. If I build the entire URL into a single string and call VSTS.Feed () with that, I get the correct information in Power BI desktop, but it will not refresh in Power BI online. I have been told to use the (undocumented) Query parameter, as shown below, but it is clear that this parameter is ignored. I can see that the select parameter is ignored on smaller projects, because all columns are returned. I can see that the filter parameter is ignored because the query fails on larger projects.
Does anyone have a working example of using the Query parameter with VSTS.Feed()?
let
BaseURL = "https://server.analytics.visualstudio.com/DefaultCollection/project/_odata/WorkItemSnapshot",
Select = "DateSK,WorkItemId,State,WorkItemType",
Filter = "WorkItemType eq Bug and State ne Closed and State ne Removed and DateSK ge 20180517 and DateSK le 20180615",
Source = VSTS.Feed(BaseURL, [Query=[select=#"Select",filter=#"Filter"]])
in
Source
Update:
With the query above, the message I get is shown below. As I said earlier, it is clearly not using the Filter parameter, and I'm assuming it is not using the Select parameter, either. I can't query everything because there is too much data, and I can't use a filter because I can't figure out a way to get the Options parameter to work. With VSTS.AccountContents, the options parameter works well, but those API endpoints don't use $ in parameter names.
Error: Query result contains 36,788,023 rows and it exceeds maximum allowed size of 300,000. Please reduce the number of records by applying additional filters
Details:
DataSourceKind=Visual Studio Team Services
ActivityId=881f7988-9863-4e03-8375-0489028f28f3
Url=https://server.analytics.visualstudio.com/DefaultCollection/Project/_odata/WorkItemSnapshot
error=Record
The query that started this whole line of questioning is simply one with a variable for a start date.
let
startDate = DateTimeZone.ToText (Date.AddDays(DateTimeZone.UtcNow(), -45), "yyyyMMdd"),
URL = "https://server.analytics.visualstudio.com/DefaultCollection/project/_odata/WorkItemSnapshot?$select=DateSK,WorkItemId,State,WorkItemType&$filter=WorkItemType eq 'Bug' and State ne 'Closed' and State ne 'Removed' and DateSK gt " & startDate,
Source = VSTS.Feed(URL)
in
Source
While this query mostly works in Power BI desktop (the select clause is ignored), the message I get when the data source is refreshed online is:
You can't schedule refresh for this dataset because one or more sources currently don't support refresh.
Discover Data Sources
Query contains unknown or unsupported data sources.
The documentation for VSTS.Feed() contradicts itself, saying both
The VSTS.Feed function has the same arguments, options and return value format as OData.Feed.
and
'VSTS.Feed' provides a subset of the Arguments and Options available through 'OData.Feed'.
To to summarize, I know that I can't combine data sources in Power BI. Does VSTS.Feed() support the options parameter? If so, how do I pass a Filter and Select clause to it?
To get WorkItemSnapshot by vsts.feed, please refer below query:
let
Source = OData.Feed("https://account.analytics.visualstudio.com/project/_odata/v1.0-preview", null, [Implementation="2.0"]),
WorkItemSnapshot_table = Source{[Name="WorkItemSnapshot",Signature="table"]}[Data]
in
WorkItemSnapshot_table
Note: the URL format should be https://account.analytics.visualstudio.com/project/_odata/v1.0-preview, or https://account.analytics.visualstudio.com/_odata/v1.0-preview.
And you can refer below documents:
Connect to VSTS using the Power BI OData feed
Connect using Power Query and Visual Studio Team Services (VSTS) functions
I am currently experimenting with Tableau Extract API to generate some TDE from the tables I have in a PostgreSQL database. I was able to write a code to generate the TDE from single table, but I would like to do this for multiple joined tables. To be more specific, if I have two tables that are inner joined by some field, how would I generate the TDE for this?
I can see that if I am working with small number of tables, I could use a SQL query with JOIN clauses to create a one gigantic table, and generate the TDE from that table.
>> SELECT * FROM table_1 INNER JOIN table_2
INTO new_table_1
ON table_1.id_1 = table_2.id_2;
>> SELECT * FROM new_table_1 INNER JOIN TABLE_3
INTO new_table_2
ON new_table_1.id_1 = table_3.id_3
and then generate the TDE from new_table_2.
However, I have some tables that have over 40 different fields, so this could get messy.
Is this even a possibility with current version of the API?
You can read from as many tables or other sources as you want. Or use complex query with lots of joins, or create a view and read from that. Usually, creating a view is helpful when you have a complex query joining many tables.
The data extract API is totally agnostic about how or where you get the data to feed it -- the whole point is to allow you to grab data from unusual sources that don't have pre-built drivers for Tableau.
Since Tableau has a Postgres driver and can read from it directly, you don't need to write a program with the data extract API at all. You can define your extract with Tableau Desktop. If you need to schedule automated refreshes of the extract, you can use Tableau Server or its tabcmd command.
Many thanks for your replies. I am aware that I could use Tableau Desktop to define my extract. In fact, I have done this many times before. I am just trying to create the extracts using the API, because I need to create some calculated fields, which is near impossible to create using the Tableau Desktop.
At this point, I am hesitant to use JOINs in the SQL query because the resulting table would look too complicated to comprehend (some of these tables also have same field names).
When you say that I could read from multiple tables or sources, does that mean with the Tableau Extract API? At this point, I cannot find anywhere in this API that accommodates multiple sources. For example, I know that when I use multiple tables in the Tableau Desktop, there are icons on the left hand side that tells me that the extract is composed of multiple tables. This just doesn't seem to be happening with the API, which leaves me stranded. Anyways, thank you again for your replies.
Going back to the topic, this is something that I tried few days ago on my python code
try:
tdefile= tde.Extract("extract.tde")
except:
os.remove("extract.tde")
tdefile = tde.Extract("extract.tde")
tableDef = tde.TableDefinition()
# Read each column in table and set the column data types using tableDef.addColumn
# Some code goes here...
for eachTable in tableNames:
tableAdd = tdeFile.addTable(eachTable, tableDef)
# Use SQL query to retrieve bunch_of_rows from eachTable
for some_row in bunch_of_rows:
# Read each row in table, and set the values in each column position of each row
# Some code goes here...
tableAdd.insert(some_row)
some_row.close()
tdefile.close()
When I execute this code, I get the error that eachTable has to be called "Extract".
Of course, this code has its flaws, as there is no where in this code that tells how each table are being joined.
So I am little thrown off here, because it doesn't seem like I can use multiple tables unless I use JOINs to generate one table that contains everything.
I'm working on SQL Compact 3.5 (Microsoft). Is something like this possible? ::- ).
Select From DatabaseOne.SomeTable, Join DatabaseTWO.SomeOtherTable
I need to join data from 2 different databases which I have loaded in SQL Server Management Studio (from 2 different files). So I'm connected to both of them and I can switch between using one or another via the USE keyword, but I don't know how to use both databases in the same time in a JOIN.
That is not possible, you must move all data to a single SQL Compact file.