Using VSTS.Feed() in Power BI to access odata - azure-devops

I am trying to use the VSTS.Feed() function in Power BI to read WorkItemSnapshot data. There are multiple problems. If I build the entire URL into a single string and call VSTS.Feed () with that, I get the correct information in Power BI desktop, but it will not refresh in Power BI online. I have been told to use the (undocumented) Query parameter, as shown below, but it is clear that this parameter is ignored. I can see that the select parameter is ignored on smaller projects, because all columns are returned. I can see that the filter parameter is ignored because the query fails on larger projects.
Does anyone have a working example of using the Query parameter with VSTS.Feed()?
let
BaseURL = "https://server.analytics.visualstudio.com/DefaultCollection/project/_odata/WorkItemSnapshot",
Select = "DateSK,WorkItemId,State,WorkItemType",
Filter = "WorkItemType eq Bug and State ne Closed and State ne Removed and DateSK ge 20180517 and DateSK le 20180615",
Source = VSTS.Feed(BaseURL, [Query=[select=#"Select",filter=#"Filter"]])
in
Source
Update:
With the query above, the message I get is shown below. As I said earlier, it is clearly not using the Filter parameter, and I'm assuming it is not using the Select parameter, either. I can't query everything because there is too much data, and I can't use a filter because I can't figure out a way to get the Options parameter to work. With VSTS.AccountContents, the options parameter works well, but those API endpoints don't use $ in parameter names.
Error: Query result contains 36,788,023 rows and it exceeds maximum allowed size of 300,000. Please reduce the number of records by applying additional filters
Details:
DataSourceKind=Visual Studio Team Services
ActivityId=881f7988-9863-4e03-8375-0489028f28f3
Url=https://server.analytics.visualstudio.com/DefaultCollection/Project/_odata/WorkItemSnapshot
error=Record
The query that started this whole line of questioning is simply one with a variable for a start date.
let
startDate = DateTimeZone.ToText (Date.AddDays(DateTimeZone.UtcNow(), -45), "yyyyMMdd"),
URL = "https://server.analytics.visualstudio.com/DefaultCollection/project/_odata/WorkItemSnapshot?$select=DateSK,WorkItemId,State,WorkItemType&$filter=WorkItemType eq 'Bug' and State ne 'Closed' and State ne 'Removed' and DateSK gt " & startDate,
Source = VSTS.Feed(URL)
in
Source
While this query mostly works in Power BI desktop (the select clause is ignored), the message I get when the data source is refreshed online is:
You can't schedule refresh for this dataset because one or more sources currently don't support refresh.
Discover Data Sources
Query contains unknown or unsupported data sources.
The documentation for VSTS.Feed() contradicts itself, saying both
The VSTS.Feed function has the same arguments, options and return value format as OData.Feed.
and
'VSTS.Feed' provides a subset of the Arguments and Options available through 'OData.Feed'.
To to summarize, I know that I can't combine data sources in Power BI. Does VSTS.Feed() support the options parameter? If so, how do I pass a Filter and Select clause to it?

To get WorkItemSnapshot by vsts.feed, please refer below query:
let
Source = OData.Feed("https://account.analytics.visualstudio.com/project/_odata/v1.0-preview", null, [Implementation="2.0"]),
WorkItemSnapshot_table = Source{[Name="WorkItemSnapshot",Signature="table"]}[Data]
in
WorkItemSnapshot_table
Note: the URL format should be https://account.analytics.visualstudio.com/project/_odata/v1.0-preview, or https://account.analytics.visualstudio.com/_odata/v1.0-preview.
And you can refer below documents:
Connect to VSTS using the Power BI OData feed
Connect using Power Query and Visual Studio Team Services (VSTS) functions

Related

Spark notebook can't find table that exists in synapse dedicated pool

I have a really strange issue that I'm having a difficult time trying to get to the bottom of. I'm working on a solution that allows a user to enter parameters into a pipeline that in turn calls a Notebook and passes the parameter; scala is our language of choice. I've set the parameter cell and am using string interpolation to pass the parameter which is the table name of a dataset that a researcher/analyst have created in our dedicated pool. See the following code:
val datain = "table1"
val datain:DataFrame = spark.read.sqlanalytics(s"dedicatedp1.datacha.$datain")
Which results in the following error:
com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: The specified table does not exist. Please provide a valid table.
The table does exist and this code has worked previously, so other than a security issue that I'm working with the platform team to investigate. I'm curious if the community has any other thoughts on what may be causing this issue.

Azure Data Factory check rowcount of copied records

I am designing a ADF pipeline that copies rows from a SQL table to a folder in Azure Data Lake. After that the rows in SQL should be deleted. But for this delete action takes place I want to know if the number rows that are copied are the same as the number of rows that were I selected in the beginning of the pipeline.
Is there a way to get the rowcount fo the copy action and use this in another action (like a lookup)
Edit follow up question:
Bo Xiao's answer is OK. BUt then I have a follow up question. After the copy-activity I put an If Condition with the following expression:
#activity('LookUpActivity').output.firstRow.RecordsRead == #{activity('copyActivity').output.rowsCopied
But then I get the error: #activity('LookUpActivity').output.firstRow.RecordsRead == #{activity('copyActivity').output.rowsCopied
Isn't it possible to compare output parameters of two activities to see if this is True?
extra edit: I just found an error in this piece of code. I forgot a "{" at the begin of the code. But then the code is still wrong. To compare two outputs from earlier activities the code must be:
#equals(activity('LookUpActivity').output.firstRow.RecordsRead,activity('copyActivity').output.rowsCopied)
You can find copied rows in activity output as pictured below.
And you can use the output value like this:
#activity('copyActivity').output.rowsCopied

Informatica SQ returns different result

I am trying to pull data from DB2 via informatica, I have a SQ query that pulls few fields based on joins for 4 different tables.
When I run the query directly in the database, it returns the expected result, however when I run it in informatica and run a debugger, I see something else.
Please note all the columns data perfectly match, except one single column.
Weird thing is, this is a calculated field from the table based on a case statement:
CASE WHEN Column1='3' THEN 'N' ELSE 'Y' END.
Since this is a calculated field with a length of one string, I have connected from the source to SQ from one of the sources having 1 character length.
This returns 'Y' when executed in the database, the same query when I copy paste in SQ of information and run it, I get a data 'E', and this data can never be possible as I expect only a N or a Y. I have verified the column order, that its in the right place. This is very strange, is something going wrong because of the CASE Statement?
Save yourself the hassle, put an expression transformation after tge source qualifier and calculate, port value there then forget about it
I think i got the issue. We use Informatica PowerExchange to connect to a as400 system(DB2), and it seems that when we are trying to set a flag information in AS400, and pass it to informatica via PowerExchange, it converts it to binary, and to solve this, there needs to be an entry in the PowerExchange configuration file.
Unfortunately, i myself was not aware that it could be related to PowerExchange instead of powercenter itself.!!
Thanks for your assistance! Below is the KB about it.
https://kb.informatica.com/solution/4/Pages/17498.aspx

Executing the query using bq command line in Google Big Query

I execute a query using the below Python script and the table gets populated with 2,564,691 rows. When I run the same query using Google Big Query console, it returns 17,379,353 rows (query is as-is). I was wondering whether there is some issue with the below script. Not sure whether --replace in bq query replaces the past result set instead of appending to it.
Any help would be appreciated.
dateToday = (time.strftime("%Y/%m/%d"))
dateToday1 = dateToday.replace('/','')
commandStr = "type C:\Users\query.txt | bq query --allow_large_results --replace --destination_table table:dataset1_%s -n 1" %(dateToday1)
In the Web UI you can use Query History option to navigate to respective queries.
After you locate them - you can expand respective entries and see what exactly query was executed
I am more than sure that just comparing query texts you will see source of "discrepancy" right away!
added
In Query History - not only you can see Query Text, but also all configuration properties that were used for respective query - like Write Preference for example and others. So even if query text the same you can see potential difference in configuration that will give you a clue

Openstreetmap: filter out data that have been edited after some timestamp

I want to get OSM data after some timestamp - in other words the last records after a certain timestamp. I have downloaded the osm file of the area. I went through the osmosis documentation but could not find a way to filter it by time. The result should be same as when we use the timestamp-argument. Well how to do that:
I could use the overpass but the area is large and overpass timed out many times
I could use the osmconvert-tool (cf the manual: m.m.i24.cc/osmconvert.c )
Some of the following statements might be useful for the task:
"--timestamp=<date_time> add a timestamp to the data\n"
"--timestamp=NOW-<seconds> add a timestamp in seconds before now\n"
What I have tried is the following;
./osmfilter austria-latest.osm --keep="$key=$school" |
./osmconvert - --all-to-nodes --csv="#id #lat #lon #timestamp $key name" --csv-headline |
but this fails. How to get the data out of the osm-pbf-file. Should I use the statements drop! or should i name a certain time from timestamp to timestamp!?
Since version 0.7.50 Overpass API provides a way to query for data, which changed since a given timestamp or in a given timeframe. It is even possible to restrict the change analysis to certain tags (or filter criteria). Please check the Overpass API Wiki page for more details on "diff" and "adiff" keywords.
Working with Overpass API ina way is much more convenient than trying to process a full planet history, which takes at least 35GB to download and requires more complex post-processing.
You want to process OSM history planet (extracts): https://wiki.openstreetmap.org/wiki/Planet.osm/full