Azure Data Factory V2: MDX Query on SAP BW exception Microsoft.DataTransfer.Common.Shared.HybridDeliveryException - azure-data-factory

I am trying to fetch data from SAP BW system into Azure Data Lake using MDX Query in SAP BW connector. But I am getting exception message in Azure is following :
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column '[Measures].[SomeMeasure]' contains an invalid value '4.000-2'. Cannot convert '4.000-2' to type 'Decimal'.,Source=Microsoft.DataTransfer.Common,''Type=System.InvalidCastException,Message=Specified cast is not valid.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "Copy1"
}
From the error, I can understand there are some values in Measures which is not actually numeric value. Changing or correcting value in the SAP system is not in my scope.
Is there any option in the Data Factory V2 for SAP BW connection so that I can define the data type of the Measure for input and/or output. or there is any fine-tuning in the MDX query so that I can fetch the data without any error?
This is my MDX Query :
SELECT
{[Measures].[SomeMeasure]} ON COLUMNS,
NON EMPTY
{ [0COMP_CODE].[LEVEL01].MEMBERS *
[0COSTELMNT].[LEVEL01].MEMBERS }
ON ROWS
FROM SomeQube
WHERE {[0FISCPER].[K42015008]}

If you could skip that incorrect rows, you could set enableSkipIncompatibleRow in as true. Please reference this doc.

Your filter is incorrect. SAP 0FISCPER dimension is in YYYYMMM format. You need to enter WHERE {[0FISCPER].[2015008]}. I am almost certain the "K4" you entered there is your fiscal year variant, which you should not put there. Objects like this K4 are called compounding infoobject (which your fiscal year/period belongs to if you envision it as hierarchical parent/child structure). But in this specific case, you don't need to specify fiscal year variant. Just remove K4.
I highly advise against skipping "incorrect rows". My guidance to you is that you should always fail the job whenever exception happens and investigate the data quality issue. Do not ever put enterprise data integrity at risk. SAP data has to be 100% accurate, not 99.9999%.

Related

IBM Datastage assumes a column is WVARCHAR while it's date

I'm doing an ETL for a job. For the data source stage, I input custom select statement. In the output tab of data source stage, I defined the INCEPTION column data type is Timestamp. The right data type for INCEPTION is date. I check it via DBEAVER. But somehow the IBM Datastage assumes that it is WVARCHAR. It says ODBC_Connector_0: Schema reconciliation detected a type mismatch for field INCEPTION. When moving data from field type WVARCHAR(min=0,max=10) into DATETIME(fraction=6), data corruption can occur (CC_DBSchemaRules::reportTypeMismatch, file CC_DBSchemaRules.cpp, line 1,838). I don't know why it is, since from the database shows that INCEPTION is definitely a Date column. I don't know how to fix this since I don't think I'm making mistake. What did I do wrong and how to fix this?
Where did DataStage get its table definition? DataStage is a computer program; it can't "decide" anything. If you import the table definition from the source, what data type is INCEPTION ? If it is either Date or Timestamp, load that table definition into your DataStage job. Otherwise explicitly convert the string using StringToTimestamp() function in a Transformer stage.

How to automate the execution process of data quality rules?

One of our clients has a requirement to build/develop data quality rules using hiveQL.
E.g, Replace NULL values, Change date format in YYYY-MM-DD, Standardize amount column values in US & EU format, etc.
Problem Statement:
I have the set of data quality rules in one hive table(dq_rules), want to execute each rule one by one and store the errors(the data issues such as null column, incorrect date format column) in another hive table(dq_logging) for reporting/logging purpose.
Please Suggest me solution by keeping one thing in mind that, I want to make this solution generic and executable for any hive table/columns(It means it should be parameterized).
Restriction: I cannot use existing Data Quality tools. I need to complete it using a hive only(Restriction is given by Client).
Schema for Tables:
dq_rules => Validation Rule ID,Rule Category,DQ Dimension,Rule Description Date Added,Date Retired
dq_logging => Error_ID,Source_Name,Erroneous_Source_Fields,Source_File_Record,Validation Rule ID
If anyone has a solution related to writing shell/python script that will also work for me. I just need to make it end to end process.

Using VSTS.Feed() in Power BI to access odata

I am trying to use the VSTS.Feed() function in Power BI to read WorkItemSnapshot data. There are multiple problems. If I build the entire URL into a single string and call VSTS.Feed () with that, I get the correct information in Power BI desktop, but it will not refresh in Power BI online. I have been told to use the (undocumented) Query parameter, as shown below, but it is clear that this parameter is ignored. I can see that the select parameter is ignored on smaller projects, because all columns are returned. I can see that the filter parameter is ignored because the query fails on larger projects.
Does anyone have a working example of using the Query parameter with VSTS.Feed()?
let
BaseURL = "https://server.analytics.visualstudio.com/DefaultCollection/project/_odata/WorkItemSnapshot",
Select = "DateSK,WorkItemId,State,WorkItemType",
Filter = "WorkItemType eq Bug and State ne Closed and State ne Removed and DateSK ge 20180517 and DateSK le 20180615",
Source = VSTS.Feed(BaseURL, [Query=[select=#"Select",filter=#"Filter"]])
in
Source
Update:
With the query above, the message I get is shown below. As I said earlier, it is clearly not using the Filter parameter, and I'm assuming it is not using the Select parameter, either. I can't query everything because there is too much data, and I can't use a filter because I can't figure out a way to get the Options parameter to work. With VSTS.AccountContents, the options parameter works well, but those API endpoints don't use $ in parameter names.
Error: Query result contains 36,788,023 rows and it exceeds maximum allowed size of 300,000. Please reduce the number of records by applying additional filters
Details:
DataSourceKind=Visual Studio Team Services
ActivityId=881f7988-9863-4e03-8375-0489028f28f3
Url=https://server.analytics.visualstudio.com/DefaultCollection/Project/_odata/WorkItemSnapshot
error=Record
The query that started this whole line of questioning is simply one with a variable for a start date.
let
startDate = DateTimeZone.ToText (Date.AddDays(DateTimeZone.UtcNow(), -45), "yyyyMMdd"),
URL = "https://server.analytics.visualstudio.com/DefaultCollection/project/_odata/WorkItemSnapshot?$select=DateSK,WorkItemId,State,WorkItemType&$filter=WorkItemType eq 'Bug' and State ne 'Closed' and State ne 'Removed' and DateSK gt " & startDate,
Source = VSTS.Feed(URL)
in
Source
While this query mostly works in Power BI desktop (the select clause is ignored), the message I get when the data source is refreshed online is:
You can't schedule refresh for this dataset because one or more sources currently don't support refresh.
Discover Data Sources
Query contains unknown or unsupported data sources.
The documentation for VSTS.Feed() contradicts itself, saying both
The VSTS.Feed function has the same arguments, options and return value format as OData.Feed.
and
'VSTS.Feed' provides a subset of the Arguments and Options available through 'OData.Feed'.
To to summarize, I know that I can't combine data sources in Power BI. Does VSTS.Feed() support the options parameter? If so, how do I pass a Filter and Select clause to it?
To get WorkItemSnapshot by vsts.feed, please refer below query:
let
Source = OData.Feed("https://account.analytics.visualstudio.com/project/_odata/v1.0-preview", null, [Implementation="2.0"]),
WorkItemSnapshot_table = Source{[Name="WorkItemSnapshot",Signature="table"]}[Data]
in
WorkItemSnapshot_table
Note: the URL format should be https://account.analytics.visualstudio.com/project/_odata/v1.0-preview, or https://account.analytics.visualstudio.com/_odata/v1.0-preview.
And you can refer below documents:
Connect to VSTS using the Power BI OData feed
Connect using Power Query and Visual Studio Team Services (VSTS) functions

Issue with a numeric field in SSIS dtsx package

I've an SSIS dtsx package which is used to load data from a remote MAS db server using a DSN based connection. We load data from many tables into their replica tables in SQL-Server. Everything was working fine until we made some changes to a table in MAS. The dtsx has been failing with the following error:
Error: 0xC02090F8 at Data Flow Task, Import Data, DataReader Source
[28866]: The value was too large to fit in the output column
"UDF_TREAD_DEPTH" (29160).
Actually I believe it might be related to a single table field "UDF_TREAD_DEPTH" which is a decimal field. This field is shown in the DataReader source as "numeric [DT_NUMERIC]" with Length:0, Precision:4 & Scale:2.
In past we had simple data in format xx.xx. And now I see after the issue that we have data like xx.xx, xxx, .. however, still the data type didn't change after I refreshed the Data Reader source.
I believe the "Precision shud be updated to 5" for the data we have
based on this description.
I'm unable to change the data type as visible in the attached screen (Data Source Output column.png). When I debug this dtsx package, it errs while loading the Data Reader Source. If I'm nailing it right - how can I fix it. If there're any other possibilities then kindly let me know.
Have you tried to edit the source with the advanced editor? (Right click and select "Show Advanced Editor...") You can navigate to the Input and output parameters section (generally the last tab), go into the output columns section (for OLE DB, click the + next to OLE DB Source Output, then the plus next to Output Columns, then highlight the column name you want to change) and change the properties of the column in question (look for Data Type Properties and change Precision and scale as needed.). If you are not able to do that, you can try deleting the source and replacing it with a new source to the same data (ie the recreation of this object will requery the connection for column properties).
I got the data to be updated with the xxx.xx mask so 100 became 100.00. And this helped the DataReader in SSIS infer the type correctly.
In addition to it I also found another easy way of doing so which didn't require support of any cast / convert function -
UDF_TREAD_DEPTH * 1.00 as UDF_TREAD_DEPTH
This also allowed the DataReader to infer the type (i.e. precision & scale) correctly.

Joing several MDX query results in a single report

I use MS SQL Server 2008 R2.
I've got the problem, please, excuse the long explanation.
We've got the SSAS cube. It is under development at this time, but it is partially working and can be accessed through excel.
There are projects: hierarchycal parent-child dimension
There are resources assigned to the project (e.g. man-hours, building materials, technic): dimension with resource types, fact M2M table ProjectId-ResourceId-UnitsCount-Cost
There are milestones for the projects: dimension with milestone types (few are defined), M2M fact table: ProjectId-MilestoneId-...milestone dates: planned/actual start/finish
This is a simplified schema.
I need to create a MS Reporting Services report with the following columns:
Project Hierarchy
several columnts with the pre-defined and "hardcoded" resource type amount. e.g the business wants to see the columnt with man-hours spent, and concrete consumption in cub-meters. Thse two clauses can be hardcoded in the query.
several columns with the pre-defined and "hardcoded" milestone type dates
this is a simplified schema too, more columns with other dimension slices are needed...
The problem is that i cannot find an elegant way to create this report.
in my current version, i have to create 2 datasets and query the resouce and milestone data in separate mdx queries.
then i need to use RS-lookup function to join the data in report outcome.
Please acvise:
is there a possibility to query this data in an single mdx query. when i try something like this:
union({{[Dim Resource].[Measure].[man-hour]} + {[Dim Resource].[Measure].[cub-meter]}},
{[Dim Milestone].[Milestone Type].[ProjectStart]}) i've got "different dimensionality" error. Any workarounds?
if i need to output a formatted value like: "X 'man-hour' / Y 'cub-meter'", i have to use lookup func to get both parts of the formula - any better way?
can i query this data any other way?
Please, indicate the direction of googling
or... should i just query the data from the source tables (this is allowed by security restrictions) with SQL
thank you in advance
Perhaps create a new 'virtual cube' to contain data from both of your existing cubes, then query that one.