Data Comparison between SAP Hana and SQL Server - hash

I am working on a solution to compare Datasets from SAP HANA and Azure SQL Server to check consistency of data on SQL server.
Instead of getting all the fields from HANA and doing an "except",
I was thinking of evaluating and comparing a Checksum or Hashbytes on both systems.
However, the Hashvalues for same data is not matching.
Hash Values on SAP HANA
SELECT HASH_MD5(MANDT), HASH_SHA256(MANDT) from SLT_DECO100.MSKU where CHARG = 'UK2031RP' and WERKS = 'U72D'
0x25DAAD3D9E60B45043A70C4AB7D3B1C6
0x47DC540C94CEB704A23875C11273E16BB0B8A87AED84DE911F2133568115F254
Hash Values on SQL Server
select HASHBYTES('MD5', MANDT), HASHBYTES('SHA2_256', MANDT)
from consolidation.MSKU where CHARG = 'UK2031RP' and WERKS = 'U72D'
0xA4DC01E53D7E318EAB12E235758CFDC5
0x04BC92299F034949057D131F2290667DE4F97E016262874BA9703B1A72AE712A
Need support to understand and perform comparison

The hash values could be different based on algorithms what we are using.
Here in the below link comparing the data from two different environments of same tables by providing the pipe delimiters in query.
The pipe delimiters will separates the data from column to column then it gives the accurate results.
Check here for Compare Records Using Hash Values.
Note: More information for the below text in Microsoft Docs,
Algorithms (MD2, MD4, MD5, SHA & SHA1) are deprecated starting with SQL Server 2016 (13.x).
Use SHA2_256 or SHA2_512 instead. Older algorithms will continue working, but they will raise a deprecation event.

Related

nvarchar(), MD5, and matching data in Bigquery

I am consolidating a couple of datasets in BQ, and in order to do that i need to run MD5 on some data.
The problem I'm having is a chunk of the data is coming MD5'ed already from Azure, and the original field is nvarchar()
I'm not as familiar with Azure, but what I find is that:
HASHBYTES('MD5',CAST('6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5' as nvarchar(max)))
returns
0xCD571D6ADB918DC1AD009FFC3786C3BC (which is expected value)
where
HASHBYTES('MD5','6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5')
returns
0x94B04255CD3F8FEC2B3058385F96822A which is equivalent to what i get if i run
MD5(TO_HEX(MD5('6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5'))) in Bigquery, but it is unfortunately not what i need to match to, i need to match to the nvarchar version in BQ but i cannot figure out how to do that.
Figured out the problem, posting here for posterity.
The field in Azure is being stored as a nvarchar(50) which is encoded as UTF-16LE

Converting data types of Foreign Keys to use Joiner in Google Cloud Data Fusion Pipeline

I am building a pipeline that connects to an on-prem Oracle DB using the Database Plugin, queries two tables (table_a, table_b), and joins those tables using Joiner Plugin, before uploading to a BigQuery table.
The problem I have now is that the Foreign Keys to join table_a and table_b have different data types when I use Get Schema in the Database Plugin. In Joiner, I am joining the tables on table_a.customer_id = table_b.customer_id.
The dtype of table_a.customer_id is LONG but table_b.customer_id is DOUBLE. In the source Oracle DB, both columns are actually integers. For some reason, though, using Get Schema thinks they are LONG and DOUBLE.
I am obviously getting an error in Joiner trying to join on a foreign keys with different data types.
Is there a way to cast/convert the columns from the tables to match so that I can use Joiner?
I've seen some examples using Wrangler Transform to parse dates, but I don't see anything to convert to any other data types. I couldn't find any directive examples either: https://github.com/data-integrations/wrangler.
You can transform your data before joining them by using any of the transform plugins that Cloud Data Fusion offers. As #muscat mentioned, you can use Wrangler transform and utilize the Set type directives, or you can use the Projection transform and configure the Convert field.

How to convert ABS(HASH(...)) from Legacy sql to Standard SQL

In Legacy sql, we can do SELECT ABS(HASH('12345')) to get unique hash number of a value.
I am in process of converting legacy sql to standard sql in GBQ,
so wondering whats the best way to convert above function so that it gives me same value back as legacy sql.
We won't expose a function that returns the same values as in legacy SQL; it uses an undocumented implementation. The closest equivalent when using standard SQL is FARM_FINGERPRINT, which uses the open-source FarmHash library.
For the expression that you provided, you would instead use ABS(FARM_FINGERPRINT('12345')).

Convert Oracle to T-SQL

Сan you help me to convert this Oracle rule to T-SQL.
SELECT CAST(SYS_CONTEXT('CLIENTCONTEXT', 'AccessSubject') AS NVARCHAR2(255)) AS AccessSubjectCode FROM DUAL
I would like to know how this will be in T-SQL : SYS_CONTEXT()
The SYS_CONTEXT('CLIENTCONTEXT', 'AccessSubject') call returns a value configured by the client application. See https://docs.oracle.com/cd/B28359_01/network.111/b28531/app_context.htm#DBSEG98209 for details.
The most similar feature in SQL Server is the CONTEXT_INFO() function. See https://msdn.microsoft.com/en-us/library/ms180125.aspx. However, in SQL Server the context can store just a single value, of maximum 128 bytes (as opposed to Oracle, where there are multiple contexts and you can store multiple named values in each context).

Tableau Extract API with multiple tables in a database

I am currently experimenting with Tableau Extract API to generate some TDE from the tables I have in a PostgreSQL database. I was able to write a code to generate the TDE from single table, but I would like to do this for multiple joined tables. To be more specific, if I have two tables that are inner joined by some field, how would I generate the TDE for this?
I can see that if I am working with small number of tables, I could use a SQL query with JOIN clauses to create a one gigantic table, and generate the TDE from that table.
>> SELECT * FROM table_1 INNER JOIN table_2
INTO new_table_1
ON table_1.id_1 = table_2.id_2;
>> SELECT * FROM new_table_1 INNER JOIN TABLE_3
INTO new_table_2
ON new_table_1.id_1 = table_3.id_3
and then generate the TDE from new_table_2.
However, I have some tables that have over 40 different fields, so this could get messy.
Is this even a possibility with current version of the API?
You can read from as many tables or other sources as you want. Or use complex query with lots of joins, or create a view and read from that. Usually, creating a view is helpful when you have a complex query joining many tables.
The data extract API is totally agnostic about how or where you get the data to feed it -- the whole point is to allow you to grab data from unusual sources that don't have pre-built drivers for Tableau.
Since Tableau has a Postgres driver and can read from it directly, you don't need to write a program with the data extract API at all. You can define your extract with Tableau Desktop. If you need to schedule automated refreshes of the extract, you can use Tableau Server or its tabcmd command.
Many thanks for your replies. I am aware that I could use Tableau Desktop to define my extract. In fact, I have done this many times before. I am just trying to create the extracts using the API, because I need to create some calculated fields, which is near impossible to create using the Tableau Desktop.
At this point, I am hesitant to use JOINs in the SQL query because the resulting table would look too complicated to comprehend (some of these tables also have same field names).
When you say that I could read from multiple tables or sources, does that mean with the Tableau Extract API? At this point, I cannot find anywhere in this API that accommodates multiple sources. For example, I know that when I use multiple tables in the Tableau Desktop, there are icons on the left hand side that tells me that the extract is composed of multiple tables. This just doesn't seem to be happening with the API, which leaves me stranded. Anyways, thank you again for your replies.
Going back to the topic, this is something that I tried few days ago on my python code
try:
tdefile= tde.Extract("extract.tde")
except:
os.remove("extract.tde")
tdefile = tde.Extract("extract.tde")
tableDef = tde.TableDefinition()
# Read each column in table and set the column data types using tableDef.addColumn
# Some code goes here...
for eachTable in tableNames:
tableAdd = tdeFile.addTable(eachTable, tableDef)
# Use SQL query to retrieve bunch_of_rows from eachTable
for some_row in bunch_of_rows:
# Read each row in table, and set the values in each column position of each row
# Some code goes here...
tableAdd.insert(some_row)
some_row.close()
tdefile.close()
When I execute this code, I get the error that eachTable has to be called "Extract".
Of course, this code has its flaws, as there is no where in this code that tells how each table are being joined.
So I am little thrown off here, because it doesn't seem like I can use multiple tables unless I use JOINs to generate one table that contains everything.