Azure data Factory and power shell - powershell

enter image description here
Hi Stack team,
actually i plan to migrate my SQL server2012 databases to Azure data warehouse with Azure data factory approach...
But problems are,
1) my database size is 4.5 TB
2)in this approach there are 3 methods. those method details i mentioned in the attached image.. my problem is i planed to 3rd method for migrate(3.Using Azure Data Factory and PowerShell (entire database - ADF))
so please tell me links related above method and its possible for migration or not. if its possiable send me how to do...

Please refers to the link below, there's an UX solution in Azure Data Factory which are just target for migration from SQL Server to Azure SQL Datawarehouse. It's a wizard based UX which are very easy to follow. It will automatically create tables in Azure SQL Datawarehouse and migrate the data using polybase, which is the most efficient way to load data into SQLDW.
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-with-data-factory

Related

Cannot select sink for Oracle linked service from data flow

I have a linked service that connects to an on-prem Oracle database through a self hosted integration runtime.
I am able to access this from a data factory pipeline - the dataset that uses the linked service is called pc_payinitil.
However, the pc_payinitial dataset is not available to select from the sink tab of a sink shape of a data flow:
Am I trying to do something that's not possible - or did I just go wrong somewhere?
Currently, the Oracle dataset is not supported in mapping data flow in the Azure data factory. It is only supported in copy activity source/sink and lookup activity.
There are only limited datasets that are supported in mapping data flow as of now. You can go through this MS document for more information on supported data stores in the Azure data factory.

How to implement scd2 in snowflake tables using Azure Data Factory

I want to implement the scd2 in the snowflake tables. My source and target tables are present in snowflake only. The entire process has to be done using Azure Data Factory.
I went through the documentation given by azure for implementing the scd2 using data flows but when I tried to create a dataset for snowflake connection its showing as disabled.
Is there any way or any documentation where I can see the steps to create SCD2 in adf with snowflake tables.
Thanks
vipendra
SCD2 in ADF can be built and managed graphically via data flows. The Snowflake connector for ADF today does not work directly with data flows, yet. So for now, you will need to use the Copy Activity in an ADF pipeline and stage the dimension data in Blob or ADLS, then build your SCD2 logic in data flows using the staged data.
Your pipeline will look something like this:
[Copy Activity Snowflake-to-Blob] -> [Data Flow SCD2 logic Blob-to-Blob] -> [Copy Activity Blob-to-Snowkflake]
We are working on direct connectivity to Snowflake from data flows and hope to land that soon.
If your source and target tables are both in Snowflake, you could use Snowflake Streams to do this. There's a blog post covering this in more detail at https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-1
However, in short, if you have a source table source, you can put a stream on it like so:
create or replace stream source_changes on table source;
This will capture all the changes that are made to the source table. You can then build a view on that stream that establishes how you want to feed those changes into the SCD table. (The blog post uses case statements to put start and end dates in on each row in the view).
From there, you can use a Snowflake Task to automate the process of loading from the stream into the SCD only when the Stream actually has changes.

Do I need a storage (of some sort) when pulling data in Azure Data factory

*Data newbie here *
Currently, to run analytics report on data pulled from Dynamics 365, I use Power BI.
Issue with this is, Power BI is quite slow processing large data. I carry out a number of transform steps (e.g. Merge, Join, deleting or renaming columns, etc). So, when I try to run a query in Power BI with said steps, it takes a long time to complete.
So, as a solution, I decided to make use of Azure Data Factory(ADF). The plan is to use ADF to pull the data from CRM (i.e. Dynamics 365), perform transformations and publish the data. Then I'll use Power BI for visual analytics.
My question is:
What azure service will I need in addition to Data Factory? Will I need to store the data I pulled from CRM somewhere - like Azure Data Lake or Blob storage? Or can I do the transformation on the fly, right after the data is ingested?
Initially, I thought I could use the 'copy' activity to ingest data from CRM and start playing with the data. But using the copy activity, I needed to provide a sink (destination for the data. Which has to be a storage of some sort).
I also thought, I could make use of the 'lookup' activity. I tried to use it, but getting errors (no exception message is produced).
I have scoured the internet for a similar process (i.e. Dynamics 365 -> Data Factory -> Power BI), but I've not been able to find any.
Most of the processes I've seen however, utilises some sort of data storage right after data ingest.
All response welcome. Even if you believe I am going about this the wrong way.
Thanks.
Few things here:
The copy activity just moves data from a source, to a sink. It doesnt modify it on the fly.
The lookup activity is just to look for some atributes to use later on the same pipeline.
ADF cannot publish a dataset to power bi (although it may be able to push to a streaming dataset).
You approach is correct, but you need that last step of transforming the data. You have a lot of options here, but since you are already familiar with Power Bi you can use the Wrangling Dataflows, which allows you to take a file from the datalake, apply some power query and save a new file in the lake. You can also use Mapping Dataflows, databricks, or any other data transformation tool.
Lastly, you can pull files from a data lake with Power Bi to make your report with the data on this new file.
Of course, as always in Azure there are a lot of ways to solve problems or architect services, this is the one I consider simpler for you.
Hope this helped!

Connect ADF to ServiceNow URL

I am fairly new to Azure, but I have been doing ETL for quite some time now. I want to connect ADF to ServiceNow to bring in lists to our SQL data warehouse. Does anyone have any good articles or what the settings are on how to achieve this?
You could use copy activity in ADF which supports Service Now as input source and Azure Synapse Analytics(formerly Azure SQL Data Warehouse) as output sink.
Since you are new to ADF,based on above tutorials,i'm afraid that there are 3 elements you should get know when you execute copy activity.
1.Linked Service:https://learn.microsoft.com/en-us/azure/data-factory/concepts-linked-services
2.Dataset:https://learn.microsoft.com/en-us/azure/data-factory/concepts-datasets-linked-services
3.Pipeline:https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
If you want to execute pipeline in the schedule, you also could add time trigger onto specific pipeline.

Parallelisms in Azure Data factory v2 copy activity

We are implementing solution to achieve similar functionality as of ssis packages to copy data from one database to another (on-premise to azure SQL). In SSIS we have option to setup parallel processing in different ways. We can also transfer data in chunks.
Similarly, which is the best way to achieve parallelisms in Azure Data Factory version 2? Please consider scenario of transferring data for only 1 table.
Have a look at the Copy Activity Performance and Tuning Guide for ways to optimize transferring data into the Cloud with ADF: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance