How we can copy any file within Azure Data Lake Store folders - powershell

We already have Move-AzureRmDataLakeStoreItemwhich will move files between folders inside Azure datalake. What I am seeking is to copy files within the datalake without effecting the original file.
The possibilities that I know are-
using USQL to EXTRACT data from sourcefile and then OUTPUT to the destinationfile - but I am trying to copy all sort of files (.gz,.txt,.info,.exe,.msi) and I am not sure if USQL can help me with .gz or .exe or .msi files
using Data Factory to copy data from/to Data Lake store
So, my ask here is do we have anything else at our disposal with which we can perform a copy of files within Azure Data Lake Store?

You have couple of other options,
run distcp on an HDI cluster - Similar to instructions provided here. https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-wasb-distcp
use adlcopy if you are copying limited amount of data (saying 10-100's of GB) - https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-azure-storage-blob
Does this suffice please? Or do you want something natively supported by Azure Data Lake Store via its REST APIs?
Thanks,
Sachin Sheth
Program Manager, Azure Data Lake.

Related

How to download file from url and store it in aws s3 bucket?

as stated, I'm trying to download this dataset of zip folders containing images: https://data.broadinstitute.org/bbbc/BBBC006/ and store them in an s3 bucket so I can later unzip them in the bucket, reorganize them, and pull them in smaller chunks into a vm for some computation. Problem is, I don't know how to get the data from https://data.broadinstitute.org/bbbc/BBBC006/BBBC006_v1_images_z_00.zip for example or any of the other ones, to then send it s3
this is my first time using aws or really any cloud platform so please bear with me :]
Amazon EC2 provides a virtual computer just like a normal Linux or Windows computer.
Amazon S3 is a block storage service where you can upload/download files.
If you wish to copy files from a website to Amazon S3, you will need to write an application or script that will:
Download the files from the website
Upload them to Amazon S3
If you wish to do it from a script, you could use the AWS Command-Line Interface (CLI).
Or, you could do it from a programming language, see: SDKs and Programming Toolkits for AWS

Some questions about google Data fusion

I am discovering the tool and I have some questions:
-what do you exactly mean by the type File in (Source, Sink),
-is it also possible to send the result of the pipeline directly to a FTP server
I check the documentation, but I did not find this information
thank you
Short answer: File refers to the filesystem where the pipelines run. In Data Fusion context if you are using File sink the contents will be written to HDFS on Dataproc cluster.
Data Fusion has SFTP put actions that can be used to write to SFTP. Here is a simple pipeline of how to write to SFTP from GCS.
Step1: GCS Source to File Sink - This writes the content of GCS to HDFS on Dataproc when the pipeline is run
Step 2: SFTP Put action, that takes the output of File sink and upload to SFTP.
You need to configure the output path of File the same as source path in SFTP

Copy Items from one resource group to another in Azure data lake store using powershell

All I want is to copy the data from a development environment to production environment in Azure data lake store. There is not QA..
These are .CSV files the environments are nothing but different resource groups.
I tried copying the data in the similar Resource Group using the command
Move-AzureRmDataLakeStoreItem -AccountName "xyz" -path "/Product_S
ales_Data.csv" -destination "/mynewdirectory
Which worked fine, however, I want the data movement to take place between two different resource groups.
Possible solution that I have come across is by using the Export command which downloads the files in the local machine and then using the Import command and uploading them to a different resource group.
Import-AzureRmDataLakeStoreItem
Export-AzureRmDataLakeStoreItem
The reason behind using a PowerShell is to automate the process of importing the files/copying them across different environment which is nothing but automating the entire deployment process using PowerShell.
The solution mentioned above might help me in taking care of the process but I am looking for a better solution where the local machine or a VM is not required.
You do have a number of options, all of the below will be able to accomplish what you are looking to achieve. Keep in mind that you need to check the limitations of each and weigh the costs. For example, Azure functions have a limited time they can execute (default maximum of 5 minutes) and local storage limitations.
Azure Logic Apps (drag and from config)
Azure Data Factory (using the Data Lake lined service)
Azure Functions (using the Data Lake REST API)
You could use Azure Automation and PowerShell to automate your current approach.
Use ADLCopy to copy between lakes (and other stores)
Choosing which can be opinionated and subjective.

Redirecting output to a text file located on Azure Storage - Using Powershell

Using PowerShell, what is the best way of writing output to a text file located on Azure storage container? Thank you.
Simply, you can't.
While this capability exists within Azure Storage with the (relatively) new Append blob it hasn't yet filtered down to Powershell.
In order to implement this you would either need to create a new c# cmdlet that encapsulates the functionality. Or you will need to redirect the output to a standard file and then us the usual Azure storage cmdlets to upload that file to Azure Storage.

Best way to stage file from cloud storage to windows machine

I am wanting to store a data file for Quickbooks in the cloud. I understand that the data file is more of a database-in-a-file, so I know that I don't want to simply have the data file itself in a cloud directory.
When I say 'cloud', I'm meaning something like Google Drive or box.com.
What I see working is that I want to write a script (bat file, or do they have something new and improved for Windows XP, like some .net nonsense or something?)
The script would:
1) Download the latest copy of the data file from cloud storage and put it in a directory on the local machine
2) Launch Quickbooks with that data file
3) When the user exits Quickbooks, copy the data file back up into the cloud storage.
4) Rejoice.
So, my question(s)... Is there something that already does this? Is there an easily scriptable interface to work with the cloud storage options? In my ideal world, I'd be able to say 'scp google-drive://blah/blah.dat localdir' and have it copy the file down, and do the opposite after running QB. I'm guessing I'm not going to get that.
Intuit already provides a product to do this. It is called Intuit Data Protect and it backs up your Quickbooks company file to the cloud for you.
http://appcenter.intuit.com/intuitdataprotect
regards,
Jarred