Talend Cloud - how to work with files for jobs published to Talend Cloud - talend

As part of our project we have developed many etl solutions through Talend, they are extensively tested across platforms which are contained to local system - meaning they were tested by manually executing through the desktop software both the free version and subscription version. Now we are targeting to push/publish the Jobs gradually to Talend Cloud. How the Talend Cloud (management console etc...) works I have a fair bit of knowledge on that but what keeps me on doubt are the file based jobs which were developed.
Scenario: We have few jobs which are file based - access files from FTP or push files to FTP or write data to excel/delimited files. Now we couldn't read/write data directly from files that are in FTP so we need to have local copy first. Until now as we had developed/tested on desktop software this was all good but now what I am not sure is how do we handle the situation when we publish these jobs to Talend Cloud.
More specifically how do we handle/change the file paths that were until now pointing to some random C/D/E drive but now needs to be remapped to Cloud paths - how are the Talend Cloud directories defined?
Development Platform: Talend Open Studio for Data Integration Version: 6.5.1
Cloud Platform: Talend Integration Cloud Hybrid Edition Version: 6.5.1

Let say , If you are publishing already created job to Talend Cloud.
You can able to run that published Job on one Remote server in the Talend Cloud.
SO when pointing local directory for the FTP fetch,U have to provide remote server path.
Consider:- In existing job, you have c:/temp as local directory
similarly in the remote engine create on temp path c:/temp directory in order for your jobs to work with out any job modification.

Related

DnsResolutionFailure when Azure Data Factory try to access File Server after update of Self-Hosted Integration Runtime

I have ADF Linked Services to File Server(on-prem) It is accessing X drive.
Linked Service is using Self-hosted integration Runtime (VM) to access File Server.
I have pipeline which copy files from Azure Blob storage to File Server.
This has been working fine for more than years.
However this connection got broke after Integration Runtime got software update last week.
Operation on target ForEach1 failed: Activity failed because an inner activity failed; Inner
activity name: Copy data, Error: ErrorCode= DnsResolutionFailure,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException, Message=x could not be
resolved., Source=Microsoft.DataTransfer.Common,'
What could be issue?
As per my analysis below are findings:
Root cause:
As of version 5.22.8297.1, to improve security, the File System Connector will no longer support connecting to local disk, for example, the C: drive, as well as \localhost.
Action Required:
For long term solution, ADF product team recommends, customers should serve files over a remote network share instead of from the local disk of the same machine SHIR is running on.
Temporary workaround:
Downgrade to a previous version of SHIR, until you've made the required change listed above (Long Term solution). Ensure to disable auto-update until the action is taken.
The latest version that supports the above scenario is 5.22.8285.1:
https://download.microsoft.com/download/E/4/7/E4771905-1079-445B-8BF9-8A1A075D8A10/IntegrationRuntime_5.22.8285.1.msi.
Once the action is completed, please re-enable auto-update, or manually update to the latest version as soon as possible.
Here is a GitHub issue where ADF product team is actively engaging with users in regard to this issue. Feel free to add your comments/feedbacks if you have any: Integration Runtime Upgrade Breaks Sink Connections

Application Deployment in close infrastructure

our team adopted agile development style. We have desktop application which is installed on more than 5 thousands computers. These computers are in customer network. In network there are distribution points but one point is main. It means we copy binary files into main point and then there are distributed into all distribution points to install client computers.
For us it means a lot of manual work. We have own Azure DevOps server (TFS) which is not connected with customer newtork because of source code security. We can copy binary files by some shared folder but nothing else.
How we do application deployment? There are steps:
1) Copy binary files to main distribution point.
2) Create deltas by xdelta tool.
3) Copy all new files to all distribution points by robocopy.
4) When copy is done we change version in manifest file and copy again.
5) We have manually created database alter file so we upgrade database by this file.
I wanted to use Jenkins to automate these steps. Problem is that customer said he don't want to install any other software to his servers. All steps need to be done in customer network.
What devops tool should I use to automate these steps by pipeline? When we copy to distribution points it is parallel as same as database deployment because there are more than 70 database instances.
It is not about one application. We have more application which we would like to deploy more effective.
Thank you.
SOLUTION: I solved this problem by using MSDeploy tool. I wrote own small application which can read simple XML configuration tool and launches MSDeploy through MSDeploy API. DacPac deployment is solved by SqlPackage.exe. So I can deploy whole application with all references and dependent parts.
I think you need a configuration managment tool to roll out desktop software Windows clients.
Microsoft inhouse solution is this: https://learn.microsoft.com/en-us/configmgr/

Auto upload remote files into Google cloud storage via FTP?

I download a lot of csv files via ftp from different sources on a daily basis. I then upload these files into Google Cloud Storage.
Are there any programs/api/tools to automate this?
Looking for a best way, if possible, to load these files directly into Google Cloud Storage without having to locally download them. Something that I can deploy on Google Compute, so I don't need to run a local programs like Filezilla/CrossFTP. The program/tool will keep checking the remote location on a regular basis and load new files into Google Cloud Storage; ensuring a checksum match.
I apologize in advance if this is too vague/generic question.
Sorry, no. Automatically importing objects from a remote FTP server is not currently a feature of GCS.

.rss file to deploy report

Is there a way to deploy .rdl files in structured folders that are not already on a report server? The goal is to deploy the structure from windows explorer to the target report server, which has the same structure.
I recently read this article and code, which deploys from server to server.
https://azuresql.codeplex.com/releases/view/115207
We are trying to create a build environment where the deployment of rdl files comes directly from our source control, and would like to use a script that has been as widely used as the one in the link provided.
Thanks for your time,
In the SQL Server install folder there is an executable called rs.exe. You can use this, passing it an .RSS file that contains the configuration that you want to deploy. You can then bulk deploy from your folder to Reporting Services.
RS.exe
If you have the source .rdl files in TFS you can use Team Build to process and create the .RSS and likely a zip package. You can then have it deployed by Release Management for Visual Studio 2013 through a specified release pipeline that pushed it from Dev->QA->Prod.
You can get information on how from Professional Application Lifecycle Management with Visual Studio 2013.

Azure - SSMS - PowerShell

I am working through my first Azure HDInsight tutorial. Can I do this without installing Azure Remote PowerShell on my local computer?
Can I use SSMS (2008R2) to run the PowerShell? My first attempt at that led me down the path of using a Database in Azure, but I do not think that is what I want to do (the tutorial describes setting Storage (not a Database) and then an HDInsight instance to interact with that Storage).
I am doing this tutorial: http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/
Thank you.
While you can use SQL Server and HDInsight together as part of a full pipeline, for the purposes of the getting started tutorial you want to think of them as two very different things.
The Storage referred to, is a standard Windows Azure Storage account, based on blobs. These then form a backing file system for the HDInsight cluster.
As far as using PowerShell goes, it is definitely the best, and easiest way to submit jobs to an HDInsight cluster. I would also recommend using a regular PowerShell console, or the PowerShell ISE to work with HDInsight as well, rather than the one available through SSMS, since the SSMS version won't load all the Azure modules by default.
There are other ways to submit jobs if PowerShell is not your thing (if you are on OS X or Linux for instance). You can use the REST API provided by WebHCAT (documentation). If you're on Windows, and prefer C# to PowerShell, you can also use the Windows Azure HDInsight Management Client from the Microsoft Hadoop SDK to submit jobs (available on codeplex and nuget). These will need you to break out Visual Studio and write a short console program to submit your job, so may be a bit heavy unless you're doing full on C# streaming Map Reduce, and so are already there.
If you're after a GUI based approach to job submission to HDInsight, you're out of luck at the moment, but your might like to check out what my team is working on at Red Gate, which will help you with submitting Hive and Pig jobs.