Data Fusion: GCS create creating folders not object - google-cloud-data-fusion

I am trying to create an GCS object (File) with GCS create plugin of Data Fusion.
but it is creating a folder instead.
How I can have a file created instead of a folder ??

It seems that the description of the plugin leads to a misunderstanding. Cloud Storage doesn't work like a conventional filesystem, so you cannot "strictly" create empty files. The gsutil command doesn't have an equivalent to a touch command (on Linux) and all "basic" operation in this product is limited to the cp command (upload and download files).
Therefore, since there is no file when you specify the storage url, it's expected that a folder will be created instead of a file.
Based on this, I would like to suggest you two workarounds:
If you are using this plugin to create a file as a ‘flag’, you can continue using the plugin since the created folder also serves as a flag (to trigger a Cloud Function, for example)
If you need to create a file, you can create files with the GCS plugin located in ‘Sink’ plugins group to write records to one or more files in a directory on Google Cloud Storage. Files can be written in various formats such as csv, avro, parquet, and json.

Related

Audio encoding, sample rate, and re-encoding in Google Cloud

Is it possible to lookup the audio metadata for a file stored in Google Cloud without having to download it? When building a Google Speech-to-Text API service you pass it a gs://bucket/file.flac, and I know the sox and ffmpeg bash and Python commands for locally stored files metadata lookup, but I can't seem figure out a way to lookup audio file metadata on Google Cloud Storage file.
Additionally if I have a gs://bucket/audio.wav, can I re-encode that using sox/py-sox and write the new audio.flac directly to gs://bucket/audio.flac? Or do I have to download the audio.wav to re-encode it?
Any thoughts or directions appreciated.
No, it is not possible to access the metadata you want directly in google Cloud Storage. Using the command gsutil ls -L gs://[bucket_name]/[file_name] will prompt the metadata of that file within the bucket. You can modify these metadata, but not the ones you are referring to. You will need to download the files, re-encode them and upload them again.
You cannot do that re-encoding operation in Cloud Storage, you will need to download the file and process it the way you want before uploading it again to your bucket. However, here is a workaround if it works for you:
Create a Cloud Function triggered when your file is uploaded. Then, retrieve the file that you just uploaded and perform any operation you want with it (such as re-encoding into .flac). After that, upload it again (careful! If you give the new file the same name and extension, it will overwrite the older one in the bucket).
About your library, Cloud Functions use Python 3.7, which for the time being does not support the py-sox library, so you will need to find another one.

Get file metadata as columns when importing a folder

I'm importing a folder (not a file) from Google Storage within Google Data Prep. I need to get the file name for all of the files from within the storage as a column within the finished dataset.

Packaging SF service into a single file

I am working through how to automate the build and deploy of my Service Fabric app. Currently I'm working on the package step and while it is creating files within the pkg subfolder it is always creating a folder hierarchy of files, not a true package in a single file. I would swear I've seem a .SFPKG file (or something similarly named) that has everything in one file (a zip maybe?). Is there some way to to create such a file with msbuild?
Here's the command line I'm using currently:
msbuild myservice.sfproj "/p:Configuration=Dev;Platform=AnyCPU" /t:Package /consoleloggerparameters:verbosity=minimal /maxcpucount
I'm concerned about not having a single file because it seems inefficient in sending a new package up to my clusters, and it's harder for me to manage a bunch of files on a build automation server.
I believe you read about the .sfpkg at
https://azure.microsoft.com/documentation/articles/service-fabric-get-started-with-a-local-cluster
Note that internally we do not yet support provisioning a .sfpkg file. This is a feature that will be coming in soon (date TBD). Instead, we upload each file in the application package.
Update (SF 6.1 - April 2018)
Since 6.1 it is possible to create a ZIP file (*.sfpkg) and upload it to an external store. Service Fabric executes a GET operation to download the sfpkg application package. For more infos see https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-package-apps#create-an-sfpkg
NOTE: This only works with external provisioning, the Azure image store still doesn't support sfpkg files.

Talend issue while copying local files to HDFS

Hi I want to know how to copy files to HDFS from source file system(Local File system),if source file already copied to HDFS,then how to eliminate or ignore that file to copy again in HDFS using Talend.
Thanks
Venkat
To copy files from local file system to the HDFS, you need to use tHDFSPut components if you have Talend for big data. If you use Talend for data integration you can easily use tSystem component with the right command.
To avoid duplicated files, you need to create a table in a RDBMS and keep track of all copied files. Each time the job start copying file, it should check if it already exists in the table.

Google Compute Startup Script PHP Files From Bucket

I'd like to automatically load a folder full of php files from a bucket when an instance starts up. My php files are normally located at /var/www/html
How do I write a startup script for this?
I think this would be enormously useful for people such as myself who are trying to deploy autoscaling, but don't want to have to create a new image with their php files every time they want to deploy changes. It would also be useful as a way of keeping a live backup on cloud storage.