Corrupted file from Google drive API - rest

I wish to use google drive api as a backup solution. So, I have a zipped folder that is uploaded with curl to a service account using updloadType=resumable.
The zipped (tar) file is a ~30mb folder and seems to upload to drive api without errors.
My issue is that I cant unzip the file (seems corrupted) after downloading it from my service account. when I try to untar I have :
gzip: stdin: not in gzip format.
tar: Child returned status 1.
tar: Error is not recoverable: exiting now.
It already worked back in september. The file that i've already been able to download and extract back then aren't working anymore.
I know the downloaded file come the same size as the original file. (not empty)
and I specify X-Upload-Content-Type : application/x-gtar in my header.
Thanks

Related

Unable read file from partitioned directory

I am unable to read file from partitioned directory in DBFS
But the other files are read easily in the normal scenarios
Am I missing something? Any alternative?
Failed
Screengrab for successful run
Successful
Please change the path in the failed scenario to /dbfs/<path> instead of dbfs:/

Pull from and Push to S3 using Perl

everyone! I have what I assume to be a simple problem, but I could use a hand digging in. I have a server that preprocesses data before translation. This is done by a series of perl scripts developed over a decade ago (but they work!). This virtual server is being lifted into AWS. The change this makes for my scripts is that the location they pull from and the location they write to will be S3 buckets now.
The work flow is: copy all files in the source location to the local drive, preprocess the data file by file, and when complete move the preprocessed files to a final destination.
process_file ($workingDir, $dirEntry);
final_move;
move("$downloadDir/$dirEntry", "$archiveDir") or die "ERROR: Archive file $downloadDir/$dirEntry -> $archiveDir FAILED $!\n";
unlink("$workingDir/$dirEntry");
So, in this case $dir and $archiveDir are S3 buckets.
Any advice on adapting this is appreciated.
TIA,
VtR
You have a few options.
Use a system like s3fs-fuse to mount your S3 bucket as a local drive. This would presumably require the smallest changes to your existing code.
Use the AWS Command Line Interface to copy your files to your S3 bucket.
Use the Amazon API (through something like Paws) to upload your files to S3.

SFTP file uploading and downloading at same time

A cronjob runs every 3 hours to download a file using SFTP. The scheduled program is written in Perl and the module used is Net::SFTP::Foreign.
Can the Net::SFTP::Foreign download files that are only partially uploaded using SFTP?
If so, do we need to check the SFTP file modified date to check copy process completion?
Suppose a new file is uploading by someone in SFTP and he file upload/copy is in progress. If a download is attempted at the same time, do I need to code for the possibility of fetching only part of a file?
It's not a question of the SFTP client you use, that's irrelevant. It's how the SFTP server handles the situation.
Some SFTP servers may lock the file being uploaded, preventing you from accessing it, while it is still being uploaded. But most SFTP servers, particularly the common OpenSSH SFTP server, won't lock the file.
There's no generic solution to this problem. Checking for timestamp or size changes may work for you, but it's hardly reliable.
There are some common workarounds to the problem:
Have the uploader upload "done" file once upload finishes. Make your program wait for the "done" file to appear.
You can have dedicated "upload" folder and have the uploader (atomically) move the uploaded file to "done" folder. Make your program look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the uploader (atomically) rename the file after upload to its final name. Make your program ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for example of implementing this approach.
Also, some FTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive.
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, when you download an incomplete file.
For details, see my answer to SFTP file lock mechanism.
The easiest way to do that when the upload process is also under your control, is to upload files using temporal names (for instance, foo-20170809.tgz.temp) and once the upload finishes, rename then (Net::SFTP::Foreign::put method supports the atomic option which does just that). Then on the download side, filter out the files with names corresponding to temporal files.
Anyway, Net::SFTP::Foreign get and rget methods can be instructed to resume a transfer passing the option resume => 1.
Also, if you have full SSH access to the SFTP server, you could check if some other process is still writing to the file to be downloaded using fuser or some similar tool (though, note that even then, the file may be incomplete if for instance there is some network issue and the uploader needs to reconnect before resuming the transfer).
You can check the size of the file.
Connect to SFTP.
Check file size.
Sleep for 5/10 seconds.
Check file size again.
If size did not change, download the file, if the size changed do step 3.

On resume gsutil seems to re-upload files

I'm trying to upload data to Google Cloud Storage from a disk with ~3000 files totalling 1TB. I'm using gsutil cp -R <disk-top-directory> <bucket>. My understanding is that, if gsutil is resumed/restarted, it uses checksums to determine when a file has already been uploaded and skips over it.
It doesn't appear to be doing this: it appears to be resuming the upload from the top and replacing the files all over again. When I run successive gsutil ls -Rl <bucket/disk-top-directory> ten minutes apart and compare them with diff, I see what appears to be the same files with the same sizes but a changed (newer) date. (i.e. consistent with the same file being re-uploaded.)
For example:
< 404104811 2014-04-08T14:13:44Z gs://my-bucket/disk-top-directory/dir1/dir2/dir3/dir4/dir5/file-20.tsv.bz2
---
> 404104811 2014-04-08T14:43:48Z gs://my-bucket/disk-top-directory/dir1/dir2/dir3/dir4/dir5/file-20.tsv.bz2
The machine I'm using to read the disk and transfer files is running Ubuntu 13.10. I installed gsutil using the pip instructions for Debian and Ubuntu.
Am I misunderstanding how gsutil's resumable transfers is supposed to work? If not, any diagnosis and fix to get the correct resume behavior? Thanks in advance!
You need to use the -n (No-clobber) switch to prevent the re-uploading of objects that already exist at the destination.
gsutil cp -Rn <disk-top-directory> <bucket>
From the help (gsutil help cp)
-n No-clobber. When specified, existing files or objects at the
destination will not be overwritten. Any items that are skipped
by this option will be reported as being skipped. This option
will perform an additional HEAD request to check if an item
exists before attempting to upload the data. This will save
retransmitting data, but the additional HTTP requests may make
small object transfers slower and more expensive.
Also according to this, when transfering files over 2MB, gsutil automatically uses a resumable transfer mode.
If you're open to working with the (still beta) gsutil v4, that version of gsutil has an rsync command. You can get this by running:
gsutil update gs://prerelease/gsutil_4.0beta2pre_minus_m_sugg.tar.gz
Please be sure to read the release notes before switching to this major new release, especially if you're using gsutil v3 in scripts.

Windows Service ran by domain account cannot access file while full control

I have created a C# service that:
- Picks up and opens a local text file
- Opens an Excel-file used as template (saved locally)
- Fills in the data from the text file in the excel file
- Saves the Excel file to a network folder.
The service runs using a domain account (I cannot give the local system account rights on the network from our network admin...). When the service tries to open the template, I get an access denied error:
Microsoft Excel cannot access the file 'C:\BloxVacation\Template\BloxTemplate.xlsm'. There are several possible reasons:
• The file name or path does not exist.
• The file is being used by another program.
• The workbook you are trying to save has the same name as a currently open workbook.
The file does exist and the path is correct.
The file is not used by another user or program.
I try to OPEN the workbook (no other workbook is open), not SAVE it.
I have received the same error using the system account. The reason for this is that, when using interopservices, the system account needs a desktop folder (bug in Windows 7: http://forums.asp.net/t/1585488.aspx).
C:\Windows\System32\config\systemprofile\Desktop
C:\Windows\SysWOW64\config\systemprofile\Desktop
Create those 2 files and the error disappears for the system account.
I have given the domain user rights to those folders and the error disappears as well however, the service hangs on the code line where I open the excel file. When I execute the exact same code with the system account, the code execute well (Note: I save the file locally).
objXL.Workbooks.Open(BloxVacationService.ExcelTemplateFilePath)
Has anybody an idea how to solve this issue without having to rewrite the entire service in OpenXML? Thank you very much in advance.
If you have done all the things described in the question and it still doesn't work (as it was with me), the answer is pretty simple:
Make the domain user local admin on the machine that runs the service. It solved the problem.