Does IPFS provide block-level file copying feature? - delta

Update 11-14-2019: Please see the following discussion for feature request to IPFS: git-diff feature: Improve efficiency of IPFS for sharing updated file. Decrease file/block duplication
There is a .tar.gz file, which contains a data.txt file, file.tar.gz (~1 GB) stored in my IPFS-repo, which is pulled from another node-a.
I open the data.txt file and added a single character in a random locations in the file(beginning of the file, middle of the file, and end of the file), and compress it again as file.tar.gz and store it in my IPFS-repo.
When node-a wants to re-get the updated tar.gz file a re-sync will take place.
I want to know using IPFS whether the entire 1 GB of file will be synced or there exists a way by which parts of the file that is changed (called the delta) get synced.
Similar question is asked for the Google Drive: Does Google Drive sync entire file again on making small changes?
The feature you are asking is called "Block-Level File Copying". With
this feature, when you make a change to a file, rather than copying
the entire file from your hard drive to the cloud server again, only
the parts of the file that changed (called the delta) get sent.
As far as I know, Drobox, pCloud, and OneDrive, which however only supports it for Microsoft Office documents, offers block level sync.

Related

how to create a script that allows to use the path list as a reference for copying files in PowerShell in .bat script

I'm looking for a way to automate archiving where after I plug my two external drives I can copy all my resources. The problem is that I have different file structures on my laptop and on both external drives so I need to select specific folders to be copied. It means that I can't select one root folder and copy it straightforward. I tried to find a way to declare more than one path in the cp command and in the copy command, without success. An example path:
/my_programming_stuff
/folder1
/folder2
/folder3
/folder4
I want to select only the first 3 folders to copy them into external drive1 and external drive 2. The idea is to create a .bat file that will copy everything at once ( in the best case scenario it will be copied simultaneously on both external drives, so it will be much faster). Another problem is that there needs to be a bypass the ntfs long path limitations (max. 260 characters).
Flags that I want to use:
Copy the files and directories and all of their attributes,
including ownerships and permissions.
Recursively copy directories and their contents.
When copying files from one directory to another, only
copy files that either doesn't exist or are newer than the
existing corresponding files, in the destination
directory.
data verification (so it's certain that the copy was verified)
progression bar with time eta
Until now I was using Total Commander to do this but every day I need to pick only a few folders to be copied which takes time and is inefficient.
I have experience with Bash and PowerShell but I am not sure how to handle this topic.
Create a static batch file with robocopy commands. I think /copyall is the only switch you need to specify for all this. Other defaults should satisfy requirements.
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy
I think your time will be better spent learning how to use either FastCopy or FreeFileSynce. I used FreeFileSync some years ago but got disgusted with the it's constantly changing format of its xml file used for starting a backup, so I switched to FastCopy. But it looks like FreeFileSync may be getting their act together and I aim to do some experiments over the summer to see if I want to switch back to it.
Both can handle the long filename format issues, both can be executed by a batch file, both seem to have a lot of quality, but FreeFileSync has more features - and more bloated because of the features. But speed wise, I think FastCopy is probably one of the better products out there and very streamline in use and design.

Force overwrite or delete file in use (executable that currently runs)

I'm looking for solution to delete or (preferably directly) overwrite source of an exe file while it is running.
To explain further before you get it all wrong, I'll give an example:
I have an exe file on drive D:\ which I run (with previously posted question's answer, giving params to "Start in" folder on C:\Program Files\MyProgram\" so it finds its dlls.
Now after the file is running, I'd like to rewrite the file's byte stream (just like opening it in hex editor...), or at least delete it so I can copy over new exe file directly using same name.
So far the solution I'm using is that I trigger format D: command for the whole drive D:\ (which, in my case is ramdisk and thumb-drive, as I only have this exe on it, I copy it there as necessary), since that removes the file and let's me copy new file there.
Trying to use del myProgram.exe even with -force flag triggers error that access to the file is denied. Same goes if I try to overwrite the contents of the file.
Is there any alternative to do that without using the format command, as that requires to have partition drive only for the purpose?
Update: Note: MoveFileEx and similar techniques that require termination of the process or system restart/reboot are not qualified as a solution. This should be done while the process is running without further actions that can compromise the process's run state.
On a side note, when formatting the drive using the Powershell's format command, the file is gone, although if viewing the partition using Hex viewer tool, there is full binary (hex) content of the exe visible there and an be restored using just as simple as copy-paste technique. This is one of the points as to where overwriting the file contents would be preferable than deleting the file directly.
Please note: This is a knowledge and skills based question, and would therefore appreciate sparing the moral and security-concerning comments about such actions and behaviour.
For deleting/replacing/overwriting a file at least two conditions must be met:
The user performing the operation must have the required permissions to do so. This can be verified for instance via Get-Acl or icacls.
Windows must not have an open handle to the file. This can be checked for instance with tools like Process Explorer or handle. These tools can also be used to forcibly close open handles, although that's not recommended as it may cause data loss and/or damage to the files in question. I'm not sure, though, if it's actually possible to close handles to an executable without terminating the process.
Note that antivirus software is likely to interfere with this kind of operation.
The basic problem here is that Windows loads from the .EXE upon demand, it's not all read in at once.
If you destroy the original file what happens when it tries to load in a page that no longer exists?
If I had to write something of this sort I would copy the .exe to a temporary location (beware that running code from the temp directory may be prohibited), run the new .exe, terminate the old one and then do what I want to it.

Copy multiple .gz files from one GCS bucket to another bucket in Java

I want to copy multiple .gz files from one gcs bucket to another. File name pattern has prefix as 'Logs_' and suffix as date like '20160909',so full file name will be Logs_2016090.gz, Logs_20160908.gz etc. I want to copy all files starting with Logs_ from one gcs bucket to another gcs bucket. For this I am using wildcard character * at the end like Logs_*.gz for copy operation as below:
Storage.Objects.Copy request =
storageService
.objects()
.copy("source_bucket", "Logs_*.gz", "destination_bucket", ".", content);
Above I am using "." because all files has to be copied to destination_bucket, so I can't specify single file name there. Unfortunately, this code doesn't work and error that file doesn't exist. I am not sure what change is required here. Any java link or any piece of code will be helpful. Thanks !!
While the gsutil command-line utility happily supports wildcards, the GCS APIs themselves are lower level commands and do not. The storage.objects.copy method must have one precise source and one precise destination.
I recommend one of the following:
Use a small script invoking gsutil, or
Make a storage.objects.list call to get the names of all matching source objects, then iterate over them, calling copy for each, or
If you're dealing with more than, say, 10 TB or so of gzip files, consider using Google's Cloud Storage Transfer Service to copy the files.

Talend tWaitForFile insufficiency

We have a producer process that write files into a specific folder, which run continuously, we have to read files one by one using talend, there is 2 issues:
The 1st: tWaitForFile read only files which exist before its starting, so files which have created after the component starting are not visible for it.
The 2nd: There is no way to know if the file is released by the producer process, it may be read while it is not completely written, the parameter _wait_release_ of tWaitForFile does not work on Linux system !
So how can make Talend read complete written files from a directory that have an increasing files number ?
I'm not sure what you mean by your first issue. tWaitForFile has options to trigger when files are created, modified or deleted in a folder.
As for the second issue, your best bet here is for the file producer to be creating an OK or control file which is a 0 byte touch when it has finished writing the file you want.
In this case you simply look for the appearance of the OK file and then pick up the relevant completed file. If you name the 2 files the same but with a different file extension (the OK file is typically called ".OK" then this should be easy enough to look for. So you would set your tWaitForFile to look for "*.OK" files and then connect this to an iterate to a tFileInputDelimited (in the case you want to pick up a delimited text file) and then declare the file name as ((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).substring(0,((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).length()-3) + ".txt"
I've included some screenshots to help you below:

System.IO - Does BinaryReader/Writer read/write exactly what a file contains? (abstract concept)

I'm relatively new to C# and am attempting to adapt a text encryption algorithm I designed in wxMaxima into a Binary encryption program in C# using Visual Studio forms. Because I am new to reading/writing binary files, I am lacking in knowledge regarding what happens when I try to read or write to a filestream.
For example, instead of encrypting a text file as I've done in the past, say I want to encrypt an executable or any other form of binary file.
Here are a few questions I don't understand:
When I open a file stream and use binaryreader will it read in an absolute duplicate of absolutely everything in the file? I want to be able to, for example, read in an entire file, delete the original file, then create a new file with the old name and write the entire binary stream back. Will this reproduce the original file exactly or will there be some sort of corruption that must otherwise be accounted for?
Because it's an encryption program, I was hoping to add in a feature that would low-level "format" the original file before deleting it so it would be theoretically inaccessible by combing the physical data of a harddisk. If I use binarywriter to overwrite parts of the original file with gibberish will it be put on the same spot on the harddisk or will the file become fragmented and actually just redirect via the FAT to some other portion of the harddisk? Obviously there's no point in overwriting the original file with gibberish if it's not over-writing the original cluster on the harddisk.
For your first question: A BinaryReader is not what you want. The name is a bit misleading: it "Reads primitive data types as binary values in a specific encoding." You probably want a FileStream.
Regarding the second question: That will not be easy: please see the "How SDelete Works" section of SDelete for an explanation. Brief extract in case that link breaks in the future:
"Securely deleting a file that has no special attributes is relatively straight-forward: the secure delete program simply overwrites the file with the secure delete pattern. What is more tricky is securely deleting Windows NT/2K compressed, encrypted and sparse files, and securely cleansing disk free spaces.
Compressed, encrypted and sparse are managed by NTFS in 16-cluster blocks. If a program writes to an existing portion of such a file NTFS allocates new space on the disk to store the new data and after the new data has been written, deallocates the clusters previously occupied by the file."