OpenStack Swift client for the fastest synchronisation of huge amount of files? - openstack-swift

I have folder with a lot of files (~50k, 3Gb). I need to sync this folder recursively to my container in OpenStak Swift-like storage.
I have tried to use cli duck (cyberduck), but it crashes on huge list of files in prepare process.
I am trying to use supload utility, but it is so slow :(
May be somebody recommends me the best approach (some cli better) for this situation?

You should use official python-swiftclient package and simply :
# load your openstack credentials
source openrc.sh
cd path_to_directory_you_want_to_sync
# upload all the files recursively keeping good paths
swift upload --changed your_container *
Swift does not support rsync-like synchronization, but I use this little script to delete in the container the files you deleted locally and to upload new files without asking swift to compare each files :
#!/bin/bash
cd $2
diff <(find * -type f -print | sort) <(swift list $1 | sort) | while read x; do
if [[ $x == \>* ]]; then
echo "Need to delete ${x:2}"
swift delete $1 "${x:2}"
elif [[ $x == \<* ]]; then
echo "Need to upload ${x:2}"
swift upload $1 "${x:2}"
fi
done
cd -
Use with :
./swift_sync.sh your_container directory_to_sync

Try http://rclone.org/docs/
It has "sync" operation and "Bandwidth limit".
If this not solve your speed problem, the root cause is not on the clients.

Related

GSUTIL CP using file size

I am trying to copy files from a directory on my Google Compute Instance to Google Cloud Storage Bucket. I have it working, however there are ~35k files but only ~5k have an data in them.
Is there anyway to only copy files above a certain size?
I've not tried this but...
You should be able to do this using a resumable transfer and setting the threshold to 5k (defaults to 8Mib). See: https://cloud.google.com/storage/docs/gsutil/commands/cp#resumable-transfers
May be advisable to set BOTO_CONFIG specifically for this copy (a) to be intentional; (b) to remind yourself how it works. See: https://cloud.google.com/storage/docs/boto-gsutil
Resumable uploads has the added benefit, of course, of resuming if there are any failures.
Recommend: try this on a small subset and confirm it works to your satisfaction.
While it's not possible to do it only with gsutil, it's possible to do it by parsing the names and use the -I flag on the cp command to process them. If you're using a Linux Compute Engine instance you can perform it by using the du and awk commands:
du * | awk '{if ($1 > 1000) print $2 }' | gsutil -m cp -I gs://bucket2
The command will get the filesize of the files inside the current directory on your compute engine with du * and will only copy the files which size are larger than 1000 bytes to bucket2, you can change that value to adjust it to your needs.

Utility/Tool to get hash value of a data block in ext3

I have been searching for a utility/tool that can provide the md5sum(or any unique checksum) of a data block inside ext3 inode structure.
The requirement is to verify whether certain data blocks get zeroed, after a particular operation.
I am new to file systems and do not know if any existing tool can do the job, or I need to write this test utility myself.
Thanks...
A colleague provided a very elegant solution. Here is the script.
It needs the name of file as a parameter, and assumes the file system blocksize to be 4K
A further extension of this idea:
If you know the data blocks associated with the file (stat ), you can use 'skip' option of 'dd' command and build small files, each of 1 block size length. Further, you can get the md5sum of these blocks. So, this way you can get md5sum directly from the block device. Not something you would want to do everyday, but a nice analytical trick.
==================================================================================
#!/bin/bash
absname=$1
testdir="/root/test/"
mdfile="md5"
statfile="stat"
blksize=4096
fname=$(basename $absname)
fsize=$( ls -al $absname | cut -d " " -f 5 )
numblk=$(( fsize/blksize ))
x=1
#Create the test directory, if it does not exist already
if [[ ! -d $testdir ]];
then
`mkdir -p $testdir`
fi
#Create multiple files from the test file, each 1 block sized
while [[ $x -le $numblk ]]
do
(( s=x-1 ))
`dd if=$absname of=$testdir$fname$x bs=4096 count=1 skip=$s`
`md5sum $testdir$fname$x >> $testdir$mdfile`
(( x=x+1 ))
done

Recursively replace colons with underscores in Linux

First of all, this is my first post here and I must specify that I'm a total Linux newb.
We have recently bought a QNAP NAS box for the office, on this box we have a large amount of data which was copied off an old Mac XServe machine. A lot of files and folders originally had forward slashes in the name (HFS+ should never have allowed this in the first place), which when copied to the NAS were all replaced with a colon.
I now want to rename all colons to underscores, and have found the following commands in another thread here: pitfalls in renaming files in bash
However, the flavour of Linux that is on this box does not understand the rename command, so I'm having to use mv instead. I have tried using the code below, but this will only work for the files in the current folder, is there a way I can change this to include all subfolders?
for f in *.*; do mv -- "$f" "${f//:/_}"; done
I have found that I can find al the files and folders in question using the find command as follows
Files:
find . -type f -name "*:*"
Folders:
find . -type d -name "*:*"
I have been able to export a list of the results above by using
find . -type f -name "*:*" > files.txt
I tried using the command below but I'm getting an error message from find saying it doesn't understand the exec switch, so is there a way to pipe this all into one command, or could I somehow use the files I exported previously?
find . -depth -name "*:*" -exec bash -c 'dir=${1%/*} base=${1##*/}; mv "$1" "$dir/${base//:/_}"' _ {} \;
Thank you!
Vincent
So your for loop code works, but only in the current dir. Also, you are able to use find to build a file with all the files with : in the filename.
So, as you've already done all this, I would just loop over each line of your file, and perform the same mv command.
Something like this:
for f in `cat files.txt`; do mv $f "${f//:/_}"; done
EDIT:
As pointed out by tripleee, using a while loop is a better solution
EG
while read -r f; do mv "$f" "${f//:/_}"; done <files.txt
Hope this helps.
Will

File movement issue on NFS file system on Unix box

Currently there are 4.5 million files in a single directory on an NFS file system. As a result any read or write operation on that directory is causing a huge delay.
In order to over come this problem, all the files in that directory will be moved onto different directories based on the year of its creation.
Apparently, the find command that we are using with the -ctime option is not working because of the huge file volume.
We tried listing the files based on the year of creation and then feed the list to a script that will move them in a for loop. But even this failed as ls -lrt went for a hang.
Is there any other way to tackle this problem?
Please help.
Script contents:
1) filelist.sh
ls -tlr|awk '{print $8,$9,$6,$7}'|grep ^2011|awk '{print $2,$1,$3,$4}' 1>>inboundstore_$1.txt 2>>Error_$1.log
ls -tlr|awk '{print $8,$9,$6,$7}'|grep ^2011|wc -l 1>>count_$1.log
2) filemove.sh
INPUT_FILE=$1 ##text file which has the list of files from the previous script
FINAL_LOCATION=$2 ##destination directory
if [ -r $INPUT_FILE ]
then
for file in `cat $INPUT_FILE`
do
echo "TIME OF FILE COPY OF [$file] IS : `date`" >> xyz/IBSCopyTime.log
mv $file $FINAL_LOCATION
done
else
echo "$INPUT_FILE does not exist"
fi
Use the readdir iterator.

How can I remove a file based on its creation date time in Perl?

My webapp is hosted on a unix server using MySQL as database.
I wrote a Perl script to run backup of my database. The Perl script is inside the cgi-bin folde and it is working. I only need to set the cronjob and run the Perl script once a day.
The backups are stored in a folder named db_backups,. However, I also want to add a command inside my Perl script to remove any files inside the folder db_backups that are older than say 10 days ago.
I have searched high and low for unix commands and cannot find anything that matches what I needed.
if (-M $file > 10) { unlink $file }
or, coupled with File::Find::Rule
my $ten_days_ago = time() - 10 * 86400;
my #to_delete = File::Find::Rule->file()
->mtime("<=$ten_days_ago")
->in("/path/to/db_backup");
unlink #to_delete;
On Unix you can't, because the file's creation date is not stored in the filesystem.
You may want to check out stat, and -M (modification time)/-C (inode change time)/-A (access time) if you want a simple expression with relative timestamps (how long ago).
i have searched high and low for unix commands
and cannot find anything that matches what i needed.
Check out find(1) and xargs(1). Warning: these commands may change your life at the shell prompt.
$ find /path/to/backup -type f -mtime +10 -print0 | xargs -0 echo rm -f
When you're confident that will Do What You Want (tm), remove the echo. It says, roughly, starting in /path/to/backup, descend looking for plain files whose mtime is greater than 10 days, and print their names to xargs, which will pass those names to rm in batches.
(print0 and its complement -0 are GNU extensions -- you mentioned you were on Linux -- which let you deal with whitespace in filenames safely.)
You should be able to do it without resorting to Unix commands. Loop through the files in your directory, use stat on each file to get its last modify time for a file, then use unlink on the file to delete it if it's older than what you want.