how to print the progress of the files being copied in bash [duplicate] - perl

I suppose I could compare the number of files in the source directory to the number of files in the target directory as cp progresses, or perhaps do it with folder size instead? I tried to find examples, but all bash progress bars seem to be written for copying single files. I want to copy a bunch of files (or a directory, if the former is not possible).

You can also use rsync instead of cp like this:
rsync -Pa source destination
Which will give you a progress bar and estimated time of completion. Very handy.

To show a progress bar while doing a recursive copy of files & folders & subfolders (including links and file attributes), you can use gcp (easily installed in Ubuntu and Debian by running "sudo apt-get install gcp"):
gcp -rf SRC DEST
Here is the typical output while copying a large folder of files:
Copying 1.33 GiB 73% |##################### | 230.19 M/s ETA: 00:00:07
Notice that it shows just one progress bar for the whole operation, whereas if you want a single progress bar per file, you can use rsync:
rsync -ah --progress SRC DEST

You may have a look at the tool vcp. Thats a simple copy tool with two progress bars: One for the current file, and one for overall.
EDIT
Here is the link to the sources: http://members.iinet.net.au/~lynx/vcp/
Manpage can be found here: http://linux.die.net/man/1/vcp
Most distributions have a package for it.

Here another solution: Use the tool bar
You could invoke it like this:
#!/bin/bash
filesize=$(du -sb ${1} | awk '{ print $1 }')
tar -cf - -C ${1} ./ | bar --size ${filesize} | tar -xf - -C ${2}
You have to go the way over tar, and it will be inaccurate on small files. Also you must take care that the target directory exists. But it is a way.

My preferred option is Advanced Copy, as it uses the original cp source files.
$ wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.21.tar.xz
$ tar xvJf coreutils-8.21.tar.xz
$ cd coreutils-8.21/
$ wget --no-check-certificate wget https://raw.githubusercontent.com/jarun/advcpmv/master/advcpmv-0.8-8.32.patch
$ patch -p1 -i advcpmv-0.8-8.32.patch
$ ./configure
$ make
The new programs are now located in src/cp and src/mv. You may choose to replace your existing commands:
$ sudo cp src/cp /usr/local/bin/cp
$ sudo cp src/mv /usr/local/bin/mv
Then you can use cp as usual, or specify -g to show the progress bar:
$ cp -g src dest

A simple unix way is to go to the destination directory and do watch -n 5 du -s . Perhaps make it more pretty by showing as a bar . This can help in environments where you have just the standard unix utils and no scope of installing additional files . du-sh is the key , watch is to just do every 5 seconds.
Pros : Works on any unix system Cons : No Progress Bar

To add another option, you can use cpv. It uses pv to imitate the usage of cp.
It works like pv but you can use it to recursively copy directories
You can get it here

There's a tool pv to do this exact thing: http://www.ivarch.com/programs/pv.shtml
There's a ubuntu version in apt

How about something like
find . -type f | pv -s $(find . -type f | wc -c) | xargs -i cp {} --parents /DEST/$(dirname {})
It finds all the files in the current directory, pipes that through PV while giving PV an estimated size so the progress meter works and then piping that to a CP command with the --parents flag so the DEST path matches the SRC path.
One problem I have yet to overcome is that if you issue this command
find /home/user/test -type f | pv -s $(find . -type f | wc -c) | xargs -i cp {} --parents /www/test/$(dirname {})
the destination path becomes /www/test/home/user/test/....FILES... and I am unsure how to tell the command to get rid of the '/home/user/test' part. That why I have to run it from inside the SRC directory.

Check the source code for progress_bar in the below git repository of mine
https://github.com/Kiran-Bose/supreme
Also try custom bash script package supreme to verify how progress bar work with cp and mv comands
Functionality overview
(1)Open Apps
----Firefox
----Calculator
----Settings
(2)Manage Files
----Search
----Navigate
----Quick access
|----Select File(s)
|----Inverse Selection
|----Make directory
|----Make file
|----Open
|----Copy
|----Move
|----Delete
|----Rename
|----Send to Device
|----Properties
(3)Manage Phone
----Move/Copy from phone
----Move/Copy to phone
----Sync folders
(4)Manage USB
----Move/Copy from USB
----Move/Copy to USB

There is command progress, https://github.com/Xfennec/progress, coreutils progress viewer.
Just run progress in another terminal to see the copy/move progress. For continuous monitoring use -M flag.

Related

gsutil command to delete old files from last day

I have a bucket in google cloud storage. I have a tmp folder in bucket. Thousands of files are being created each day in this directory. I want to delete files that are older than 1 day every night. I could not find an argument on gsutil for this job. I had to use a classic and simple shell script to do this. But the files are deleting very slowly.
I have 650K files accumulated in the folder. 540K of them must be deleted. But my own shell script worked for 1 day and only 34K files could be deleted.
The gsutil lifecycle feature is not able to do exactly what I want. He's cleaning the whole bucket. I just want to delete the files regularly at the bottom of certain folder.. At the same time I want to do deletion faster.
I'm open to your suggestions and your help. Can I do this with a single gsutil command? or a different method?
simple script I created for testing (I prepared to delete bulk files temporarily.)
## step 1 - I pull the files together with the date format and save them to the file list1.txt.
gsutil -m ls -la gs://mygooglecloudstorage/tmp/ | awk '{print $2,$3}' > /tmp/gsutil-tmp-files/list1.txt
## step 2 - I filter the information saved in the file list1.txt. Based on the current date, I save the old dated files to file list2.txt.
cat /tmp/gsutil-tmp-files/list1.txt | awk -F "T" '{print $1,$2,$3}' | awk '{print $1,$3}' | awk -F "#" '{print $1}' |grep -v `date +%F` |sort -bnr > /tmp/gsutil-tmp-files/list2.txt
## step 3 - After the above process, I add the gsutil delete command to the first line and convert it into a shell script.
cat /tmp/gsutil-tmp-files/list2.txt | awk '{$1 = "/root/google-cloud-sdk/bin/gsutil -m rm -r "; print}' > /tmp/gsutil-tmp-files/remove-old-files.sh
## step 4 - I'm set the script permissions and delete old lists.
chmod 755 /tmp/gsutil-tmp-files/remove-old-files.sh
rm -rf /tmp/gsutil-tmp-files/list1.txt /tmp/gsutil-tmp-files/list2.txt
## step 5 - I run the shell script and I destroy it after it is done.
/bin/sh /tmp/gsutil-tmp-files/remove-old-files.sh
rm -rf /tmp/gsutil-tmp-files/remove-old-files.sh
There is a very simple way to do this, for example:
gsutil -m ls -l gs://bucket-name/ | grep 2017-06-23 | grep .jpg | awk '{print $3}' | gsutil -m rm -I
There isn't a simple way to do this with gsutil or object lifecycle management as of today.
That being said, would it be feasible for you to change the naming format for the objects in your bucket? That is, instead of uploading them all under "gs://mybucket/tmp/", you could append the current date to that prefix, resulting in something like "gs://mybucket/tmp/2017-12-27/". The main advantages to this would be:
Not having to do a date comparison for every object; you could run gsutil ls "gs://mybucket/tmp/" | grep "gs://[^/]\+/tmp/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/$" to find those prefixes, then do date comparisons on the last portion of of those paths.
Being able to supply a smaller number of arguments on the command line (prefixes, rather than the name of each individual file) to gsutil -m rm -r, thus being less likely to pass in more arguments than your shell can handle.

Batch rename with command line

I have some files: file1.txt, file2.txt and I would like to rename them like this: file1.something.txt and file2.something.txt
I looked for some similar questions and I come up with this:
for i in file*.txt; do echo mv $i file*.something.txt; done
but unfortunately the output is:
mv file1.txt file*.something.txt
mv file2.txt file*.something.txt
and therefore only 1 file is created.
Could please somebody help?
(I am using a macbook air, I am not sure if this is relevant)
Thank you very much
Try this :
rename -n 's/\.txt/something.txt' *
(remove -n switch when your tests are OK)
There are other tools with the same name which may or may not be able to do this, so be careful.
If you run the following command (GNU)
$ file "$(readlink -f "$(type -p rename)")"
and you have a result like
.../rename: Perl script, ASCII text executable
and not containing:
ELF
then this seems to be the right tool =)
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo update-alternatives --set rename /path/to/rename
(replace /path/to/rename to the path of your perl's rename command.
If you don't have this command, search your package manager to install it or do it manually
Last but not least, this tool was originally written by Larry Wall, the Perl's dad.

How to rename files downloaded with wget -r

I want to download an entire website using the wget -r command and change the name of the file.
I have tried with:
wget -r -o doc.txt "http....
hoping that the OS would have automatically create file in order like doc1.txt doc2.txt but It actually save the stream of the stdout in that file.
Is there any way to do this with just one command?
Thanks!
-r tells wget to recursively get resources from a host.
-o file saves log messages to file instead of the standard error. I think that is not what you are looking for, I think it is -O file.
-O file stores the resource(s) in the given file, instead of creating a file in the current directory with the name of the resource. If used in conjunction with -r, it causes wget to store all resources concatenated to that file.
Since wget -r downloads and stores more than one file, recreating the server file tree in the local system, it has no sense to indicate the name of one file to store.
If what you want is to rename all downloaded files to match the pattern docX.txt, you can do it with a different command after wget has end:
wget -r http....
i=1
while read file
do
mv "$file" "$(dirname "$file")/doc$i.txt"
i=$(( $i + 1 ))
done < <(find . -type f)

how to prevent "find" from dive deeper than current directory

I have many directory with lots of files inside them.
I've just compressed that directory respectively become filename.tar.gz, someothername.tar.gz, etc.
After compressing, I use this bash to delete everything except file name contains .tar.gz:
find . ! -name '*.tar.gz*' | xargs rm -r
But the problem is find will dive too deep inside the directory. Because the directory has been deleted but find will dive deep in each directory, many messages displayed, such as:
rm: cannot remove `./dirname/index.html': No such file or directory
So how to prevent find from dive deeper than this level (current directory)?
You can use ls instead of find for your problem:
ls | grep -v .tar.gz | xargs rm -rf
You can tell find the max depth to recurse:
find -maxdepth 1 ....

How can you recursively copy all of the *.foo files in src to target using cp and/or find?

cp -v -ur path/to/jsps/ /dest/path/
The above command copies all of the files that have been updated from the source directory to the destination, preserving the directory structure.
What I can't figure out is how to copy only *.someExtention files. I know that you can use something like:
find -f -name *.jsp -exec some awesome commands {}
But I don't know how to do it (and I don't have time to read the info pages in detail).
All help is greatly appreciated.
Thanks,
LES
If you want to use find / cp then the following should do the trick:
find -f -name *.jsp -exec cp --parents {} /dest/path \;
but rsync is probably the better tool.
rsync might help - you can tell it to just copy certain files with a combination of include and exclude options, e.g.
rsync -a \
--include='*.foo' \
--include='*/' \
--exclude='*' \
path/to/jsps/ /dest/path/
See the manual and look at the section entitled FILTER RULES for more.