rename command doesn't rename - perl

This should work on my CentOS 6.6 but somehow the file name is not changed. What am I missing here?
rename -f 's/silly//' sillytest.zi
This should rename sillytest.zi to test.zi but the name is not changed. Of course I can use mv command but I want to apply to many files and patterns.

There are two different rename utilities commonly used on GNU/Linux systems.
util-linux version
On Red Hat-based systems (such as CentOS), rename is a compiled executable provided by the util-linux package. It’s a simple program with very simple usage (from the relevant man page):
rename from to file...
rename will rename the specified files by replacing the first occurrence of from in their name by to.
Newer versions also support a useful -v, --verbose option.
NB: If a file already exists whose name coincides with the new name of the file being renamed, then this rename command will silently (without warning) over-write the pre-existing file.
Example
Fix the extension of HTML files so that all .htm files have a four-letter .html suffix:
rename .htm .html *.htm
Example from question
To rename sillytest.zi to test.zi, replace silly with an empty string:
rename silly '' sillytest.zi
Perl version
On Debian-based systems ,rename is a Perl script which is much more capable
as you get the benefit of Perl’s rich set of regular expressions.
Its usage is (from its man page):
rename [ -v ] [ -n ] [ -f ] perlexpr [ files ]
rename renames the filenames supplied according to the rule specified as the first argument.
This rename command also includes a -v, --verbose option. Equally useful is its -n, --no-act which can be used as a dry-run to see which files would be renamed. Also, it won’t over-write pre-existing files unless the -f, --force option is used.
Example
Fix the extension of HTML files:
rename s/\.htm$/.html/ *.htm

Related

How do I diff only certain files?

I have a list of files (a subset of the files in a directory) and I want to generate a patch that includes only the differences in those files.
From the diff manual, it looks like I can exclude (-x), but would need to specify that for every file that I don't want to include, which seems cumbersome and difficult to script cleanly.
Is there a way to just give diff a list of files? I've already isolated the files with the changes into a separate directory, and I also have a file with the list of filenames, so I can present it to diff in whichever way works best.
What I've tried:
cd filestodiff/
for i in `*`; do diff /fileswithchanges/$i /fileswithoutchanges/$i >> mypatch.diff; done
However patch doesn't see this as valid input because there's no filename header included.
patchutils provides filterdiff that can do this:
diff -ur old/ new/ | filterdiff -I filelist > patchfile
It is packaged for several linux distributions

Can we wget with file list and renaming destination files?

I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.

GNU make: Can I delete obsolete files?

In my workflow, I have lots of xxx.smr files in a folder and I need to convert them into other file format xxx_step3.mat by importing some data from xxx_info.xlsx. I learned that GNU make is powerful in keep all the files up-to-date.
In a very simple "explicit" format (without sophisticated wild card usage), Makefile for this process would look like this. To handle multiple xxx.smr files and their descendants, I should be able to do that by modifying this file.
.PHONY: all clean
all: xxx_step3.mat
xxx_step3.mat: xxx_step2.mat xxx_info.xlsx
matlab -r "merge2files('xxx_step2.mat', 'xxx_info.xlsx')"
xxx_step2.mat: xxx_step1.mat
matlab -r "convertmat('xxx_step1.mat')"
xxx_info.xlsx: master.xslx
matlab -r "extractfromMasterxlsx('master.xlsx', 'xxx_info.xlsx')"
xxx_step1.mat: xxx_step0.smr
#echo "\nCreate " $#
# I can't do this step from the command line so I leave message
clean:
rm -f xxx_step1.mat xxx_step2.mat xxx_step3.mat xxx_info.xlsx
However, I realized that, when some of xxx.smr files were found to be surplus and deleted at some point, running GNU make with this Makefile does not delete the obsolete descendant files, including all the intermediate files and the final xxx_step3.mat files, that are dependent on those deleted xxx.smr files.
For example, I start with the three xxx.smr files and run Make.
A.smr, B.smr, C.smr
It will create all the descendants, including the final target files:
A_step3.mat, B_step3.mat, C_step3.mat
Later, say, I find the B.smr contained a fatal error and decided to delete from the folder.
A.smr, C.smr
Running Make at this stage will result in ... no change, because both A_step3.mat and C_step3.mat are newer than its direct prerequisites (and than A.smr and C.smr). However, actually I need to remove all the descendants of B.smr, such as B_step1.mat, B_step2.mat, B_step3.mat, and B_info.xlsx. If those obsolete files are kept, the final target B_step3.mat will be included in the subsequent analyses and affect the results.
I wonder if there is a "smart" way of removing xxx_step1.mat, xxx_step2.mat, xxx_step3.mat, xxx_info.xlsx files, when their corresponding xxx.smr files have been deleted.
Or should I just implement this with MATLAB or Python etc?
Since a Makefile is a collection of shell commands, on your clean: target, you can collect and remove all the files that correspond to your xxx.smr files using a for loop and parameter expansion/substring matching. To find all files that correspond to each xxx.smr file, find all xxx.smr files. Then for each xxx.smr, extract xxx and remove all xxx_step?.* and xxx_info.* files. After each of the step? and info files are removed, then remove xxx.smr. In multi-line form it would look like:
for i in *.smr; do
for j in ${i%.*}; do
rm -f "${j}_step?.*" "${j}_info.*"
done
rm -f "$i"
done
Or, in a single line:
for i in *.smr; do for j in ${i%.*}; do rm -f "${j}_step?.*" "${j}_info.*"; done; rm -f "$i"; done
Note this will remove all xxx_step... and xxx_info... files for each xxx.smr file. Make sure this is what you intend and run on a test directory first. You can tighten the extensions above to just remove xxx_info.xlsx by replacing xxx_info.* with xxx_info.xlsx, etc...

wget to exclude certain naming structures

My company has a local production server I want to download files from that have a certain naming convention. However, I would like to exclude certain elements based on a portion of the name. Example:
folder client_1234
file 1234.jpg
file 1234.ai
file 1234.xml
folder client_1234569
When wget is ran I want it to bypass all folders and files with "1234". I have researched and ran across ‘--exclude list’ but that appears to be only for directories and ‘reject = rejlist’ which appears to be for file extensions. Am I missing something in the manual here
EDIT:
this should work.
wget has options -A <accept_list> and -R <reject_list>, which from the manual page, appear to allow either suffixes or patterns. These are separate from the -I <include_dirs> and -X <exclude_dirs> options, which, as you note, only deal with directories. Given the example you list, something along the lines of -A "folder client_1234*" -A "file 1234.*" might be what you need, although I'm not entirely sure that's exactly the naming convention you're after...

Why can't Perl find my file that is in ClearCase?

This piece of Perl is telling me that a file in ClearCase doesn't exist when it does;
$x = "PATH/TO/FILE"
if (-e $x) {
print "This file exists on the file system";
} else {
print "I can't see this file";
}
Has anyone seen this before?
Some files return fine. Got me stumped.
Clearcase view is dynamic, hosted on unix.
This piece of code is returning that a file exists and another doesn't when they are in the same folder on the same view.
Clearcase stores its 'files' as directories
What Aric is trying to tell you is that ClearCase uses extended path names, "extended" because it extends the file name with version path.
So in a dynamic view, any file can be described to reveal its versioning path:
$ ct ls
myFile
$ ct descr -l myFile
myFile##/main/3
In a dynamic view, you can actually explore the versions of a file (hence the "file as directories") part
$ cd myFile##
$ ls
main
$ cd main
$ ls
3
$ cat 3
... // content of third version of myFile
Now, if ClearQuest (the issue tracking system) were used here, it would reference activities (change set of file set) of ClearCase.
But with ClearCase, a version of a file (referenced by ClearQuest or obtained through another mean) can very well has been deleted in the dynamic view ("rmnamed" actually).
Meaning a file may be referenced by ClearQuest or by some ClearCase activity, but not be visible directly with ClearCase in the dynamic view.
However, its extended path name would still be accessible in that same dynamic view.
Clearcase stores its 'files' as directories. You can cd into an file and get into the actual directory that it's using to store all of the revisions of a file; While the OS hooks usually work, perhaps this is causing Perl to not recognize some of the files.