Primary and secondary numerical sorting of file names by delimiter - gnu-sort

I have some files named like so:
S241R39.txt
S241R40.txt
S241R41.1.txt
S241R41.2.txt
S241R42.1.txt
S241R42.2.txt
I want to be able to sort these in this order:
S241R39.txt
S241R40.txt
S241R41.1.txt
S241R42.1.txt
S241R41.2.txt
S241R42.2.txt
Here, I want 41.1 to come before 42.1, and 42.1 to come before 41.2
In files that don't end in .1 or .2 this sorts my files correctly:
ls -1 *.txt | sort -V
Does anyone have any suggestions for how I can adjust this to give me my desired output?

You can use sort -t . -k2n -k1n:
printf '%s\n' *.txt | sort -t . -k2n -k1V
S241R39.txt
S241R40.txt
S241R41.1.txt
S241R42.1.txt
S241R41.2.txt
S241R42.2.txt
sort command in use is:
-t .: Make dot a field delimiter
-k2n -k1V: Use sort by field2 (numeric), field1 (version)

Related

how to replace with sed when source contains $

I have a file that contains:
$conf['minified_version'] = 100;
I want to increment that 100 with sed, so I have this:
sed -r 's/(.*minified_version.*)([0-9]+)(.*)/echo "\1$((\2+1))\3"/ge'
The problem is that this strips the $conf from the original, along with any indentation spacing. What I have been able to figure out is that it's because it's trying to run:
echo " $conf['minified_version'] = $((100+1));"
so of course it's trying to replace the $conf with a variable which has no value.
Here is an awk version:
$ awk '/minified_version/{$3+=1} 1' file
$conf['minified_version'] = 101
This looks for lines that contain minified_version. Anytime such a line is found the third field, $3, is incremented by.
My suggested approach to this would be to have a file on-disk that contained nothing but the minified_version number. Then, incrementing that number would be as simple as:
minified_version=$(< minified_version)
printf '%s\n' "$(( minified_version + 1 ))" >minified_version
...and you could just put a sigil in your source file where that needs to be replaced. Let's say you have a file named foo.conf.in that contains:
$conf['minified_version'] = #MINIFIED_VERSION#
...then you could simply run, in your build process:
sed -e "s/#MINIFIED_VERSION#/$(<minified_version)/g" <foo.conf.in >foo.conf
This has the advantage that you never have code changing foo.conf.in, so you don't need to worry about bugs overwriting the file's contents. It also means that if you're checking your files into source control, so long as you only check in foo.conf.in and not foo.conf you avoid potential merge conflicts due to context near the version number changing.
Now, if you did want to do the native operation in-place, here's a somewhat overdesigned approach written in pure native bash (reading from infile and writing to outfile; just rename outfile back over infile when successful to make this an in-place replacement):
target='$conf['"'"'minified_version'"'"'] = '
suffix=';'
while IFS= read -r line; do
if [[ $line = "$target"* ]]; then
value=${line##*=}
value=${value%$suffix}
new_value=$(( value + 1 ))
printf '%s\n' "${target}${new_value}${suffix}"
else
printf '%s\n' "$line"
fi
done <infile >outfile

Opening last edited file from a directory

I'm trying to run a perl script that opens the last edited file in a directory.
I know how to open a single file, like in the sample below, but not the last edited on /home/test/
open(CONFIG_FILE,"/home/test/test.txt");
while (<CONFIG_FILE>){
}
close(CONFIG_FILE);
How can I do that?
It read all files from /home/test/ and takes newest one,
my ($last_modified) = sort { -M $a <=> -M $b } grep -f, </home/test/*>;
Check perldoc -f -X
Try doing this :
chomp(my $file = qx(cd /home/test && ls -1t | sed q));
this solution work by evaluating a shell command. sed q is a trick to display only the first line
qx() execute the shell command and get the output, see perldoc qx
Or with pure-perl :
use File::DirList;
my $files = File::DirList::list('/home/test', 'M');
my $file = $files->[0]->[13];
The method File::DirList::list('/home/test', 'M'); return an array sorted by modification date, whe just pick up the first one. 13 is the key for the filename.
See perldoc File::DirList
There's an example of how to read the files in a directory in the documentation for the readdir function.
Once you get that going, you'll want to use the stat function to check the mtime of each file as you look at it in the readdir.
All the rest is just programming ;-)
Alternately, you could take the last item from ls sorted by date: ls -tr | tail -1.

Print line numbers after comparison

Can someone tell me the best way to print the number of different lines in 2 files. I have 2 directories with 1000s of files and I have a perl script that compares all files in dir1 with all files in dir2 and outputs the difference to a different file. Now I need to add something like Filename - # of different lines
File1 - 8
File2 - 30
Right now I am using
my $diff = `diff -y --suppress-common-lines "$DirA/$file" "$DirB/$file"`;
But along with this I also need to print how many lines are different in each one of those 1000 files.
Sorry is a duplicate of my prev thread. So would be glad if some moderator could delete the previous one
Why you even use perl?
for i in "$dirA"/*; do file="${i##*/}"; echo "$file - $(diff -y --suppress-common-lines "$i" "$dirB/$file" | wc -l)" ; done > diffs.txt

Matlab how to change file names from 1_x_10_a.jpg to 01_x_010_a.jpg

I have really a lot of files named like:
1_x_0_a.jpg, 1_x_0_b.jpg, 1_x_5_a.jpg ... 15_x_160_a.jpg, 15_x_160_b.jpg, 15_x_165_a.jpg
I would like to change the file names as follows:
01_x_000_a.jpg, 01_x_000_b.jpg, 01_x_005_a.jpg
So, before x should be a number with 2 dig and after x with 3 digits.
The following code should work on relatively newer versions of MATLAB.
fileStruct = dir;
files = {fileStruct.name};
for oldFile = files
oldFile = oldFile{1}; //Takes string out of cell
// Embedding the sprintf in a regexprep only works in certain versions
newFile = regexprep(oldFile, '^(\d*)', '${sprintf(''%02d'', str2num($1))}');
newFile = regexprep(newFile, '(?<=_)(\d*)(?=_)', '${sprintf(''%03d'', str2num($1))}');
movefile(oldFile, newFile);
end
Here are some steps that should help you:
Use dir to get a list of file names.
Use regexprep to replace starting numbers by starting numbers with leading zeros
Use regexprep to replace middle numbers by starting numbers with up to two zeros
Use rename to change the file names
Note that I have not tried it and the documentation of rename is a bit strange as it refers to ftp sites, but it might just work. If it does not work, I guess you can just copy all the files and then remove the old ones.
If you are on a Unix or Linux machine you can try this small shell script:
In a terminal go to the directory where you have your files.
You can first try it without really renaming your files, by replacing the mv with echo to see if it works as expected.
for file in *; do
mv $file $(echo $file | awk -F '_' '{ printf "%02d_%s_%003d_%s\n", $1, $2, $3, $4 }')
done
or as a one liner
for file in *; do mv $file $(echo $file | awk -F '_' '{ printf "%02d_%s_%003d_%s\n", $1, $2, $3, $4 }'); done
For files
1_x_0_a.jpg
1_x_0_b.jpg
1_x_5_a.jpg
15_x_160_a.jpg
15_x_160_b.jpg
15_x_165_a.jpg
I get the result
01_x_000_a.jpg
01_x_000_b.jpg
01_x_005_a.jpg
15_x_160_a.jpg
15_x_160_b.jpg
15_x_165_a.jpg

Best way to parse this particular string using awk / sed?

I need to get a particular version string from a file (call it version.lst) and use it to compare another in a shell script. For example sake, the file contains lines that look like this:
V1.000 -- build date and other info here -- APP1
V1.000 -- build date and other info here -- APP2
V1.500 -- build date and other info here -- APP3
.. and so on. Let's say I am trying to grab the first version (in this case, V1.000) from APP1. Obviously, the versions can change and I want this to be dynamic. What I have right now works:
var = `cat version.lst | grep " -- APP1" | grep -Eo V[0-9].[0-9]{3}`
Pipe to grep will get the line containing APP1 and the second pipe to grep will get the version string. However, I hear grep is not the way to do this so I'd like to learn the best way using awk or sed. Any ideas? I am new to both and haven't found a tutorial easy enough to learn the syntax of it. Do they support egrep? Thanks!
Try this to get the complete version:
#!/bin/sh
app=APP1
var=$(awk -v "app=$app" '$NF == app {print $1}' version.lst)
or to get only the major version number, the last line could be:
var=$(awk -v "app=$app" '$NF == app {split($1,a,"."); print a[1]}' version.lst)
Using sed to get the complete version:
var=$(sed -n "/ $app\$/s/^\([^ ]*\).*/\1/p" version.lst)
or this to get only the major version number:
var=$(sed -n "/ $app\$/s/^\([^.]*\).*/\1/p" version.lst)
Explanations:
The second AWK command:
-v "app=$app" - set an AWK variable equal to a shell variable
$NF == app - if the last field is equal to the contents of the variable (NF is the number of field, so $NF is the contents of the NFth field)
{split($1,a,".") - then split the first field at the dot
print a[1] - and print the first part of the result of the split
The sed commands:
-n - don't print any output unless directed to
"/ $app\$/ - for any line that ends with (\$) the contents of the shell variable $app (not that double quotes are used to allow the variable to be expanded and it's a good idea to escape the second dollar sign)
s/^\([^ ]*\).*/\1/p" - starting at the beginning of the line (^), capture \(\) the sequence of characters that consists of non-spaces ([^ ]) (or non-dots in the second version) of any number (zero or more *) and match but don't capture all the rest of the characters on the line (.*), replace the matched text (the whole line in this case) with the string that was captured (the version number) (\1 refers to the first (only, in this case) capture group, and print it (p)
If I understood correctly: egrep "APP1$" version.lst | awk '{print $1}'
$ awk '/^V1\.00.* APP1$/{print $NF}' version.lst
APP1
That regular expression matches lines that start with "V1.00", followed by any number of any other characters, ending with " APP1". The backslash in the middle there might be really important--it matches only ".", and so it excludes (probably corrupt) lines that might begin with, say, "V1a00". The space before "APP1" excludes things like "APP2_APP1".
"NF" is an automatically generated variable that contains the number of field in the input line. It's also the number of the last field, which happens to be the one you're interested in.
There are a couple of ways to prune off the "V1". Here's one way, although you and I might not be talking about quite the same thing.
$ awk '/^V1\.00.* APP1$/{print substr($1, 1, index($1, ".") - 1), $NF}' version.lst
V1 APP1