Best way to parse this particular string using awk / sed? - sed

I need to get a particular version string from a file (call it version.lst) and use it to compare another in a shell script. For example sake, the file contains lines that look like this:
V1.000 -- build date and other info here -- APP1
V1.000 -- build date and other info here -- APP2
V1.500 -- build date and other info here -- APP3
.. and so on. Let's say I am trying to grab the first version (in this case, V1.000) from APP1. Obviously, the versions can change and I want this to be dynamic. What I have right now works:
var = `cat version.lst | grep " -- APP1" | grep -Eo V[0-9].[0-9]{3}`
Pipe to grep will get the line containing APP1 and the second pipe to grep will get the version string. However, I hear grep is not the way to do this so I'd like to learn the best way using awk or sed. Any ideas? I am new to both and haven't found a tutorial easy enough to learn the syntax of it. Do they support egrep? Thanks!

Try this to get the complete version:
#!/bin/sh
app=APP1
var=$(awk -v "app=$app" '$NF == app {print $1}' version.lst)
or to get only the major version number, the last line could be:
var=$(awk -v "app=$app" '$NF == app {split($1,a,"."); print a[1]}' version.lst)
Using sed to get the complete version:
var=$(sed -n "/ $app\$/s/^\([^ ]*\).*/\1/p" version.lst)
or this to get only the major version number:
var=$(sed -n "/ $app\$/s/^\([^.]*\).*/\1/p" version.lst)
Explanations:
The second AWK command:
-v "app=$app" - set an AWK variable equal to a shell variable
$NF == app - if the last field is equal to the contents of the variable (NF is the number of field, so $NF is the contents of the NFth field)
{split($1,a,".") - then split the first field at the dot
print a[1] - and print the first part of the result of the split
The sed commands:
-n - don't print any output unless directed to
"/ $app\$/ - for any line that ends with (\$) the contents of the shell variable $app (not that double quotes are used to allow the variable to be expanded and it's a good idea to escape the second dollar sign)
s/^\([^ ]*\).*/\1/p" - starting at the beginning of the line (^), capture \(\) the sequence of characters that consists of non-spaces ([^ ]) (or non-dots in the second version) of any number (zero or more *) and match but don't capture all the rest of the characters on the line (.*), replace the matched text (the whole line in this case) with the string that was captured (the version number) (\1 refers to the first (only, in this case) capture group, and print it (p)

If I understood correctly: egrep "APP1$" version.lst | awk '{print $1}'

$ awk '/^V1\.00.* APP1$/{print $NF}' version.lst
APP1
That regular expression matches lines that start with "V1.00", followed by any number of any other characters, ending with " APP1". The backslash in the middle there might be really important--it matches only ".", and so it excludes (probably corrupt) lines that might begin with, say, "V1a00". The space before "APP1" excludes things like "APP2_APP1".
"NF" is an automatically generated variable that contains the number of field in the input line. It's also the number of the last field, which happens to be the one you're interested in.
There are a couple of ways to prune off the "V1". Here's one way, although you and I might not be talking about quite the same thing.
$ awk '/^V1\.00.* APP1$/{print substr($1, 1, index($1, ".") - 1), $NF}' version.lst
V1 APP1

Related

how to replace with sed when source contains $

I have a file that contains:
$conf['minified_version'] = 100;
I want to increment that 100 with sed, so I have this:
sed -r 's/(.*minified_version.*)([0-9]+)(.*)/echo "\1$((\2+1))\3"/ge'
The problem is that this strips the $conf from the original, along with any indentation spacing. What I have been able to figure out is that it's because it's trying to run:
echo " $conf['minified_version'] = $((100+1));"
so of course it's trying to replace the $conf with a variable which has no value.
Here is an awk version:
$ awk '/minified_version/{$3+=1} 1' file
$conf['minified_version'] = 101
This looks for lines that contain minified_version. Anytime such a line is found the third field, $3, is incremented by.
My suggested approach to this would be to have a file on-disk that contained nothing but the minified_version number. Then, incrementing that number would be as simple as:
minified_version=$(< minified_version)
printf '%s\n' "$(( minified_version + 1 ))" >minified_version
...and you could just put a sigil in your source file where that needs to be replaced. Let's say you have a file named foo.conf.in that contains:
$conf['minified_version'] = #MINIFIED_VERSION#
...then you could simply run, in your build process:
sed -e "s/#MINIFIED_VERSION#/$(<minified_version)/g" <foo.conf.in >foo.conf
This has the advantage that you never have code changing foo.conf.in, so you don't need to worry about bugs overwriting the file's contents. It also means that if you're checking your files into source control, so long as you only check in foo.conf.in and not foo.conf you avoid potential merge conflicts due to context near the version number changing.
Now, if you did want to do the native operation in-place, here's a somewhat overdesigned approach written in pure native bash (reading from infile and writing to outfile; just rename outfile back over infile when successful to make this an in-place replacement):
target='$conf['"'"'minified_version'"'"'] = '
suffix=';'
while IFS= read -r line; do
if [[ $line = "$target"* ]]; then
value=${line##*=}
value=${value%$suffix}
new_value=$(( value + 1 ))
printf '%s\n' "${target}${new_value}${suffix}"
else
printf '%s\n' "$line"
fi
done <infile >outfile

how to find a duplicate value between two files and print

I have two files, one with a single IP address (which I have already used perl to strip the ip) and one that has ip's with more info. I need to do a commpare or use perl and find the duplicate IP in each file. but I need the second file with more info to remain in tact and when a duplicate is found print the entire line of the second file.
file1 content example (just ip no comma etc)
114.42.141.131
file2 content example (need all this info to print when match found)
114.42.141.131,Host TW,Taipei,25.0391998291,121.525001526
This is a little beyond my skills. Any help would be greatly appreciated!!!
Thank you!
To match on the first field, all you need is:
awk -F, 'FNR==NR { a[$1]; next } $1 in a' file1 file2
I assume you have shell access.
If the first file contains only the IP, then you can do something like:
REF_IP=`cat file1`
Then, you can use grep from the second file:
grep "${REF_IP}" file2
The result should be the line with the duplicated address.
Note: The actual syntax might be slightly different (I don't have access to a shell right now)
HTH
take a look this oneliner, if it is what you want:
Note, this will print duplicated ip line in file2 only once. also assume there is no duplicated ips in file2.
awk -F, 'NR==FNR{p[$1]=$0;next}{a[$0]++}END{for(x in a)if (a[x]>1)print p[x]}' file2 file1
little test:
kent$ head f1 f2
==> f1 <==
1.1.1.1
1.1.1.1
1.1.1.1
2.2.2.2
==> f2 <==
1.1.1.1,Host TW,Taipei,25.0391998291,121.525001526
2.2.2.2,this is for 2.2.
kent$ awk -F, 'NR==FNR{p[$1]=$0;next}{a[$0]++}END{for(x in a)if (a[x]>1)print p[x]}' f2 f1
1.1.1.1,Host TW,Taipei,25.0391998291,121.525001526

How to pipe Bash Shell command's output line by line to Perl for Regex processing?

I have some output data from some Bash Shell commands. The output is delimited line by line with "\n" or "\0". I would like to know that is there any way to pipe the output into Perl and process the data line by line within Perl (just like piping the output to awk, but in my case it is in the Perl context.). I suppose the command may be something like this :
Bash Shell command | perl -e 'some perl commands' | another Bash Shell command
Suppose I want to substitute all ":" character to "#" character in a "line by line" basis (not a global substitution, I may use a condition, e.g. odd or even line, to determine whether the current line should have the substitution or not.), then how could I achieve this.
See perlrun.
perl -lpe's/:/#/g' # assumes \n as input record separator
perl -0 -lpe's/:/#/g' # assumes \0 as input record separator
perl -lne'if (0 == $. % 2) { s/:/#/g; print; }' # modify and print even lines
Yes, Perl may appear at any place in a pipeline, just like awk.
The command line switch -p (if you want automatic printing) or -n (if you don't want it) will do what you want. The line contents are in $_ so:
perl -pe's/\./\#/g'
would be a solution. Generally, you want to read up on the '<>' (diamond) operator which is the way to go for non-oneliners.

Perl oneliner match repeating itself

I'm trying to read a specific section of a line out of a file with Perl.
The file in question is of the following syntax.
# Sets $USER1$
$USER1$=/usr/....
# Sets $USER2$
#$USER2$=/usr/...
My oneliner is simple,
perl -ne 'm/^\$USER1\$\s*=\s*(\S*?)\s*$/m; print "$1";' /my/file
For some reason I'm getting the extraction for $1 repeated several times over, apparently once for every line in the file after my match occurs. What am I missing here?
You are executing print for every line of the file because print gets called for every line, whether the regex matches or not. Replace the first ; with an &&.
From perlre:
NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.
Try this instead:
perl -ne 'print "$1" if m/^\$USER1\$\s*=\s*(\S*?)\s*$/m;' /my/file
$ cat test.txt
# Sets $USER1$
$USER1$=/usr/....
# Sets $USER2$
#$USER2$=/usr/...
$ perl -nle 'print if /^\$USER1/;' test.txt
$USER1$=/usr/....
Try this
perl -ne '/^.*1?=([\w\W].*)$/;print "$1";' file

replacing a variable in shell script using perl

I have a variable in a shell script,
var=1234_number
I want to replace all other than integer of $var .. how can I do it using a perl onliner?
You might be looking for something to edit the shell script, in which case, this might be sufficient:
perl -i.bak -e 's/\b(var=\d+).*/$1/' shellscript.sh
The '-i' overwrites the original file, saving a copy in shellscript.sh.bak; the substitute command finds assignments to 'var' (and not any longer name ending 'var') followed by an equals sign, some digits, and any non-digits, and leaves behind just the assignment of digits.
In the example, it gives:
var=1234
Note that the Perl regex is not foolproof - it will mangle this (dropping the closing brace).
: ${var=1234_number}
Dealing with all such possible variants is extremely fairly tricky:
echo $var=$other
OTOH, you might be looking to eliminate digits from a variable within a shell script, in which case:
var=$(echo $var | perl -e 's/\D//g')
You could also use 'sed' for the job:
var=$(echo $var | sed 's/[^0-9]//g')
No need to use anything but the shell for this
var=1234_abcd
var=${var%_*}
echo $var # => 1234
See 'Parameter Expansion' in the bash manual.