Awk/Perl/Sed column substitution based on a text code

Awk/Perl/Sed column substitution based on a text code - perl

I have a text file with the following content
L,4m,06/03/2013
L,33GJm,06/03/2013,G
L,44Bm,06/03/2013,B
L,4q,08/03/2013
J,4m,04/03/2013
J,3GU,04/03/2013,G
J,3jm,04/03/2013
J,3GJ,04/03/2013,G
J,44Bm,06/03/2013,B
J,34Bq,08/03/2013,B
M,4v,12/03/2013
D,3GU,12/03/2013,G
D,4B,11/03/2013,B
D,4m,12/03/2013
D,3GJ,13/03/2013,G
D,3GU,13/03/2013,G
D,4B,14/03/2013,B
D,4B,14/03/2013,B
D,34Bm,14/03/2013,B
L,33BUq,11/03/2013,B
L,3BJUq,11/03/2013,B
L,44Bq,14/03/2013,B
L,44Bq,14/03/2013,B
L,3Bq,15/03/2013,B
L,3q,15/03/2013
J,34Bjq,11/03/2013,B
J,33GUm,12/03/2013,G
J,4q,13/03/2013
J,33GUq,13/03/2013,G
J,33GUq,13/03/2013,G
J,4q,13/03/2013
M,3BU,18/03/2013,B
M,4B,18/03/2013,B
M,4B,18/03/2013,B
M,3GJ,19/03/2013,G
M,3GJ,19/03/2013,G
D,4B,22/03/2013,B
D,3BU,22/03/2013,B
L,34Bv,18/03/2013,B
L,3jm,19/03/2013
L,4m,19/03/2013
L,33GJm,19/03/2013,G
L,33GUm,19/03/2013,G
J,33BUm,18/03/2013,B
J,4m,18/03/2013
J,4B,18/03/2013,B
J,33BUm,18/03/2013,B
J,4q,22/03/2013
J,4q,22/03/2013
A,3GJ,28/03/2013,G
M,4B,27/03/2013,B
D,4B,25/03/2013,B
L,44Bq,25/03/2013,B
L,34Bq,25/03/2013,B
L,34Bq,25/03/2013,B
L,33BUa,26/03/2013,B
L,33BUq,26/03/2013,B
L,33BUq,26/03/2013,B
L,34Bq,27/03/2013,B
L,34Bq,27/03/2013,B
L,4B,27/03/2013,B
L,34Bq,27/03/2013,B
L,4a,28/03/2013
I want to translate the second column based on the following coding system.
If $2 starts with a 1 or 2 - Change $2 to Excellent
If $2 contains 3BU or 3GU - Change $2 to Good
if $2 contains 3BJ or 3GJ - Change $2 to OK
If $2 starts with a 4 - Change $2 to Poor
If $2 starts with a 5 - Change $2 Terrible
I can find and change the 3BUs to Good easy enough using the following command
awk 'BEGIN{FS=",";OFS=","} {if ($2~ /3(B|G)U/)print $1,"Good",$3}' file | sponge file
Though I use all other non 3(B|G)U lines. I could use if else terminology though this seems inelegant. I have tried to use gensub to solve the problem
awk -F, '{gensub(/3(B|G)U/,Good,"",2)}1' file
But this prints the file contents without substitution. Any hints
Desired output
L,Poor,06/03/2013
L,Ok,06/03/2013,G
L,Poor,06/03/2013,B
L,Poor,08/03/2013
J,Poor,04/03/2013
J,Good,04/03/2013,G
A perl or sed one-liner would also be helpful as this code forms part of a bash shell script

If you want to stick with shell:
(
IFS=,
while read -ra f; do # pick more appropriate variable names
case ${f[1]} in
[12]*) f[1]=Excellent ;;
*3[BG]U*) f[1]=Good ;;
*3[BG]J*) f[1]=OK ;;
4*) f[1]=Poor ;;
5*) f[1]=Terrible ;;
esac
echo "${f[*]}"
done < file
) > tmp && mv tmp file
I ran that in a subshell to localize changes to $IFS

a sed solutions too
sed -e 's/\(^.,\)\(1\|2\)[^,]*/\1Excellent/g' -e 's/\(^.,\)3[BG]U[^,]*/\1Good/g' -e 's/\(^.,\)3[BG]J[^,]*/\1OK/g' -e 's/\(^.,\)4[^,]*/\1Poor/g' -e 's/\(^.,\)5[^,]*/\1Terrible/g' <filename>

$ awk '
BEGIN { FS=OFS="," }
$2 ~ /^(1|2)/ { $2 = "Excellent" }
$2 ~ /3(B|G)U/ { $2 = "Good" }
$2 ~ /3(B|G)J/ { $2 = "OK" }
$2 ~ /^4/ { $2 = "Poor" }
$2 ~ /^5/ { $2 = "Terrible" }
1
' foo.txt | head -n 10
L,Poor,06/03/2013
L,OK,06/03/2013,G
L,Poor,06/03/2013,B
L,Poor,08/03/2013
J,Poor,04/03/2013
J,Good,04/03/2013,G
J,3jm,04/03/2013
J,OK,04/03/2013,G
J,Poor,06/03/2013,B
J,34Bq,08/03/2013,B

perl -pe 's{,(\w+)}{ $_ = /^[12]/ ?"Excellent" :/3[BG]U/ ?"Good" :/3[BG]J/ ?"OK" :/^4/ ?"Poor" :/^5/ ?"Terrible" :$_ for $v=$1; ",$v" }e'
More readable version,
s{,(\w+)}{
for ($v = $1) {
$_ = /^[12]/ ?"Excellent"
:/3[BG]U/ ?"Good"
:/3[BG]J/ ?"OK"
:/^4/ ?"Poor"
:/^5/ ?"Terrible"
:$_;
}
",$v";
}e;

Related

using command line tools to extract and replace texts for translations

For an application, I have a language file in the way
first_identifier = English words
second_identifier = more English words
and need to translate it to further languages. In a first step I'm required to extract the right side of those texts resulting in a file like ...
English words
more English words
... How can I archive that? Using grep maybe?
Next I'd use a translation tool and receive something like
German words
more German words
that need to be inserted in the first file again (replace English words with Germans) now. I was thinking about using sed maybe, but I don't know how to use it for this purpose. Or, do you have other recommendations?

To do it as you describe would be:
$ cat tst.sh
#!/usr/bin/env bash
tmp=$(mktemp) || exit 1
trap 'rm -f "$tmp"; exit' 0
sed 's/[^ =]* = //' "${#:--}" > "$tmp" &&
tr 'a-z' 'A-Z' < "$tmp" |
awk '
BEGIN { OFS = " = " }
NR == FNR {
ger[NR] = $0
next
}
{
sub(/ = .*/,"")
print $0, ger[FNR]
}
' - "$tmp"
$ ./tst.sh file
English words = ENGLISH WORDS
more English words = MORE ENGLISH WORDS
but you don't need a temp file for that:
$ cat tst.sh
#!/usr/bin/env bash
sed 's/[^ =]* = //' "$#" |
tr 'a-z' 'A-Z' |
awk '
BEGIN { OFS = " = " }
NR == FNR {
ger[NR] = $0
next
}
{
sub(/ = .*/,"")
print $0, ger[FNR]
}
' - "$#"
$ ./tst.sh file
first_identifier = ENGLISH WORDS
second_identifier = MORE ENGLISH WORDS
and I think this might be what you really want anyway so your translation tool can translate 1 line at a time instead of the whole input at once which might produce different results:
$ cat tst.sh
#!/usr/bin/env bash
while IFS= read -r line; do
id="${line%% = *}"
eng="${line#* = }"
ger="$(tr 'a-z' 'A-Z' <<<"$eng")"
printf '%s = %s\n' "$id" "$ger"
done < "${#:--}"
$ ./tst.sh file
first_identifier = ENGLISH WORDS
second_identifier = MORE ENGLISH WORDS
Just replace tr 'a-z' 'A-Z' < "$tmp" or tr 'a-z' 'A-Z' <<<"$eng" with the call to whatever translation tool you have in mind.

Replace or append in configuration file with sed

I would like to replace or append in a configuration file like sshd_config:
Key1 value
#Key2 value
The idea of the command is:
$ cmd Key1 home file
$ cmd Key2 house file
$ cmd Key3 flat file
So the resulting file is:
Key1 home
Key2 house
Key3 flat
Any help is more than welcome.
I have taken this as an example but the one that comments and uncomments is not properly working.
Besides I have managed with other options but only for comments or uncommented lines and I want everything in one command if possible.
sed '/^Key\s/{h;s/\(\s\).*/\1newvalue/};${x;/^$/{s//Key newvalue/;H};x}' file
This one gets if the Key exists but, how do I append if it doesn't=
sed -i 's/^#\(Key\s\).*/\1newvalue/g' file
Thanks a lot. I have tried to understand sed but it is quite complex the different spaces and I don't know how to get with # or without.
Edit: Stdout output with -i inplace
$ sudo tee -a /usr/local/bin/conf-space-replace-or-append > /dev/null << 'EOL'
#!/bin/bash
awk -i inplace -v key="$1" -v val="$2" '
($1 == key) || ($1 == "#"key) { $0 = key OFS val; done=1 }
{ print }
END { if (!done) print key, val }
' "$3" > /dev/null
EOL
$ sudo chmod +x /usr/local/bin/conf-space-replace-or-append
$ sudo conf-space-replace-or-append Port 22 /etc/ssh/sshd_config

sed is for doing s/old/new on an individual line, that is all. For anything else you should be using awk for clarity, simplicity, portability, efficiency, etc., etc.
Just put the following in a file named cmd and execute it as you show in your question.
awk -v key="$1" -v val="$2" '
($1 == key) || ($1 == "#"key) { next }
{ print }
END { print key, val }
' "$3"
The above will delete the existing key+val if present and always appends the new pair to the end of the file. If you'd rather keep an existing key in it's original position in the file and only add new key+val pairs to the end then that's just a tweak:
awk -v key="$1" -v val="$2" '
($1 == key) || ($1 == "#"key) { $0 = key OFS val; done=1 }
{ print }
END { if (!done) print key, val }
' "$3"

Merge two lines into one within a configuration file

I have several AIX systems with a configuration file, let's call it /etc/bar/config. The file may or may not have a line declaring values for foo. An example would be:
foo = A_1,GROUP_1,USER_1,USER_2,USER_3
The foo line may or may not be the same on all systems. Different systems may have different values and different a different number of values. My task is to add "bare minimum" values in the config file on all systems. The bare minimum line will look like this.
foo = A_1,USER_1,SYS_1,SYS_2
If the line does not exist, I must create it. If the line does exist, I must merge the two lines. Using my examples, the result would be this. The order of the values does not matter.
foo = A_1,GROUP_1,USER_1,USER_3,USER_2,SYS_1,SYS_2
Obviously I want a script to do my work. I have the standard sh, ksh, awk, sed, grep, perl, cut, etc. Since this is AIX, I do not have access to the GNU versions of these utilities.
Originally, I had a script with these commands to replace the entire foo line.
cp /etc/bar/config /etc/bar/config.$$
sed "s/foo = .*/foo = A_1,USER_1,SYS_1,SYS_2/" /etc/bar/config.$$ > /etc/bar/config
But this simply replaces the line. It does take into consideration any pre-existing configuration, including a line that's missing. And I'm doing other configuration modifications in the script, such as adding completely unique lines to other files and restarting a process, so I'd perfer this be some type of shell-based code snippet I can add to my change script. I am open to other options, especially if the solution is simpler.

Some dirty bash/sed:
#!/usr/bin/bash
input_file="some_filename"
v=$(grep -n '^foo *=' "$input_file")
lineno=$(cut -d: -f1 <<< "${v}0:")
base="A_1,USER_1,SYS_1,SYS_2,"
if [[ "$lineno" == 0 ]]; then
echo "foo = A_1,USER_1,SYS_1,SYS_2" >> "$input_file"
else
all=$(sed -n ${lineno}'s/^foo *= */'"$base"'/p' "$input_file" | \
tr ',' '\n' | sort | uniq | tr '\n' ',' | \
sed -e 's/^/foo = /' -e 's/, *$//' -e 's/ */ /g' <<< "$all")
sed -i "${lineno}"'s/.*/'"$all"'/' "$input_file"
fi

Untested bash, etc.
config=/etc/bar/config
default=A_1,USER_1,SYS_1,SYS_2
pattern='^foo[[:blank:]]*=[[:blank:]]*' # shared with grep and sed
if current=$( grep "$pattern" "$config" | sed "s/$pattern//" )
then
new=$( echo "$current,$default" | tr ',' '\n' | sort | uniq | paste -sd, )
sed "s/$pattern.*/foo = $new/" "$config" > "$config.$$.tmp" &&
mv "$config.$$.tmp" "$config"
else
echo "foo = $default" >> "$config"
fi
A vanilla perl solution:
perl -i -lpe '
BEGIN {%foo = map {$_ => 1} qw/A_1 USER_1 SYS_1 SYS_2/}
if (s/^foo\s*=\s*//) {
$found=1;
$foo{$_}=1 for split /,/;
$_ = "foo = " . join(",", keys %foo);
}
END {print "foo = " . join(",", keys %foo) unless $found}
' /etc/bar/config

This Perl code will do as you ask. It expects the path to the file to be modified as a parameter on the command line.
Note that it reads the entire input file into the array #config and then overwrites the same file with the modified data.
It works by building a hash %values from a combination of the items already present in the foo = line and the list of defaults items in #defaults. The combination is sorted in alphabetical order and joined eith a comma
use strict;
use warnings;
my #defaults = qw/ A_1 USER_1 SYS_1 SYS_2 /;
my ($file) = #ARGV;
my #config = <>;
open my $out_fh, '>', $file or die $!;
select $out_fh;
for ( #config ) {
if ( my ($pfx, $vals) = /^(foo \s* = \s* ) (.+) /x ) {
my %values;
++$values{$_} for $vals =~ /[^,\s]+/g;
++$values{$_} for #defaults;
print $pfx, join(',', sort keys %values), "\n";
}
else {
print;
}
}
close $out_fh;
output
foo = A_1,GROUP_1,SYS_1,SYS_2,USER_1,USER_2,USER_3

Since you didn't provide sample input and expected output I couldn't test this but this is the right approach:
awk '
/foo = / { old = ","$3; next }
{ print }
END {
split("A_1,USER_1,SYS_1,SYS_2"old,all,/,/)
for (i in all)
if (!seen[all[i]]++)
new = (new ? new "," : "") all[i]
print "foo =", new
}
' /etc/bar/config > tmp && mv tmp /etc/bar/config

hash using sha1sum using awk

I have a "pipe-separated" file that has about 20 columns. I want to just hash the first column which is a number like account number using sha1sum and return the rest of the columns as is.
Whats the best way I can do this using awk or sed?
Accountid|Time|Category|.....
8238438|20140101021301|sub1|...
3432323|20140101041903|sub2|...
9342342|20140101050303|sub1|...
Above is an example of the text file showing just 3 columns. Only the first column has the hashfunction implemented on it. Result should like:
Accountid|Time|Category|.....
104a1f34b26ae47a67273fe06456be1fe97f75ba|20140101021301|sub1|...
c84270c403adcd8aba9484807a9f1c2164d7f57b|20140101041903|sub2|...
4fa518d8b005e4f9a085d48a4b5f2c558c8402eb|20140101050303|sub1|...

What the Best Way™ is is up for debate. One way to do it with awk is
awk -F'|' 'BEGIN { OFS=FS } NR == 1 { print } NR != 1 { gsub(/'\''/, "'\'\\\\\'\''", $1); command = ("echo '\''" $1 "'\'' | sha1sum -b | cut -d\\ -f 1"); command | getline hash; close(command); $1 = hash; print }' filename
That is
BEGIN {
OFS = FS # set output field separator to field separator; we will use
# it because we meddle with the fields.
}
NR == 1 { # first line: just print headers.
print
}
NR != 1 { # from there on do the hash/replace
# this constructs a shell command (and runs it) that echoes the field
# (singly-quoted to prevent surprises) through sha1sum -b, cuts out the hash
# and gets it back into awk with getline (into the variable hash)
# the gsub bit is to prevent the shell from barfing if there's an apostrophe
# in one of the fields.
gsub(/'/, "'\\''", $1);
command = ("echo '" $1 "' | sha1sum -b | cut -d\\ -f 1")
command | getline hash
close(command)
# then replace the field and print the result.
$1 = hash
print
}
You will notice the differences between the shell command at the top and the awk code at the bottom; that is all due to shell expansion. Because I put the awk code in single quotes in the shell commands (double quotes are not up for debate in that context, what with $1 and all), and because the code contains single quotes, making it work inline leads to a nightmare of backslashes. Because of this, my advice is to put the awk code into a file, say foo.awk, and run
awk -F'|' -f foo.awk filename
instead.

Here's an awk executable script that does what you want:
#!/usr/bin/awk -f
BEGIN { FS=OFS="|" }
FNR != 1 { $1 = encodeData( $1 ) }
47
function encodeData( fld ) {
cmd = sprintf( "echo %s | sha1sum", fld )
cmd | getline output
close( cmd )
split( output, arr, " " )
return arr[1]
}
Here's the flow break down:
Set the input and output field separators to |
When the row isn't the first (header) row, re-assign $1 to an encoded value
Print the entire row when 47 is true (always)
Here's the encodeData function break down:
Create a cmd to feed data to sha1sum
Feed it to getline
Close the cmd
On my system, there's extra info after sha1sum, so I discard it by spliting the output
Return the first field of the sha1sum output.
With your data, I get the following:
Accountid|Time|Category|.....
104a1f34b26ae47a67273fe06456be1fe97f75ba|20140101021301|sub1|...
c84270c403adcd8aba9484807a9f1c2164d7f57b|20140101041903|sub2|...
4fa518d8b005e4f9a085d48a4b5f2c558c8402eb|20140101050303|sub1|...
running by calling awk.script data (or ./awk.script data if you bash)
EDIT by EdMorton:
sorry for the edit, but your script above is the right approach but needs some tweaks to make it more robust and this is much easier than trying to describe them in a comment:
$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==1 { for (i=1; i<=NF; i++) f[$i] = i; next }
{ $(f["Accountid"]) = encodeData($(f["Accountid"])); print }
function encodeData( fld, cmd, output ) {
cmd = "echo \047" fld "\047 | sha1sum"
if ( (cmd | getline output) > 0 ) {
sub(/ .*/,"",output)
}
else {
print "failed to hash " fld | "cat>&2"
output = fld
}
close( cmd )
return output
}
$ awk -f tst.awk file
104a1f34b26ae47a67273fe06456be1fe97f75ba|20140101021301|sub1|...
c84270c403adcd8aba9484807a9f1c2164d7f57b|20140101041903|sub2|...
4fa518d8b005e4f9a085d48a4b5f2c558c8402eb|20140101050303|sub1|...
The f[] array decouples your script from hard-coding the number of the field that needs to be hashed, the additional args for your function make them local and so always null/zero on each invocation, the if on getline means you won't return the previous success value if it fails (see http://awk.info/?tip/getline) and the rest is maybe more style/preference with a bit of a performance improvement.

Make some replacements on a bunch of files depending the number of columns per line

I'm having a problem dealing with some files. I need to perform a column count for every line in a file and depending the number of columns i need to add severals ',' in in the end of each line. All lines should have 36 columns separated by ','
This line solves my problem, but how do I run it in a folder with several files in a automated way?
awk ' BEGIN { FS = "," } ;
{if (NF == 32) { print $0",,,," } else if (NF==31) { print $0",,,,," }
}' <SOURCE_FILE> > <DESTINATION_FILE>
Thank you for all your support
R&P

The answer depends on your OS, which you haven't told us. On UNIX and assuming you want to modify each original file, it'd be:
for file in *
do
awk '...' "$file" > tmp$$ && mv tmp$$ "$file"
done
Also, in general to get all records in a file to have the same number of fields you can do this without needing to specify what that number of fields is (though you can if appropriate):
$ cat tst.awk
BEGIN { FS=OFS=","; ARGV[ARGC++] = ARGV[ARGC-1] }
NR==FNR { nf = (NF > nf ? NF : nf); next }
{
tail = sprintf("%*s",nf-NF,"")
gsub(/ /,OFS,tail)
print $0 tail
}
$
$ cat file
a,b,c
a,b
a,b,c,d,e
$
$ awk -f tst.awk file
a,b,c,,
a,b,,,
a,b,c,d,e
$
$ awk -v nf=10 -f tst.awk file
a,b,c,,,,,,,
a,b,,,,,,,,
a,b,c,d,e,,,,,

It's a short one-liner with Perl:
perl -i.bak -F, -alpe '$_ .= "," x (36-#F)' *

if this is only a single folder without subfolders, use:
for oldfile in /path/to/files/*
do
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
done
if you also want to include subdirectories recursively, it's probably easiest to put the awk+redirection into a small shell-script, like this:
#!/bin/bash
oldfile=$1
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
and then run this script (let's calls it runawk.sh) via find:
find /path/to/files/ -type f -not -name "*.new" -exec runawk.sh \{\} \;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse