Swapping digits - sed

I would like to swap one digit with the previous one:
E.g. 123456 to 214365
How would I do this using sed/awk in bash environment?

echo 123456 | sed 's/\([0-9]\)\([0-9]\)/\2\1/g'

Following your comment on x13n's answer, which answers your question, it seems that you want to be more specific about which digits you swap.
I'd use awk to swap all pairs of digits in the second column:
bash-3.2$ gawk -V | sed -n 1p
GNU Awk 4.0.0
$ echo 254789123456,5306153059630141,639027041150453 | gawk -F',' '
BEGIN {
OFS=","
}
{
$2 = gensub(/(.)(.)/, "\\2\\1", "g", $2)
print
} '
254789123456,3560510395361014,639027041150453
You've asked a number of questions about sed and awk, I'd recommend getting hold of sed & awk, Second Edition.
I'd also recommend reading Jon Skeet's guide to writing the perfect question, which will help you get the answer you need when asking such questions.

Related

How to replace consecutive symbols using only one sed command?

I have a simple .csv file with lines that holds 't' values. Here is the example:
2ABC;t;t;t;tortuga;fault;t;t;bored
I want to replace them to '1' using sed.
If I make sed "s/;t;/;1;/g" I get the next result:
2ABC;1;t;1;tortuga;fault;1;t;bored
As you can see, consecutive ';t;' have been replaced through one. Yes, I can replace all ';t;' by sed -e "s/;t;/;1;/g" -e "s/;t;/;1;/g" but this is boring.
How can I make the replacement by one sed command?
If there is something to replace, branch to replace again.
sed ': again; /;t;/{ s//;1;/; b again }'
Overall, parsing cvs with sed is crude. Consider awk.
awk -F';' -v OFS=';' '{ for(i=1;i<=NF;++i) if ($i=="t") $i=1 } 1'
Lookarounds is helpful in such cases:
$ s='t;2ABC;t;t;t;tortuga;fault;t;t;bored;t'
$ echo "$s" | perl -lpe 's/(?<![^;])t(?![^;])/1/g'
1;2ABC;1;1;1;tortuga;fault;1;1;bored;1
echo '2ABC;t;t;t;tortuga;fault;t;t;bored' |
— gawk-specific solution
gawk -be '(ORS = RT)^!(NF = NF)' FS='^t$' OFS=1 RS=';'
— cross-awk-solution
{m,g,n}awk 'gsub(FS, OFS, $!(NF = NF))^_' FS=';t;' OFS=';1;' RS=
2ABC;1;1;1;tortuga;fault;1;1;bored

Search and remove floating numbers in scientific form using REGEX and SED

I was working on a solution using shell script (bash) to search floating numbers with exponent reaching more than 3 digits (ex. 11.1234567e+300) and remove e+ and the digits after e+.
I was using grep to search for it but im having trouble applying it in SED.
grep -E '([[:digit:]]+[.])[[:digit:]]+[eE][+-][[:digit:]]{3}' filename
Sample data would be something like below.
COL1,COL2,COL3,COL4
TEXT123,11.12345,12.12345e+300,13.123456
Any help would be greatly appreciated.
If I understand your question, and you want to search and replace the e+/-[[:digits:]]{3} with nothing, you could use:
sed -E 's/([[:digit:]]+[.][[:digit:]]+)[eE][+-][[:digit:]]{3}/\1/g' file
Example Use/Output
$ sed -E 's/([[:digit:]]+[.][[:digit:]]+)[eE][+-][[:digit:]]{3}/\1/g' file
COL1,COL2,COL3,COL4
TEXT123,11.12345,12.12345,13.123456
Let me know if this is what you intended and if not, I'm happy to help further.
#TimurShtatland brings up a good point. If your intent was to keep the [Ee][+-] as the plain reading of your question indicates, you could use:
$ sed -E ':a;s/([[:digit:]]+[.][[:digit:]]+[eE][+-])[[:digit:]]{3}/\1/;ta' file
(though I'm not sure what purpose leaving the [Ee][+-] would serve)
Use this Perl one-liner:
perl -pe 's{\b ( \d+ [.]? \d* ) [Ee] [-+]? \d{3,} }{$1}xg' in_file > out_file
For example:
echo 'TEXT123,11.12345,12.12345e+300,13.123456' | perl -pe 's{\b ( \d+ [.]? \d* ) [Ee] [-+]? \d{3,} }{$1}xg'
Output:
TEXT123,11.12345,12.12345,13.123456

Replace string with substring in lowercase using sed / awk / tr / perl?

I have a plaintext file containing multiple instances of the pattern $$DATABASE_*$$ and the asterisk could be any string of characters. I'd like to replace the entire instance with whatever is in the asterisk portion, but lowercase.
Here is a test file:
$$DATABASE_GIBSON$$
test me $$DATABASE_GIBSON$$ test me
$$DATABASE_GIBSON$$ test $$DATABASE_GIBSON$$ test
$$DATABASE_GIBSON$$ $$DATABASE_GIBSON$$$$DATABASE_GIBSON$$
Here is the desired output:
gibson
test me gibson test me
gibson test gibson test
gibson gibsongibson
How do I do this with sed/awk/tr/perl?
Here's the perl version I ended up using.
perl -p -i.bak -e 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' inputFile
Unfortunately there's no easy, foolproof way with awk, but here's one approach:
$ cat tst.awk
{
gsub(/[$][$]/,"\n")
head = ""
tail = $0
while ( match(tail, "\nDATABASE_[^\n]+\n") ) {
head = head substr(tail,1,RSTART-1)
trgt = substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART+RLENGTH)
gsub(/\n(DATABASE_)?/,"",trgt)
head = head tolower(trgt)
}
$0 = head tail
gsub("\n","$$")
print
}
$ cat file
The quick brown $$DATABASE_FOX$$ jumped over the lazy $$DATABASE_DOG$$s back.
The grey $$DATABASE_SQUIRREL$$ ate $$DATABASE_NUT$$s under a $$DATABASE_TREE$$.
Put a dollar $$DATABASE_DOL$LAR$$ in the $$ string.
$ awk -f tst.awk file
The quick brown fox jumped over the lazy dogs back.
The grey squirrel ate nuts under a tree.
Put a dollar dol$lar in the $$ string.
Note the trick of converting $$ to a newline char so we can negate that char in the match(RE), without that (i.e. if we used ".+" instead of "[^\n]+") then due to greedy RE matching if the same pattern appeared twice on one input line the matching string would extend from the start of the first pattern to the end of the second pattern.
This one works with complicated examples.
perl -ple 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' filename.txt
And for simpler examples :
echo '$$DATABASE_GIBSON$$' | sed 's#$$DATABASE_\(.*\)\$\$#\L\1#'
in sed, \L means lower case (\E to stop if needed)
Using awk alone:
> echo '$$DATABASE_AWESOME$$' | awk '{sub(/.*_/,"");sub(/\$\$$/,"");print tolower($0);}'
awesome
Note that I'm in FreeBSD, so this is not GNU awk.
But this can be done using bash alone:
[ghoti#pc ~]$ foo='$$DATABASE_AWESOME$$'
[ghoti#pc ~]$ foo=${foo##*_}
[ghoti#pc ~]$ foo=${foo%\$\$}
[ghoti#pc ~]$ foo=${foo,,}
[ghoti#pc ~]$ echo $foo
awesome
Of the above substitutions, all except the last one (${foo,,}) will work in standard Bourne shell. If you don't have bash, you can instead do use tr for this step:
$ echo $foo
AWESOME
$ foo=$(echo "$foo" | tr '[:upper:]' '[:lower:]')
$ echo $foo
awesome
$
UPDATE:
Per comments, it seems that what the OP really wants is to strip the substring out of any text in which it is included -- that is, our solutions need to account for the possibility of leading or trailing spaces, before or after the string he provided in his question.
> echo 'foo $$DATABASE_KITTENS$$ bar' | sed -nE '/\$\$[^$]+\$\$/{;s/.*\$\$DATABASE_//;s/\$\$.*//;p;}' | tr '[:upper:]' '[:lower:]'
kittens
And if you happen to have pcregrep on your path (from the devel/pcre FreeBSD port), you can use that instead, with lookaheads:
> echo 'foo $$DATABASE_KITTENS$$ bar' | pcregrep -o '(?!\$\$DATABASE_)[A-Z]+(?=\$\$)' | tr '[:upper:]' '[:lower:]'
kittens
(For Linux users reading this: this is equivalent to using grep -P.)
And in pure bash:
$ shopt -s extglob
$ foo='foo $$DATABASE_KITTENS$$ bar'
$ foo=${foo##*(?)\$\$DATABASE_}
$ foo=${foo%%\$\$*(?)}
$ foo=${foo,,}
$ echo $foo
kittens
Note that NONE of these three updated solutions will handle situations where multiple tagged database names exist in the same line of input. That's not stated as a requirement in the question either, but I'm just sayin'....
You can do this in a pretty foolproof way with the supercool command cut :)
echo '$$DATABASE_AWESOME$$' | cut -d'$' -f3 | cut -d_ -f2 | tr 'A-Z' 'a-z'
This might work for you (GNU sed):
sed 's/$\$/\n/g;s/\nDATABASE_\([^\n]*\)\n/\L\1/g;s/\n/$$/g' file
Here is the shortest (GNU) awk solution I could come up with that does everything requested by the OP:
awk -vRS='[$][$]DATABASE_([^$]+[$])+[$]' '{ORS=tolower(substr(RT,12,length(RT)-13))}1'
Even if the string indicated with the asterix (*) contained one or more single Dollar signs ($) and/or linebreaks this soultion should still work.
awk '{gsub(/\$\$DATABASE_GIBSON\$\$/,"gibson")}1' file
gibson
test me gibson test me
gibson test gibson test
gibson gibsongibson
echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}'
awk will take what ever input, in this case the first agurment, and use the tolower function and return the results.
For your bash script you can do something like this and use the variable DBLOWER
DBLOWER=$(echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}');

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

I have a line:
<random junk>TYPE=snp;<more random junk>
and I need to return everything between the end of TYPE= and the ; (in this case snp but it could be any of a number of text strings.
I tried various sed / awk solutions but I can't seem to get it working. I have the feeling this is a simple problem so, sorry about that.
This seems to work:
sed 's/.*TYPE=\(.*\);.*/\1/'
EDIT:
Ah, so there can be semicolons in the random junk. Try this:
sed 's/.*TYPE=\([^;]*\);.*/\1/'
requires GNU grep:
grep -Po '(?<=TYPE=)[^;]+'
meaning: preceded by "TYPE=", find some non-semicolon characters
One way using GNU sed:
sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt
Since you also tagged this awk:
$ text='<random junk>TYPE=snp;<more random junk>'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
$ text='foo=bar;baz=fnu;TYPE=snp;XAI=0;XAM=0'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
(Only using the variable to keep the lines from wrapping.)
Or, to parse this as set of variable=value pairs rather than just a string of text:
$ echo "$text" | awk -vRS=";" -F= '$1=="TYPE" {print $2}'
snp
You can also do this in pure bash, if you want:
$ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0"
$ t=${t#*TYPE=}
$ t=${t%%;*}
$ echo $t
snp

using set to extract a matched pattern using ' as pattern separator

I'm just not getting my head around the pattern matching in sed, what is worse, there are quotes as separators.
I do:
cat file | grep \'*.s\'
and get:
'PhaseRayA: ' 'sca/sca_out/sc_ray_a.s'
'PhaseRayO: ' 'sca/sca_out/sc_ray_o.s'
as output. An now I want to extract the:
sca/sca_out/sc_ray_a
sca/sca_out/sc_ray_o.s.s
So my pattern would be '*.s', with the quotes being part of the pattern but not part of the wanted result.
Any ideas on that? I guess sed will du the job but have no clue how...
Thanks for any help...
All the best, André
Your question is a little ambiguous, but this should do what I think you mean:
sed -e "s/'[^']*' *'//" -e "s/'//" file
You might want to consider awk:
$ cat test.txt
'PhaseRayA: ' 'sca/sca_out/sc_ray_a.s'
'PhaseRayO: ' 'sca/sca_out/sc_ray_o.s'
$ awk -F "'" '{print $4}' test.txt
sca/sca_out/sc_ray_a.s
sca/sca_out/sc_ray_o.s
I tend to use sed to edit files and awk to process them. awk is built for breaking up records.
Give this a try:
sed "s/.*'\([^']*\)'/\1/" inputfile
Similarly:
sed 's/.*\o47\([^\o47]*\)\o47/\1/' inputfile # that's the letter "o" between the backslash and the 4
or
sed 's/.*\x27\([^\x27]*\)\x27/\1/' inputfile