SED with "unmatched /" and odd behavior - sed

I have an input file that looks like this, and i was trying a couple transformations with sed :
Original File :
DATA: gt_alv LIKF zfica_paym_cancel_alv OCCURS 0 WITH HFADFR LINF.
DATA: gt_out LIKF zfica_paym_cancel_out OCCURS 0 WITH HFADFR LINF.
I want to clean some characters based on their hexadecimal values, but i m getting an odd behavior from the box. ( Ran in MobaXterm )
When i run :
sed -e $'s/\x20/\x040/g' testsed.txt
the output is :
DATA:0gt_alv0LIKE0zfica_paym_cancel_alv0OCCURS000WITH0HEADER0LINE.
DATA:0gt_out0LIKE0zfica_paym_cancel_out0OCCURS000WITH0HEADER0LINE.
and it works as intended. But when i try to hit the null value x00 i get the following error :
sed -e $'s/\x00/\x040/g' testsed.txt
produces :
sed: unmatched '/'
I have tried various combination. Giving the ascii codes in octal doesnt work at all. But hex numbers work just find expect from the /x00 null. Can someone explain to me why this happens and if possible how to resolve it? Is it a problem with Moba? Thank you.

Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard.
From : http://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html
You should try to escape the \ in your pattern :
sed -e $'s/\\x00/\x040/g' null.txt

You simply cannot have a NUL byte in a shell string because it signals end of string to the underlying C library. Use a tool which handles internal NUL bytes in strings correctly.
perl -pe 's/\x00/\x40/g' testsed.txt

Related

Decode sed expression

I would like to understand the sed part of this code:
/usr/local/bin/pcsensor -l60 -n | sed -e "s/^.*\$/PUTVAL downloads\/exec-environmental\/temperature-cpu interval=30 N:\0/"
(the input) pcsensor produces:
2016/09/19 22:41:31 Temperature 90.50F 32.50C
The code produces (output):
PUTVAL downloads/exec-environmental/temperature-cpu interval=30 N:32.50
I am hoping that understanding the sed expression will help me to knock the last digit off (so the temp is only 1 decimal place).
Updated: My booboo (it was late):
the -n in the first part of the command outputs this:
32.50
Which works fine in an echo/printf
printf "32.50 %s\n"| sed -e "s/^.*\$/PUTVAL downloads\/exec-environmental\/temperature-cpu interval=30 N:\0/"
About
sed -e "s/^.*\$/PUTVAL downloads\/exec-environmental\/temperature-cpu interval=30 N:\0/"
This is 1 sed command, namely the s/.../.../ for "substitute". In simple terms, it does a single "search and replace" for every line that it gets to work on.
The "search" part is ^.*\$, the "replacement" part is PUTVAL downloads\/exec-environmental\/temperature-cpu interval=30 N:\0/.
^.*\$ is a simple Regular expression that here stands for "everything" or "the whole line". So, the s command will replace the whole line with
PUTVAL downloads\/exec-environmental\/temperature-cpu interval=30 N:\0/
As Benjamin W. pointed out the use of \0 is "weird". It apparently was meant as a so-called reference, so that the part we searched for is appended after the text "PUTVAL(...)val=30 N:".
I have several issues with the way this is presented, though.
\0 is not in the manpage of my Debian GNU Sed 4.2.2.
Quoting the sed command with " is not needed here and makes things unnecessarily complicated and error-prone. Single quotes should be used instead.
A \0 anywhere in a Shell and especially in Sed could very well stand for a null character which here raises even more red flags due to the " quoting.
Using sed just to prepend a text is "useless use of Sed".
Since you asked about sed, here is how I would write it:
sed -e 's/^.*$/PUTVAL downloads\/exec-environmental\/temperature-cpu interval=30 N:&/'
& stands for "what the search part found". In your case, the whole line.
In order to cut off the last decimal, there are many ways to achieve this. A rather simple approach assumes that the input always has 2 decimals. Then we could prepend a command that replaces the last character (.$) with "nothing" (//):
sed -e 's/.$//;s/^[0-9][0-9]*\.[0-9]/PUTVAL downloads\/exec-environmental\/temperature-cpu interval=30 N:&/'
However, as I said, sed is overkill here. You could just use for instance printf:
text='PUTVAL downloads/exec-environmental/temperature-cpu interval=30 N:'
printf "%s%3.1f\n" "$text" $(/usr/local/bin/pcsensor -l60 -n)

sed command over multiple lines not working

I am using sed to replace 14 different abbreviations like CA_23456, CB_scaffold34532,... with 'proper' names in a file and it works putting it all on one line.
acc=$1
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/;s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/;s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/;s/CS_[A-Z]*[a-z]*[0-9]*/Cupressus_sempervirens/;s/CT_[A-Z]*[a-z]*[0-9]*/Cupressus_torulosa/;s/JD_[A-Z]*[a-z]*[0-9]*/Juniperus_drupacea/;s/JF_[A-Z]*[a-z]*[0-9]*/Juniperus_flaccida/;s/JI_[A-Z]*[a-z]*[0-9]*/Juniperus_indica/;s/JP_[A-Z]*[a-z]*[0-9]*/Juniperus_phoenicea/;s/JX_[A-Z]*[a-z]*[0-9]*/Juniperus_procera/;s/JS_[A-Z]*[a-z]*[0-9]*/Juniperus_scopulorum/;s/MD_[A-Z]*[a-z]*[0-9]*/Microbiota_decussata/;s/XN_[A-Z]*[a-z]*[0-9]*/Xanthocyparis_nootkatensis/;s/XV_[A-Z]*[a-z]*[0-9]*/Xanthocyparis_vietnamensis/' ${acc}.nex > ${acc}_replaced.nex
To make it more readable I'd like to have the command split over multiple lines using '\' (not all the replacements are shown for brevity)
acc=$1
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/;\
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/;\
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/'\
${acc}.nex > ${acc}_replaced.nex
However, I get an error message: sed: -e expression #1, char 168: unterminated address regex. I have looked at the answers to similar problems on various webforums and tried various things (using 's/.../.../' on every line, leaving ';' out,....) but I can't get it to work. What am I doing wrong?
Drop the \ that escapes the newlines. (They are not actually doing it!, they are interpreted as wrong syntax by sed). However I would suggest to put it into a file and run it like this:
sed -f script.sed input
where script.sed looks like this:
s/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/
Remove the backslashes from the sed code.
Inside singly-quoted shell strings, backslashes are not needed to escape newlines and are not removed because they are not parsed as escape characters. This has the effect that sed sees them as part of its code, and it then expects to find an address regex with a different delimiter than / before the command ends at the next newline (similar to \,/home/, !d). This address regex does not appear (nor an associated command), and so sed complains about invalid code.
Apart from that: The semicolons in the sed code are no longer necessary when you terminate commands with newlines, and anything involving shell variables should be quoted to avoid splitting in case of whitespace.
In sum:
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/' \
"${acc}.nex" > "${acc}_replaced.nex"

Using sed to convert singular/plural words into uppercase

Using one sed command I'm trying to convert all occurrences of test and tests found in a .txt file into all caps. I also want to print only the converted lines, so I'm using -n. I've been playing around for it for over an hour. The problem is that I'm able to convert one or the other (either test or tests) but not both.
Any help would be so greatly appreciated. Thank you!
Use this
sed -e 's/tests/TESTS/g; s/test/TEST/g; T; p;' input.txt
The semicolons let you execute multiple commands.
This might work for you (GNU sed):
sed 's/\<tests\?\>/\U&/gp;d' file
This will uppercase words (\<....\>) that begin test with an optional s (s\?).
Sorry for the late response, but here is hopefully an understandable one with basic regex (no extended regex):
sed 's:\<test\(s*\)\>:TEST\1:g' < inputFile.txt > outputFile.txt; cat outputFile.txt | grep -n TEST
Explanation:
: delimiter (instead of usual /)
\<test\> matches test. The character before the first t can be any character except a letter, number or underscore. Same applies for the character after the last t.
\(\) remember what is inside the parenthesis.
s* match zero or more s's.
\1 used to insert first remembered match (i.e. any number of s's matched).
The rest is hopefully clear. Otherwise leave a comment.

sed rare-delimiter (other than & | / ?...)

I am using the Unix sed command on a string that can contain all types of characters (&, |, !, /, ?, etc).
Is there a complex delimiter (with two characters?) that can fix the error:
sed: -e expression #1, char 22: unknown option to `s'
The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.
At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:
$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'
In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).
Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.
Another way to do this is to use Shell Parameter Substitution.
${parameter/pattern/replace} # substitute replace for pattern once
or
${parameter//pattern/replace} # substitute replace for pattern everywhere
Here is a quite complex example that is difficult with sed:
$ parameter="Common sed delimiters: [sed-del]"
$ pattern="\[sed-del\]"
$ replace="[/_%:\\#]"
$ echo "${parameter//$pattern/replace}"
result is:
Common sed delimiters: [/_%:\#]
However: This only work with bash parameters and not files where sed excel.
There is no such option for multi-character expression delimiters in sed, but I doubt
you need that. The delimiter character should not occur in the pattern, but if it appears in the string being processed, it's not a problem. And unless you're doing something extremely weird, there will always be some character that doesn't appear in your search pattern that can serve as a delimiter.
You need the nested delimiter facility that Perl offers. That allows to use stuff like matching, substituting, and transliterating without worrying about the delimiter being included in your contents. Since perl is a superset of sed, you should be able to use it for whatever you’re used sed for.
Consider this:
$ perl -nle 'print if /something/' inputs
Now if your something contains a slash, you have a problem. The way to fix this is to change delimiter, preferably to a bracketing one. So for example, you could having anything you like in the $WHATEVER shell variable (provided the backets are balanced), which gets interpolated by the shell before Perl is even called here:
$ perl -nle "print if m($WHATEVER)" /usr/share/dict/words
That works even if you have correctly nested parens in $WHATEVER. The four bracketing pairs which correctly nest like this in Perl are < >, ( ), [ ], and { }. They allow arbitrary contents that include the delimiter if that delimiter is balanced.
If it is not balanced, then do not use a delimiter at all. If the pattern is in a Perl variable, you don’t need to use the match operator provided you use the =~ operator, so:
$whatever = "some arbitrary string ( / # [ etc";
if ($line =~ $whatever) { ... }
With the help of Jim Lewis, I finally did a test before using sed :
if [ `echo $1 | grep '|'` ]; then
grep ".*$1.*:" $DB_FILE | sed "s#^.*$1*.*\(:\)## "
else
grep ".*$1.*:" $DB_FILE | sed "s|^.*$1*.*\(:\)|| "
fi
Thanks for help
Wow. I totally did not know that you could use any character as a delimiter.
At least half the time I use the sed and BREs its on paths, code snippets, junk characters, things like that. I end up with a bunch of horribly unreadable escapes which I'm not even sure won't die on some combination I didn't think of. But if you can exclude just some character class (or just one character even)
echo '#01Y $#1+!' | sed -e 'sa$#1+ashita' -e 'su#01YuHolyug'
> > > Holy shit!
That's so much easier.
Escaping the delimiter inline for BASH to parse is cumbersome and difficult to read (although the delimiter does need escaping for sed's benefit when it's first used, per-expression).
To pull together thkala's answer and user4401178's comment:
DELIM=$(echo -en "\001");
sed -n "\\${DELIM}${STARTING_SEARCH_TERM}${DELIM},\\${DELIM}${ENDING_SEARCH_TERM}${DELIM}p" "${FILE}"
This example returns all results starting from ${STARTING_SEARCH_TERM} until ${ENDING_SEARCH_TERM} that don't match the SOH (start of heading) character with ASCII code 001.
There's no universal separator, but it can be escaped by a backslash for sed to not treat it like separator (at least unless you choose a backslash character as separator).
Depending on the actual application, it might be handy to just escape those characters in both pattern and replacement.
If you're in a bash environment, you can use bash substitution to escape sed separator, like this:
safe_replace () {
sed "s/${1//\//\\\/}/${2//\//\\\/}/g"
}
It's pretty self-explanatory, except for the bizarre part.
Explanation to that:
${1//\//\\\/}
${ - bash expansion starts
1 - first positional argument - the pattern
// - bash pattern substitution pattern separator "replace-all" variant
\/ - literal slash
/ - bash pattern substitution replacement separator
\\ - literal backslash
\/ - literal slash
} - bash expansion ends
example use:
$ input="ka/pus/ta"
$ pattern="/pus/"
$ replacement="/re/"
$ safe_replace "$pattern" "$replacement" <<< "$input"
ka/re/ta

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?
This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'
#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters
In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.
The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file
Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).