sed and special char - sed

im trying the following sed command, but i have no luck with special chars:
echo "x#asdf" | sed "s/\([^-]\)#/\1\n/g"
x
asdf
but if i use some special char in test.txt
echo "ä#asdf" | sed "s/\([^-]\)#/\1\n/g"
ä#asdf
why ?
this works:
echo "ü#asdf" | sed "s/ü/-/g"
-#asdf
but this doesnt:
echo "ü#asdf" | sed "s/[ü]/-/g"
ü#asdf

I'm not sure about this, because your sed commands work ok for me (gnu sed 4.1.5), but try invoking sed this way:
$ LANG=de_DE.UTF-8 sed ...
See this post for more information: Why does sed fail with International characters and how to fix?.
If this doesn't work, it may help to upgrade to gnu sed 4.2, if you can. The NEWS file says "multibyte processing fixed" for 4.2 but does not go into further detail.

Related

manipulation of text by sed command

I a file containing the genome ids following NZ_FLAT01000030.1_173 I need to manipulate those ids like this one: NZ_FLAT01000030.1
I tried some but didn't give me the exact thing.
sed 's/_/\t/' output : NZ FLAT01000030.1_173
sed -r 's/_//' output: NZFLAT01000030.1_173
sed -r 's/_//g' output: NZFLAT01000030.1173
How can I do that by using sed command?
Are you trying to remove the undesrscore and the digits following it?
echo 'NZ_FLAT01000030.1_173' | sed -E 's/_[0-9]+//g'
NZ_FLAT01000030.1
$ echo 'NZ_FLAT01000030.1_173' | sed 's/_[^_]*$//'
NZ_FLAT01000030.1

sed replacement \1 not working

Typing the following on the command line:
echo happy | sed -r s/\([p]\)\([p]\)/*\1*\2*/
I expect the following result:
ha*p*p*y
Instead, this is the result:
ha*1*2*y
I am using Ubuntu 12.04.3 LTS (GNU/Linux 3.2.0-53-generic x86_64)
The shell is -ksh
sed is 4.2.1 December 2010
The -r option allowed me to use \( and \). I thought it would also enable \1 and \2 but that doesn't seem to be the case. Is there another option I'm overlooking?
When typed on the command line, the shell is interpreting some of your backslash characters, so sed never sees them.
Instead, try one of these. Notice the single quotes which preserves the literal backslash characters.
echo happy | sed -r 's/([p])([p])/*\1*\2*/'
or
echo happy | sed 's/\([p]\)\([p]\)/*\1*\2*/'
you don't need -r just use
echo happy | sed 's/\(p\)\([p]\)/*\1*\2*/'

How to add new line using sed on MacOS?

I wanted to add a new line between </a> and <a><a>
</a><a><a>
</a>
<a><a>
I did this
sed 's#</a><a><a>#</a>\n<a><a>#g' filename but it didn't work.
Powered by mac in two Interpretation:
echo foo | sed 's/f/f\'$'\n/'
echo foo | gsed 's/f/f\n/g'
Some seds, notably Mac / BSD, don't interpret \n as a newline, you need to use an actual newline, preceded by a backslash:
$ echo foo | sed 's/f/f\n/'
fnoo
$ echo foo | sed 's/f/f\
> /'
f
oo
$
Or you can use:
echo foo | sed $'s/f/f\\\n/'
...or you just pound on it! worked for me on insert on mac / osx:
sed "2 i \\\n${TEXT}\n\n" -i ${FILE_PATH_NAME}
sed "2 i \\\nSomeText\n\n" -i textfile.txt

How do I get rid of this unicode character?

Any idea how to get rid of this irritating character U+0092 from a bunch of text files? I've tried all the below but it doesn't work. It's called U+0092+control from the character map
sed -i 's/\xc2\x92//' *
sed -i 's/\u0092//' *
sed -i 's///' *
Ah, I've found a way:
CHARS=$(python2 -c 'print u"\u0092".encode("utf8")')
sed 's/['"$CHARS"']//g'
But is there a direct sed method for this?
Try sed "s/\`//g" *. (I added the g so it will remove all the backticks it finds).
EDIT: It's not a backtick that OP wants to remove.
Following the solution in this question, this ought to work:
sed 's/\xc2\x92//g'
To demonstrate it does:
$ CHARS=$(python -c 'print u"asdf\u0092asdf".encode("utf8")')
$ echo $CHARS
asdf<funny glyph symbol>asdf
$ echo $CHARS | sed 's/\xc2\x92//g'
asdfasdf
Seeing as it's something you tried already, perhaps what is in your text file is not U+0092?
This might work for you (GNU sed):
echo "string containing funny character(s)" | sed -n 'l0'
This will display the string as sed sees it in octal, then use:
echo "string containing funny character(s)" | sed 's/\onnn//g'
Where nnn is the octal value, to delete it/them.

Change sed line separator to NUL to act as "xargs -0" prefilter?

I'm running a command line like this:
filename_listing_command | xargs -0 action_command
Where filename_listing_command uses null bytes to separate the files -- this is what xargs -0 wants to consume.
Problem is that I want to filter out some of the files. Something like this:
filename_listing_command | sed -e '/\.py/!d' | xargs ac
but I need to use xargs -0.
How do I change the line separator that sed wants from newline to NUL?
If you've hit this SO looking for an answer and are using GNU sed 4.2.2 or later, it now has a -z option which does what the OP is asking for.
Pipe it through grep:
filename_listing_command | grep -vzZ '\.py$' | filename_listing_command
The -z accepts null terminators on input and the -Z produces null terminators on output and the -v inverts the match (excludes).
Edit:
Try this if you prefer to use sed:
filename_listing_command | sed 's/[^\x0]*\.py\x0//g' | filename_listing_command
If none of your file names contain newline, then it may be easier to read a solution using GNU Parallel:
filename_listing_command | grep -v '\.py$' | parallel ac
Learn more about GNU Parallel http://www.youtube.com/watch?v=OpaiGYxkSuQ
With help of Tom Hale and that answer we have:
sed -nzE "s/^$PREFIX(.*)/\1/p"