How to inject a line feed to replace a delimiter - sed

/usr/bin/sed 's/,/\\n/g' comma-delimited.txt > newline-separated.txt
This doesn't work for me. I just get the ',' removed but the tokens are now just not delimited.

You must have an older version of sed, so you need to put a literal LF char in your substitution, i.e.
/usr/bin/sed 's/,/
/g' comma-delimited.txt > newline-separated.txt
You may even need to escape the LF, so make sure there are no white space chars after the last char '\'
/usr/bin/sed 's/,/\
/g' comma-delimited.txt > newline-separated.txt

This might work for you:
echo a,b,c,d,e | sed 'G;:a;s/,\(.*\(.\)\)/\2\1/;ta;s/.$//'
a
b
c
d
e
Explanation:
Appends a newline to the pattern space. G
Substitute ,'s with the last character in the pattern space i.e. the \n :a;s/,\(.*\(.\)\)/\2\1/;ta
Remove the newline. s/.$//

I tried the following, looks clumsy but does the work. Easy to understand. I use tr to do the replacement of the placeholder §. Only caveat is the placeholder, must be something NOT in the string(s).
ps -fu $USER | grep java | grep DML| sed -e "s/ -/§ -/g" | tr "§" "\n"
will give you an indented output of the commandline. DML is just some servername.

on AIX7 answer #3 worked well:
I need to insert a newline at the beginning of a paragraph so I can do grep -p to filter for 'mksysb' in the resulting 'stanza'
lsnim -l | /usr/bin/sed 's/^[a-zA-Z/\^J&/'
(actually the initial line had an escaped newline:
lsnim -l | /usr/bin/sed 's/^[a-zA-Z/\
&/')
recalling the command showed the ^J syntax ...

Related

Use sed to replace every character by itself followed by $n times a char?

I'm trying to run the command below to replace every char in DECEMBER by itself followed by $n question marks. I tried both escaping {$n} like so {$n} and leaving it as is. Yet my output just keeps being D?{$n}E?{$n}... Is it just not possible to do this with a sed?
How should i got about this.
echo 'DECEMBER' > a.txt
sed -i "s%\(.\)%\1\(?\){$n}%g" a.txt
cat a.txt
This might work for you (GNU sed):
n=5
sed -E ':a;s/[^\n]/&\n/g;x;s/^/x/;/x{'"$n"'}/{z;x;y/\n/?/;b};x;ba' file
Append a newline to each non-newline character in a line $n times then replace all newlines by the intended character ?.
N.B. The newline is chosen as the initial substitute character as it is not possible for it to be within a line (sed uses newlines to separate lines) and if the final substitution character already exists within the current line, the substitutions are correct.
Range (also, interval or limiting quantifiers), like {3} / {3,} / {3,6}, are part of regex, and not replacement patterns.
You can use
sed -i "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" a.txt
See the online demo:
#!/bin/bash
sed "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" <<< "DECEMBER"
# => D???????E???????C???????E???????M???????B???????E???????R???????
Here, . matches any char, and & in the replacement pattern puts it back and $(for i in {1..7}; do echo -n '?'; done) adds seven question marks right after it.
This one-liner should do the trick:
sed 's/./&'$(printf '%*s' "$n" '' | tr ' ' '?')'/g' a.txt
with the assumption that $n expands to a positive integer and the command is executed in a POSIX shell.
Efficiently using any awk in any shell on every Unix box after setting n=2:
$ awk -v n="$n" '
BEGIN {
new = sprintf("%*s",n,"")
gsub(/./,"?",new)
}
{
gsub(/./,"&"new)
print
}
' a.txt
D??E??C??E??M??B??E??R??
To make the changes "inplace" use GNU awk with -i inplace just like GNU sed has -i.
Caveat - if the character you want to use in the replacement text is & then you'd need to use gsub(/./,"\\\\\\&",new) in the BEGIN section to make it is treated as literal instead of a backreference metachar. You'd have that issue and more (e.g. handling \1 or /) with any sed solution and any solution that uses double quotes around the script would have more issues with handling $s and the solutions that have a shell script expanding unquoted would have even more issues with globbing chars.

How to replace \n by space using sed command?

I have to collect a select query data to a CSV file. I want to use a sed command to replace \n from the data by a space.
I'm using this:
query | sed "s/\n/ /g" > file.csv .......
But it is not working. Only \ is getting removed, while it should also remove n and add a space. Please suggest something.
You want to replace newline with space, not necessarily using sed.
Use tr:
tr '\n' ' '
\n is special to sed: it stands for the newline character. To replace a literal \n, you have to escape the backslash:
sed 's/\\n/ /g'
Notice that I've used single quotes. If you use double quotes, the backslash has a special meaning if followed by any of $, `, ", \, or newline, i.e., "\n" is still \n, but "\\n" would become \n.
Since we want sed to see \\n, we'd have to use one of these:
sed "s/\\\n/ /g" – the first \\ becomes \, and \n doesn't change, resulting in \\n
sed "s/\\\\n/ /g" – both pairs of \\ are reduced to \ and sed gets \\n as well
but single quotes are much simpler:
$ sed 's/\\n/ /g' <<< 'my\nname\nis\nrohinee'
my name is rohinee
From comments on the question, it became apparent that sed had nothing to do with removing the backslashes; the OP tried
echo my\nname\nis | sed 's/\n/ /g'
but the backslashes are removed by the shell:
$ echo my\nname\nis
mynnamenis
so even if the correct \\n were used, sed wouldn't find any matches. The correct way is
$ echo 'my\nname\nis' | sed 's/\\n/ /g'
my name is

Sed replacing last comma fails

I'm working on a sed script that takes a bunch of lines and turns them into an argument list for matlab (single quoted, comma separated).
It's working well so far:
[script to generate list] | sed -n "s#\(.*$\)#'\1',#p#" | tr '\n' ' '
But this leaves me with a trailing comma.
By testing, I can remove it with
[list of comma separated values] | sed -n 's#,$##p#'
but, when putting it all together:
[script to generate list] | sed -n "s#\(.*$\)#'\1',#;s#,$##p#" | tr '\n' ' '
Outputs nothing.
I feel like it has something to do with not having a p in the first line of the sed script, but I don't want it to print those values, I want them sent to the next line in the script (isn't that the default?)
Edit:
[script to generate list] Outputs a list of directories, for example:
./work/matlab_stun_gun/tex/fullTest.pdf
./Downloads/Howfar(tetra2) fixed.pdf
./work/savdocs/win_tests/tex/texReport.pdf
./Downloads/AcademicAudit.pdf
./work/matlab_stun_gun/report.pdf
./Downloads/PMB_4DVMC.pdf
./work/savdocs/win_tests/tex/mouseHeatMap.pdf
./Downloads/Geometry.pdf
./work/savdocs/win_tests/tex/mouseHeatMap.pdf
./work/matlab_stun_gun/tex/fullTest.pdf
The list generator is just find . -name "*.pdf" | pickl -n 10, adjusted for file type/ number etc. This is going to become a general purpose script.
Expected output would be :
'./work/savdocs/win_tests/tex/mClickss.pdf', './Downloads/Howfar(tetra2) fixed.pdf', './Downloads/MedPhys_defDOSXYZ.pdf', './Downloads/MedPhys_defDOSXYZ.pdf', './report.pdf', './work/savdocs/win_tests/tex/cSwitchs.pdf', './tex/zoomIn.pdf', './tex/fullTest.pdf', './temp/tex/zoomIn.pdf', './tex/zoomIn.pdf'
(Note the lack of trailing comma)
You are experiencing a multi-faceted problem here, in the sense that each of your attempts has something wrong with it.
Starting with [list of comma separated values] | sed -n 's#,$##p#', keep in mind that tr effectively makes your separator ', ' (comma-space) instead of just ',' comma. This means that you will output nothing from the second sed expression. You can fix that by matching with sed -n 's#, $##p#'. If you insist on using the -n flag, that is the correct solution. In full:
[script to generate list] | \
sed -n "s#\(.*$\)#'\1',#p#" | \
tr '\n' ' ' | \
sed -n 's#, $##p#'
The problem with your combination attempt, [script to generate list] | sed -n "s#\(.*$\)#'\1',#;s#,$##p#" | tr '\n' ' ', is that you need to apply tr before you remove the trailing commas. Even if this were to print anything, you would be adding a comma, stripping it off immediately on each line, and then replacing newlines with spaces. The correct order is already shown above.
Multiple commands in sed should be specified using the -e flag. They pipe the result of one command into another, equivalently to using pipes, but much more efficiently. To get sed -n "s#\(.*$\)#'\1',#;s#,$##p#" to print, rephrase it like:
sed -n -e "s#\(.*$\)#'\1',#" -e "s#, $##p#"
This is of course going to strip off the commas as soon as you add them to each line, but it shows the correct syntax for doing so.
Further Improvements
You probably don't need to use the -n flag for sed (and consequently) the /p/ flag for the s command. The -n flag is only useful if you only want to print matches, but you want to print everything, so it does not apply to you.
You also don't need an explicit capture group since you can use the \0 replacement to get the entire match, not just the group. Here is an example:
[script to generate list] | sed "s/.*/'\0'" | tr '\n' ' ' | sed 's/, $//'
Finally, there are alternatives to removing the trailing bits of the string without starting a subprocess, especially since you are already enclosing your expression in $(...):
RESULT=$([script to generate list] | sed "s/.*/'\0'" | tr '\n' ' ')
RESULT=${RESULT%, }
OR
RESULT=${RESULT::-2}

UNIX Replacing a character sequence in either tr or sed

Have a file that has been created incorrectly. There are several space delimited fields in the file but one text field has some unwanted newlines. This is causing a big problem.
How can I remove these characters but not the wanted line ends?
file is:
'Number field' 'Text field' 'Number field'
1 Some text 999999
2 more
text 111111111
3 Even more text 8888888888
EOF
So there is a NL after the word "more".
I've tried sed:
sed 's/.$//g' test.txt > test.out
and
sed 's/\n//g' test.txt > test.out
But none of these work. The newlines do not get removed.
tr -d '\n' does too much - I need to remove ONLY the newlines that are preceded by a space.
How can I delete newlines that follow a space?
SunOS 5.10 Generic_144488-09 sun4u sparc SUNW,Sun-Fire-V440
A sed solution is
sed '/ $/{N;s/\n//}'
Explanation:
/ $/: whenever the line ends in space, then
N: append a newline and the next line of input, and
s/\n//: delete the newline.
It might be simplest with Perl:
perl -p0 -e 's/ \n/ /g'
The -0 flag makes Perl read the entire file as one line. Then we can substitute using s in the usual way. You can, of course, also add the -i option to edit the file in-place.
How can I delete newlines that follow a space?
If you want every occurrence of $' \n' in the original file to be replaced by a space ($' '), and if you know of a character (e.g. a control character) that does not appear in the file, then the task can be accomplished quite simply using sed and tr (as you requested). Let's suppose, for example, that control-A is a character that is not in the file. For the sake of simplicity, let's also assume we can use bash. Then the following script should do the job:
#!/bin/bash
A=$'\01'
tr '\n' "$A" | sed "s/ $A/ /g" | tr "$A" '\n'

Skip/remove non-ascii character with sed

Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa
I've been trying to use sed to modify email addresses in a .csv but the line above keeps tripping me up, using commands like:
sed -i 's/[\d128-\d255]//' FILENAME
from this stackoverflow question
doesn't seem to work as I get an 'invalid collation character' error.
Ideally I don't want to change that combined AE character at all, I'd rather sed just skip right over it as I'm not trying to manipulate that text but rather the email addresses. As long as that AE is in there though it causes my sed substitution to fail after one line, delete the character and it processes the whole file fine.
Any ideas?
This might work for you (GNU sed):
echo "Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa" |
sed 's/\o346/a+e/g'
Chip,Dirkland,Droba+eSphere Inc,cdirkland#hotmail.com,usa
Then do what you have to do and after to revert do:
echo "Chip,Dirkland,Droba+eSphere Inc,cdirkland#hotmail.com,usa" |
sed 's/a+e/\o346/g'
Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa
If you have tricky characters in strings and want to understand how sed sees them use the l0 command (see here). Also very useful for debugging difficult regexps.
echo "Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa" |
sed -n 'l0'
Chip,Dirkland,Drob\346Sphere Inc,cdirkland#hotmail.com,usa$
sed -i 's/[^[:print:]]//' FILENAME
Also, this acts like dos2unix
The issue you are having is the local.
if you want to use a collation range like that you need to change the character type and the collation type.
This fails as \x80 -> \xff are invalid in a utf-8 string.
note \u0080 != \x80 for utf8.
anyway to get this to work just do
LC_ALL=C sed -i 's/[\d128-\d255]//' FILENAME
this will override LC_CTYPE and LC_COLLATE for the one command and do what you want.
I came here trying this sed command s/[\x00-\x1F]/ /g;, which gave me the same error message.
in this case it simply suffices to remove the \x00 from the collation, yielding s/[\x01-\x1F]/ /g;
Unfortunately it seems like all characters above and including \x7F and some others are disallowed, as can be seen with this short script:
for (( i=0; i<=255; i++ )); do
printf "== $i - \x$(echo "ibase=10;obase=16;$i" | bc) =="
echo '' | sed -E "s/[\d$i-\d$((i+1))]]//g"
done
Note that the problem is only the use of those characters to specify a range. You can still list them all manually or per script. E.g. to come back to your example:
sed -i 's/[\d128-\d255]//' FILENAME
would become
c=; for (( i=128; i<255; i++ )); do c="$c\d$i"; done
sed -i 's/['"$c"']//' FILENAME
which would translate to:
sed -i 's/[\d128\d129\d130\d131\d132\d133\d134\d135\d136\d137\d138\d139\d140\d141\d142\d143\d144\d145\d146\d147\d148\d149\d150\d151\d152\d153\d154\d155\d156\d157\d158\d159\d160\d161\d162\d163\d164\d165\d166\d167\d168\d169\d170\d171\d172\d173\d174\d175\d176\d177\d178\d179\d180\d181\d182\d183\d184\d185\d186\d187\d188\d189\d190\d191\d192\d193\d194\d195\d196\d197\d198\d199\d200\d201\d202\d203\d204\d205\d206\d207\d208\d209\d210\d211\d212\d213\d214\d215\d216\d217\d218\d219\d220\d221\d222\d223\d224\d225\d226\d227\d228\d229\d230\d231\d232\d233\d234\d235\d236\d237\d238\d239\d240\d241\d242\d243\d244\d245\d246\d247\d248\d249\d250\d251\d252\d253\d254\d255]//' FILENAME
In this case there is a way to just skip non-ASCII chars, not bothering with removing.
LANG=C sed /someemailpattern/
See https://bugzilla.redhat.com/show_bug.cgi?id=440419 and Will sed (and others) corrupt non-ASCII files?.
How about using awk for this. We setup the Field Separator to nothing. Then loop over each character. Use an if loop to check if it matches our character class. If it does we print it else we ignore it.
awk -v FS="" '{for(i=1;i<=NF;i++) if($i ~ /[A-Za-z,.# ]/) printf $i}'
Test:
[jaypal:~/Temp] echo "Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa" |
awk -v FS="" '{for(i=1;i<=NF;i++) if($i ~ /[A-Za-z,.# ]/) printf $i}'
Chip,Dirkland,DrobSphere Inc,cdirkland#hotmail.com,usa
Update:
awk -v FS="" '{for(i=1;i<=NF;i++) if($i ~ /[A-Za-z,.# ]/) printf $i; printf "\n"}' < datafile.csv > asciidata.csv
I have added printf "\n" after the loop to keep the lines separate.