Use sed to replace every character by itself followed by $n times a char?

Use sed to replace every character by itself followed by $n times a char? - sed

I'm trying to run the command below to replace every char in DECEMBER by itself followed by $n question marks. I tried both escaping {$n} like so {$n} and leaving it as is. Yet my output just keeps being D?{$n}E?{$n}... Is it just not possible to do this with a sed?
How should i got about this.
echo 'DECEMBER' > a.txt
sed -i "s%\(.\)%\1\(?\){$n}%g" a.txt
cat a.txt

This might work for you (GNU sed):
n=5
sed -E ':a;s/[^\n]/&\n/g;x;s/^/x/;/x{'"$n"'}/{z;x;y/\n/?/;b};x;ba' file
Append a newline to each non-newline character in a line $n times then replace all newlines by the intended character ?.
N.B. The newline is chosen as the initial substitute character as it is not possible for it to be within a line (sed uses newlines to separate lines) and if the final substitution character already exists within the current line, the substitutions are correct.

Range (also, interval or limiting quantifiers), like {3} / {3,} / {3,6}, are part of regex, and not replacement patterns.
You can use
sed -i "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" a.txt
See the online demo:
#!/bin/bash
sed "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" <<< "DECEMBER"
# => D???????E???????C???????E???????M???????B???????E???????R???????
Here, . matches any char, and & in the replacement pattern puts it back and $(for i in {1..7}; do echo -n '?'; done) adds seven question marks right after it.

This one-liner should do the trick:
sed 's/./&'$(printf '%*s' "$n" '' | tr ' ' '?')'/g' a.txt
with the assumption that $n expands to a positive integer and the command is executed in a POSIX shell.

Efficiently using any awk in any shell on every Unix box after setting n=2:
$ awk -v n="$n" '
BEGIN {
new = sprintf("%*s",n,"")
gsub(/./,"?",new)
}
{
gsub(/./,"&"new)
print
}
' a.txt
D??E??C??E??M??B??E??R??
To make the changes "inplace" use GNU awk with -i inplace just like GNU sed has -i.
Caveat - if the character you want to use in the replacement text is & then you'd need to use gsub(/./,"\\\\\\&",new) in the BEGIN section to make it is treated as literal instead of a backreference metachar. You'd have that issue and more (e.g. handling \1 or /) with any sed solution and any solution that uses double quotes around the script would have more issues with handling $s and the solutions that have a shell script expanding unquoted would have even more issues with globbing chars.

Related

Data transformation using sed

I have a file like:
A
B
C
D
E
F
G
H
I
J
K
L
and I want it to come out like
A,B,C,D
E,F,G,H
I'm assuming I'd use sed, but actually I'm not even sure if that's the best tool. I'm open to using anything commonly available on a Linux system.
In perl, I did it like this ... it works, but it's dirty and has a trailing comma. Was hoping for something simpler:
$ perl -ne 'if (/^(\w)\R/) {print "$1,";} else {print "\n";}' test
A,B,C,D,
E,F,G,H,
I,J,K,L,

Set the input record separator to paragraph mode (-00) and then split each record on any remaining whitespace:
$ perl -00 -ne 'print join("," => split), "\n"' test
Add -l to enable automatic newlines (but make sure it comes before -00, because we want $\ to be set to the value of $/ before modification):
$ perl -l -00 -ne 'print join("," => split)' test
Add -a to enable autosplit mode and implicitly split to #F:
$ perl -l -00 -ane 'print join("," => #F)' test
Swap out -n for -p for automatic printing:
$ perl -l -00 -ape '$_ = join("," => #F)' test

You could use
awk 'BEGIN {RS=""; FS="\n"; ORS="\n"; OFS=","} {$1=$1} 1' file
I see the gawk manual says this:
If RS
is set to the null string, then records are separated by blank lines. When RS is set to the null string, the newline character always acts as a field separator, in addition to whatever value FS may have.
So we don't actually need to specify FS to get the desired output:
awk 'BEGIN {RS=""; ORS="\n"; OFS=","} {$1=$1} 1' file

xargs could do it,
$ xargs -n4 < file | tr ' ' ','
A,B,C,D
E,F,G,H
I,J,K,L

Replacing newlines with sed is a bit complicated (see this question). It is easier to use tr for the newlines. The rest can be done by sed.
The following command assumes that yourFile does not contain any ,.
tr '\n' , < yourFile | sed 's/,*$/\n/;s/,,/\n/g'
The tr part converts all newlines to ,. The resulting string will have no newlines.
s/,*$/\n/ removes trailing commas and appends a newline (text files usually end with a newline).
s/,,/\n/g replaces ,, by a newline. Two consecutive commas appear only where your original file contained two consecutive newlines, that is where the sections are separated by an empty line.

Escaping a variable with special characters within sed - comment and uncomment an arbitrary line of source code

I need to comment out a line in a crontab file through a script, so it contains directories, spaces and symbols. This specific line is stored in a variable and I am starting to get mixed up on how to escape the variable. Since the line changes on a regular basis I dont want any escaping in there. I don't want to simply add # in front of it, since I also need to switch it around and replace the line again with the original without the #.
So the goal is to replace $line with #$line (comment) with the possibility to do it the other way around (uncomment).
So I have a variable:
line="* * * hello/this/line & /still/this/line"
This is a line that occurs in a file, file.txt. Wich needs to get comment out.
First try:
sed -i "s/^${line}/#${line}/" file.txt
Second try:
sed -i 's|'${line}'|'"#${line}"'|g' file.txt

choroba's helpful answer shows an effective solution using perl.
sed solution
If you want to use sed, you must use a separate sed command just to escape the $line variable value, because sed has no built-in way to escape strings for use as literals in a regex context:
lineEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$line") # escape $line for use in regex
sed -i "s/^$lineEscaped\$/#&/" file.txt # Note the \$ to escape the end-of-line anchor $
With BSD/macOS sed, use -i '' instead of just -i for in-place updating without backup.
And the reverse (un-commenting):
sed -i "s/^#\($lineEscaped\)\$/\1/" file.txt
See this answer of mine for an explanation of the sed command used for escaping, which should work with any input string.
Also note how variable $lineEscaped is only referenced once, in the regex portion of the s command, whereas the substitution-string portion simply references what the regex matched (which avoids the need to escape the variable again, using different rules):
& in the substitution string represents the entire match, and \1 the first capture group (parenthesized subexpression, \(...\)).
For simplicity, the second sed command uses double quotes in order to embed the value of shell variable $lineEscaped in the sed script, but it is generally preferable to use single-quoted scripts so as to avoid confusion between what the shell interprets up front vs. what sed ends up seeing.
For instance, $ is special to both the shell and sed, and in the above script the end-of-line anchor $ in the sed regex must therefore be escaped as \$ to prevent the shell from interpreting it.
One way to avoid confusion is to selectively splice double-quoted shell-variable references into the otherwise single-quoted script:
sed -i 's/^'"$lineEscaped"'$/#&/' file.txt
awk solution
awk offers literal string matching, which obviates the need for escaping:
awk -v line="$line" '$0 == line { $0 = "#" $0 } 1' file.txt > $$.tmp && mv $$.tmp file.txt
If you have GNU Awk v4.1+, you can use -i inplace for in-place updating.
And the reverse (un-commenting):
awk -v line="#$line" '$0 == line { $0 = substr($0, 2) } 1' file.txt > $$.tmp &&
mv $$.tmp file.txt

Perl has ways to do the quoting/escaping for you:
line=$line perl -i~ -pe '$regex = quotemeta $ENV{line}; s/^$regex/#$ENV{line}/' -- input.txt

UNIX Replacing a character sequence in either tr or sed

Have a file that has been created incorrectly. There are several space delimited fields in the file but one text field has some unwanted newlines. This is causing a big problem.
How can I remove these characters but not the wanted line ends?
file is:
'Number field' 'Text field' 'Number field'
1 Some text 999999
2 more
text 111111111
3 Even more text 8888888888
EOF
So there is a NL after the word "more".
I've tried sed:
sed 's/.$//g' test.txt > test.out
and
sed 's/\n//g' test.txt > test.out
But none of these work. The newlines do not get removed.
tr -d '\n' does too much - I need to remove ONLY the newlines that are preceded by a space.
How can I delete newlines that follow a space?
SunOS 5.10 Generic_144488-09 sun4u sparc SUNW,Sun-Fire-V440

A sed solution is
sed '/ $/{N;s/\n//}'
Explanation:
/ $/: whenever the line ends in space, then
N: append a newline and the next line of input, and
s/\n//: delete the newline.

It might be simplest with Perl:
perl -p0 -e 's/ \n/ /g'
The -0 flag makes Perl read the entire file as one line. Then we can substitute using s in the usual way. You can, of course, also add the -i option to edit the file in-place.

How can I delete newlines that follow a space?
If you want every occurrence of $' \n' in the original file to be replaced by a space ($' '), and if you know of a character (e.g. a control character) that does not appear in the file, then the task can be accomplished quite simply using sed and tr (as you requested). Let's suppose, for example, that control-A is a character that is not in the file. For the sake of simplicity, let's also assume we can use bash. Then the following script should do the job:
#!/bin/bash
A=$'\01'
tr '\n' "$A" | sed "s/ $A/ /g" | tr "$A" '\n'

How to insert strings containing slashes with sed? [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 1 year ago.
I have a Visual Studio project, which is developed locally. Code files have to be deployed to a remote server. The only problem is the URLs they contain, which are hard-coded.
The project contains URLs such as ?page=one. For the link to be valid on the server, it must be /page/one .
I've decided to replace all URLs in my code files with sed before deployment, but I'm stuck on slashes.
I know this is not a pretty solution, but it's simple and would save me a lot of time. The total number of strings I have to replace is fewer than 10. A total number of files which have to be checked is ~30.
An example describing my situation is below:
The command I'm using:
sed -f replace.txt < a.txt > b.txt
replace.txt which contains all the strings:
s/?page=one&/pageone/g
s/?page=two&/pagetwo/g
s/?page=three&/pagethree/g
a.txt:
?page=one&
?page=two&
?page=three&
Content of b.txt after I run my sed command:
pageone
pagetwo
pagethree
What I want b.txt to contain:
/page/one
/page/two
/page/three

The easiest way would be to use a different delimiter in your search/replace lines, e.g.:
s:?page=one&:pageone:g
You can use any character as a delimiter that's not part of either string. Or, you could escape it with a backslash:
s/\//foo/
Which would replace / with foo. You'd want to use the escaped backslash in cases where you don't know what characters might occur in the replacement strings (if they are shell variables, for example).

The s command can use any character as a delimiter; whatever character comes after the s is used. I was brought up to use a #. Like so:
s#?page=one&#/page/one#g

A very useful but lesser-known fact about sed is that the familiar s/foo/bar/ command can use any punctuation, not only slashes. A common alternative is s#foo#bar#, from which it becomes obvious how to solve your problem.

add \ before special characters:
s/\?page=one&/page\/one\//g
etc.

In a system I am developing, the string to be replaced by sed is input text from a user which is stored in a variable and passed to sed.
As noted earlier on this post, if the string contained within the sed command block contains the actual delimiter used by sed - then sed terminates on syntax error. Consider the following example:
This works:
$ VALUE=12345
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345
This breaks:
$ VALUE=12345/6
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
sed: -e expression #1, char 21: unknown option to `s'
Replacing the default delimiter is not a robust solution in my case as I did not want to limit the user from entering specific characters used by sed as the delimiter (e.g. "/").
However, escaping any occurrences of the delimiter in the input string would solve the problem.
Consider the below solution of systematically escaping the delimiter character in the input string before having it parsed by sed.
Such escaping can be implemented as a replacement using sed itself, this replacement is safe even if the input string contains the delimiter - this is since the input string is not part of the sed command block:
$ VALUE=$(echo ${VALUE} | sed -e "s#/#\\\/#g")
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345/6
I have converted this to a function to be used by various scripts:
escapeForwardSlashes() {
# Validate parameters
if [ -z "$1" ]
then
echo -e "Error - no parameter specified!"
return 1
fi
# Perform replacement
echo ${1} | sed -e "s#/#\\\/#g"
return 0
}

this line should work for your 3 examples:
sed -r 's#\?(page)=([^&]*)&#/\1/\2#g' a.txt
I used -r to save some escaping .
the line should be generic for your one, two three case. you don't have to do the sub 3 times
test with your example (a.txt):
kent$ echo "?page=one&
?page=two&
?page=three&"|sed -r 's#\?(page)=([^&]*)&#/\1/\2#g'
/page/one
/page/two
/page/three

replace.txt should be
s/?page=/\/page\//g
s/&//g

please see this article
http://netjunky.net/sed-replace-path-with-slash-separators/
Just using | instead of /

Great answer from Anonymous. \ solved my problem when I tried to escape quotes in HTML strings.
So if you use sed to return some HTML templates (on a server), use double backslash instead of single:
var htmlTemplate = "<div style=\\"color:green;\\"></div>";

A simplier alternative is using AWK as on this answer:
awk '$0="prefix"$0' file > new_file

You may use an alternative regex delimiter as a search pattern by backs lashing it:
sed '\,{some_path},d'
For the s command:
sed 's,{some_path},{other_path},'

Skip/remove non-ascii character with sed

Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa
I've been trying to use sed to modify email addresses in a .csv but the line above keeps tripping me up, using commands like:
sed -i 's/[\d128-\d255]//' FILENAME
from this stackoverflow question
doesn't seem to work as I get an 'invalid collation character' error.
Ideally I don't want to change that combined AE character at all, I'd rather sed just skip right over it as I'm not trying to manipulate that text but rather the email addresses. As long as that AE is in there though it causes my sed substitution to fail after one line, delete the character and it processes the whole file fine.
Any ideas?

This might work for you (GNU sed):
echo "Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa" |
sed 's/\o346/a+e/g'
Chip,Dirkland,Droba+eSphere Inc,cdirkland#hotmail.com,usa
Then do what you have to do and after to revert do:
echo "Chip,Dirkland,Droba+eSphere Inc,cdirkland#hotmail.com,usa" |
sed 's/a+e/\o346/g'
Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa
If you have tricky characters in strings and want to understand how sed sees them use the l0 command (see here). Also very useful for debugging difficult regexps.
echo "Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa" |
sed -n 'l0'
Chip,Dirkland,Drob\346Sphere Inc,cdirkland#hotmail.com,usa$

sed -i 's/[^[:print:]]//' FILENAME
Also, this acts like dos2unix

The issue you are having is the local.
if you want to use a collation range like that you need to change the character type and the collation type.
This fails as \x80 -> \xff are invalid in a utf-8 string.
note \u0080 != \x80 for utf8.
anyway to get this to work just do
LC_ALL=C sed -i 's/[\d128-\d255]//' FILENAME
this will override LC_CTYPE and LC_COLLATE for the one command and do what you want.

I came here trying this sed command s/[\x00-\x1F]/ /g;, which gave me the same error message.
in this case it simply suffices to remove the \x00 from the collation, yielding s/[\x01-\x1F]/ /g;
Unfortunately it seems like all characters above and including \x7F and some others are disallowed, as can be seen with this short script:
for (( i=0; i<=255; i++ )); do
printf "== $i - \x$(echo "ibase=10;obase=16;$i" | bc) =="
echo '' | sed -E "s/[\d$i-\d$((i+1))]]//g"
done
Note that the problem is only the use of those characters to specify a range. You can still list them all manually or per script. E.g. to come back to your example:
sed -i 's/[\d128-\d255]//' FILENAME
would become
c=; for (( i=128; i<255; i++ )); do c="$c\d$i"; done
sed -i 's/['"$c"']//' FILENAME
which would translate to:
sed -i 's/[\d128\d129\d130\d131\d132\d133\d134\d135\d136\d137\d138\d139\d140\d141\d142\d143\d144\d145\d146\d147\d148\d149\d150\d151\d152\d153\d154\d155\d156\d157\d158\d159\d160\d161\d162\d163\d164\d165\d166\d167\d168\d169\d170\d171\d172\d173\d174\d175\d176\d177\d178\d179\d180\d181\d182\d183\d184\d185\d186\d187\d188\d189\d190\d191\d192\d193\d194\d195\d196\d197\d198\d199\d200\d201\d202\d203\d204\d205\d206\d207\d208\d209\d210\d211\d212\d213\d214\d215\d216\d217\d218\d219\d220\d221\d222\d223\d224\d225\d226\d227\d228\d229\d230\d231\d232\d233\d234\d235\d236\d237\d238\d239\d240\d241\d242\d243\d244\d245\d246\d247\d248\d249\d250\d251\d252\d253\d254\d255]//' FILENAME

In this case there is a way to just skip non-ASCII chars, not bothering with removing.
LANG=C sed /someemailpattern/
See https://bugzilla.redhat.com/show_bug.cgi?id=440419 and Will sed (and others) corrupt non-ASCII files?.

How about using awk for this. We setup the Field Separator to nothing. Then loop over each character. Use an if loop to check if it matches our character class. If it does we print it else we ignore it.
awk -v FS="" '{for(i=1;i<=NF;i++) if($i ~ /[A-Za-z,.# ]/) printf $i}'
Test:
[jaypal:~/Temp] echo "Chip,Dirkland,DrobæSphere Inc,cdirkland#hotmail.com,usa" |
awk -v FS="" '{for(i=1;i<=NF;i++) if($i ~ /[A-Za-z,.# ]/) printf $i}'
Chip,Dirkland,DrobSphere Inc,cdirkland#hotmail.com,usa
Update:
awk -v FS="" '{for(i=1;i<=NF;i++) if($i ~ /[A-Za-z,.# ]/) printf $i; printf "\n"}' < datafile.csv > asciidata.csv
I have added printf "\n" after the loop to keep the lines separate.