awk and sed challenge: adjust the text width - sed

So let's see how can we do this: trim the text width within a certain value, say, 10.
For lines longer than 10, break it into multiple lines.
Example:
A text file:
01234567
01234567890123456789abcd
0123
should be changed to:
01234567
0123456789
0123456789
abcd
0123
So how can we do it using sed or awk as short as possible?

Use the proper tool for the job...
fold -w 10

Or, marginally shorter (than Jonathan Dursi's answer):
sed -e 's/.\{10\}/&\
/g' text.file
sed -e 's/.\{10,10\}/&\
/g' text.file
Tested on MacOS X 10.6.4, which does not use GNU sed.

$ sed -e 's/\(..........\)/\1\\n/g' foo.txt
or, if that doesn't work (eg, don't have a sufficiently new gnu sed), just insert a newline and make sure it's quoted:
$ sed -e 's/\(..........\)/\1\\
/g' foo.txt
You can pretty much transliterate that into awk, too:
$ awk '{ gsub(/........../, "&\n" ) ; print}' foo.txt

In awk with a variable width:
awk -v WIDTH=5 '{ gsub(".{"WIDTH"}", "&\n"); printf $0 }; !/\n$/ { print "" }'
The final statement prevents the printing of extra newlines when the line is an exact multiple of the maximum line width.

Related

How to replace consecutive symbols using only one sed command?

I have a simple .csv file with lines that holds 't' values. Here is the example:
2ABC;t;t;t;tortuga;fault;t;t;bored
I want to replace them to '1' using sed.
If I make sed "s/;t;/;1;/g" I get the next result:
2ABC;1;t;1;tortuga;fault;1;t;bored
As you can see, consecutive ';t;' have been replaced through one. Yes, I can replace all ';t;' by sed -e "s/;t;/;1;/g" -e "s/;t;/;1;/g" but this is boring.
How can I make the replacement by one sed command?
If there is something to replace, branch to replace again.
sed ': again; /;t;/{ s//;1;/; b again }'
Overall, parsing cvs with sed is crude. Consider awk.
awk -F';' -v OFS=';' '{ for(i=1;i<=NF;++i) if ($i=="t") $i=1 } 1'
Lookarounds is helpful in such cases:
$ s='t;2ABC;t;t;t;tortuga;fault;t;t;bored;t'
$ echo "$s" | perl -lpe 's/(?<![^;])t(?![^;])/1/g'
1;2ABC;1;1;1;tortuga;fault;1;1;bored;1
echo '2ABC;t;t;t;tortuga;fault;t;t;bored' |
— gawk-specific solution
gawk -be '(ORS = RT)^!(NF = NF)' FS='^t$' OFS=1 RS=';'
— cross-awk-solution
{m,g,n}awk 'gsub(FS, OFS, $!(NF = NF))^_' FS=';t;' OFS=';1;' RS=
2ABC;1;1;1;tortuga;fault;1;1;bored

How to replace only specific spaces in a file using sed?

I have this content in a file where I want to replace spaces at certain positions with pipe symbol (|). I used sed for this, but it is replacing all the spaces in the string. But I don't want to replace the space for the 3rd and 4th string.
How to achieve this?
Input:
test test test test
My attempt:
sed -e 's/ /|/g file.txt
Expected Output:
test|test|test test
Actual Output:
test|test|test|test
sed 's/ /\
/3;y/\n / |/'
As newline cannot appear in a sed pattern space, you can change the third space to a newline, then change all newlines and spaces to spaces and pipes.
GNU sed can use \n in the replacement text:
sed 's/ /\n/3;y/\n / |/'
If the original input doesn't contain any pipe characters, you can do
sed -e 's/ /|/g' -e 's/|/ /3' file
to retain the third white space. Otherwise see other answers.
You could replace the 'first space' twice, e.g.
sed -e 's/ /|/' -e 's/ /|/' file.txt
Or, if you want to specify the positions (e.g. the 2nd and 1st spaces):
sed -e 's/ /|/2' -e 's/ /|/1' file.txt
Using GNU sed to replace the first and second one or more whitespace chunks:
sed -i -E 's/\s+/|/;s/\s+/|/' file
See the online demo.
Details
-i - inline replacements on
-E - POSIX ERE syntax enabled
s/\s+/|/ - replaces the first one or more whitespace chars
; - and then
s/\s+/|/ the second one or more whitespace chars on each line (if present).
Keep it simple and use awk, e.g. using any awk in any shell on every Unix box no matter what other characters your input contains:
$ awk '{for (i=1;i<NF;i++) sub(/ /,"|")} 1' file
test|test|test test
The above replaces all but the last " " on each line. If you want to replace a specific number, e.g. 2, then just change NF to 2.

Remove all the characters from string after last '/'

I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.
You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt
perl -lne 'print $1 if(/(.*)\//)' your_file
Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file

How to change part of the string using sed?

I have a file data.txt with the following strings:
text-common-1.1.1-SNAPSHOT.jar
text-special-common-2.1.2-SNAPSHOT.jar
some-text-variant-1.1.1-SNAPSHOT.jar
text-another-variant-text-3.3.3-SNAPSHOT.jar
I want to change all of the text-something-digits-something.jar to text-something-5.0.jar.
Here is my script with sed (GNU sed version 4.2.1
), but it doesn't work, I don't know why:
#!/bin/bash
for t in ./data.txt
do
sed -i "s/\(text-[a-z]*-(\d|\.)*\).*\(.jar\)/\15.0\2/" ${t}
done
What is wrong with my sed usage?
How about this awk
awk '/^text/ {sub(/[0-9].*\./,"5.0.")}1'
text-common-5.0.jar
text-special-common-5.0.jar
some-text-variant-1.1.1-SNAPSHOT.jar
text-another-variant-text-5.0.jar
text-something-digits-something.jar to text-something-5.0.jar
equal change digits-someting to 5.0
It also takes care of changing line only starting with text
I think a simpler approach might be enough: sed -r -e 's/(text-(.*-)?common-)([0-9\.]+)(-.*\.jar)/\15.0\4/' < your_data.
Another way of saying the same thing with perl: perl -pe 's/(text-(?:(.*-))*common-)([\d\.]+)(-.*\.jar)/${1}1.5${4}/' < your_data.
#!/bin/bash
for t in ./data.txt
do
sed -i '/^text-/ s/[.0-9]\{1,\}-something\(\.jar\)$/5.0\2/' ${t}
# for "any" something
#sed -i '/^text-/ s/[.0-9]\{1,\}-[^?]\{1,\}\(\.jar\)$/5.0\2/' ${t}
done
select string starting with text and change digit value is present
Using sed:
sed '/^text-/ s/-[0-9.]*-/-5.0-/' file

How to find and replace all percent, plus, and pipe signs?

I have a document containing many percent, plus, and pipe signs. I want to replace them with a code, for use in TeX.
% becomes \textpercent.
+ becomes \textplus.
| becomes \textbar.
This is the code I am using, but it does not work:
sed -i "s/\%/\\\textpercent /g" ./file.txt
sed -i "s/|/\\\textbar /g" ./file.txt
sed -i "s/\+/\\\textplus /g" ./file.txt
How can I replace these symbols with this code?
Test script:
#!/bin/bash
cat << 'EOF' > testfile.txt
1+2+3=6
12 is 50% of 24
The pipe character '|' looks like a vertical line.
EOF
sed -i -r 's/%/\\textpercent /g;s/[+]/\\textplus /g;s/[|]/\\textbar /g' testfile.txt
cat testfile.txt
Output:
1\textplus 2\textplus 3=6
12 is 50\textpercent of 24
The pipe character '\textbar ' looks like a vertical line.
This was already suggested in a similar way by #tripleee, and I see no reason why it should not work. As you can see, my platform uses the very same version of GNU sed as yours. The only difference to #tripleee's version is that I use the extended regex mode, so I have to either escape the pipe and the plus or put it into a character class with [].
nawk '{sub(/%/,"\\textpercent");sub(/\+/,"\\textplus");sub(/\|/,"\\textpipe"); print}' file
Tested below:
> echo "% + |" | nawk '{sub(/%/,"\\textpercent");sub(/\+/,"\\textplus");sub(/\|/,"\\textpipe"); print}'
\textpercent \textplus \textpipe
Use single quotes:
$ cat in.txt
foo % bar
foo + bar
foo | bar
$ sed -e 's/%/\\textpercent /g' -e 's/\+/\\textplus /g' -e 's/|/\\textbar /g' < in.txt
foo \textpercent bar
foo \textplus bar
foo \textbar bar