Recursively remove trailing characters - solaris

I just copied couple of files from windows to unix and they all have ^M at the end. I know how to remove them using vi, but I can only do one file at a time, is there a way I can do it for all the files in the folder?. There are like 60 files and manually doing it for all of them is time consuming!.
I'm open to using other tools as well!
PS: The OS is Solaris
Thanks

For posterity, let's post the solution from within VI. You can remove the Ctrl-M at the end of every line like this:
:%s/^V^M$//
Note that this is what you type, wnere ^V means Ctrl-V and ^M means Ctrl-M. The idea here is that ^V will "escape" the following ^M, so that you can match it in the substitution regex.
And the % expression means "do this on every line".
Note that this may or may not work in vim, depending on your settings.
But your question asks how to do this in vi, in which you can't easily make a change to multiple files. If you're open to using other tools, please indicate so in your question.
You can use sed on a single file or stream:
$ printf 'one\r\ntwo\r\n' > /tmp/test.txt
$ od -c < /tmp/test.txt
0000000 o n e \r \n t w o \r \n
0000012
$ sed -i'' -e 's/^M$//' /tmp/test.txt
$ od -c < /tmp/test.txt
0000000 o n e \n t w o \n
0000010
$
In this case, in /bin/sh in FreeBSD, I escaped the ^M by ... you guessed it ... using ^V.
When using sed's -i option, you can specify multiple files and they will all be modified in place, perhaps eliminating the need to wrap this in a script. If you want to put this into a script anyway, I recommend you try to do so, and then ask for help if it doesn't work. That's the StackOverflow Way. :-)
Or just use Jonathan's for loop example. You don't need temp files.
UPDATE
If your sed does not have a -i option, then you can still do this pretty easily in a for loop:
[ghoti#pc ~]$ od -c /tmp/test1.txt
0000000 o n e \r \n t w o \r \n
0000012
[ghoti#pc ~]$ for f in /tmp/test*.txt; do sed -e 's/^M$//' "$f" > /tmp/temp.$$ && mv -v /tmp/temp.$$ "$f"; done
/tmp/temp.26687 -> /tmp/test1.txt
/tmp/temp.26687 -> /tmp/test2.txt
[ghoti#pc ~]$ od -c /tmp/test1.txt
0000000 o n e \n t w o \n
0000010

If you don't have a dos2unix or dtou command on your machine, you can use tr instead:
for file in "$#" # LIst of files passed as argument to script
do
tr -d '\015' < "$file" > tmp.$$
cp tmp.$$ "$file"
done
rm tmp.$$
You can add trap commands around that to clean up if you interrupt. Using cp instead of mv preserves owner, permissions, symlinks, hard links.

use the command dos2ux.
dos2ux file >file2

Related

Replace first line in directory files

I would like to execute this make command to first replace the first line of all csv files inside the directory and then replace the # for commas through the other lines.
The second command is working fine and does what it is supposed to do, but the first one only replaces the line on the first file.
Could anyone give me a help on that?
csv:
$(DOCKER_RUN) npm run csv-generator
make format-csv
format-csv:
#sed -i '' '1 s/^.*$$/"bar","repository"/g' $(CURDIR)/foo/npm/*.csv
#sed -i '' 's/\(.*\)#/\1","/g' $(CURDIR)/foo/npm/*.csv
The reason that the first sed command "fails" is that sed doesn't reset the line counter between input files (on your system, and neither on my Mac OS X machine, see comments):
$ cat test1
a
b
g
$ cat test2
aa
bb
cc
$ sed -n '=' test1 test2 # the '=' sed command outputs line numbers
1
2
3
4
5
6
This is why the first sed command isn't doing what you want it to do, it only affects the first file's first line.
The solution is to loop over the files and call sed for each of them (untested in Makefile):
#for f in $(CURDIR)/foo/npm/*.csv; do \
sed -i '' '1 s/^.*$$/"bar","repository"/g' $f; \
done
Using find and xargs will also work, just make sure that find isn't picking up files further down in the folders.
EDIT: In light of the comments on this answer, I would recommend avoiding the use of sed -i on multiple files altogether, and convert both statements into for-loops (in this case, they may be collapsed into one loop with two statements):
#for f in $(CURDIR)/foo/npm/*.csv; do \
sed -i '' '1 s/^.*$$/"bar","repository"/g' $f; \
sed -i '' 's/\(.*\)#/\1","/g' $f; \
done
In my experience, using for-loops in Makefiles seems to be far more common compared to using find and xargs. This is probably due to incompatibility between find and xargs versions between Unices. It also makes the Makefile a lot easier to read if one uses explicit loops.
I managed to solve with:
#find $(CURDIR)/foo/npm -name "*.csv" -type f | xargs -L 1 sed -i '' '1 s/^.*$$/"bar"/g'

how to replace the tabs with empty space in each file of a directory

I would like to replace the tabs in each file of a directory with the corresponding empty space. I found already a solution 11094383, where you can replace tabs with given number of empty spaces:
> find ./ -type f -exec sed -i 's/\t/ /g' {} \;
In the solution above tabs are replaced with four spaces. But in my case tabs can occupy more spaces - e.g. 8.
An example of file with tabs, which should be replaced with 8 spaces is:
NSMl1 100 PSHELL 0.00260 400000 400200 400300
400400 400500 400600 400700 400800 400900
401000 401100 400100 430000 430200 430300
430400 430500 430600 430700 430800 430900
431000 431100 430100 401200 431200
here the lines with tabs are the 3th to the 5th line.
An example of file with tabs, which should be replaced with 4 tabs is:
RBE2 1101001 5000511 123456 1100
Could anybody help?
The classic answer is to use the pr command with options to expand tabs into an appropriate number of spaces, turning of the pagination features:
pr -e8 -l1 -t …files…
The tricky part is getting the file over-written that seems to be part of the question. Of course, sed in the GNU and BSD (Mac OS X) incarnations supports overwriting with the -i option — with variant behaviours between the two as BSD sed requires a suffix for the backup files and GNU sed does not. However, sed does not (readily) support converting tabs to an appropriate number of blanks, so it isn't wholly appropriate.
There's a script overwrite (which I abbreviate to ow) in The UNIX Programming Environment that can do that. I've been using the script since 1987 (first checkin — last updated in 2005).
#!/bin/sh
# Overwrite file
# From: The UNIX Programming Environment by Kernighan and Pike
# Amended: remove PATH setting; handle file names with blanks.
case $# in
0|1) echo "Usage: $0 file command [arguments]" 1>&2
exit 1;;
esac
file="$1"
shift
new=${TMPDIR:-/tmp}/ovrwr.$$.1
old=${TMPDIR:-/tmp}/ovrwr.$$.2
trap "rm -f '$new' '$old' ; exit 1" 0 1 2 15
if "$#" >"$new"
then
cp "$file" "$old"
trap "" 1 2 15
cp "$new" "$file"
rm -f "$new" "$old"
trap 0
exit 0
else
echo "$0: $1 failed - $file unchanged" 1>&2
rm -f "$new" "$old"
trap 0
exit 1
fi
It would be possible and arguably better to use the mktemp command on most systems these days; it didn't exist way back then.
In the context of the question, you could then use:
find . -type f -exec ow {} pr -e8 -t -l1 \;
You do need to process each file separately.
If you are truly determined to use sed for the job, then you have your work cut out. There's a gruesome way to do it. There is a notational problem; how to represent a literal tab; I will use \t to denote it. The script would be stored in a file, which I'll assume is script.sed:
:again
/^\(\([^\t]\{8\}\)*\)\t/s//\1 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{1\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{2\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{3\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{4\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{5\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{6\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{7\}\)\t/s//\1\3 /
t again
That's using the classic sed notation.
You can then write:
sed -f script.sed …data-files…
If you have GNU sed or BSD (Mac OS X) sed, you can use the extended regular expressions instead:
:again
/^(([^\t]{8})*)\t/s//\1 /
/^(([^\t]{8})*)([^\t]{1})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{2})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{3})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{4})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{5})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{6})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{7})\t/s//\1\3 /
t again
and then run:
sed -r -f script.sed …data-files… # GNU sed
sed -E -f script.sed …data-files… # BSD sed
What do the scripts do?
The first line sets a label; the last line jumps to that label if any of the s/// operations in between made a substitution. So, for each line of the file, the script loops until there are no matches made, and hence no substitutions performed.
The 8 substitutions deal with:
A block of zero or more sequences of 8 non-tabs, which is captured, followed by
a sequence of 0-7 more non-tabs, which is also captured, followed by
a tab.
It replaces that match with the captured material, followed by an appropriate number of spaces.
One curiosity found during the testing is that if a line ends with white space, the pr command removes that trailing white space.
There's also the expand command on some systems (BSD or Mac OS X at least), which preserves the trailing white space. Using that is simpler than pr or sed.
With these sed scripts, and using the BSD or GNU sed with backup files, you can write:
find . -type f -exec sed -i.bak -r -f script.sed {} +
(GNU sed notation; substitute -E for -r for BSD sed.)

remove ^M characters from file using sed

I have this line inside a file:
ULNET-PA,client_sgcib,broker_keplersecurities
,KEPLER
I try to get rid of that ^M (carriage return) character so I used:
sed 's/^M//g'
However this does remove everything after ^M:
[root#localhost tmp]# vi test
ULNET-PA,client_sgcib,broker_keplersecurities^M,KEPLER
[root#localhost tmp]# sed 's/^M//g' test
ULNET-PA,client_sgcib,broker_keplersecurities
What I want to obtain is:
[root#localhost tmp]# vi test
ULNET-PA,client_sgcib,broker_keplersecurities,KEPLER
Use tr:
tr -d '^M' < inputfile
(Note that the ^M character can be input using Ctrl+VCtrl+M)
EDIT: As suggested by Glenn Jackman, if you're using bash, you could also say:
tr -d $'\r' < inputfile
still the same line:
sed -i 's/^M//g' file
when you type the command, for ^M you type Ctrl+VCtrl+M
actually if you have already opened the file in vim, you can just in vim do:
:%s/^M//g
same, ^M you type Ctrl-V Ctrl-M
You can simply use dos2unix which is available in most Unix/Linux systems. However I found the following sed command to be better as it removed ^M where dos2unix couldn't:
sed 's/\r//g' < input.txt > output.txt
Hope that helps.
Note: ^M is actually carriage return character which is represented in code as \r
What dos2unix does is most likely equivalent to:
sed 's/\r\n/\n/g' < input.txt > output.txt
It doesn't remove \r when it is not immediately followed by \n and replaces both with just \n. This fails with certain types of files like one I just tested with.
alias dos2unix="sed -i -e 's/'\"\$(printf '\015')\"'//g' "
Usage:
dos2unix file
If Perl is an option:
perl -i -pe 's/\r\n$/\n/g' file
-i makes a .bak version of the input file
\r = carriage return
\n = linefeed
$ = end of line
s/foo/bar/g = globally substitute "foo" with "bar"
In awk:
sub(/\r/,"")
If it is in the end of record, sub(/\r/,"",$NF) should suffice. No need to scan the whole record.
This is the better way to achieve
tr -d '\015' < inputfile_name > outputfile_name
Later rename the file to original file name.
I agree with #twalberg (see accepted answer comments, above), dos2unix on Mac OSX covers this, quoting man dos2unix:
To run in Mac mode use the command-line option "-c mac" or use the
commands "mac2unix" or "unix2mac"
I settled on 'mac2unix', which got rid of my less-cmd-visible '^M' entries, introduced by an Apple 'Messages' transfer of a bash script between 2 Yosemite (OSX 10.10) Macs!
I installed 'dos2unix', trivially, on Mac OSX using the popular Homebrew package installer, I highly recommend it and it's companion command, Cask.
This is clean and simple and it works:
sed -i 's/\r//g' file
where \r of course is the equivalent for ^M.
Simply run the following command:
sed -i -e 's/\r$//' input.file
I verified this as valid in Mac OSX Monterey.
remove any \r :
nawk 'NF+=OFS=_' FS='\r'
gawk 3 ORS= RS='\r'
remove end of line \r :
mawk2 8 RS='\r?\n'
mawk -F'\r$' NF=1

Hex String Replacement Using sed

I'm having some trouble getting sed to do a find/replace of some hex characters. I want to replace all instances within a file of the following hexadecimal string:
0x0D4D5348
with the following hexadecimal string:
0x0D0A4D5348
How can I do that?
EDIT: I'm trying to do a hex find/replace. The input file does not have the literal value of "0x0D4D5348" in it, but it does have the ASCII representation of that in it.
GNU sed v3.02.80, GNU sed v1.03, and HHsed v1.5 by Howard Helman
all support the notation \xNN, where "NN" are two valid hex numbers, 00-FF.
Here is how to replace a HEX sequence in your binary file:
$ sed 's/\x0D\x4D\x53\x48/\x0D\x0A\x4D\x53\x48/g' file > temp; rm file; mv temp file
As #sputnik pointed out, you can use sed's in place functionality. One caveat though, if you use it on OS/X, you'd have to add an empty set of quotes:
$ sed '' 's/\x0D\x4D\x53\x48/\x0D\x0A\x4D\x53\x48/g' file
As sed in place on OS/X takes a parameter to indicate what extension to add to the file name when making a backup, since it does create a temp file first. But then.. OS/X's sed doesn't support \x.
This worked for me on Linux and OSX.
Replacing in-place:
sed -i '.bk' 's'/`printf "\x03"`'/foo/g' index.html
(See #Ernest's comment in the answer by #tolitius)
In OS/X system's Bash, You can use command like this:
# this command will crate a variable named a which contains '\r\n' in it
a=`echo -e "hello\r\nworld\r\nthe third line\r\n"`
echo "$a" | sed $'s/\r//g' | od -c
and now you can see the output characters :
0000000 h e l l o \n w o r l d \n t h e
0000020 t h i r d l i n e \n
0000033
You should notice the difference between 's/\r//g' and $'s/\r//g'.
Based on the above practices, you can use command like this to replace hex String
echo "$a" | sed $'s/\x0d//g' | od -c

Have sed make substitute on string but SKIP first occurrence

I have been through the sed one liners but am still having trouble with my goal. I want to substitue matching strings on all but the first occurrence of a line. My exact usage would be:
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | sed <<MAGIC HAPPENS>
cd /Users/joeuser/bump\ bonding/initial\ trials
The line replaced the space in bump bonding with the slash space bump\ bonding so that I can execute this line (since when the spaces aren't escaped I wouldn't be able to cd to it).
Update: I solved this by just using single quotes and outputting
cd 'blah blah/thing/another space/'
and then using source to execute the command. But it didn't answer my question. I'm still curious though... how would you use sed to fix it?
s/ /\\ /2g
The 2 specifies that the second one should apply, and the g specifies that all the rest should apply too. (This probably only works on GNU sed. According to the Open Group Base Specification, "If both g and n are specified, the results are unspecified.")
You can avoid the problem with g and n
Replace all of them, then undo the first one:
sed -e 's/ /\\ /g' -e 's/\\ / /1'
Here's another method which uses the t branch-if-substituted command:
sed ':a;s/\([^ ]* .*[^\\]\) \(.*\)/\1\\ \2/;ta'
which has the advantage of leaving existing backslash-space sequences in the input intact.
use awk
$ echo cd 'blah blah/thing/another space/' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd blah\ blah/thing/another\ space/
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd /Users/joeuser/bump\ bonding/initial\ trials