Standard linux patch hard-coded only for unix text files.
PS: I do no want convert ALL to unix and then convert result back.
I've run into this problem before a few times. This is what I've discovered:
The Linux patch command will not recognize a patchfile that has CRLF in the patch 'meta-lines'.
The line-endings of the actual patch content must match the line endings of files being patched.
So this is what I did:
Use dos2unix to convert patch files to LF line-endings only.
Use dos2unix to convert the files being patched to LF line-endings only.
Apply patch.
You can use unix2dos to convert patched files back to CRLF line-endings if you want to maintain that convention.
Use the --binary option. Here is the relevant snippet from the man page:
--binary
Write all files in binary mode, except for standard output and /dev/tty. When reading, disable
the heuristic for transforming CRLF line endings into LF line endings. This option is needed
on POSIX systems when applying patches generated on non-POSIX systems to non-POSIX files. (On
POSIX systems, file reads and writes never transform line endings. On Windows, reads and writes
do transform line endings by default, and patches should be generated by diff --binary when
line endings are significant.)
Combined:
dos2unix patchfile.diff
dos2unix $(grep 'Index:' patchfile.diff | awk '{print $2}')
patch --verbose -p0 -i patchfile.diff
unix2dos $(grep 'Index:' patchfile.diff | awk '{print $2}')
The last line depends on whether you want to keep the CRLFs or not.
M.
PS. This should've been a reply to cscrimge's post. DS.
This is a solution one of our guys came up with in our office, so I'm not taking credit for it but it works for me here.
We have a situation of mixed linux and windows line endings in the same file sometimes, and we also create patch files from windows and apply them on linux.
If you are experience a patch problem after creating your patch file on windows or you have mixed line endings then do this:
dos2unix patch-file
dos2unix $(sed -n 's/^Index: //p' patch-file)
patch -p0 -i patch-file
perl -i.bak -pe's/\R/\n/g' inputfile to convert any line ending to the standard.
Related
Problem Background
We have several thousand large (10M<lines) text files of tabular data produced by a windows machine which we need to prepare for upload to a database.
We need to change the file encoding of these files from cp1252 to utf-8, replace any bare Unix LF sequences (i.e. \n) with spaces, then replace the DOS line end sequences ("CR-LF", i.e \r\n) with Unix line end sequences (i.e. \n).
The dos2unix utility is not available for this task.
We initially had a bash function that packaged these operations together using iconv and sed, with iconv doing the encoding and sed dealing with the LF/CRLF sequences. I'm trying to replace part of this bash function with a perl command.
Example Code
Based on some helpful code review, I want to change this function to a perl script.
The author of the code review suggested the following perl to replace CRLF (i.e. "\r\n") with LF ("\n").
perl -g -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;'
The explanation for why this is better than what we had previously makes perfect sense, but this line fails for me with:
Unrecognized switch: -g (-h will show valid options).
More interestingly, the author of the code review also suggests it is possible to perform the decode/recode in a perl script, too, but I am completely unsure where to start.
Questions
Please can someone explain why the suggested answer fails with Unrecognized switch: -g (-h will show valid options).?
If it helps, the line is supposed to receive piped input from incov as follows (though I am interested in learning how to use perl to do the redcoding/recoding step, too):
iconv --from-code=CP1252 --to-code=UTF-8 $1$ | \
perl -g -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;'
> "$2"
(Highly simplified) example input for testing:
apple|orange|\n|lemon\r\nrasperry|strawberry|mango|\n\r\n
Desired output:
apple|orange| |lemon\nrasperry|strawberry|mango| \n
Perl recently added the command line switch -g as an alias for 'gulp mode' in Perl v5.36.0.
This works in Perl version v5.36.0:
s=$(printf "Line 1\nStill Line 1\r\nLine 2\r\nLine 3\r\n")
perl -g -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;' <<<"$s"
Prints:
Line 1 Still Line 1
Line 2
Line 3
But any version of perl earlier than v5.36.0, you would do:
perl -0777 -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;' <<<"$s"
# same
BTW, the conversion you are looking for a way easier in this case with awk since it is close to the defaults.
Just do this:
awk -v RS="\r\n" '{gsub(/\n/," ")} 1' <<<"$s"
Line 1 Still Line 1
Line 2
Line 3
Or, if you have a file:
awk -v RS="\r\n" '{gsub(/\n/," ")} 1' file
This is superior to the posted perl solution since the file is processed record be record (each block of text separated by \r\n) versus having the read the entire file into memory.
(On Windows you may need to do awk -v RS="\r\n" -v ORS="\n" '...')
Another note:
You can get similar behavior from Perl by:
Setting the input record separator to the fixed string $/="\r\n" in a BEGIN block;
Use the -l switch so every line has the input record separator removed;
Use tr for speedy replacement of \n with ' ';
Possible set the output record separator, $/="\n", on Windows.
Full command:
perl -lpE 'BEGIN{$/="\r\n"} tr/\n/ /' file
The error message is about the command line switch -g you use in perl -g -pe .... This is not about the switch at the regex - which is valid (but useless since there is only a single \n in a line anyway, and -p reads line by line).
This switch simply does not exist with the perl version you are using. It was only added with perl 5.36, so you are likely using an older version. Try -0777 instead.
Quite certainly I miss something basic. My file contains lines like
fooLOCATION=sdfmsvdnv
fooLOCATION=
barLOCATION=sadssf
barLOCATION=
and I want to delete all lines ending with LOCATION=.
sed -i '/LOCATION=$/d' file
does not do, it deletes nothing, and I have tried endless variations, but I don't get it. What inline sed command can do this?
There are two approaches here, either print all non-matching lines with
sed -in '/LOCATION=$/!p' file
or delete all matching names with
sed -i '/LOCATION=$/d' file
The first uses the n command line option to suppress the default action of printing the line. We then test for lines that end in LOCATION= and invert the pattern (only keeping those that don't match). When we get a desirable line, we print it with the p option.
The second looks for lines matching the end of line pattern, and deletes those that do.
Your file contains blank lines, and both of these keep those. If we don't want to keep those, we can change the first option to
sed -in '/^$/!{/LOCATION=$/!p}' file
which first checks if a line is not empty, and only bothers checking if it should be printed if it isn't empty. We can modify the second option to
sed -i '/^$/d;/LOCATION=$/d' file
which deletes blank lines and then checks about deleting the other pattern.
We can modify the options to work with different line ending by specifying the difference in the pattern. The difference between line endings on Unix/Linux (\n) and Windows (\r\n) is the presence of an extra carriage return on Windows. Modifying the four commands above to accept either, we get
sed -in '/LOCATION=\r\{0,1\}$/!p' file
sed -i '/LOCATION=\r\{0,1\}$/d' file
sed -in '/^\r\{0,1\}$/!{/LOCATION=\r\{0,1\}$/!p}' file
sed -i '/^\r\{0,1\}$/d;/LOCATION=\r\{0,1\}$/d' file
Note that in each of these we allow an optional \r before the end of line. We use the curly bracket notation, as sed does not support the question mark optional quantifier in normal mode (using the r option to GNU sed for enabling extended regular expressions, we can replace \{0,1\} with ?).
On a Windows shell, all of the options above require double quotes instead of single quotes.
Your command does work for me:
$ sed -i '/LOCATION=$/d' file
Results, viewed using cat:
$ cat file
fooLOCATION=sdfmsvdnv
barLOCATION=sadssf
Note
If a file has non-Unix line endings such as files from Windows with DOS-formatted line-endings, it can be a reason for failure. A typical remedy is to use dos2unix:
$ dos2unix file
This converter fixes the newline issues, so that file will now have Unix-style line endings. Sed should now properly recognize those line endings, so retry your sed command and it should work.
This might work for you (GNU sed):
sed -i '/LOCATION=\s*$/d' file
This deletes the line if LOCATION= is at the end of the line or if there is any optional white space following the pattern.
I have a script on a centos server and I wrote the script on the server using VIM. The script is to edit a configuration file. When I check the configuration file after it has been edited, there is a ^M at the end of every line that was NOT edited. The lines that were edited are fine.
cat hibernate.properties |
sed -i.bk \
-e 's%\(^hibernate\.connection\.url\=ristor:jdbc:postgresql:\/\/127\.0\.0\.1/\).*%\'1$dbname'%' \
-e 's/\(^hibernate\.connection\.username\=\).*/\'1$dbuser'/' \
-e 's/\(^hibernate\.connection\.password\=\).*/\'1$pws'/' hibernate.properties
This is the code that is being used to edit the configuration file. Why is it putting ^M at the end of every line that is NOT edited?
The ^M being shown are probably windows-style line endings on some lines. Try to run your file through dos2unix before running your script.
For example:
dos2unix hibernate.properties
This is not likely to add \r, it's more like that the file had them already, but was detected as dos fileformat by vim. Your script actually removed it from each line it has touched and vim doesn't consider the file dos anymore and therefore shows carriage returns that are still left in. Once you remove them (%s/<Ctrl-V><Ctrl-M>$// in vim should do), it is not likely to happen again.
I am using a very simple sed script removing comments : sed -e 's/--.*$//'
It works great until non-ascii characters are present in a comment, e.g.: -- °.
This line does not match the regular expression and is not substituted.
Any idea how to get . to really match any character?
Solution :
Since file says it is an iso8859 text, LANG variable environment must be changed before calling sed :
LANG=iso8859 sed -e 's/--.*//' -
It works for me. It's probably a character encoding problem.
This might help:
Why does sed fail with International characters and how to fix?
http://www.barregren.se/blog/how-use-sed-together-utf8
#julio-guerra: I ran into a similar situation, trying to delete lines like the folowing (note the Æ character):
--MP_/yZa.b._zhqt9OhfqzaÆC
in a file, using
sed 's/^--MP_.*$//g' my_file
The file encoding indicated by the Linux file command was
file my_file: ISO-8859 text, with very long lines
file -b my_file: ISO-8859 text, with very long lines
file -bi my_file: text/plain; charset=iso-8859-1
I tried your solution (clever!), with various permutations; e.g.,
LANG=ISO-8859 sed 's/^--MP_.*$//g' my_file
but none of those worked. I found two workarounds:
The following Perl expression worked, i.e. deleted that line:
perl -pe 's/^--MP_.*$//g' my_file
[For an explanation of the -pe command-line switches, refer to this StackOverflow answer:
Perl flags -pe, -pi, -p, -w, -d, -i, -t? ]
Alternatively, after converting the file encoding to UTF-8, the sed expression worked (the Æ character remained, but was now UTF8-encoded):
iconv -f iso-8859-1 -t utf-8 my_file > my_file.utf8
As I am working with lots (1000's) of emails with various encodings, that undergo intermediate processing (bash-scripted conversions to UTF-8 do not always work), for my purposes "solution 1" above will probably be the most robust solution.
Notes:
sed (GNU sed) 4.4
perl v5.26.1 built for x86_64-linux-thread-multi
Arch Linux x86_64 system
The documentation of GNU sed's z command mentions this effect (my emphasis):
This command empties the content of pattern space. It is usually
the same as 's/.*//', but is more efficient and works in the
presence of invalid multibyte sequences in the input stream. POSIX
mandates that such sequences are not matched by '.', so that
there is no portable way to clear sed's buffers in the middle of
the script in most multibyte locales (including UTF-8 locales).
It seems likely that you are running sed in a UTF-8 (or other multibyte) locale. You'll want to set LC_CTYPE (that's finer-grained than LANG, and won't affect translation of error messages. Valid locale names usually look like en.iso88591 or (for the location in your profile) fr_FR.iso88591, not just the encoding on its own - you might be able to see the full list with locale -a.
Example:
LC_CTYPE=fr_FR.iso88591 sed -e 's/--.*//'
Alternatively, if you know that the non-comment parts of the line contain only ASCII, you could split the line at a comment marker, print the first part and discard the remainder:
sed -e 's/--/\n/' -e 'P' -e 'd'
I have the following line in my proftpd log (line 78 to be precise)
Deny from 1.2.3.4
I also have a script which rolls through my logs for people using brute force attacks and then stores their IP (ready for a black listing). What i'm struggling with is inserting (presume with sed) at the end of that specific line - this is what I've got so far:
sed "77i3.4.5.6" /opt/etc/proftpd.conf >> /opt/etc/proftpd.conf
Now one would presume this would work perfectly, however it actually does the following (lines 77 through 78):
3.4.5.6
Deny from 1.2.3.4
I suspect this is due to my dated version of sed, are there any other ways of acheiving the same thing? Also the >> causes the config to be duplicated at the end of the fole (again i'm sure this is a limitation of my version of sed). This is running a homebrew linux kernel on my nas. Sed options below:
root#NAS:~# sed BusyBox v1.7.0
(2009-04-29 19:12:57 JST) multi-call
binary
Usage: sed [-efinr] pattern [files...]
Options:
-e script Add the script to the commands to be executed
-f scriptfile Add script-file contents to the
commands to be executed
-i Edit files in-place
-n Suppress automatic printing of pattern space
-r Use extended regular expression syntax
If no -e or -f is given, the first
non-option argument is taken as the
sed script to interpret. All remaining
arguments are names of input files; if
no input files are specified, then the
standard input is read. Source files
will not be modified unless -i option
is given.
Cheers for your help guys.
This has nothing to do with the version of sed; this is just plain old Doing It Wrong.
sed -i '77s/$/,3.4.5.6/' /opt/etc/proftpd.conf