Generally me and my developers prefer UNIX line endings. I have updated the setting in VSCode for end of line settings to the following...
"files.eol": "\n"
This seems to work when creating new files, however any pre-existing file from our source code is defaulted to CRLF. Is there any way that I can set our end of lines to \n across all files, even pre-existing files from our repo's source code?
When VSCode opens a file, if the file has at least one line terminator in it, its per-file EOL value will be set according to whatever is the most common line terminator in the file, regardless of files.eol. That means that if files are opening as CRLF, then those files are already CRLF on disk.
My guess is your SCM is checking them out with CRLF endings and you probably can adjust its configuration so it checks them out as LF instead.
Related
How to read ASCII files with mixed line endings (Windows and Unix) and UTF-16 Big Endian files in SAP?
Background: our ABAP application must read some of our configuration files. Most of them are ASCII files (normal text files) and one is Unicode Big Endian. So far, the files were read using ASCII mode and things were fine during our test.
However, the following happened at customers: the configuration files are located on a Linux terminal, so it has Unix Line Endings. People read the configuration files via FTP or similar and transport it to the Windows machine. On the Windows machine, they adapt some of the settings. Depending on the editor, our customers now have mixed line endings.
Those mixed line ending cause trouble when reading the file in ASCII mode in ABAP. The file is read up to the point where the line endings change plus a bit more but not the whole file.
I suggested reading the file in BINARY mode, remove all the CRs, then replace all the remaining LFs by CR LF. That worked fine - except for the UTF-16 BE file for which this approach results in a mess. So the whole thing was reverted.
I'm not an ABAP developer, I just have to test this. With my background in other programming languages I must assume there is a solution and I tend to decline a "CAN'T FIX" resolution of this bug.
you can use CL_ABAP_FILE_UTILITIES=>CHECK_FOR_BOMto determine which encoding the file has and then use the constants of class CL_ABAP_CHAR_UTILITIES to process further.
Here's the simplified version of my problem: I have two text files, different data but identical first line and generated by the same program, although possibly on different OS's. When emacs reads one of them it says it is in DOS format, while it does not when reading the other.
I used several Hex editors (Bless, GHex, OKTeta on Kubuntu) and on all of them I see the same thing, which is that every line ends with the sequence OD OA (CR LF) for both files, including the last line.
So my question is: how does emacs determine what is a DOS file and what is not, and is there something else in the file the the Hex editor would not show, or add?
Both files have the same name, in different directories. Also I came upon this problem because I have C++ code that parses strings and failed on the file that emacs lists as DOS, so the issue is really with the file content.
Last note: you will notice there is no C/C++ tag. I'm not looking for advice on how to modify my C++ code to handle the situation. I know how to do it.
Thanks for your help
a
Emacs handles DOS files by converting the CRLF to LF when reading the file and then the LF back into CRLF when writing it out. So if there is a lone LF in the file, reading&writing would end up adding a CR even if the buffer had not been modified. For this reason, if there is such a lone LF hidden in the middle of the file, Emacs will handle the file not as DOS but as a UNIX file.
First, let me explain what I am doing. I have a CVS repository that I store 5,000 Data Definition Language files in. These 5,000 files are generated from an external data modeling application, they are text and have windows CRLFs. During development, if I need to make a change, I re-generate the 5,000 files and then overwrite the contents of my local CVS workspace in eclipse. The full overwrite/replacement is to make sure that I don't miss any updates to files. After overwriting/replacing the files, I use eclipse to do a team < synchronize with repository. When I do this, the comparison flags every single file as an outgoing change because it looks to not be ignoring CRLFs in its comparison. I have "Ignore white space" checked off and the eclipse documentation states that it should be ignoring CRLFs:
Ignore whitespace option:
Causes the comparison to ignore differences which are whitespace characters
(spaces, tabs, etc.). Also causes differences in line terminators ( LF
versus CRLF) to be ignored.
When I open the files in text compare, it shows no diffs but there is an extra CRLF at top of one of the files. Is this a bug or is there an option I am missing in eclipse? It looks like the problem is that it doesn't ignore CRLFs that are on their own line.
The Eclipse compare dialog doesn't have a bug; you're just confused because you're seeing the output of several, independent problems.
The option "ignore whitespace" only reduces the amount of changes that the compare dialog shows; it has no effect whatsoever on the differences that CVS sees. So as long as the files have the wrong line ending, CVS will complain.
Some version control systems allow you to specify converters to solve this issue, CVS doesn't. So you really need to generate files with the correct line endings.
The "single file with extra CRLF" really has a an extra CRLF. Find out why and fix that to make the difference go away.
When generating files, you should never use PrintStream or PrintWriter. It is tempting but these two have many bugs (like close() doesn't flush(), violating their API contract) plus they use platform dependent line endings which is almost never what you want. Yes, it might work by accident but trust me on this, that's not what you want. You don't want you pay check filed on accident, either, right?
If you don't use PrintStream nor PrintWriter, then avoid the System property line.separator for the same reasons.
I suggest to wrote a helper class which has many of the methods of PrintStream / PrintWriter but none of the bugs. Plus it should allow you to set the line delimiter to whatever you need.
Note: If you use a Writer, make sure you also specify the charset / encoding or the "UTF-8 to bytes" conversion will be as random as the line endings.
What can cause Notepad++ to make new lines as CRLF in one file and only LF in the other?
Both files were created at the same folder from the same OS and no modifications to Notepad++ preferences were made, AFAIK... Is there any option in Notepad++ that changes how new lines are defined?
Go to Edit->EOL Conversion -- change setting to Windows Format or Unix Format or Old Mac Format (depending on your prefernces)
I am using a perl script to read in a file, but I'm not sure what encoding the file is in. Basically, my file is a list of book titles, but each book has other info associated with it (author, publication date, etc). So each book title is within a discrete chunk of data for the book. So I iterate through the file line by line until I find the regular expression '/Book Title: (.*)/' and take what's in the paren. Then, I create a separate .txt file with the name of the text file being my book. However, in my unix server, when I look at the name of the file, it's actually not, for example, 'LordOfTheFlies.txt' but rather 'LordOfTheFlies^M.txt'
What is this '^M'? Is that a weird end of line encoding I'm not taking into account? I tried chomp but it doesn't seem to be working. What is the best file encoding for working with perl?
It's the additional carriage return character that Windows systems insert before line feed characters (M == 13th letter, hence ASCII 13 is visualised as ^M).
It has nothing to do with file encoding, it's just the line ending policy biting you. Perl is usually good at handling line ending characters correctly, but if they occur somewhere else than the end of a line you have to do it yourself. You can use s/\r// instead of chomp() to get them out.
Before processing the file, you need to know the encoding of the file, which is determined by the producer of the file.
That "^M" is control-M, which is a carriage return, and is not needed in Unix file systems.Looks like the file is created in Unix and transferred to Windows. It can also be added with ftp when text file are transfered as binaries.
Try chop, instead of 'chomp'. Chomp removes the 'new line character'. s/\r// is also good.
For your general question, you might want to use appropriate module for the file type you have to make your life easier and better with Perl.