Last string appears at beginning of line in formatted output - perl

Has anyone any idea why the following would format itself in a weird way? In several years I've had no problem with creating simple text output but this problem has me baffled.
I'm using the line
print "$BC,$Ttl,$FN,$SN,$Finalage,$OurLoc,$OurDT,$FinalPC\n";
Every value is a simple text string on which I've run "chomp" to remove return characters.
I would expect the output to look like the following:
*DD10099999,,Information Services,Guest Ticket 2,41,C G,03/11/2020,NE8 9BB*
$BC is the first item and $FinalPC is the postcode at the end.
Instead I get:
*,NE8 9BB99, ,Information Services,Guest Ticket 2,41,C G,03/11/2020*
The final item has somehow moved to the beginning of the line and overwritten the first item. This is happening consistently on every line of my screen and text file output and I'm completely stumped as to why. The data is read from a text file and compared with database output which is also simple text. There are no occurrences of \b anywhere in my code. Why would a backspace character get into it?

The string in $OurDT ends with a carriage return, which causes your terminal to home the cursor. Presumably, the value of $OurDT came from a Windows file read on a unixy machine.
One option is to fix the file (e.g. by using the dos2unix utility).
Another is to accept both CRLF and LF as line endings (e.g. by using s/\s+\z// instead of chomp).

Related

Trouble rendering CSV data as an interactive table in GitHub

When viewed, any .csv file committed to a GitHub repository automatically renders as an interactive table, complete with headers and row numbering. By default, the first row is your header row. The tables were supposed to look nice as below:
However, there's an error happening in my tabular data, and despite indicating the error, I can't fix it:
I'm using a .csv file with a semicolon separator. Does anyone have an idea of what's happening?
According to the docs, Github can only do its lay-out thing with .csv (comma-separated) and .tsv (tab-separated) files.
Using a semicolon as a separator isn't supported, at least not officially, and a spurious comma in a semicolon-separated file could well throw the algorithm off.
You could try replacing all semicolons with tabs and see how you fare.
If that doesn't work, try using commas as separators and enclose all text table cell data with quotes, like:
"Liver fibrosis, sclerosis, and cirrhosis","c370800","102922","Cystic fibrosis related cirrhosis","Diagnosis of liver fibrosis, sclerosis, and cirrhosis"
Note: no spaces after the commas. Also, if you have quotes in the text fields, you will have to escape those to "" (two quotes), or the algorithm will get confused.
You may get away with using quotes only for the offending text data, but that could well be more difficult to generate than just putting the quotes around all fields.

Unity 2019 - linebreak \n not working for UI text elements

I am having some difficulty getting linebreaks to work for my Unity UI elements. (Unity 2019.2.17f1 Personal)
What I'm doing is:
string twoLinesOfText = LanguagePack.getTextByID(ID);
result:
twoLinesOfText = "Text line 1\nText line 2"
Expected output:
Text line 1
Text line 2
Reality:
Text line 1\nText line 2
I have tried using "\n", "\\n" and "\r\n". None of these give the intended result.
I assign the text to the component using
UITextComponent.GetComponent<Text>().text = twoLinesOfText;
Can this direct assignment be a problem? Do i need to push my string through a toString() or parse it somehow for the \n to be recognised?
Workaround:
I have a workaround. By using an XML file for my LanguagePack, and inserting (enter) linebreaks in the base file, I feed the linebreaks into my Unity UI elements. Obviously this is not ideal.
Reading back the strings in Debug.Log does not show which linebreak code was ultimately used: it just breaks the string according to the (enter) linebreaks in the XML file.
You can't import it trought Language Package. What you should do is :
string line1 = LanguagePackage.getTextByID(ID1);
string line2 = LanguagePackage.getTextByID(ID2);
string twoLinesOfText = line1 + "\n" + line2;
UITextComponent.GetComponent<Text>().text = twoLinesOfText;
Run into this problem myself, a little investigation showed that what I thought was \n in the string had been converted to \\n so it showed in the text box as \n.
Converting it during debugging to just \n got me the multiline text I wanted.
Now to investigate where in my data chain it got converted :-)
Ok, investigation complete. A file was saved, on my PC from a program in Visual Basic using the File.WriteAllLines function, one of those lines had a couple of instances of \n. A look at that file in notepad shows it had correctly written that line. The problem came when I used File.ReadAllLines in my unity program as it converted those \n instances to \\n. As far as I can tell this is not a documented action, in fact it's possible, on reading the MS docs, to think that it would have split that line into multiple lines, which it doesn't do.
I checked in my VB program and File.ReadAllLines does not behave in this way there. It's probably something to do with the environment, VB does not use \n, C# does. I fixed the problem by tagging a replace onto the string e.g. string.Replace("\\n", "\n"). It's entirely possible that attempting to write a string from C# with File.WriteAllLines could also mess with \n.
Geez, this was hard to write as the Editor here messes with \\n and convert it to \n and I end up having to use \\\n
For people who encounter this issue. You Could try to use some HTML similar syntax and see whether it works or not.
Eg:
Using for newline instead of \n

What code format shows proper line breaks?

I am exporting some Access tables to txt files and there are a lot of problems with the txt file. One of those problems being line breaks not visible in the txt file itself. If I copy a line with a line break into Notepad++ from Notepad, it'll break into 2 lines.
So I believe this may be a code format problem, but I can't find the proper one to resolve this. I'm currently exporting to the default Western European, but should I export tot UTF, Unicode, ASCII or something else?
When exporting from MS Access (or VB/VBA in general), make sure you're using vbCrLf constant (Carriage Return plus Line Feed) for line breaks. That constant corresponds to HEX values 0D 0A.
In Windows, it is a convention to use the above 2 characters together as line breaks, while in many other platforms, such as Unix/Linux/MacOS/etc. typically just 0A is used.
That brings up an issue: Notepad, the standard Windows text file viewer, cannot deal with 0A alone and does not treat such symbols as line breaks. More advanced editors, such as Notepad++ or UltraEdit, display such files correctly, though.
The CSV export function in Microsoft Office applications (Excel, Access) terminate a data row with CR+LF and write for a line break within a data value (multi-line string) just LF into the file. (I think just CR was written into the CSV file for a line break in older versions of Office before Office 2007.)
Most text editors detect those LF without CR (respectively CR without LF) and convert them to CR+LF on loading the CSV file which results on viewing of the CSV file in text editor in supposed wrong CSV lines as number of data values is not correct on data rows with data values containing a line break.
However, newline characters within a double quoted value in a CSV file are correct according to CSV specification as described in Wikipedia article about Comma-separated values.
But most applications with support on import from CSV file do not support CSV files with newline characters within a double quoted value and therefore some data values are imported wrong. Also regular expression replaces can't be done on a CSV file with newline characters within a data value because the number of separator character is not constant on all lines.
UltraEdit has for editing such CSV files with only LF (or CR) for a line break within a data value a special configuration setting. At Advanced - Configuration - File Handling - DOS/Unix/Mac Handling the option Never prompt to convert files to DOS format or Prompt to convert if file is not DOS format with clicking on button No if this prompt is displayed must be selected and additionally Only recognize DOS terminated lines (CR/LF) as new lines for editing must be enabled.
The CSV file with CR+LF for end of data row and only LF (or CR) for a line-break within a data value is loaded with those settings in UltraEdit with number of lines equal the number of data rows. And the line-feeds without carriage return (respectively the carriage returns without line-feed) in the CSV file are displayed as character in the lines with a small rectangle as no font has a glyph for a carriage return or line-feed defined because they are whitespace characters with no width. A Perl regular expression find searching for \r(?!\n)|\n(?<!\r) can be used now to find those line breaks within data values and replace them with something different like a space character or remove them.
Which character encoding (ASCII, ANSI, Unicode (UTF-16), UTF-8) to use on export depends on which characters can exist in string values. A Unicode encoding is necessary if string values can have also characters not included in local code page.

curious about what CR and LF means

I know that file line separaters are very different under certain operating systems, for windows it's CRLF, under linux is LF, and under MacOS is CR. But who on earth named those ascii characters? Are those (LF and CR etc.) abbreviation or something else? And dose every ascii character have a name like this?
CR stands for Carriage Return, LF stands for Line Feed. These names come from the age of typewriters. In order to start writing on the next line, you would push your carriage (the moving part of the typewriter) all the way back to the left, then engage the feed lever to pull the paper one line up.
And yes, other "control characters" have names like those too. See here: http://en.wikipedia.org/wiki/ASCII#ASCII_control_characters
CR stands for "carriage return", which means the returning of the typewriters head to the start of the line. LF is for "line feed" which advances the sheet of paper in the typewriter to the next line.
In most typewriters, CR and LF could be triggered by a single mechanism, but sometimes you also had an additional line feed key to quickly advance to the next line without moving the head (useful for formulars). And you could also omit the LF action on CR in order to write to a given line more than once.

How to use '^#' in Vim scripts?

I'm trying to work around a problem with using ^# (i.e., <ctrl-#>) characters in Vim scripts. I can insert them into a script, but when the script runs it seems the line is truncated at the point where a ^# was located.
My kludgy solution so far is to have a ^# stored in a variable, then reference the variable in the script whenever I would have quoted a literal ^#. Can someone tell me what's going on here? Is there a better way around this problem?
That is one reason why I never use raw special character values in scripts. While ^# does not work, string <C-#> in mappings works as expected, so you may use one of
nnoremap <C-#> {rhs}
nnoremap <Nul> {rhs}
It is strange, but you cannot use <Char-0x0> here. Some notes about null byte in strings:
Inserting null byte into string truncates it: vim uses old C-style strigs that end with null byte, thus it cannot appear in strings. These strings are very inefficient, so if you want to generate a very large text, try accumulating it into a list of lines (using setline is very fast as buffer is represented as a list of lines).
Most functions that return list of strings (like readfile, getline(start, end)) or take list of strings (like writefile, setline, append) treat \n (NL) as Null. It is also the internal representation of buffer lines, see :h NL-used-for-Nul.
If you try to insert \n character into the command-line, you will get Null shown (but this is really a newline). If you want to edit a file that has \n in a filename (it is possible on *nix), you will need to prepend newline with backslash.
The byte ctrl-# is also known as '\0'. Many languages, programs, etc. use it as an "end of string" marker, so it's not surprising that vim gets confused there. If you must use this byte in the middle of a script string, it sounds like your workaround is a decent one.