EM dash ("–") gets replaced by character â€” upon update in WinCvs

EM dash ("–") gets replaced by character â€” upon update in WinCvs - version-control

In WinCvs, when I take update of a file (in my case its .sql file) the EM Dash (–) gets replaced by character â€”. How can I get rid of this?
I am using WinCvs version - 2.1.1.1(Build 1)

Related

Last string appears at beginning of line in formatted output

Has anyone any idea why the following would format itself in a weird way? In several years I've had no problem with creating simple text output but this problem has me baffled.
I'm using the line
print "$BC,$Ttl,$FN,$SN,$Finalage,$OurLoc,$OurDT,$FinalPC\n";
Every value is a simple text string on which I've run "chomp" to remove return characters.
I would expect the output to look like the following:
*DD10099999,,Information Services,Guest Ticket 2,41,C G,03/11/2020,NE8 9BB*
$BC is the first item and $FinalPC is the postcode at the end.
Instead I get:
*,NE8 9BB99, ,Information Services,Guest Ticket 2,41,C G,03/11/2020*
The final item has somehow moved to the beginning of the line and overwritten the first item. This is happening consistently on every line of my screen and text file output and I'm completely stumped as to why. The data is read from a text file and compared with database output which is also simple text. There are no occurrences of \b anywhere in my code. Why would a backspace character get into it?

The string in $OurDT ends with a carriage return, which causes your terminal to home the cursor. Presumably, the value of $OurDT came from a Windows file read on a unixy machine.
One option is to fix the file (e.g. by using the dos2unix utility).
Another is to accept both CRLF and LF as line endings (e.g. by using s/\s+\z// instead of chomp).

SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape

Using Autokey 95.8, Python 3 version in Linux Mint 19.3 and I have a series of keyboard macros which generate Unicode characters. This example works:
# alt+shift+a = á
import sys
char = "\u00E1"
keyboard.send_keys(char)
sys.exit()
But the attempt to print an mdash [—] generates the following error:
SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape
# alt+shift+- = —
import sys
char = "\u2014"
keyboard.send_keys(char)
sys.exit()
Any idea how to overcome this problem in Autokey is greatly appreciated.

The code you posted above would not generated the error you ae getting - "truncated \UXXXXXXXX" needs an uppercase \U - and 8 hex-digits - if you try putting in the Python source char = "\U2014", you will get that error message (and probably it you got it when experimenting with the file in this way).
The sequence char = "\u2014" will create an mdash unicode character on the Python side - but that does not mean it is possible to send this as a Keyboard sybo via autokey to Windows. That is the point your program is likely failing (and since there is no programing error, you won't get a Python error message - it is just that it won't work - although Autokey might be nice and print out some apropriate error message in this case).
You'd have to look around on how to type an arbitrary unicode character on your S.O. config (on Linux mint it should be on the docs for "wayland" I guess), and send the character composign sequence to Autokey instead. If there is no such a sequence, then finding a way to copy the desired character to the window environment clipboard, and then send Autokey the "paste" sequence (usually ctrl + v - but depending on the app it could change. Terminal emulators use ctrl + shift + v, for example)

When you need to emit non-English US characters in AutoKey, you have two choices. The simplest is to put them into the clipboard with clipboard.fill_clipboard(your characters) and paste them into the window using keyboard.send_keys("<ctrl>+v"). This almost always works.
If you need to define a phrase with multibyte characters in it, select the Paste using Clipboard (Ctrl+V) option. (I'm trying to get that to be the default option in a future release.)
The other choice, that I'm still not quite sure of, is directly sending the Unicode escape sequence to the window, letting it convert that into the actual Unicode character. Something like keyboard.send_keys("\U2014"). Assigning that to a variable first, as in the question, creates the actual Unicode character which that API call can't handle correctly.
The problem being that the underlying code for keyboard.send_keys() wants to send keycodes that actually exist on your keyboard or that it can add to an unused key in your layout. Most of the time that doesn't work for anything multibyte.

defining escape character for a csv import

I have a source file that has text columns which end with a "\" and I have specified "^" as the column delimiter.
I have the file format for this specified use - ESCAPE = 'NONE', but rows with "\^" are causing premature end-of-line errors - assuming SF is not interpreting the "\^" as a column delimiter - therefore the column count is off.
I have changed the file format to use something else for ESCAPE but get the same message. The offending rows have the right number of columns and a text column containing "\", that is not the last character in the column, imports correctly.
The values are exported from SQL Server.
Is this an escape character problem or am I overlooking something else? I am new to SF.

I was seeing this same issue. Nomatter what I used as an escape character, when it showed up in my file next to a " at the end of a string it started causing trouble.
I switched my delimiter to \u0001 which is a special "start of header" character that very rarely shows up, especially at the end of strings.
I wouldn't say this was an ideal option for us, but it worked and is something you might want to try.

Identifying hidden characters in text

I have an ETL process that regularly extracts code from an ODBC data source, manipulates it, and inserts it into my postgres database. One of the columns from this data source regularly has odd characters in it.
For the most part I can catch and convert all of the characters appropriately, but I have one character that exists in the ODBC data source, cannot be brought into postgres (all of the text after that character gets truncated), and I'm having a hard time identifying what the character is.
I can't even insert an example of the character directly into this post because it gets stripped out :/ The closest I can get is a screen shot of the character in textmate (the only application I can actually see the character in):
There character is the diamond between the 1 and 0. When my data comes in, everything after the 0 is truncated.
Is there a good way of identifying what this character is so I can figure out a way of stripping it out?

Per tripleee's comment on the original question post:
To identify the character I grabbed the hex value of the text to identify the hex value of the offending character in question.
There are a number of ways to do this, but the quickest way for me was to use a utility application I have called HexFiend so dump the text into. Once the text was in and I highlighted the character it returned the hex value "00".
A bit more investigation pointed towards the hex null value being used as a line terminator in C applications (which makes sense given the context of my project).
I've fit this null value into my ETL process so that it gets switched out with a new line and now everything is sunshine and daises.
Thanks again for the help!

curious about what CR and LF means

I know that file line separaters are very different under certain operating systems, for windows it's CRLF, under linux is LF, and under MacOS is CR. But who on earth named those ascii characters? Are those (LF and CR etc.) abbreviation or something else? And dose every ascii character have a name like this?

CR stands for Carriage Return, LF stands for Line Feed. These names come from the age of typewriters. In order to start writing on the next line, you would push your carriage (the moving part of the typewriter) all the way back to the left, then engage the feed lever to pull the paper one line up.
And yes, other "control characters" have names like those too. See here: http://en.wikipedia.org/wiki/ASCII#ASCII_control_characters

CR stands for "carriage return", which means the returning of the typewriters head to the start of the line. LF is for "line feed" which advances the sheet of paper in the typewriter to the next line.
In most typewriters, CR and LF could be triggered by a single mechanism, but sometimes you also had an additional line feed key to quickly advance to the next line without moving the head (useful for formulars). And you could also omit the LF action on CR in order to write to a given line more than once.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

EM dash ("–") gets replaced by character â€” upon update in WinCvs - version-control

In WinCvs, when I take update of a file (in my case its .sql file) the EM Dash (–) gets replaced by character â€”. How can I get rid of this? I am using WinCvs version - 2.1.1.1(Build 1)

Related

Last string appears at beginning of line in formatted output

SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape

defining escape character for a csv import

Identifying hidden characters in text

curious about what CR and LF means

Categories

Resources