Papa Parse with backslash escaping - papaparse

I have input that people will probably say "that's not really CSV", but I still have to parse it. (using Papa Parse)
comma is the delimiter. backslash is the escape. comma, double quote, backslash, r and n (to denote newlines) can all be escaped. There is no "quoting" of strings.
so... I see data like:
this is one\, field,1/2\" bolt,this is text with \\ and a new line \r\n embedded
and I want:
[0] this is one\, field
[1] 1/2\" bolt
[2] this is text with \\ and a new line \r\n embedded
but I'm getting
[0] this is one\
[1] field
[2] 1/2\" bolt
...
I can deal with the other \x things in post processing... I'd just like to get it to handle \, correctly.
I've tried the obvious values of quoteChar and escapeChar with no luck.
oh... and the Donate link is broken on https://www.papaparse.com/ if Matt Holt is listening.

const parsed = window.Papa.parse(csvText, {
escapeChar: '\\',
});
Seems like default escape character is ", but it can be overidden in the paramters.
Upd. though now that I look at it, it does not seem to work with your case. It only did fix an issue I had of 2.5\","Shell being considered single value because " was interpreted as escape character for ,.
I'm starting to get a feeling that the only way to escape coma is to enclose the field in the quotes.
Hope someone will post the right answer eventually...

Related

CSV specification - double quotes at the start and end of fields

Question (because I can't work it out), should ""hello world"" be a valid field value in a CSV file according to the specification?
i.e should:
1,""hello world"",9.5
be a valid CSV record?
(If so, then the Perl CSV-XS parser I'm using is mildly broken, but if not, then $line =~ s/\342\200\234/""/g; is a really bad idea ;) )
The weird thing is is that this code has been running without issue for years, but we've only just hit a record that started with both a left double quote and contained no comma (the above is from a CSV pre-parser).
The canonical format definition of CSV is https://www.rfc-editor.org/rfc/rfc4180.txt. It says:
Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
Last rule means your line should have been:
1,"""hello world""",9.5
But not all parsers/generators follow this standard perfectly, so you might need for interoperability reasons to relax some rules. It all depends on how much you control the CSV format writing and CSV format parsing parts.
That depends on the escape character you use. If your escape character is '"' (double quote) then your line should look like
1,"""hello world""",9.5
If your escape character is '\' (backslash) then your line should look like
1,"\"hello world\"",9.5
Check your parser/environment defaults or explicitly configure your parser with the escape character you need e.g. to use backslash do:
my $csv = Text::CSV_XS->new ({ quote_char => '"', escape_char => "\\" });

PostgreSQL regexp_replace with matched expression

I am using PostgreSQL regexp_replace function to escape square brackets, parentheses and backslash in a string so that I could use that string as a regex pattern itself (there are other manipulations done on this string as well before using it, but they are outside the scope of this question. The idea is to replace:
[ with \[
] with \]
( with \(
) with \)
\ with \\
Postgres documentation page on regular expressions states the following:
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text.
However regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\1', 'g'); produces abc \ def\.
Further down on that same page, an example is given, which uses \\1 notation - so I tried that.
Yet, regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\\1', 'g'); produces abc \1def\1.
I would guess this is expected, but regexp_replace('abc [def]', '([\[\]\(\)\\])', E'.\\1', 'g'); produces abc .[def.]. That is, escaping works with characters other than the standard backslash.
At this point I don't know how to proceed. What can I do to actually give me the replacement I want?
OK, found the answer. Apparently, I need to double-escape the backslash in the replacement. Also, I need to E-prefix and double-escape backslashes in the search pattern on older versions of postgres (8.3 in my case). The final code looks like this:
regexp_replace('abc [def]', E'([\\[\\]\\(\\)\\\\\?\\|_%])', E'\\\\\\1', 'g')
Yes, it looks horrible, but it works :)
it's simpliest way
select regexp_replace('abc [def]', '([\[\]\(\)\\])', '\\\1', 'g')

Set String with quotes to label in objective C programming?

trackLbl2.text = #""Nonstop Bollywood Music"";
I try this but not working. How to text with double quotes.
Try this:
trackLbl2.text = #"\"Nonstop Bollywood Music\"";
You have to put \ before any special character to print it out.
The user before me provided a good answer, but I would like to elaborate on your problem a little bit. It's a good thing to know.
In most (all?) programming languages, certain characters usually serve a special function - single quotes, double quotes, backslashes, and the likes. If you want to include those inside a string, the common practice is to "escape" them, or use escape characters. Escape characters are special character combinations that get replaced with the actual character you want to see inside the string; they usually start with a forward slash (\)*.
Here is a list of common escape characters used in Objective C (source: Wikipedia)
\a - Sound alert
\b - Backspace
\f - Form feed
\n - New line
\r - Carriage return
\t - Horizontal tab
\v - Vertical tab
\ - Backslash
\" - Double quote (used when placing a double quote into a string declaration)
\' - Single quote (used when placing a double quote into a string declaration)
Fun fact: I actually had to escape that forward slash in there by typing "\\".
Try this
trackLbl2.text = #"\"Nonstop Bollywood Music\"";
You have to put \ before any special character to print it out.

Perl string sub

I want to replace something with a path like C:\foo, so I:
s/hello/c:\foo
But that is invalid.
Do I need to escape some chars?
Two problems that I can see.
Your first problem is that your s/// replacement is not terminated:
s/hello/c:\foo # fatal syntax error: "Substitution replacement not terminated"
s/hello/c:\foo/ # syntactically okay
s!hello!c:\foo! # also okay, and more readable with backslashes (IMHO)
Your second problem, the one you asked about, is that the \f is taken as a form feed escape sequence (ASCII 0x0C), just as it would be in double quotes, which is not what you want.
You may either escape the backslash, or let variable interpolation "hide" the problem:
s!hello!c:\\foo! # This will do what you want. Note double backslash.
my $replacement = 'c:\foo' # N.B.: Using single quotes here, not double quotes
s!hello!$replacement!; # This also works
Take a look at the treatment of Quote and Quote-like Operators in perlop for more information.
If I understand what you're asking, then this might be something like what you're after:
$path = "hello/there";
$path =~ s/hello/c:\\foo/;
print "$path\n";
To answer your question, yes you do need to double the backslash because \f is an escape sequence for "form feed" in a Perl string.
The problem is that you are not escaping special characters:
s/hello/c:\\foo/;
would solve your problem. \ is a special character so you need to escape it. {}[]()^$.|*+?\ are meta (special) characterss which you need to escape.
Additional reference: http://perldoc.perl.org/perlretut.html

What's the difference between single and double quotes in Perl?

I am just begining to learn Perl. I looked at the beginning perl page and started working.
It says:
The difference between single quotes and double quotes is that single quotes mean that their contents should be taken literally, while double quotes mean that their contents should be interpreted
When I run this program:
#!/usr/local/bin/perl
print "This string \n shows up on two lines.";
print 'This string \n shows up on only one.';
It outputs:
This string
shows up on two lines.
This string
shows up on only one.
Am I wrong somewhere?
the version of perl below:
perl -v
This is perl, v5.8.5 built for aix
Copyright 1987-2004, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.
I am inclined to say something is up with your shell/terminal, and whatever you are outputting to is interpreting the \n as a newline and that the problem is not with Perl.
To confirm: This Shouldn't Happen(TM) - in the first case I would expect to see a new line inserted, but with single quotes it ought to output literally the characters \n and not a new line.
In Perl, single-quoted strings do not expand backslash-escapes like \n or \t. The reason you're seeing them expanded is probably due to the nature of the shell that you're using, which is munging your output for some reason.
Everything you need to know about quoting and quote-like operators is in perlop.
To answer your specific question, double-quotes can turn certain sequences of literal characters into other characters. In your example, the double quotes turn the sequence of characters \ and n into the single character that represents a newline. In a single quoted string, that same literal sequence is just the literal \ and n characters.
By "interpreted", they mean that variable names and such will not be printed, but their values instead. \n is an escape sequence, so I'd think it would not be interpreted.
In addition to your O'Reilly link, a reference no less authoritative than the 'Programming Perl' book by Larry Wall, states that backslash interpolation does not occur in single quoted strings.
... much like Unix shell quotes: double quoted string literals are subject to
backslash and variable interpolation; single quoted strings are not
(except for \' and \\, so that you may ...)
Programing Perl, 2nd ed, 1996 page 16
So it would be interesting to see what your Perl does with
print 'Double backslash n: \\n';
As above, please show us the output from 'perl -v'.
And I believe I have confused the forum editor software, because that last Perl 'print' should have indented.
If you use the double quote it will be interpreted the \n as a newline.
But if you use the single quote it will not interpreted the \n as a newline.
For me it is working correctly.
file content
print "This string \n shows up on two lines.";
print 'This string \n shows up on only one.'