defining escape character for a csv import - import

I have a source file that has text columns which end with a "\" and I have specified "^" as the column delimiter.
I have the file format for this specified use - ESCAPE = 'NONE', but rows with "\^" are causing premature end-of-line errors - assuming SF is not interpreting the "\^" as a column delimiter - therefore the column count is off.
I have changed the file format to use something else for ESCAPE but get the same message. The offending rows have the right number of columns and a text column containing "\", that is not the last character in the column, imports correctly.
The values are exported from SQL Server.
Is this an escape character problem or am I overlooking something else? I am new to SF.

I was seeing this same issue. Nomatter what I used as an escape character, when it showed up in my file next to a " at the end of a string it started causing trouble.
I switched my delimiter to \u0001 which is a special "start of header" character that very rarely shows up, especially at the end of strings.
I wouldn't say this was an ideal option for us, but it worked and is something you might want to try.

Related

Trying to work around the error DF-CSVWriter-InvalidEscapeSetting

So I have a dataset which I want to export to csv with pipe as separator and no escape character.
That dataset contains in fact 4 source columns, 3 regular ones (just text) and one variable one.
That last column holds another subset of values that are also separated with a pipe.
Purpose is that the export looks like this, where the values are coming from my 4th field.
COL1|COL2|COL3|VAL1|VAL2|VAL3|....
The number of values can be different for each record but.
When I set the csv export separator to ";", I get this result which is expected
COL1;COL2;COL3;VAL1|VAL2|VAL3|....
However setting it to "|", it throws the error DF-CSVWriter-InvalidEscapeSetting.
Most likely because it detected the separator character in my 4th field and then enforces that an escape character needs to be set.
Which is a logical thing in most case but in my case I would like him to ignore this and just export as-is.
Any way how I can work around this, perhaps with a different approach or some additional settings?
Split & flatten produces extra rows but that's not what I want.
Regards,
Sven Peeters
As you have the same characters in the column value same as your delimiter character, with no escape character in your dataset will throw an error.
You have to change the delimiter character to a different character or add a Quote character and Escape character to Double quote(").
Downloaded file:

Trouble rendering CSV data as an interactive table in GitHub

When viewed, any .csv file committed to a GitHub repository automatically renders as an interactive table, complete with headers and row numbering. By default, the first row is your header row. The tables were supposed to look nice as below:
However, there's an error happening in my tabular data, and despite indicating the error, I can't fix it:
I'm using a .csv file with a semicolon separator. Does anyone have an idea of what's happening?
According to the docs, Github can only do its lay-out thing with .csv (comma-separated) and .tsv (tab-separated) files.
Using a semicolon as a separator isn't supported, at least not officially, and a spurious comma in a semicolon-separated file could well throw the algorithm off.
You could try replacing all semicolons with tabs and see how you fare.
If that doesn't work, try using commas as separators and enclose all text table cell data with quotes, like:
"Liver fibrosis, sclerosis, and cirrhosis","c370800","102922","Cystic fibrosis related cirrhosis","Diagnosis of liver fibrosis, sclerosis, and cirrhosis"
Note: no spaces after the commas. Also, if you have quotes in the text fields, you will have to escape those to "" (two quotes), or the algorithm will get confused.
You may get away with using quotes only for the offending text data, but that could well be more difficult to generate than just putting the quotes around all fields.

Identify hidden control character and ignore when scanning csv file

I am trying to use textscan in MATLAB to read in mixed format data from a .csv file. I am currently running into a problem that there are a number of nonvisible characters which are getting read in as a string when I am not expecting them. I believe if I set this character as a delimiter or whitespace it will solve my text scanning issue.
My main problem at the moment is that I don't know what character it is to be able to identify it. I have used isstrprop to determine that it is a control character. I guessed that it was the NUL character, so I tried adding \0 to the delimiter set for textscan. Unfortunately MATLAB does not recognize that as a valid \ constant.
Below is one line of the data file, copied from Notepad. The characters preceding each of the commas are the ones in question. The following line is the command I used in MATLAB to read it.
1 ,T,171215,173201,21.982413N,159.342881W,150 ,0 ,0 ,3D,SPS ,2.7 ,2.5 ,1.0 ,
C = textscan(fid,'%d%s%d%d%s%s%d%d%d%s%s%f%f%f%s','delimiter',',','headerlines',1,'MultipleDelimsAsOne',1)
Also, for what it's worth, using deblank on the string of characters that is read in does remove them. However, I only know how to apply this after the textscan, so the characters still throw off the parsing.
How can I identify this character and set it to be ignored by textscan?

Postgres import a double quote value

I have a large .csv file with 9 million rows. Some of these columns contain text with quotes or other special characters in them I would like to import from this .csv file into the database. For example I would like to import this row:
ID BH Units Name Type_building Year_cons
1 4 900.00 schoolgebouw "De Bolster Schoolgebouw 2014-01-01
As you can see there is a double quote in the fourth column. None of the values in the .csv file are quoted, but sometimes a double quote or backslash '\' appears in the text. When I try to upload the data using:
\COPY <tablename> FROM <path to file> WITH CSV DELIMITER ';' NULL '\N';
It gives an error message: ERROR value to long for type character varying(25).
Apparently it sees the double quote as the start of a string and it tries to combine everything after it in the .csv file (including the fifth and sixth column) into a single cell (so that cell will contain 'De Bolster Schoolgebouw 2014-01-01'), which doesnt fit because the 'Name' column allows max 25 characters.
I found a similar topic (Is it possible to turn off quote processing in the Postgres COPY command with CSV format?) in which this solution was presented:
\COPY <tablename> FROM <path to file> WITH CSV DELIMITER ';' QUOTE E'\b' NULL '\N';
I think what it does is sets the quote value (default is double quote) to something else, in this case a backspace, so it won't recognize a double quote as a quote anymore. However when I run this I get another error: INVALID input syntax for integer.
What has happened is that every value now is quoted, so ID with value '1' becomes value '"1"' and because ID is defined as an integer it won't accept quotes.
Do you have any idea how to import double quotes and other special characters from a .csv file into a postgres database?
Thanks in advance!!
Based on the error message, I'd be suspicious it has anything to do with double quoting or anything of the sort -- had it been so, it would have been a widely reported bug and fixed ages ago.
When it comes to Postgres, the error messages are almost always correct and helpful. As such, consider the very real possibility that there are more characters than meets the eye.
My own guess is that you've some trailing (or leading) spaces in there somewhere, and as such have pieces of data that look 24 characters long when viewed in a spreadsheet while being, in fact, longer.
If you don't, my second guess would be some kind of bizarro character sets conflicts or effects. Perhaps you've some double byte characters, or two single characters behaving as a single one due to a diacritic in there. These look fine in the viewer you're using for your data; but then when these get interpreted or viewed as utf8 they end up counting as two distinct characters. Unlikely imo, but possible (example).
Lastly and per Frank's suggestion, try removing the length constraint. It is only slowing you down as things stand, because it slows down inserts and is preventing you to move forward. Once done importing, re-add the constraint to the table's definition. You'll then be able to find the offending rows using the likes of:
select name from table where length(name) > 24;
... and upon fixing them, you'll be able to re-add your constraint if it serves any purpose. (Hint: it doesn't, or at the very least shouldn't have. There's a real person out there whose name is: "Kim-Jong Sexy Glorious Beast Divine Dick Father Lovely Iron Man Even Unique Poh Un Winn Charlie Ghora Khaos Mehan Hansa Kimmy Humbero Uno Master Over Dance Shake Bouti Bepop Rocksteady Shredder Kung Ulf Road House Gilgamesh Flap Guy Theo Arse Hole Im Yoda Funky Boy Slam Duck Chuck Jorma Jukka Pekka Ryan Super Air Ooy Rusell Salvador Alfons Molgan Akta Papa Long Nameh Ek.")

How to handle new line characters when using COPY in POSTGRESQL

I have text that has the following form in my csv:
'0001'|'text1'|'\ntext2'|'text3'\n
However when I try to import the data into my postgres instance, it keeps breaking by thinking the first newline character is the start of a new line. Is there an easy way to tell postgres to import the newline character into the field?
If delimiters are explicitly set you avoid the trouble of special characters being interpreted, and instead are taken literally. The same thing can be said about quotes. The parser needs to know how to recognize strings to not interpret \n as a newline.
Here's the documentation:
Backslash characters () can be used in the COPY data to quote data
characters that might otherwise be taken as row or column delimiters.
In particular, the following characters must be preceded by a
backslash if they appear as part of a column value: backslash itself,
newline, carriage return, and the current delimiter character.
SO, you might have
COPY data FROM STDIN WITH CSV HEADER DELIMITER E'|' QUOTE E'\'';