PostgreSQL copy from succeeds but fewer rows copied then lines in file - postgresql

I have several large csv (tab delim) files I need to copy into one table. Each file is five million rows. The first two files copy just fine with all five million records. But on the the third I get missing data.
I only copy about 3.9 million instead of five million. There is no error. It runs just fine. But fewer rows copied then exist in the file.
I have reviewed the text files and indeed there are five million distinct rows in the text file.
So after a very manual trial and error process I found the the last row that wrote correctly (annoying the frame that didn't write was neither in the end or the beginign). It appears that perhaps their is an issue with a particular field. The field ends with the following string: ."' (this is period, double quote, single quote) I am using tab deliminted, but is it posible that postgress is reading this as some kind of special character? I think all of the subsequent rows may be writing into the that field for that row.
Just to add some more context -- the field in which the double quotes are throwing things off also happens to be an email field. So there a email with a typo in it and a double quote. Then 1.1 million rows later there is another email with a typo in it with a double quote. All the records between these two double quotes don't get written correctly.

That is not surprising if you consider that a logical line in a CSV file can span more than one physical line:
1,a text,2019-11-24
2,"a text
that contains a newline",2020-04-01

Related

CopyData recursively copying files to a sink struggles with different column order

I have 20+ delimited files in DataLake2 that are pulled in recursively via a single Copy Data activity that runs 16 sub-processes. About 5 of them have slightly different column order -- 1 column is moved in those 5 files. The ADF seems to struggle occasionally with these files, because it seems to be assuming that the headers line up with other files.
Does this sound possible/correct? There are just 109 columns and the column that is transposed is in column 104 on most of the files, but is in column 98 on these 5 files.
The error I get when importing from these files is:
Column 'xxx_date_time' contains an invalid value 'F'
But looking both in Excel and in a text editor, 'xxx_date_time' is blank (as it should be), but only relative to the order of the columns in the specific files. If you were to use the standard column order from the other 15+ files, there is an 'F' there.
I have done some command line work to ensure there are an even number of quotes (') and same number of column delimiters (;) in each line so I don't think that formatting is off. The line endings are all \r.
In summary, any ideas why this would be happening and why the specific order of the individual file headers are being ignored? Is this a bug/feature of the Copy Data activity?

Import Flat File via SSMS to SQL Server fails

When importing a seemingly valid flat file (csv, text etc) into a SQL Server database using the SSMS Import Flat File option, the following error appears:
Microsoft SQL Server Management Studio
Error inserting data into table. (Microsoft.SqlServer.Import.Wizard)
Error inserting data into table. (Microsoft.SqlServer.Prose.Import)
Object reference not set to an instance of an object. (Microsoft.SqlServer.Prose.Import)
The target table may contain rows that imported just fine. The first row that is not imported appears to have no formatting errors.
What's going wrong?
Check the following:
that there are no blank lines at the end of the file (leaving the last line's line terminator intact) - this seems to be the most common issue
there are no unexpected blank columns
there are no badly escaped quotes
It looks like the import process loads lines in chunks. This means that the lines following the last successfully loaded chunk may appear to have no errors. You need to look at subsequent lines, that are part of the failing chunk, to find the offending line(s).
This cost me hours of hair pulling while dealing with large files. Hopefully this saves someone some time.
If the file you're importing is already open, SSMS will throw this error. Close the file and try again.
Make sure when you are creating your flat-file IF you have text (varchar) value in any of your columns, DO NOT select your file to be comma "," delimited. Instead, select vertical line "|" or something that you are SURE it can't be in those values. the comma is supper common to have in nvarchar filed.
I have this issue and none of the recommendations from other answers helped me!
I hope this saves someone some times and it took me hours to figure it out!!!
None of these other ones worked for me, however this did:
When you import a flat file, SSMS gives you a brief summary of the data types within each column. Whenever you see a nvarchar that's in an int or double column, change it to int or double. And change all nvarchars to nvarchar(max). This worked for me.
I've been working with csv data for a long time. I encountered the similar problems when I first started this job, however as a novice, I couldn't obtain a precise fault from the exceptions.
Here are a few things you should look at before importing anything.
Your csv file must not be opened in any software, such as Excel.
Your csv file cells should not include comma or quotation symbols.
There are no unnecessary blanks at the end of your data.
There is no usage of a reserved term as data. In Excel, open
yourfile and save it as a new file.
After considering all the suggestions, if anyone is still having issues, check the length of the DataType for your columns. It took hours for me to figure this out but increasing the nvarchar length from (50) to (100) worked for me.
One thing that worked for me : You can change the error range to 1 in "Modify colums"
Image for clarity of where it is
You get an error message with the specific line that's problematic in your file instead of "ran out of memory"
I fixed these errors by playing around with the data type. For instance, change my tinyint to smallint, smallint to int, and increased my nvarchar() to reasonable values, else I set it to nvarchar(MAX). Since most of the real-life data do have missing values, I checked allowed missing values in all columns. Everything then worked with a warning message.

Keep leading zeros when joining data sources in tableau

I am trying to create a data source in Tableau (10.0) where I am joining a table from SQL with an Excel file. The join happens on a site id but when reading the id from the excel source, Tableau strips the leading zeros (and SQL keeps leading zeros). I see this example
to add the leading zeros back as a new, calculated field. But the join is still dropping rows because the id is not properly formatted when making the join.
How do I get the excel data source to read the column with the leading zeros so I can do the join?
Launch Excel and choose to open a new blank workbook.
Click the Data tab and select From Text.
Browse to the saved CSV file and select Import.
Ensure that Delimited is selected and click Next.
Leave Tab as the delimiter and click Next.
Select the column containing the data with leading zeros and click
Text.
Repeat for each column which contains leading zeros.
Click Finish.
Click OK.
Never heard of or used tableau, but it sounds as though something (jet/ace database driver being used to read excel file?) is determining the column to be numeric and parsing the data as numbers, losing leading zeroes
If your attempts at putting them back are giving you grief, I'd recommend trying the other direction instead; get sqlserver to convert its strings to numbers. Number matching should be more reliable than String matching, so long as the two systems don't handle rounding differently :)
If your Excel file was read in from a CSV and the Site ID is showing "Number Stored as Text", I think you can solve your problem by telling Tableau on the Data Source entry that the field is actually a string. On the preview data source view, change the "#" (designating number) to string so that both the SQL source and the Excel source are both strings before doing the join.
This typically has to do with the way Excel stores values as mentioned above. I would play around with the number formatting for the Site ID column in Excel itself, not Tableau, and changed that two "Text" in Excel. You can verify if Tableau will read it properly with the leading 0s by exporting your excel file to csv and looking in the csv files to see if the leading 0s are still there.

COPY ignore blank columns

Unfortunately I've got some huge number of csv files with missing separator as following. Notice the second data got only 1 separator with 2 values. Currently I'm getting "delimiter not found error".
Only if I could insert NULL to 3rd column in case there is only two values.
1,avc,99
2,xyz
3,timmy,6
Is there anyway I can COPY this files into Redshift without modifying CSV files?
Use the FILLRECORD parameter to load NULLs for blank columns
You can check the docs for more details

Postgresql: Execute query write results to csv file - datatype money gets broken into two columns because of comma

After running Execute query write results to file - the columns in my output file for datatype money get broken into two columns. e.g if my revenue is $500 it is displayed correctly. But, if my revenue is $1,500.00 - there is an issue. It gets broken into two columns $1 and $500.00
Can you please help me getting my results in a csv file in a single column for datatype money?
What is this command "execute query write results to file"? Do you mean COPY? If so, have a look at the FORCE QUOTE option http://www.postgresql.org/docs/current/static/sql-copy.html
Eg.
COPY yourtable to '/some/path/and/file.csv' CSV HEADER FORCE QUOTE *;
Note: if the application that is consuming the csv files still fails because of the comma, you can change the delimiter from "," to whatever works for you (eg. "|").
Additionally, if you do not want CSV, but you do want TSV, you can omit the CSV HEADER keywords and the results will output in tab-separated format.
Comma is the list separator of our computer for some regions, some region semicolon is the list separator. so I think you need to replace the comma when you write it to csv.