COPY ignore blank columns - amazon-redshift

Unfortunately I've got some huge number of csv files with missing separator as following. Notice the second data got only 1 separator with 2 values. Currently I'm getting "delimiter not found error".
Only if I could insert NULL to 3rd column in case there is only two values.
1,avc,99
2,xyz
3,timmy,6
Is there anyway I can COPY this files into Redshift without modifying CSV files?

Use the FILLRECORD parameter to load NULLs for blank columns
You can check the docs for more details

Related

CopyData recursively copying files to a sink struggles with different column order

I have 20+ delimited files in DataLake2 that are pulled in recursively via a single Copy Data activity that runs 16 sub-processes. About 5 of them have slightly different column order -- 1 column is moved in those 5 files. The ADF seems to struggle occasionally with these files, because it seems to be assuming that the headers line up with other files.
Does this sound possible/correct? There are just 109 columns and the column that is transposed is in column 104 on most of the files, but is in column 98 on these 5 files.
The error I get when importing from these files is:
Column 'xxx_date_time' contains an invalid value 'F'
But looking both in Excel and in a text editor, 'xxx_date_time' is blank (as it should be), but only relative to the order of the columns in the specific files. If you were to use the standard column order from the other 15+ files, there is an 'F' there.
I have done some command line work to ensure there are an even number of quotes (') and same number of column delimiters (;) in each line so I don't think that formatting is off. The line endings are all \r.
In summary, any ideas why this would be happening and why the specific order of the individual file headers are being ignored? Is this a bug/feature of the Copy Data activity?

Data Conversion Failed SQL

I am using the import and export wizard and imported a large csv file. I get the following error.
Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data
conversion for column "firms" returned status value 2 and status text "The
value could not be converted because of a potential loss of data.".
(SQL Server Import and Export Wizard)
Upon importing, I use the advanced tab and make all of the adjustments. As for the field in question, I set it is numeric (8,0). I have since went through this process multiple times and tried 7,8,9,10,and 11 to no avail. I import the csv into excel and look at the respective column, firms. It shows no entry with more than 5 characters. I thought about making it DT_String but will need to manipulate that column eventually by averaging it. Also, have searched for spaces or strange characters and found none.
Any other ideas?
1) Try changing the Numeric precision to numeric(30,20) both in source and destination table.
2) Change the data type to str/wstr and adjust the output column width while importing. It will run fine. It happened with me as well while loading large CSV file of approx 5 GB. After load, use Try_convert function to convert it back to numeric and check the values which went null while conversion, you will find the root cause then.

Attributes names are not unique in Weka 3.8

I am having trouble importing a CSV file. I get the following error: File "filename.csv" not recognised as an 'CSV data files' file. Reason: Attributes names are not unique! Causes: '2' '1'.
Can anyone tell me how to fix these issues? I am using Weka 3.8 on a Windows 10 64 bit laptop.
Thanks in advance.
Just make sure to have a column name that's going to be unique vis-a-vis attribute values. This happens for me when I applied StringtoWordVector and get strings which are of the same name as my column name. Just give a good column name :)
WEKA will assume that the first row of data is the names of the columns, but the version of the NSL-KDDCup Dataset that I looked at on
github
did not have column headers. Since the first row had some repeated values, you get this error message. I will suggest two solutions.
The above noted github has a weka-friendly arff file with the data.
Add column headers to the csv file. What should the column headers be? They are listed in the arff file. :-)
It happens when attribute name is same, in more than one column of the excel sheet. Just rename the column name which are same. It should be unique. This worked for me
I was getting the same error when I uploaded a dataset to weka. When I examined the columns of the dataset, I found that the same column name was present. When I changed the name of one of the two different columns of the 'fwd header length' value, the error was fixed.

Read csv file excluding first column and first line

I have a csv file containing 8 lines and 1777 columns.
I need to read all the contents in matlab, excluding the first line and first column. First line and first column contain strings and matlab can't parse them.
Do you have any idea?
data = csvread(filepath);
The code above reads all the contents
As suggested, csvread with a range will read in the numeric data. If you would like to read in the strings as well (which are presumably column headers), you can use readtable:
t = readtable(filepath);
This will create a table with the column headers in your file as variable names of the columns of the table. This way you can keep the strings associated with the data, if need be.

Postgresql: Execute query write results to csv file - datatype money gets broken into two columns because of comma

After running Execute query write results to file - the columns in my output file for datatype money get broken into two columns. e.g if my revenue is $500 it is displayed correctly. But, if my revenue is $1,500.00 - there is an issue. It gets broken into two columns $1 and $500.00
Can you please help me getting my results in a csv file in a single column for datatype money?
What is this command "execute query write results to file"? Do you mean COPY? If so, have a look at the FORCE QUOTE option http://www.postgresql.org/docs/current/static/sql-copy.html
Eg.
COPY yourtable to '/some/path/and/file.csv' CSV HEADER FORCE QUOTE *;
Note: if the application that is consuming the csv files still fails because of the comma, you can change the delimiter from "," to whatever works for you (eg. "|").
Additionally, if you do not want CSV, but you do want TSV, you can omit the CSV HEADER keywords and the results will output in tab-separated format.
Comma is the list separator of our computer for some regions, some region semicolon is the list separator. so I think you need to replace the comma when you write it to csv.