Azure ADF Copy Activity with Trailing Column Delimiter - azure-data-factory

I have a strange source CSV file where it contains a trailing column delimiter at the end of each record just before the carriage return/new line.
When ADF is previewing this data, it displays only 2 columns without issue and all the data rows. However, when using the copy activity, it fails with the following exception.
ErrorCode=DelimitedTextColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The
name of column index 3 is empty. Make sure column name is properly
specified in the header
Now I understand why it's complaining about this due to trailing delimiter, but my question is whether or not there is a way to deal with this condition? I've tried including the trailing comma in the record delimiter (,\r\n), but then it just pivots the data where all the columns become rows.
Is there a way to address this condition in copy activity?

When preview the data in dataset, it seams correct:
But actually in copy actives, the data will derived to 3 columns by the column delimiter ",", the third column is empty or NULL value. This will cause the error.
If you use Data Flow import projection from source, you can see the third column:
Just for now, copy active doesn't support modify the data schema. You must use Data flow Derived Column to create a new schema for the source. For example:
Then mapping the new column/schema to sink will solve the problem.
HTH.

Use a different encoding for your CSV. CSV utf-8 will do the trick.

Related

Add Double-Quote on Header of Blob file

I have a Copy data activity, where the source is SQL Server Query and sink is a blob file.
The blob file is created successfully but it doesn't have a double-quote in the header, same as the rows. Can that be configured in ADF?
Blob file:
Unfortunately, that is not possible in Azure Datafactory. As we explicitly declare the First row as header then it's going to take the first row as column names and wont be having double quotes same as rows. Because, Quote character & Escape characters is only for the rows, you can avoid having quotes in the rows as well.
Here, is the way you can have double quotes only when you again run a Copy Activity using the previous output blob file as source and sink as to another blob then it could be possible, and eliminating to declare Row as header for both source and sink datasets:
I found a better solution without creating another Copy Activity. In the Mapping section of the copy activity, just add double-quote (") on the column name.

Importing issue in postgresql using pgadmin 4

The file is not importing after having created a table. The first line of code is for the table (COPY), the second line of code is for the path of the file (FROM) and the WITH I am not entirely sure if there's a prior line of code that needs to be entered for its success as its not being highlighted in pink. The importing should be going through in either the built-in tool of pgAdmin or the syntax but neither of them generates the needed output. Here are some screenshots:
So I did another table, this time focusing on a single column and ensuring that the name of the column matched on both the table and the file and it worked. The prior example had several columns that had difference in spellings of the column content in table and the file:
You can try this sequentially...
1. First create csv file. .csv file column sequence is most important.
2. Consider the below employee_info.csv file
And consider your database table employee_info table which contain (emp_id [numeric],emp_name[character],emp_sal[numeric],emp_loc [character])
Then Execute the below query
a. copy employee_info(emp_id,emp_name,emp_sal,emp_loc) from 'C:\Users\Zbook\Desktop\employee_info.csv' DELIMITERS ',' CSV;
Note: Ensure that each .csv file row value has not null. Like below...

Copy text file using postgres with custom delimiter by character size

I need to copy a text file which has confusing delimiter. I believe the delimiter is space. However, some of the column values are empty and I cannot differentiate which column which making it harder to load the data to database since the space is not indicating anything. Thus, when I try to COPY, the mapping is not right and I am getting ERROR: extra data after last expected column
I have tried to change the delimiter to comma and such, I am still getting the same error above. The below code can be used when I try to load some dummy data with proper delimiter.
COPY usm00070219(HEADREC_ID,YEAR,MONTH,DAY,HOUR,RELTIME,NUMLEV,P_SRC,NP_SRC,LAT,LON) FROM 'D:\....\USM00070219-data.txt' DELIMITER ' ';
This is example data:
It should have 11 columns but the data on the first row is only 10 and it cannot identify the empty value column. The spacings are not helpful at all!
Is there any way I can separate the columns by character size as delimiter and force the data to be divided by the size given?
COPY is not made to handle fixed-width text files. I can think of two options:
Load the file as it is into a table with a single text column using COPY. Then use regexp_split_to_array to split it into its components and inser these into another table.
You can use file_fdw to create a foreign table with a single text column like above and operate on that. That saves loading the file into the database.
There is a foreign data wrapper for fixed-width text files that you can try.

Column defined in source Dataset could not be found in the actual source

I have an ADF Copy Data flow and I'm getting the following error at runtime:
My source is defined as follows:
In my data set, the column is defined as shown below:
As you can see from the second image, the column IsLiftStation is defined in the source. Any idea why ADF cannot find the column?
I've had the same error. You can solve this by either selecting all columns (*) in the source and then mapping those you want to the sink schema, or by 'clearing' the mapping in which case the ADF Copy component will auto map to columns in the sink schema (best if columns have the same names in source and sink). Either of these approaches works.
Unfortunately, clicking the import schema button in the mapping tab doesn't work. It does produce the correct column mappings based on the columns in the source query but I still get the original error 'the column could not be located in the actual source' after doing this mapping.
could you check that is there a column named 'ae_type_id' in your schema? If that's the case, could you remove that column and try again? The columns in the schema must be aligned with columns in the query.
The issue is caused by an incomplete schema in one of the data sources. My solution is:
Step through the data flow selecting the first schema, Import projection
Go to the flow and Data Preview
Repeat for each step.
In my case, there were trailing commas in one of the CSV files. This caused automated column names to be created in the import allowing me to fix the data file.

Getting Redshift error 1214 during copy

I have the following table in redshift:
Column | Type
id integer
value varchar(255)
I'm trying to copy in (using the datapipeline's RedshiftCopyActivity), and the data has the line 1,maybe as the entry trying to be added, but I get back the error 1214:Delimiter not found, and the raw_field_data value is maybe. Is there something I'm missing in the copy parameters?
The entire csv is three lines that goes:
1,maybe
2,no
3,yes
You may want to take a look at the similar question Redshift COPY command delimiter not found.
Make sure your RedshiftCopyActivity configuration includes FORMAT AS CSV from https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv.
Be sure your input data has your configured delimiter between every field, even in the case of nulls.
Be sure you do not have any trailing blank lines.
You can run the following SQL (from the linked question) to see more specific details of what row is causing the problem.
SELECT le.starttime,
d.query,
d.line_number,
d.colname,
d.value,
le.raw_line,
le.err_reason
FROM stl_loaderror_detail d,
JOIN stl_load_errors le
ON d.query = le.query
ORDER BY le.starttime DESC;