Opencsv export bean based on annotations - annotations

I really like the #CsvBindByName annotation used to tell OpenCsv which properties should be translated when reading a CSV file.
Writing a CSV file with BeanToCsv seems more of a hassle, because you'll always need to specify the column names seperately in a mappingStrategy. Is there a way to use OpenCSV to generate CSV based on all (annotated) String properties of a bean, without using a seperate mapping / column list?

Related

Comma within a field of csv file. Exporting the data on regular basis

We will be exporting the data through command line data loader
I have a CSV file - it has many values with comma as a part of them. The commas within the fields will mislead and making it seem like the row has more columns than previously.
Name,Amount,Address
Me,20,000,My Home,India
you,23,300,Your Home,Where
What are my options here as it will be automated process.

Azure Data Factory reading Blob folder dealing with random characters

I would like to define a dataset that is has a file path of
awsomedata/1992-12-25{random characters}MT/users.json
I am unsure of how to use the expression language fully. I have figured out the following
#startsWith( pipeline().parameters.filepath(),concat('awsomedata/',formatDateTime(utcnow('d'), 'yyyy-MM-dd')), #pipeline().parameters.filePath)
The dataset will change dynamically, I am trying to tell it to look at the file each trigger to determine the schema.

How to Validate Data issue for fixed length file in Azure Data Factory

I am reading a fixed-width file in mapping Data Flow and loading it to the table. I want to validate the fields, datatype, lengths of the field that I am extracting in the Derived column using substring.
How to Achieve this in ADF
Use a Conditional Split and add a condition for each property of the field that you wish to test for. For data type checking, we literally just landed new isInteger(), isString() ... functions today. The docs are still in the printing press, but you'll find them in the expression builder. For length use length().

Creating spectrum table in matillion for csv file with comma inside quotes

I have a scenario for creating spectrum table in redshift using matillion.
my CSV file data is like this:-
column1,column2,column3
abc,"qwety,pqr",xyz
but in spectrum table i am seeing data
as
column1 column2 column3
abc qwerty pqr
Matillion is not taking quotes value as one.
can you please suggest how to achieve this using matillion's EXTERNAL TABLE component.
Basically you would like to specify a quote parameter for your CSV data.
Redshift has 2 ways of specifying external tables (see Redshift Docs for reference):
using the default built-in SerDes and properties like ROW FORMAT DELIMITED, FIELDS TERMINATED BY
explicitly specifying a SerDe with ROW FORMAT SERDE, WITH SERDEPROPERTIES
I don't think it's possible to specify a quote parameter using the built-in SerDes.
It is possible to specify them using org.apache.hadoop.hive.serde2.OpenCSVSerde (look here for details on it's properties), but beware that there are know problems with it, as one described in this SO question.
Now for Metillion:
I have never used Matillion, but looking at their Redshift External Table documentation page, looks like it's only possible to specify the FORMAT and the FIELD TERMINATOR, but not to specify a SerDe and it's properties, hence it's not possible to specify the quote parameters for the external table - unless there are some undocumented means to specify a custom SerDe.
Personal note:
We have experienced many problems with ingesting data stored as CSV, and we basically try to avoid it. There's no standard for CSV, each tool implements it's own version of support for it, and it's very difficult convince all your tools to see data the same way.

Add headers in csv file using azure data factory while moving to sink

How can we add headers to the files existing in the blob/ azure data lake using azure data factory.
I am using a copy activity to move the header less files to the sink, but while moving the files should have default headers like "Prop_0" or "Column_1". Any method available to achieve the same?
Any help would be appreciated.
Thanks and Regards,
Sandeep
In usually, Data factory will using the default header Prop_0, Prop_1...Prop_N for the less header csv file to help us copy the data, if we don't set the first row as header.
This is to help us do the column mapping but won't change the csv file.
According my experience and know about Data Factory, it doesn't support us do the schema change of the csv file. It's impossible to add the header of the csv files at least for now.
Hope this helps
In ADF, create a new Data Flow. Add your CSV source with a no header dataset. Then add your Sink with a dataset that writes to ADLS G2 folder as a text delimited file WITH headers. In the Sink Mapping, you can name your columns:
I tried a different solution. Used the 'no delimiter' option to keep all of them as one column. Then, In the derived column action, I split the single column into multiple columns and provided a proper name for each column. Now we can map the columns into the target table.