I am using Talend free version
I have below requirement:
My source is MS ACCESS; table SRC_CUST.
SRC_CUST
CUST_ID CUST_NAME
101 ABC
102 LMN
My target is .csv file TGT_CUST
Requirement: I am using tAccessInput component for MS Access table and I want to load that table into .csv file. My columns are varying day to day on daily basis.
Day 1: SRC_CUST has 2 columns CUST_ID and CUST_NAME so I need to load as it is into .csv file
Day 2: SRC_CUST has 3 columns CUST_ID, CUST_NAME, CUST_ADD so on day 2 I need to load these 3 columns without changing any code, means, I need to achieve column change dynamically .
Note: I am using Talend free version so I neither use any dynamic component nor dynamic data type. I cannot even add columns in "Edit schema" under basic settings of tAccessInput component because my columns are varying.
Please help me for same.
Thanks,
Vaishali Shinde
Related
I have a ADF data flow with many csv files as a source and a SQL database as a sink. The data in the csv files are similar with 170 plus columns wide however not all of the files have the same columns. Additionally, some column names are different in each file, but each column name starts with the same corresponding 3 digits. Example: 203-student name, 644-student GPA.
Is it possible to map source columns using the first 3 characters?
Go back to the data flow designer and edit the data flow.
Click on the parameters tab
Create a new parameter and choose string array data type
For the default value as per your requirement, enter ['203-student name','203-student grade',’203-student-marks']
Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
We're going to change the first 3 column names to the new names defined in the parameter
To do this, add 3 rule-based mapping entries in the bottom pane
For the first column, the matching rule will be position==1 and the name will be $parameter11
Follow the same pattern for column 2 and 3
Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-dynamic-columns#parameterized-column-mapping
I have 3 queries.
I need to join(merge) 2 columns in a table(there are 2 different columns in the table, need to merge them as one), kindly help with the syntax
I need to map accounts in a table but the issue being there is no unique identifier, the 2 columns are of same brand alcohol but the issue being the same alcohol has different name in different websites.
I am trying to move data from excel to snowflake but in few rows the data are of this format Meiomi Rosé, so the subsequent rows are not getting loaded and i tried to use REPLACE_INVALID_CHARACTERS 'True' but it is showing error.
Tried multiple syntaxt such as
alter table sales_ws merge SUPPLIER_VOLUME and VOLUME_UNITS;
alter table sales_ws 'join' (SUPPLIER_VOLUME, VOLUME_UNITS) AS Quantity;
tried to use REPLACE_INVALID_CHARACTERS 'True' within file format but did not work
I am using Tableau Prep to load PDFs to SQL server. The PDF contains several tables. All the columns except the one with NULL values is not created. If there are 12 columns in the PDF, I'm seeing only 11 in the Output. There's a column that has NULL values. Is there a way to create a column to have NULL values? The column is blank initially(eg, Jan, and values are being populated from Feb onwards) and then is being populated with float values. The PDF's need to be loaded daily. I created the column but it gives an error 'Error adding Text[columnname]. Expected different text.' Prep Version is 20.3.3.
Do I continue loading the Jan files and then is it possible to add the column and map it Feb onwards?
Is there another way to accomplish this?
Thank you in advance.
A little late answering but, you can add a null column as follows:
After connecting to your pdf table, add a Clean activity
Select data grid view and right click on a column
Select Create Calculated Field then Custom Calculation
Name your calculation and set the calculation value as NULL
Reset the data type as Number (decimal)
Add an output activity, saving output to Database table
I am adept in both SQL and CR, but this is something I've never had to do.
In CR, I load a table that will always contain 1 record. There is a range of columns (like Column1, Column2 ... Column60). (bad design, I know, but I can't do anything to change that).
Thanks to this old design I have to manually add each column in my report like this:
-----------
| TABLE |
-----------
| Column1 |
| Column2 |
| Column3 |
| ... |
-----------
Now I would like to be able to create a subreport and create a datasource for it in such a way that [Column1...Column60] becomes a collection [Row1...Row60]. I want to be able to use the detailsection of the subreport to dynamically generate the table. That would save me a lot of time.
Is there any way to do that? Maybe a different approach to what I had in mind?
Edit
#Siva: I'll describe it the best way I can. The table exists out of 500+ columns and will only hold 1 record (never more). Because normalization was never taken into account when creating these tables (Objective C / DBF ages) columns like these: Brand01,Brand02,Brand03...Brand60 should have been placed in a separate table named "Brands"
The document itself is pretty straight forward considering there's only one record. But some columns have to be pivoted (stacked vertically) and placed in a table layout on the document which is a lot of work if you have to do it manually. That's why I wanted to feed a range of columns into my subreport so I can use the detail section of my subreport to generate the table layout automatically.
Ok got it... I will try to answer to the extent possible...
you need to have 2 columns in report that will show the 60 column names as 60 rows as 1st column and 60 column data as 2nd column. For this there are two ways that I can think of.
if columns are static and report need to be developed only once then though its a tough job manually create 120 formulas 60 for row names where you will write column names and 60 for data for respective columns and place in report since you have only one record you will get correct data. Like below:
formula 1:
column1 name // write manually
Formula 1:
databasefield for column1 // this has data for column1
Above will be one row in report like this you will get 120 formulas 60 rows and you don't need sub report here main report will do the job.
Since you are expecting dynamic behavior (Though columns are static), you can create view from database perspective or datatable (Please note I have no idea on datatable use it as per your convinience).
Create in such a way that it has 2 columns in table and in report use cross tab that will give you dynamic behaviour.
In cross tab column1 will be rows part and column 2 will be data.
Here also I don't see any requirement for sub report you can directly use main report. If you want sub report you can use aswell no harm since you have only 1 record
I have yearly data in my excel file in such format:
Country \ Years 1980 1981 ... 2010
Abkhazia 234 334 ... 456
Afghanistan 466 789 ... 732
...
Here is picture
And I want my data transform to 3 different tables and load it to postgres database.
Tables should look something like that
First table - country:
id | name
1 | Abkhazia
2 | Afghanistan
Second table dates:
id | date
1 | 1980
2 | 1981
And third is a table where all data is stored depending on country and date:
country_id date_id data
1 1 234
1 2 334
2 1 466
2 2 789
... ... ...
Any ideas how I could achieve my goal?
Assuming the source excel structure is as below (i have custom built this):
There are basically 3 parts to your question. I break down the transformation into part for better understanding:
1. Loading Table - Country
This is pretty straight forward based on the data given in the excel. Simply take an
Excel Input >> Add a sequence step. Give the Sequence name as Country ID >> Select only the Country Name and Country ID >> Load into the Country Table using Table Output.
2. Loading Table - Year:
The idea here is to display the Year ID in Row wise format instead of the columns given the excel source data. PDI version 5 and above provides you with a very useful step called Metadata Structure. This step allows you to get the structure of your table. In this case, we need to have the year columns pulled, ignoring the country column.
Follow the steps as below:
Read the Excel Data >> Get the Metadata structure of your source >> Filter Out the Country Column (which is available in row at position=1) >> Add a Sequence Number. Name it YearID >> Finally Load the Year Table.
3. Loading the Final Table - Country and Year along with Data:
The way to display all the column data values to a row level in PDI is using Row Normalizer step. Use this step to display a normalized output. Now follow the below steps:
Read the Excel source data >> use Row Normalizer Step to normalize the rows based on the Years >> Do a Stream Lookup with the Above Country and Year tables to fetch the CountryID and YearID respectively >> Finally Load the necessary column data into Table Output
Hope it helps :)
I have placed the codes in github repo along with the data file which i have used. Its here.
Also, just realized that i have given the wrong naming conventions as per your question. Consider date_id as YearID and instead of id's i have given countryid and yearid.