Transform CSV File and Load into PostgreSQL - postgresql

My CSV data set has a variable number of columns due to certain groups of fields having a different number of sets based on how I query the data (timeframe).
For example, I have:
Field 1 Field 2 Field 3 [Field 4.1.1 Field 4.1.2 Field 4.1.3 Field 4.2.1 Field 4.2.2 Field 4.2.3]
My new PostgreSQL tables contain:
- **Table 1**
-------------
- Field 1 Field 2 Field 3
- **Table 2**
-------------
- Field 4.X.1 Field 4.X.2 Field 4.X.3
where each X is a new row in table 2.
Recommendations on ETL or other programs to automate this process?

Related

To read dynamic column changes from source table

I am using Talend free version
I have below requirement:
My source is MS ACCESS; table SRC_CUST.
SRC_CUST
CUST_ID CUST_NAME
101 ABC
102 LMN
My target is .csv file TGT_CUST
Requirement: I am using tAccessInput component for MS Access table and I want to load that table into .csv file. My columns are varying day to day on daily basis.
Day 1: SRC_CUST has 2 columns CUST_ID and CUST_NAME so I need to load as it is into .csv file
Day 2: SRC_CUST has 3 columns CUST_ID, CUST_NAME, CUST_ADD so on day 2 I need to load these 3 columns without changing any code, means, I need to achieve column change dynamically .
Note: I am using Talend free version so I neither use any dynamic component nor dynamic data type. I cannot even add columns in "Edit schema" under basic settings of tAccessInput component because my columns are varying.
Please help me for same.
Thanks,
Vaishali Shinde

Tableau collate data from multiple columns

I am not sure what the technical term for what I am trying to do is.
Hoping raw data and output below will clearly define the use case.
Raw data :
This is what my raw data looks like
Output 1 :
this is what I am trying extract first
Here I am trying to get a table where the first column has the name of the guests and 2nd column has the count of times they have featured in the table as a guest.
Output 2 :
this what I am trying extract next
Here I am trying to map months against names and see how many nights one has collected in which month.
One way to achieve this would be to create a temp table with 5 columns,column 1 with Guest names,
column 2 with count of occurrence in guest 1 column in raw data table,
column 3 with count of occurrence in guest 2 column in raw data table,
column 4 with count of occurrence in guest 3 column in raw data table,
column 5 with total of previous 3 columns.
But I am trying to find a proper solution through tableau, if possible. Because this way would not help me achieve Output 2.
Plain text raw data if you'd like to work on it :
booking by,Guest 1,Guest 2,Guest 3,stay start,stay end,hotel code
Ram,Seema,Ram,,May 1 2018,May 2 2018,BBST
Karan,Ram,Seema,,May 6 2018,May 7 2018,BRRLY
Mahesh,Mahesh,Seema,Ram,June 2 2018,June 4 2018,BBST
Krishna,Krishna,,,June 2 2018,June 3 2018,BRRLY
Seema,Seema,,,June 7 2018,June 8 2018,BRRLY

Pentaho spoon transformation from excel file

I have yearly data in my excel file in such format:
Country \ Years 1980 1981 ... 2010
Abkhazia 234 334 ... 456
Afghanistan 466 789 ... 732
...
Here is picture
And I want my data transform to 3 different tables and load it to postgres database.
Tables should look something like that
First table - country:
id | name
1 | Abkhazia
2 | Afghanistan
Second table dates:
id | date
1 | 1980
2 | 1981
And third is a table where all data is stored depending on country and date:
country_id date_id data
1 1 234
1 2 334
2 1 466
2 2 789
... ... ...
Any ideas how I could achieve my goal?
Assuming the source excel structure is as below (i have custom built this):
There are basically 3 parts to your question. I break down the transformation into part for better understanding:
1. Loading Table - Country
This is pretty straight forward based on the data given in the excel. Simply take an
Excel Input >> Add a sequence step. Give the Sequence name as Country ID >> Select only the Country Name and Country ID >> Load into the Country Table using Table Output.
2. Loading Table - Year:
The idea here is to display the Year ID in Row wise format instead of the columns given the excel source data. PDI version 5 and above provides you with a very useful step called Metadata Structure. This step allows you to get the structure of your table. In this case, we need to have the year columns pulled, ignoring the country column.
Follow the steps as below:
Read the Excel Data >> Get the Metadata structure of your source >> Filter Out the Country Column (which is available in row at position=1) >> Add a Sequence Number. Name it YearID >> Finally Load the Year Table.
3. Loading the Final Table - Country and Year along with Data:
The way to display all the column data values to a row level in PDI is using Row Normalizer step. Use this step to display a normalized output. Now follow the below steps:
Read the Excel source data >> use Row Normalizer Step to normalize the rows based on the Years >> Do a Stream Lookup with the Above Country and Year tables to fetch the CountryID and YearID respectively >> Finally Load the necessary column data into Table Output
Hope it helps :)
I have placed the codes in github repo along with the data file which i have used. Its here.
Also, just realized that i have given the wrong naming conventions as per your question. Consider date_id as YearID and instead of id's i have given countryid and yearid.

Give rank using field values on table in Talend

Right Now, I have 8 tables that needs to transform into 1, and I need to add Rank to the Output Table.
By using, Amount Collected field from 1 of the 8 table.
Sample:
Table A: amount_assignment
Table B: amount_collected
OutputTable: Rank= 1 (based on the highest collected)
How can I place 1, 2, 3.... on the Output Table field Rank based on the computed 'amount_collected'?
you can try to use your inputdataflow-->tSortRow-->tMap. In tSortrow you can sort data based on amount column you need and then further in tMap you can put a sequence number to every row using Numeric.sequence("sequencename",1,1) in expression for rank_column

Microsoft Access single cell in a Multiple Item form

I have database including a Multiple Item form. It includes one table and one query which are Budget table and SumofCost query. In budget table, there are budget codes like 1, 1.1, 1.2, 1.2.1, 1.2.2 and so on. What I'm trying to do is that I want a sum of 1.2.1 and 1.2.2 in 1.2 or 1.1 and 1.2 in 1 cell because they are sub categories. However, there is not a field in query or table for 1.2 or 1. This means that I have to create a sum for these fields. In multiple items form, 1 and 1.2 cells are empty because I created form according to budget items and if there is not any data for 1 then access makes that field blank. How can I create a sum for these fields? I tried to split this multiple item form and tried to filter according to budget code like 1.* or 1.2.*. But I couldn't split it. In link there is an example image for what I want to do.
Appreciate any help. Thanks.
With a table [Budget]...
BudgetCode
----------
1
1.1
1.2
1.2.1
1.2.2
...and a query [SumofCost]...
BudgetCode SumOfCost
---------- ---------
1.1 100
1.2.1 200
1.2.2 150
...the query...
SELECT BudgetCode, SumOfCost AS SumOfBudget
FROM SumofCost
UNION ALL
SELECT BudgetCode, DSum("SumOfCost", "SumofCost", "BudgetCode LIKE """ & [BudgetCode] & "*""")
FROM Budget
WHERE BudgetCode NOT IN (SELECT BudgetCode FROM SumofCost)
ORDER BY 1
...produces...
BudgetCode SumOfBudget
---------- -----------
1 450
1.1 100
1.2 350
1.2.1 200
1.2.2 150