I have couple of csv file, all of my csv files are about to identical but some columns in csv file are differ from one another. As an example:
csv 1,2,3 have these columns:
id name post title cdate mdate path
but in csv 4,5 have these columns:
id name post title ddate mdate fpath
My output should be like this:
id name post title cdate mdate ddate path fpath
How to achieve this? Currently I am follwoing this:
But in this procedure I can extract data from csv but not in preferred output..
You need to put each file type in different folder, let's say files 1,2,3 in folder1 and 4,5 in folder 2.
Now, insert files from one folder into you Mongo DB, using this job:
tFileList --(iterate)--> tFileInputDelimited --(file_schema)--> tMap ---(DB_schema)--> tMongoDBOutput
Here, we use tMap to get DB schema from the file schema, extra columns will remain blanked.
Finally, using a second job which is the same first job but tFileList points to the second folder and tMap have a join between the already written data and the new set of files based on the id, also file schema is different.
tMongoDBInput
|
|
tFileList --(iterate)--> tFileInputDelimited --(file_schema)--> tMap ---(DB_schema)--> tMongoDBOutput
You can use OnSubJobOK to link the first and the second job.
Related
I have a PostgreSQL query. Below the query.
select concat('abcdefghijklmnopqrstuvwyz',concat('abcdefghijklmnopqrstuvwyz','abcdefghijklmnopqrstuvwyz')) as ret3
union all
select concat('abcdefghijklmnopqrstuvwyz',concat('abcdefghijklmnopqrstuvwyz','abcdefghijklmnopqrstuvwyz')) as ret3
union all
select concat('abcdefghijklmnopqrstuvwyz',concat('abcdefghijklmnopqrstuvwyz','abcdefghijklmnopqrstuvwyz')) as ret3
The result of the query when exported to txt file is below
But I need the data to be exported in Text file on single line. Sample format is below
The requirement is, how can we export the data warehouse data to a text file , but in single line .
I tried a normal txt file import , PFB the screenshot ,
but the result is 3 line. Can somebody help, is there any way we can export the data warehoue result to a tet file ith a single line output.
I am using azure sql as source dataset and delimited file as sink dataset in the copy activity.
I tried copy activity but First row as header gives comma separated headers.
Is there way to change the header output style ?
Please note spacing is unequal (h3...h4)
In this repro, I tried to give
1 space between 1st and 2nd column,
2 spaces between 2nd and 3rd column,
3 spaces between 3rd and 4th column.
Also, I tried to give same column name for column2 and column3. The approach is as follows.
Data is copied from Azure SQL database to datalake in comma delimitted format as a staging file.
This staging file is taken as a source in Dataflow activity.
In source dataset, first row as header is not checked.
Data preview of Source transformation:
Derived column transformation is added to change the column name of column2 and column3.
In this case, date_col of column1 is header data. Thus when column1 is 'date_col' replace column2 and column3 data with same column name.
column_2 = iif(Column_1=='date_col','ECIX',Column_2);
column_3 = iif(Column_1=='date_col','ECIX',Column_3);
Again derived column transformation is added to concat all the columns with spaces. Column name is given as concat . Value for this column is
concat(Column_1,' ',Column_2,' ',Column_3,' ',Column_4)
Select transformation is added and only concat column is selected here.
In sink, new delimited file is added as a sink dataset. And in sink dataset also , first row as header is not checked.
Output file screenshot
After pipeline is run, the target file looks like this.
Keeping the source as azure sql itself in the dataflow, I created a single derived column 'OUTDC' and added all the columns from the source like this:
(h1)+' '+(h2)+' '+(h3)
Then fed the OUTDC to a delimited sink and kept the Headers option as single string like this:
['h1 h2 h2']
I have some CSV files that I want to copy to a specific folder in ADLS based on the date column within the file.
i.e. CSV file has a column named "date" that reads "2022-02-23" on all rows. I want to copy that file to a folder that has the corresponding year and month, such as "/curated/UK/ProjectABC/2022/02"
I've got a Lookup activity that's pointing to the source CSV file and populating a Set Variable activity with the month using this dynamic content - #substring(string(activity('Lookup1').output.firstrow.date),5,2)
Would this be the right approach, to use a variable?
I cant use variables in the Directory portion of the Sink Dataset, as far as I know.
Have you come across this situation before?
Sounds like you're on the right path. You can use absolutely use Dataset parameters:
Then populate them in your pipeline using a variable (or parameter, or expression):
Anyone know how to add header to csv sink? I have a data flow that's source is a database table. Then I have used derived column and concatenated the columns to make one column and split the data in the column by commas (done in the source via a query). I have then selected the column that has been concatenated to be export to csv.
Data example:
Matt,Smith,10
Therefore I technically only have one column, however, I want to add a header for each section of the data.
Desired output:
FirstName,LastName,Age
Matt,Smith,10
You can add headers in CSV file.
Select Data Flow Activity.
Select Source and use Select activity.
Add column names as shown in below screenshot.
Finally add Sink and run Pipeline.
I have two input files:
an .xlsx file that looks like this:
an .csv files that looks like this:
I already have a talend job that transforms the .xlsx file into an .xml file.
One node in the .xml file contains the
<stockLocationCode>SL213</stockLocationCode>
The output .xml file looks like this:
Now I need to replace every occurence of the stockLocationCode with the second column of the .csv file. In this case the result would be:
My talend job looks like this:
I use a tMap component to put the columns of the .xlsx file into the right node of the output xml file.
But I do not know how I can peplace the StockLocactionCode with the acutal full stock location using the .csv file. I tired to also map the .csv file with the tMap component.
I would neet to build in a methof that looks at the current value of the node <stockLocationCode> and loops over the whole .csv file until it find it in the first column of the .csv file and then replace the <stockLocationCode> content with the content of the second column of the .csv file.
Performance is not important ;)
First, you'll need a lookup in e.g. a tMap or tXMLMap component, where you map your keys and add a new column with the second column of the csv file
The resulting columns would look like this:
Product; Stock Location Code; CSV 2nd column data
Now in a second map you could just remove the stock location code and do the rest of your job.
Voila, you exchanged the columns.
u can use tXMLMap which lookup