Talend - Postgresql > tWriteJSON > tRESTClient - talend

I would like an advise about a talend job ( i'm using Talend 7.0.1 ).
I have to feed an API Rest from postgresql database.
To reach my purpuse, i can either :
-1) extract my json on excel file then insert the file with tHttpRequest
-2) call tRestCLIENT and insert all my data
1) Excel solution :
tHttpRequest screen
For the record, i have to remove the main root and "\" if i want to use the excel solution.
2) tRestClient Solution ( i need this one )
Here is my job
Here the result
As you can see in the first tlogrow : "erreur de parametres de la methode"
It means that the json "file" is not good.
You can see in the 2nd tLogRow the data.
My problem is, when i use tFileOutputJSON, i have the root and "\" everywhere. But if i remove them both, i can use tHttpRequest.
So it means the file is good ( without root and "\" ).
here the excel extraction ( it's doesnt work like this )
here the excel without root and "\" ( this one works )
Excel is not the good solution, i dont want to update the file every morning ..
What did i do wrong with the tRestClient ? Maybe there is an other way ?
Thank you guys.

Related

psycopg2.DataError: extra data after last expected column CONTEXT: COPY csvfails, line 1:

Its a Django app in which im loading a CSV , table gets created OK but the CSV copying to PSQL fails with ERROR =
psycopg2.DataError: extra data after last expected column
CONTEXT: COPY csvfails, line 1:
Questions already referred -
"extra data after last expected column" while trying to import a csv file into postgresql
Have tested multiple times , with CSV of different Column Counts , am sure now the COLUMN Count is not the issue , its the content of the CSV file. As when i change the Content and upload same CSV , table gets created and dont get this error . Content of CSV file that fails is as seen below. Kindly advise what in this content prompts - psycopg2/psql/postgres to give this error .
No as suggested in the comment cant paste even a single row of the CSV file , the **imgur** image add-in , wont allow , not sure what to do now ?
Seen below screenshots from psql - cli - the table had been created with the correct columns count , still got the error .
EDIT_1 - Further while saving on my ubuntu , using libre office , unchecked the - Separator Options >> Separated By >> TAB and SEMICOLON . This CSV then saved with only -- Separator Options >> COMMA.
The python line of code which might be the culprit is =
with open(path_csv_for_psql, 'r') as f:
next(f) # Skip the header row.
csv_up_cursor.copy_from(f, str(new_table_name), sep=',')
conn.commit()
I thought i read somewhere that the - separator parameter passed to copy_from which is default = sep=',') , could be the issue ?

Copy Dynamically generated filename and paste it in an other Folder

I am held up with one of the task I need to perform to create CSV file with its name generated in run time and then Copy that same file and paste it to a different folder. I'm able to create the required file.
Here is what I've done till now:
In SSIS I'm taking a DFT in control flow and taking a view as my
OLEDB source, then pointing it to a Flat File destination and
creating a file in my desired location say folder x in a variable
i.e My_dest_folder for the variable I've created. Here are the steps I've followed.
My_dest_folder of type string and have given my folders path as the value.
Filename of type sting and gave a name say cv99351_ as the value.
Timestamp of type string and give the expression which generates a timestamp YYYYMMDDHHMISS format.
Archivefolder of type sting and gave another path where the generated file is supposed to be copied from My_dest_folder &
pasted into
archive folder.
In the connection string of my flat file connection manager, I have given the variables with
#My_dest_folder+#Filename+#Timestamp+".csv". which creates a file
with name cs99351_.csv in the folder x.
After the file is created I am trying to capture the filename from the My_dest_folder but since the timestamp also contains seconds I am not able to capture it everytime.
Can someone please help me out here? I would really appreciate it.
If someone want to save his files with SSIS, your description is already nice and could be use as a tutorial :)
But if I understand well you have a problem at the end of your process, when you try to get the filename generated.
To read it you use the same variable concatenation but sometimes your Timestamp can change and then you get an error (Your file doesn't exist)
If yes, I guess that you use a kind of GETDATE() function in the expression of your variable. It appears that SSIS will evalute the value of your variable each time you will request it.
I tested it:
I Ran 3 Insert statement and wait with the debugger between each.
It gave me 3 differents values:
I recommend you to not use your getdate() function in the variable expression.
You can retrieve it with a unique SQL Task (with a SELECT GETDATE() SQL Query) or with C# / VB method.
Does it solve your problem?
Regards,
Arnaud
I had a similar issue.
There was a specific named file in the source folder.
#[User::v_Orig_FileName] : #[$Project::v_FilePath]+ #[$Project::v_FileName]
It was renamed with the timestamp using GETDATE()
#[User::v_Archive_FileName] : #[$Project::v_FilePath] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Orig_FileName]
Destination Variable
#[User::v_Archive_FileName]
The file was moved into the archive folder. To get the source file name , I was using the exactly same variable name as the destination variable name in step 2.
#[User::v_Archive_Folder]: #[$Project::v_FilePath]+ "\\Archive"
#[User::v_Archive_ArchivedFileName] : #[User::v_Archive_Folder] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Archive_FileName]
Destination Variable
#[User::v_Archive_ArchivedFileName]
If the timestamp for step 2 and step 3 has even a second difference, there is an error as, pointed out above, that GETDATE() will evaluate each time when it is requested.
So the solution I came up with was swapping the Step 2 and Step 3.
There was a specific named file in the source folder.
#[User::v_Orig_FileName] : #[$Project::v_FilePath]+ #[$Project::v_FileName]
The file was moved into the archive folder.
#[User::v_Archive_Folder]: #[$Project::v_FilePath]+ "\\Archive"
#[User::v_Archive_OrigFileName] : #[User::v_Archive_Folder] +"\\"+ #[$Project::v_FileName]
Source Variable
#[User::v_Orig_FileName]
Destination Variable
#[User::v_Archive_OrigFileName]
It was renamed with the timestamp using GETDATE()
#[User::v_Archive_FileName] : #[User::v_Archive_Folder] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Archive_OrigFileName]
Destination Variable
#[User::v_Archive_FileName]
Hope this gives an idea to have a different spin on this issue.

Greenplum : Getting filenames processed via an external table

we are processing multiple files using external table. Is there any way I can get the file name being processed in external tables and stored it in database table?
Only workaround I can find is appending the file name to every record in the flat file which isn't ideal when huge dataset and multiple files.
Can anyone help on this
Thanks
No, the file name is simply never passed from the gpfdist daemon back to Greenplum. So you have to append the file name to each line - you can use gpfdist transformation for doing so
I was struggling with this as well, here's my solution. Please note I'm not an expert in linux, so there may be a one liner solution.
So I wanted to add a filename column in front of my records.
That can be done in sed, I've created a transform.sh file, with the following content:
#/bin/sh
filename=$1
#echo $filename >> transform.txt
sed -e "s|^|$filename\v|" $filename
Please note that I was using vertical tab as a delimiter, \v. Also in the filename you could have / hence using | . In order to have the value of $filename we have to use double quites for sed.
Test it, it looks good.
./transform.sh countersamples-2016-03-02--11-51-10.csv
countersamples-2016-03-02--11-51-10.csv
timestamp
machine
category
instance
name
value
countersamples-2016-03-02--11-51-10.csv
2016-03-02 11:51:10.064
DESKTOP-4PLQKVL
Memory
% Committed Bytes In Use
74.8485488891602
This part is done, lets continue with gpfdist. We need a yaml file that can be passed to gpfdist, I named this transform.yaml
Content:
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
add_filename:
TYPE: input
CONTENT: data
COMMAND: /bin/bash transform.sh %filename%
Please note that we have the %filename% value here. It seems that gpfdist prefilters the files that needs to be handled, and passes them 1 by 1 to our transform.
Lets fire up gpfdist:
gpfdist -c transform.yaml -v
Now go into greenplum and create an external table such as:
CREATE READABLE EXTERNAL TABLE "ext_transform"
(
"filename" text,
"timestamp" timestamp without time zone ,
"machine" text ,
"category" text ,
"instance" text ,
"name" text ,
"value" double precision
)
LOCATION ('gpfdist://localhost:8080/*/countersamples*.csv#transform=add_filename')
FORMAT 'TEXT'
( HEADER DELIMITER '\013' NULL AS '\\N' ESCAPE AS '\\' )
And when we select data from it:
select * from "ext_transform";
We see:
I've created 2 folders to see how it reacts if the files are not in the same folder as the transform. This way I can distinguish between the 2 files, even if their data is identical.

How to remove the extra space as a result of double quotes in SAS

I am new to SAS and I am trying create a batch file through a SAS program. The code is below:
data new;
enddate=date();
getdate=date()+1;
flname1=compress("d:\temp\file"||year(enddate)||put(month(enddate),z2.)||
put(day(enddate),z2.)||".txt");
begdate=enddate-&days;
dtline1=compbl(compress("00:00_"||put(begdate,mmddyy10.))||" "||
compress("00:00_"||put(getdate,mmddyy10.)));
file 'h:\programs\daily_file';
put 'LOGIN abc xyz';
put 'FILE(C:\temp\list.txt)
dtline1 "script.pl("flname1")";
put 'LOGOUT';
Script.pl is a perl script and in the resulting batch file, there is an extra space after flname1. It prints something like this:
script.pl(d:\temp\file_date ).
I don't want the this extra space after date. What can I do?
The easiest way to get that to work properly is simply to put the entire command (script.pl(filename)) into a single variable, then put that variable.
You can also use +(-1) in put to move the line pointer back one, if it's consistently off by one (though most of the time that's not needed).
put "script.pl(" flname1 +(-1) ")";
It really looks like you are working too hard to create string variables before writing to the file. You should be able to write what you want just using the features of the PUT statement. It is not clear from the question what format you want the file to have, but I think this code mimics your program.
%let days=7 ;
data new;
enddate=date();
getdate=date()+1;
begdate=enddate-&days;
file 'h:\programs\daily_file';
put 'LOGIN abc xyz'
/ 'FILE(C:\temp\list.txt) 00:00_' begdate mmddyy10.
' 00:00_' getdate mmddyy10.
' script.pl(d:\temp\file' enddate yymmddn8. '.txt)'
/ 'LOGOUT'
;
run;
and it produces:
LOGIN abc xyz
FILE(C:\temp\list.txt) 00:00_08/14/2015 00:00_08/22/2015 script.pl(d:\temp\file20150821.txt)
LOGOUT

Import password-protected xlsx workbook into R

How can I import a worksheet from a password-protected xlsx workbook into R?
I would like to be able to convert an Excel worksheet into a csv file without having to go through Excel itself.
It is possible for xls workbooks using the perl-based function xls2csv from package gdata. I gather that the problem is Spreadsheet::XLSX doesn't support it.
There are a variety of functions and packages for importing non-encrypted xlsx workbooks, but none seems to address this issue.
At present it seems the only alternatives are to go through Excel or figure out how to write perl code that can do it.
It looks to be what you need except it isn't with the xlsx package:
https://stat.ethz.ch/pipermail/r-help/2011-March/273678.html
library(RDCOMClient)
eApp <- COMCreate("Excel.Application")
wk <- eApp$Workbooks()$Open(Filename="your_file",Password="your_password")
tf <- tempfile()
wk$Sheets(1)$SaveAs(tf, 3)
To build on ed82's answer, there are a few caveats:
You may need to pass another password parameter, WriteResPassword. See docs here
I didn't find learning COM interface appealing after I got used to xlsx R package. So I would rather save a copy of the protected Excel file without a password immediately, close it, and read it in with another package:
eApp <- COMCreate("Excel.Application")
# Find out whether you need to pass **Password** or **WriteResPassword**
wk <- eApp$Workbooks()$Open(Filename= filename, Password="somepass", WriteResPassword = "somepass")
# Save a copy, clear the password (otherwise copy is still pass-protected)
wk$SaveAs(Filename = '...somepath...', WriteResPassword = '', Password = '')
# The copied file is still open by COM, so close it
wk$Close(SaveChanges = F)
# Now read into data.frame using a familiar package {xlsx}
my.data <- raed.xlsx('...somepath...', sheetIndex = ...)