Using Foreign Data Wrappers in PostgreSQL (variable filename) - postgresql

I'm running PostgreSQL 9.3 and want to import some daily generated csv files into specific tables.
I started playing with FDW (Foreign Data Wrapper) and pointed to a specific csv, where I can query and append/upsert to a table.
But I have two more needs:
- The file generation date and source branch is present in the filename, and only there.
I need to get this information and insert also in the table.
- As expected, the files names are not fixed, so the FDW doesn't know where to get the information.
I thought about solving this using some unix tools (although my Postgres runs on windows), basically for each file in a list (from a previously created index), the script would rename the file and pass the branch and date as parameters to a psql.exe command line, where the import would be from a fixed name in FDW.
This would work, but this script sound a bit like a hack and not a very "elegant" solution.
Does anyone has an better suggestion?
Thanks!

Related

Can I use a sql query or script to create format description files for multiple tables in an IBM DB2 for System I database?

I have an AS400 with an IBM DB2 database and I need to create a Format Description File (FDF) for each table in the DB. I can create the FDF file using the IBM Export tool but it will only create one file at a time which will take several days to complete. I have not found a way to create the files systematically using a tool or query. Is this possible or should this be done using scripting?
First of all, to correct a misunderstanding...
A Format Description File has nothing at all to do with the format of a Db2 table. It actually describes the format of the data in a stream file that you are uploading into the Db2 table. Sure you can turn on an option during the download from Db2 to create the FDF file, but it's still actually describing the data in the stream file you've just downloaded the data into. You can use the resulting FDF file to upload a modified version of the downloaded data or as the starting point for creating an FDF file that matches the actual data you want to upload.
Which explain why there's no built-in way to create an appropriate FDF file for every table on the system.
I question why you think you actually to generate an FDF file for every table.
As I recall, the format of the FDF (or it's newer variant FDFX) is pretty simple; it shouldn't be all that difficult to generate if you really wanted to. But I don't have one handy at the moment, and my Google-FU has failed me.

Talend Open Studio Big Data - Iterate and load multiple files in DB

I am new to talend and need guidance on below scenario:
We have set of 10 Json files with different structure/schema and needs to be loaded into 10 different tables in Redshift db.
Is there a way we can write generic script/job which can iterate through each file and load it into database?
For e.g.:
File Name: abc_< date >.json
Table Name: t_abc
File Name: xyz< date >.json
Table Name: t_xyz
and so on..
Thanks in advance
With Talend Enterprise version one can benefit of dynamic schema. However based on my experiences with json-s they are somewhat nested structures usually. So you'd have to figure out how to flatten them, once thats done it becomes a 1:1 load. However with open studio this will not work due to the missing dynamic schema.
Basically what you could do is: write some java code that transforms your JSON into CSV. Use either psql from commandline or if your Talend contains new enough PostgreSQL JDBC driver then invoke the client side \COPY from it to load the data. If your file and the database table column order matches it should work without needing to specify how many columns you have, so its dynamic, but the data newer "flows" through talend.
Really not cool but also theoretically possible solution: If Redshift supports JSON (Postgres does) then one can create a staging table, with 2 columns: filename, content. Once the whole content is in this staging table, INSERT-SELECT SQL could be created that transforms the JSON into tabular format that can be inserted into the final table.
However, with your toolset you probably have no other choice than to load these files with 1 job per file. And I'd suggest 1 dedicated job to each file. They would each look for their own files and triggered / scheduled individually or be part of a bigger job where you scan the folders and trigger the right job for the right file.

How to update the Postgresql using CSV file multiple times

I have a CSV file whose data is to be imported to Postgres database , I did it using import function in pgadmin III but the problem is my CSV file changes frequently so how to import the data overwriting the already existing data in database from CSV file ?
You can save WAL logging through an optimization between TRUNCATE/COPY in the same transaction. The basic idea is to wipe the database table with TRUNCATE and reimport the data with COPY. This doesn't need to be done manually with pgAdmin each time. It can be scripted with something like:
BEGIN;
-- The CSV file is 'mydata.csv' and the table is 'mydata'.
TRUNCATE mydata;
COPY mydata FROM 'mydata.csv' WITH (FORMAT csv);
COMMIT;
Note that it requires superuser access to work. The COPY command also takes various arguments, so you can adjust for different settings for null and headers etc.
Finally it should be noted that you ideally want these both to be in the same transaction. I'm not going to over-complicate this example here though as this level of care isn't needed in many of the real-world sorts of cases where one is copying in a CSV file. If you think your situation needs it, it's not too hard to track down.

Can one upload a csv file to postgres via pgadmin, without specifying column names beforehand?

Im trying to use the PgAdminn III Import tool and want to upload a .csv file. I dont know the column names OR column numbers beforehand, and would like to have them be populated on the fly. I also know that the number of columns is consistent across rows.
In the sense of having a table dynamically created for you from the CSV, no, not with PgAdmin-III or psql.
You'll want to write a quick script for that with your preferred scripting language + its PostgreSQL driver interface, or use an ETL tool like CloverETL, Pentaho Kettle, or Talend Studio.

Import to postgreSQL from csv filtering each line

I have the following question (even thorough research couldn't help me):
I want to import data from a (rather large) CSV/TXT file to a postgreSQL DB, but I want to filter each line (before importing it) based on specific criteria.
What command/solution can I use?
On sidenote: If I am not reading from file, but a data stream what is the relevant command/procedure?
Thank you all in advance and sorry if this has been in some answer/doc that I have missed!
Petros
To explain the staging table approach, which is what I use myself:
Create a table (could be a temporary table) matching your csv structure
Import into that table, doing no filtering
Process and import your data into the real tables using SQL to filter and process.
Now, in PostgreSQL, you could also use the file_fdw to give you direct sql access to csv files. In general the staging table solution will usually be cleaner, but you can do this by essentially letting PostgreSQL treat the file as a table and going through a foreign data wrapper.