I am faced with a situation where we get a lot of CSV files from different clients but there is always some issue with column count and column length that out target table is expecting.
What is the best way to handle frequently changing CSV files. My goal is load these CSV files into Postgres database.
I checked the \COPY command in Postgres but it does have an option to create a table.
You could try creating a pg_dump compatible file instead which has the appropriate "create table" section and use that to load your data instead.
I recommend using an external ETL tool like CloverETL, Talend Studio, or Pentaho Kettle for data loading when you're having to massage different kinds of data.
\copy is really intended for importing well-formed data in a known structure.
Related
I have an AS400 with an IBM DB2 database and I need to create a Format Description File (FDF) for each table in the DB. I can create the FDF file using the IBM Export tool but it will only create one file at a time which will take several days to complete. I have not found a way to create the files systematically using a tool or query. Is this possible or should this be done using scripting?
First of all, to correct a misunderstanding...
A Format Description File has nothing at all to do with the format of a Db2 table. It actually describes the format of the data in a stream file that you are uploading into the Db2 table. Sure you can turn on an option during the download from Db2 to create the FDF file, but it's still actually describing the data in the stream file you've just downloaded the data into. You can use the resulting FDF file to upload a modified version of the downloaded data or as the starting point for creating an FDF file that matches the actual data you want to upload.
Which explain why there's no built-in way to create an appropriate FDF file for every table on the system.
I question why you think you actually to generate an FDF file for every table.
As I recall, the format of the FDF (or it's newer variant FDFX) is pretty simple; it shouldn't be all that difficult to generate if you really wanted to. But I don't have one handy at the moment, and my Google-FU has failed me.
I have a .fbk file and I want to convert it into Excel or csv file.
I used https://www.rebasedata.com/convert-fbk-to-csv-online , but I have to pay to get all data.
Does enyone know any free software or another way to covert it?
A .fbk-file is - usually - a Firebird gbak backup file, which means it is a binary logical backup of a Firebird database. You first need to restore it, after restore, you can query the tables in the database, and produce CSV from it using any tool that is capable of querying a Firebird database and produce CSV (eg FBExport, and most database query tools, for example DBeaver).
I'm not aware of tools that can directly convert .fbk to CSV files.
I am new to talend and need guidance on below scenario:
We have set of 10 Json files with different structure/schema and needs to be loaded into 10 different tables in Redshift db.
Is there a way we can write generic script/job which can iterate through each file and load it into database?
For e.g.:
File Name: abc_< date >.json
Table Name: t_abc
File Name: xyz< date >.json
Table Name: t_xyz
and so on..
Thanks in advance
With Talend Enterprise version one can benefit of dynamic schema. However based on my experiences with json-s they are somewhat nested structures usually. So you'd have to figure out how to flatten them, once thats done it becomes a 1:1 load. However with open studio this will not work due to the missing dynamic schema.
Basically what you could do is: write some java code that transforms your JSON into CSV. Use either psql from commandline or if your Talend contains new enough PostgreSQL JDBC driver then invoke the client side \COPY from it to load the data. If your file and the database table column order matches it should work without needing to specify how many columns you have, so its dynamic, but the data newer "flows" through talend.
Really not cool but also theoretically possible solution: If Redshift supports JSON (Postgres does) then one can create a staging table, with 2 columns: filename, content. Once the whole content is in this staging table, INSERT-SELECT SQL could be created that transforms the JSON into tabular format that can be inserted into the final table.
However, with your toolset you probably have no other choice than to load these files with 1 job per file. And I'd suggest 1 dedicated job to each file. They would each look for their own files and triggered / scheduled individually or be part of a bigger job where you scan the folders and trigger the right job for the right file.
Im trying to use the PgAdminn III Import tool and want to upload a .csv file. I dont know the column names OR column numbers beforehand, and would like to have them be populated on the fly. I also know that the number of columns is consistent across rows.
In the sense of having a table dynamically created for you from the CSV, no, not with PgAdmin-III or psql.
You'll want to write a quick script for that with your preferred scripting language + its PostgreSQL driver interface, or use an ETL tool like CloverETL, Pentaho Kettle, or Talend Studio.
We are writing a testing framework from scratch using Perl. Each test case writes a log file and we are planning to archive the resulting log files created by each test case for reporting purposes.
Now we are using PostgreSQL database for storing the results. Now how do I archive the text log file in PostgreSQL database? I googled and found out that bytea datatype can be used to store files in binary format. If I do so how do i retrieve it back as text?.
Any ideas will be appreciated.
If your log files are text files, then you should use the TEXT datatype to store them. If the log files are binary (or, perhaps, compressed text files), then you'd want to use BYTEA. In either case, you can INSERT and SELECT them just like any other column type when using DBI. If they're really large then you might want to play with the LongReadLen DBI parameter and read the DBI manual section on BLOBs.