Import to postgreSQL from csv filtering each line - postgresql

I have the following question (even thorough research couldn't help me):
I want to import data from a (rather large) CSV/TXT file to a postgreSQL DB, but I want to filter each line (before importing it) based on specific criteria.
What command/solution can I use?
On sidenote: If I am not reading from file, but a data stream what is the relevant command/procedure?
Thank you all in advance and sorry if this has been in some answer/doc that I have missed!
Petros

To explain the staging table approach, which is what I use myself:
Create a table (could be a temporary table) matching your csv structure
Import into that table, doing no filtering
Process and import your data into the real tables using SQL to filter and process.
Now, in PostgreSQL, you could also use the file_fdw to give you direct sql access to csv files. In general the staging table solution will usually be cleaner, but you can do this by essentially letting PostgreSQL treat the file as a table and going through a foreign data wrapper.

Related

Multiple table import from html page

I am a beginner in using databases, I decide upon Postgres as I have learnt it is usable with Python.
I have reports that I receive in the form of html files, each file has multiple (100+) tables when parsed with Pandas data frame function, there is no unique ID common among all tables, and each table has unique columns.
Is it possible to import all tables, and merge them as a single table with ALL the columns in it, and have each report be a single entry in this new table with a PostgreSQL built-in feature, or do I have to develop a data pipeline using python and add them in manually?
I hope my question is clear enough.
Thank you.

How to update the Postgresql using CSV file multiple times

I have a CSV file whose data is to be imported to Postgres database , I did it using import function in pgadmin III but the problem is my CSV file changes frequently so how to import the data overwriting the already existing data in database from CSV file ?
You can save WAL logging through an optimization between TRUNCATE/COPY in the same transaction. The basic idea is to wipe the database table with TRUNCATE and reimport the data with COPY. This doesn't need to be done manually with pgAdmin each time. It can be scripted with something like:
BEGIN;
-- The CSV file is 'mydata.csv' and the table is 'mydata'.
TRUNCATE mydata;
COPY mydata FROM 'mydata.csv' WITH (FORMAT csv);
COMMIT;
Note that it requires superuser access to work. The COPY command also takes various arguments, so you can adjust for different settings for null and headers etc.
Finally it should be noted that you ideally want these both to be in the same transaction. I'm not going to over-complicate this example here though as this level of care isn't needed in many of the real-world sorts of cases where one is copying in a CSV file. If you think your situation needs it, it's not too hard to track down.

Import Data to cassandra and create the Primary Key

I've got some csv data to import to cassandra. This could work with the copy-command. The Problem is, that the csv doesn't serve a unique ID for the data so I need to create a timeuuid on import.
Is it possible to do this via copy-command or did I need to write a external script for importing?
I would write a quick script to do it, the copy command can really only handle small amounts of data anyway. Try the new python driver. I find it quite fast to setup loading scripts with, especially if you need any sort of minor modifications of the data before being loaded.
If you have a really big set of data bulk-loading is still the way to go.

Filemaker Pro Advanced - Scripting import from ODBC with variable target table

I have several tables I'm importing from ODBC using the import script step. Currently, I have an import script for each and every table. This is becoming unwieldy as I now have nearly 200 different tables.
I know I can calculate the SQL statement to say something like "Select * from " & $TableName. However, I can't figure out how to set the target table without specifying it in the script. Please, tell me I'm being dense and there is a good way to do this!
Thanks in advance for your assistance,
Nicole Willson
Integrated Research
Unfortunately, the target table of an import has to be hard coded in FileMaker up through version 12 if you're using the Import Records script step. I can think of a workaround to this, but it's rather convoluted and if you're importing a large number of records, would probably significantly increase the time to import them.
The workaround would be to not use the Import Records script step, but to script the creation of records and the population of data into fields yourself.
First of all, the success of this would depend on how you're using ODBC. As far as I can think, it would only work if you're using ODBC to create shadow tables within FileMaker so that FileMaker can access the ODBC database via other script steps. I'm not an expert with the other ODBC facilities of FileMaker, so I don't know if this workaround would be helpful in other cases.
So, if you have a shadow table into the remote ODBC database, then you can use a script something like the following. The basic idea is to have two sets of layouts, one for the shadow tables that information is coming from and another for the FileMaker tables that the information needs to go to. Loop through this list, pulling information from the shadow table into variables (or something like the dictionary library I wrote which you can find at https://github.com/chivalry/filemaker-dictionary). Then go to the layout linked to the target table, create a record and populate the fields.
This isn't a novice technique, however. In addition to using variables and loops, you're also going to have to use FileMaker's design functions to determine the source and destination of each field and Set Field By Name to put the data in the right place. But as far as I can tell, it's the only way to dynamically target tables for importing data.

Migrating a schema from one database to other

As part of some requirement, I need to migrate a schema from some existing database to a new schema in a different database. Some part of it is already done and now I need to compare the 2 schema and make changes in the new schema as per gap finding.
I am not using a tool and was trying to understand some details using syscat command but could not get much success.
Any pointer on what is the best way to solve this?
Regards,
Ramakant
A tool really is the best way to solve this – IBM Data Studio is free and can compare schemas between databases.
Assuming you are using DB2 for Linux/UNIX/Windows, you can do a rudimentary compare by looking at selected columns in SYSCAT.TABLES and SYSCAT.COLUMNS (for table definitions), and SYSCAT.INDEXES (for indexes). Exporting this data to files and using diff may be the easiest method. However, doing this for more complex structures (tables with range or database partitioning, foreign keys, etc) will become very complex very quickly as this information is spread across a lot of different system catalog tables.
An alternative method would be to extract DDL using the db2look utility. However, you can't specify the order that db2look outputs objects (db2look extracts DDL based on the objects' CREATE_TIME), so you can't extract DDL for an entire schema into a file and expect to use diff to compare. You would need to extract DDL into a separate file for each table.
Use SchemaCrawler for IBM DB2, a free open-source tool that is designed to produce text output that is designed to be diffed. You can get very detailed information about your schema, including view and stored procedure definitions. All of the information that you need will be output in a single file, and can be compared very easily using a standard diff tool.
Sualeh Fatehi, SchemaCrawler
unfortunately as per company policy, cannot use these tools at this point of time. So am writing some program using JDBC to get the details and do some comparison kind of stuff.