Perl dbd-sqlite, is there a equivalent to the .import function? - perl

In many of my scripts I am using sqlite for reporting info and I need first to upload my big table data (millions of csv rows). In the past I have found that .import was quicker than line by line inserting (even using transactions).
Nowadays my scripts implement a method that do system call for sqlite3 db '.import ....'. I wonder if it is possible to call .import from dbd-sqlite. Or it would be better to keep calling insert from system?.
PD: The reason for wanting to call .import from inside dbd-sql is to remove the sqlite3 dependency when my software is installed elsewhere.

.import is a SQLite-specific command, so you won't find a DBI method for it which is independent of the database driver; while any given database engine almost certainly has equivalent functionality, each will implement it differently (e.g. SQLite .import vs MySQL LOAD DATA INFILE, &c.)
If you're looking for true engine independence, you'll need to import your data by means of INSERT queries, which can be relied upon in the simplest case to work more or less equivalently everywhere. However, if the difference in execution time is significant enough, it may be worth your while to write an engine-agnostic interface to the import functionality, with a wrapper around each engine's specific import command, and determining from the currently active database driver (or some other method, depending on your code) which wrapper to invoke at runtime.

if you are not opposed to "shelling out"
perl -e 'system(qq(sqlite3 foo.db ".import file.dat table")) and die $!'

Related

How to improve import speed on SQL Workbench/J

Tried like below, but it imports terribly slow, with speed 3 rows/sec
WbImport -file=c:/temp/_Cco_.txt
-table=myschema.table1
-filecolumns=warehouse_id,bin_id,cluster_name
---deleteTarget
-batchSize=10000
-commitBatch
WbInsert can use the COPY API of the Postgres JDBC driver.
To use it, use
WbImport -file=c:/temp/_Cco_.txt
-usePgCopy
-table=myschema.table1
-filecolumns=warehouse_id,bin_id,cluster_name
The options -batchSize and -commitBatch are ignored in that case, so you should remove them.
SQL Workbench/J will then essentially use the equivalent of a COPY ... FROM STDIN. That should be massively faster than regular INSERT statements.
This requires that the input file is formatted according to the requirements of the COPY command.
WbImport uses INSERT to load data. This is the worst way to load data into Redshift.
You should be using the COPY command for this as noted in the Redshift documentation:
"We strongly recommend using the COPY command to load large amounts of data. Using individual INSERT statements to populate a table might be prohibitively slow."

Importing many columns from a CSV into Postgres

I am trying to import to pgAdmin a big table with more than 100 columns. Is there any way to import the table without creating those 100 columns in a table within the pgAdmin? That would be a considerably time-consuming task.
You are not importing data into pgAdmin, you are importing it into Postgres, and using pgAdmin to help you in that task. Graphical tools like pgAdmin are, at heart, just convenience wrappers around the actual functionality of the database, and everything they do can be done in other ways.
In the case of a simple task like creating a table, the relevant SQL syntax is well worth learning. It will work in any database tool, even (with some minor changes) on other SQL databases (e.g. MySQL), can be saved in version control, and manipulated with an editor of your choice.
You could even go so far as to write a script in the language of your choice that generates the SQL for you based on some other data (e.g. the headings of the CSV file) - although make sure you don't run that with third-party data without checking the result or taking extreme care with code injection and other security concerns!
The Postgres manual has an introduction to tables and creating them which would be a good place to start.

fixed width data into postgres

Looking for good way to load FIXED-Width data into postgres tables. I do this is sas and python not postgres. I guess there is not a native method. The files are a few GB. The one way I have seen does not work on my file for some reason (possibly memory issues). There you load as one large column and then parse into tables. I can use psycopy2 but because of memory issues would rather not. Any ideas or tools that work. Does pgloader work well or are there native methods?
http://www.postgresonline.com/journal/index.php?/archives/157-Import-fixed-width-data-into-PostgreSQL-with-just-PSQL.html
Thanks
There's no convenient built-in method to ingest fixed-width tabular data in PostgreSQL. I suggest using a tool like Pentaho Kettle or Talend Studio to do the data-loading, as they're good at consuming many different file formats. I don't remember if pg_bulkload supports fixed-width, but suspect not.
Alternately, you can generally write a simple script with something like Python and the psycopg2 module, loading the fixed-width data row by row and sending that to PostgreSQL. psycopg2's support for the COPY command via copy_from makes this vastly more efficient. I didn't find a convenient fixed-width file reader for Python in a quick search but I'm sure they're out there. You can use whatever language you like anyway - Perl's DBI and DBD::Pg do just as well, and there are millions of fixed-width file reader modules for Perl.
The Python Pandas library has a function pandas.read_fwf which works great.
Data can be read in using python, then written to Postgres database.

Can COPY FROM tolerantly consume bad CSV?

I am trying to load text data into a postgresql database via COPY FROM. Data is definitely not clean CSV.
The input data isn't always consistent: sometimes there are excess fields (separator is part of a field's content) or there are nulls instead of 0's in integer fields.
The result is that PostgreSQL throws an error and stops loading.
Currently I am trying to massage the data into consistency via perl.
Is there a better strategy?
Can PostgreSQL be asked to be as tolerant as mysql or sqlite in that respect?
Thanks
PostgreSQL's COPY FROM isn't designed to handle bodgy data and is quite strict. There's little support for tolerance of dodgy data.
I thought there was little interest in adding any until I saw this proposed patch posted just a few days ago for possible inclusion in PostgreSQL 9.3. The patch has been resoundingly rejected, but shows that there's some interest in the idea; read the thread.
It's sometimes possible to COPY FROM into a staging TEMPORARY table that has all text fields with no constraints. Then you can massage the data using SQL from there. That'll only work if the SQL is at least well-formed and regular, though, and it doesn't sound like yours is.
If the data isn't clean, you need to pre-process it with a script in a suitable scripting language.
Have that script:
Connect to PostgreSQL and INSERT rows;
Connect to PostgreSQL and use the scripting language's Pg APIs to COPY rows in; or
Write out clean CSV that you can COPY FROM
Python's csv module can be handy for this. You can use any language you like; perl, python, php, Java, C, whatever.
If you were enthusiastic you could write it in PL/Perlu or PL/Pythonu, inserting the data as you read it and clean it up. I wouldn't bother.

PL/Python or Psycopg2

I have a Postgresql 9.1 server, and I want to write some functions on Python.
There 2 ways: plpy or psycopg2. For me writing functions in plpy is like nightmare, a lot of "prepare" and "execute" methods... more comfortably to use psycopg2, but I care about efficiency.
Is it correct using psycopg2 on server?
They are 2 different things. plpy (PL/Python) is a stored procedure language like PLpgSQL and psycopg2 is a client library which allows Python code to access a PostgreSQL database. With PL/Python, you will need to make a connection to the database and then call your function from the database. With psycopg2, you would write Python code to call the database. Unless you have specific requirements of writing a Python function that runs from within PostgreSQL itself, use pysocpg2.
For me writing functions in plpy is like nightmare
You've already made your mind up. Even if you use pl/python, any bugs in your code will be attributed to the language or its environment. After a while you will be able to abandon your pl/python code and re-implement it the way you always wanted to*.
The beauty of it is, that as Jim says above the two interfaces do completely different things - one running inside the database within a single transaction, the other outside as normal client code. There's almost no overlap between the use cases for the two languages. Somehow though you've missed that even while making up your mind about which is "good" or "bad".
*We all have our programming predjudices, but I'm not sure there's much point in opening a question with a bold statement of them.