Low performance of postgres_fdw extension - postgresql

I need to periodically copy data from the TMP database to the remote PROD database with some data modifications in columns.
When I use the postgres_fdw extension from the PROD database (with mapping foreign schema), the process of copying a million records lasts 6 minutes.
insert into prod.foreign_schema.foreign_table
(select * from tmp.public.table limit 1000000);
However, when I use the dblink to copy the same table from the PROD database (SQL is running on the PROD database, not on the TEMP), the process lasts 20 seconds.
insert into prod.public.table
(select * from dblink('host=192.1... port=5432 dbname=... user=… password=…. connect_timeout=2', 'select * from tmp.production.table limit 1000000') as tab (id integer…..)
);
How can I optimize and shorten the process of copying data from the TEMP database?
I have to run SQL commands on the TMP database.
TMP and PROD database are in this same versions (10).

The first statement will effectively run many small inserts, albeit with a prepared statement, so you don't have the planning overhead each time. So you'll have more round trips between the two servers, which probably is the reason for the difference.

Related

How to load bulk data to table from table using query as quick as possible? (postgresql)

I have a large table(postgre_a) which has 0.1 billion records with 100 columns. I want to duplicate this data into the same table.
I tried to do this using sql
INSERT INTO postgre_a select i1 + 100000000, i2, ... FROM postgre_a;
However, this query is running more than 10 hours now... so I want to do this more faster. I tried to do this with copy, but I cannot find the way to use copy from statement with query.
Is there any other method can do this faster?
You cannot directly use a query in COPY FROM, but maybe you can use COPY FROM PROGRAM with a query to do what you want:
COPY postgre_a
FROM PROGRAM '/usr/pgsql-10/bin/psql -d test'
' -c ''copy (SELECT i1+ 100000000, i2, ... FROM postgre_a) TO STDOUT''';
(Of course you have to replace the path to psql and the database name with your values.)
I am not sure if that is faster than using INSERT, but it is worth a try.
You should definitely drop all indexes and constraints before the operation and recreate them afterwards.

SQL Server 2012 Express Edition Database Size

We have a requirement in our project to store millions of records(~100 million) in database.
And we know that SQL Express Edition 2012 can maximum accommodate 10GB of data.
I am using this query to get the actual size of the database - Is this right?
use [Bio Lambda8R32S50X]
SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = 'Bio Lambda8R32S50X'
GO
SET NOCOUNT ON
DBCC UPDATEUSAGE(0)
-- Table row counts and sizes.
CREATE TABLE #t
(
[name] NVARCHAR(128),
[rows] CHAR(11),
reserved VARCHAR(18),
data VARCHAR(18),
index_size VARCHAR(18),
unused VARCHAR(18)
)
INSERT #t EXEC sp_msForEachTable 'EXEC sp_spaceused ''?'''
SELECT *
FROM #t
-- # of rows.
SELECT SUM(CAST([rows] AS int)) AS [rows]
FROM #t
DROP TABLE #t
The second question is this restriction is only on the database size of the Primary file group or inclusive of the log files as well?
If we do a lot of delete and insert, or may be delete and insert back the same number of records, does the database size vary or remains the same?
This is very crucial, since this will decide whether we can go ahead with SQL Server 2012 Express Edition or not?
Thanks and regards
Subasish
I can see that the first query is to get the overall size of the database for the data and logs. The second one is for each table. So I would say yes to both.
Based upon my experience seeing db's over 40GB and this linkmaximum DB size limits that the limit on sql server express is based upon the mdf and ndf files not the ldf.
You might be safer however, just to go with SQL Server Standard and use CAL licensing in case your database starts growing.
Good Luck!

PostgreSQL, ODBC and temp table

Could you tell me why this query works in pgAdmin, but doesn't with software using ODBC:
CREATE TEMP TABLE temp296 WITH (OIDS) ON COMMIT DROP AS
SELECT age_group AS a,male AS m,mode AS t,AVG(speed) AS speed
FROM person JOIN info ON person.ppid=info.ppid
WHERE info.mode=2
GROUP BY age_group,male,mode;
SELECT age_group,male,mode,
CASE
WHEN age_group=1 AND male=0 THEN (info_dist_km/(SELECT avg_speed FROM temp296 WHERE a=1 AND m=0))*60
ELSE 0
END AS info_durn_min
FROM person JOIN info ON person.ppid=info.ppid
WHERE info.mode IN (7) AND info.info_dist_km>2;
I got "42P01: ERROR: relation "temp296" does not exist".
I also have tried with "BEGIN; [...] COMMIT;" - "HY010:The cursor is open".
PostgreSQL 9.0.10, compiled by Visual C++ build 1500, 64-bit
psqlODBC 09.01.0200
Windows 7 x64
I think that the reason why it did not work for you because by default ODBC works in autocommit mode. If you executed your statements serially, the very first statement
CREATE TEMP TABLE temp296 ON COMMIT DROP ... ;
must have autocommitted after finishing, and thus dropped your temp table.
Unfortunately, ODBC does not support directly using statements like BEGIN TRANSACTION; ... COMMIT; to handle transactions.
Instead, you can disable auto-commit using SQLSetConnectAttr function like this:
SQLSetConnectAttr(hdbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF, 0);
But, after you do that, you must remember to commit any change by using SQLEndTran like this:
SQLEndTran(SQL_HANDLE_DBC, hdbc, SQL_COMMIT);
While WITH approach has worked for you as a workaround, it is worth noting that using transactions appropriately is faster than running in auto-commit mode.
For example, if you need to insert many rows into the table (thousands or millions), using transactions can be hundreds and thousand times faster than autocommit.
It is not uncommon for temporary tables to not be available via SQLPrepare/SQLExecute in ODBC i.e., on prepared statements e.g., MS SQL Server is like this. The solution is usually to use SQLExecDirect.

Using T-SQL, is it possible to insert records into a table without locking it?

I have a process that inserts millions of records into a table.
While it's being executed, no other process can access that table.
They have to wait a few minutes. Unacceptable for web apps.
So, is there something like BULK INSERT that can be used between two sql tables ?
Thanks !
So, is there something like BULK INSERT that can be used between two
sql tables ?
Yes there is... it's called BULK INSERT but it's a two step process to go from table to table:
Save the data locally:
execute xp_cmdshell 'bcp Northwind.dbo.Orders out c:\temp\Orders.txt -Sgalser01 -T -n'
Then bulk insert the saved file:
select * into Northwind.dbo.Orders2 from Northwind.dbo.Orders where 1=2
bulk insert Northwind.dbo.Orders2 from 'c:\temp\Orders.txt'
with (DATAFILETYPE = 'native')

Creating a connection from Microsoft SQL server to an AS/400

I'm trying to connect from Microsoft SQL server to as AS/400 so i can pull data from the AS/400 then flag the data as being pulled.
I've successfully created and OLE DB "IBMDASQL" connection, and am able to pull data some data, but i'm running into an issue when i try to pull data from a very large table
This runs fine, and returns a count of 170 million:
select count(*)
from transactions
This query executed for 15 hours before i gave up on it. (It should return zero since i haven't flagged anything as 'in process' yet)
select count(*)
from transactions
where processed = 'In process'
I'm a Microsoft guy, but my AS/400 guy says that there is an index on the 'processed' column and that locally, that query run instantaneously.
Any thoughts on what i might be doing wrong? I found a table with only 68 records in it, and was able to run this query in about a second:
select count(*)
from smallTable
where RandomColumn = 'randomValue'
So I know that the AS/400 is at least able to understand that type of query.
I have had to fight this battle many times.
There are two ways of approaching this.
1) Stage your data from the AS400 into SQL server where you can optimize your indexes
2) Ask the AS400 folks to create logical views which speed up data retrieval, your AS400 programmer is correct, index will help but I forget the term they use to define a "view" similar to a sql server view, I beleive its something like "physical" v/s "logical". Logical is what you want.
Thirdly, 170 million is a lot of records, even for a relational database like SQL server, have you considered running an SSIS package nightly that stages your data into your own SQL table to see if it improves performance?
I would suggest this way to have good performance, i suppose you have at least SQL2005, i havent tested yet but this is a tip
Let the AS400 perform the select in native way by creating stored procedure in the AS400
open a AS400 session
launch STRSQL
create an AS400 stored procedure in this way to get/update the recordset
CREATE PROCEDURE MYSELECT (IN PARAM CHAR(10))
LANGUAGE SQL
DYNAMIC RESULT SETS 1
BEGIN
DECLARE C1 CURSOR FOR SELECT * FROM MYLIB.MYFILE WHERE MYFIELD=PARAM;
OPEN C1;
RETURN;
END
create an AS400 stored procedure to update the recordset
CREATE PROCEDURE MYUPDATE (IN PARAM CHAR(10))
LANGUAGE SQL
RESULT SETS 0
BEGIN
UPDATE MYLIB.MYFILE SET MYFIELD='newvalue' WHERE MYFIELD=PARAM;
END
Call those AS400 SP from SQL SERVER
declare #myParam char(10)
set #myParam = 'In process'
-- get the recordset
EXEC ('CALL NAME_AS400.MYLIB.MYSELECT(?) ', #myParam) AT AS400 -- < AS400 = name of linked server
-- update
EXEC ('CALL NAME_AS400.MYLIB.MYUPDATE(?) ', #myParam) AT AS400
Hope it helps
I recommend following the suggestions in the IBM Redbook SQL Performance Diagnosis on IBM DB2 Universal Database for iSeries to determine what's really happening.
IBM technical support can also be extremely helpful in diagnosing issues such as these. Don't be afraid to get in touch with them as the software support is generally included as part of the maintenance contract and there is no charge to talk to them.
I've seen OLEDB connections eat up 100% cpu for hours and when the same query is run through VisualExplain (query analyzer) it estimates mere seconds to execute.
We found that running the query like this performed liked expected:
SELECT *
FROM OpenQuery( LinkedServer,
'select count(*)
from transactions
where processed = ''In process''')
GO
Could this be a collation problem? - your WHERE clause is testing on a text field and if the collations of the two servers don't match this clause will be applied clientside rather than serverside so you are first of all pulling all 170 million records down to the client and then performing the WHERE clause on it there.
Based on the past interactions I have had, the query should take about the same amount of time no matter how you access the data. Another thought would be if you could create a view on the table to get the data you need or use a stored procedure.