SAS SQL Pass Through - sql-server-2008-r2

I would like to know what gets executed first in the SAS SQL pass thru in this code:
Connect To OLEDB As MYDB ( %DBConnect( Catalog = MYDB ) ) ;
Create table MYDB_extract as
select put(Parent,$ABC.) as PARENT,
put(PFX,z2.) as PFX,*
From Connection To MYDB
( SELECT
Appointment,Parents,Children,Cats,Dogs
FROM MYDB.dbo.FlatRecord
WHERE Appointment between '20150801' and '20150831'
And Children > 2);
Disconnect from MYDB;
Since MS SQL-Server doesn't support the PUT function will this query cause ALL of the records to be processed locally or only the resultant records from the DBMS?

The explicit pass-through query will still process and will return to SAS what it returns (however many records that is). Then, SAS will perform the put operations on the returned rows.
So if 10000 rows are in the table, and 500 rows meet the criteria in where, 500 records will go to SAS and then be put; SQL will handle the 10000 -> 500.
If you had written this in implicit pass through, then it's possible (if not probable) that SAS might have done all of the work.

First the code in the inline view will be executed on the server:
SELECT Appointment,Parents,Children,Cats,Dogs
FROM MYDB.dbo.FlatRecord
WHERE Appointment between '20150801' and '20150831' And Children > 2
Rows that meet that WHERE clause will be returned by the DBMS to SAS over the OLDEB connection.
Then SAS will (try and) select from that result set, applying any other code, including the put functions.
This isn't really any different from how an inline view works in any other DBMS, except that here you have two different database engines, one running the inner query and SAS running the outer query.

Related

How to detect if Postgresql function utilizing index on the tables or not

I have created a PL/pgSQL table-returning function that executes a SELECT statement and uses the input parameter in the WHERE clause of the query.
I frame the statement dynamically and execute it like this: EXECUTE sqlStmt USING empID;
sqlStmt is a variable of data type text that has the SELECT query which joins 3 tables.
When I execute that query in pgAdmin and analyze I could see that 'Index scan' on the tables are utilized as expected. However, when I do EXPLAIN ANALYZE SELECT * from fn_getDetails(12), the output just says "Function scan".
How do we know if the table indexes are utilized? Other SO answers to use auto_explain module did not provide details of the function body statements. And I am unable to use the PREPARE inside my function body.
The time taken by execution of the direct SELECT statement is almost the same as the use of function, just couple of milliseconds, but how can I know if the index was used?
auto_explain will certainly provide the requested information.
Set the following parameters:
shared_preload_libraries = 'auto_explain' # requires a restart
auto_explain.log_min_duration = 0 # log all statements
auto_explain.log_nested_statements = on # log statements in functions too
The last parameter is required for tracking SQL statements inside functions.
To activate the module, you need to restart the database.
Of course, testing if the index is used in a query on a small table won't give you a reliable result. You need about as many test data as you expect to have in reality.

SELECT vs CREATE TABLE AS SELECT execution time

My function should return a TABLE which is created by lots of joins and is relatively "big".
If inside of my function i put return query select <complex query goes here>; then it takes ages (more like 10-15 mins) to run.
However, if instead of returning a TABLE, I return VOID and simply create a table within function body - it finished under 1 min.
The same goes for running this "complex query" as select <complex query goes here> VS create table <table name> as select <complex query goes here> and then select * from <table_name>.
Why is there such a difference in execution time?
P.S. The select clause of the query has around 35 columns with some logic inside.
P.P.S. The query returns only about 90K rows, so I doubt that it is the time that takes to send the data over the network
answer
select differs from create table as select in manner where you use the data, first will send data to the client and the latter will save it to disk server side.
why
Possible reasons could be slow link, and "feature" of the client. According to the fact that local psql running \copy (select * from) to 'local_file' took 3 seconds and yet PgAdmin took ages to display sam data, I assume you version PgAdmin (or any version at all) is not meant for your amount of data to display (as you say 36MB). So it was not the link, but the client.

SQL Server 2008 R2, "string or binary data would be truncated" error

In SQL Server 2008 R2, I am trying to insert 30 million records from a source table to the target table. Out of these 30 million records, few records have some bad data and exceeds the length of target field. Generally due to these bad data, the whole insert gets aborted with "string or binary data would be truncated" error, without loading any rows in the target table and SQL Server also do not specify which row had the problem. Is there a way that we can insert rest of rows and catch the bad data rows without big impact on the performance (because performance is the main concern in this case) .
You can use the len function in your where condition to filter out long values:
select ...
from ...
where len(yourcolumn) <= 42
gives you the "good" records
select ...
from ...
where len(yourcolumn) > 42
gives you the "bad" records. You can use such where conditions in an insert select syntax as well.
You can also truncate your string as well, like:
select
left(col, 42) col
from yourtable
In the examples I assumed that 42 is your character limit.
You are not mention that how to insert data i.e. bulk insert or SSIS.
I prefer in this condition SSIS, in which you have control and also find the solution of your issue means you can insert the proper data as #Lajos suggest as well as for bad data you can create a temporary table and get the bad datas.
You can give flow of your logic via transformation and also error handling. You can more search for this too.
https://www.simple-talk.com/sql/reporting-services/using-sql-server-integration-services-to-bulk-load-data/
https://www.mssqltips.com/sqlservertip/2149/capturing-and-logging-data-load-errors-for-an-ssis-package/
http://www.techbrothersit.com/2013/07/ssis-how-to-redirect-invalid-rows-from.html

Postgresql update 2 tables in one update statement

I have two different tabs with same field, like:
host.enabled, service.enabled.
How I can update his from 1 update statement?
I tried:
UPDATE hosts AS h, services AS s SET h.enabled=0, s.enabled=0
WHERE
ctid IN (
SELECT hst.ctid FROM hosts hst LEFT JOIN services srv ON hst.host_id = srv.host_id
WHERE hst.instance_id=1
);
On mysql syntax this query worked like this:
UPDATE hosts LEFT JOIN services ON hosts.host_id=services.host_id SET hosts.enabled=0, services.enabled=0 WHERE hosts.instance_id=1
I didn't really understand your schema. If you can set up a fiddle that would be great.
In general though to update two tables in a single query the options are:
Trigger
This makes the assumption that you always want to update both together.
Stored procedure/function
So you'll be doing it as multiple queries in the function, but it can be triggered by a single SQL statement from the client.
Postgres CTE extension
Postgres permits common table expressions (with clauses) to utilise data manipulation.
with changed_hosts as (
update hosts set enabled = true
where host_id = 2
returning *
)
update services set enabled = true
where host_id in (select host_id from changed_hosts);
In this case the update in the WITH statement runs and sets the flag on the hosts table, then the main update runs, which updates the records in the services table.
SQL Fiddle for this at http://sqlfiddle.com/#!15/fa4d3/1
In general though, its probably easiest and most readable just to do 2 updates wrapped in a transaction.

Faster way to transfer table data from linked server

After much fiddling, I've managed to install the right ODBC driver and have successfully created a linked server on SQL Server 2008, by which I can access my PostgreSQL db from SQL server.
I'm copying all of the data from some of the tables in the PgSQL DB into SQL Server using merge statements that take the following form:
with mbRemote as
(
select
*
from
openquery(someLinkedDb,'select * from someTable')
)
merge into someTable mbLocal
using mbRemote on mbLocal.id=mbRemote.id
when matched
/*edit*/
/*clause below really speeds things up when many rows are unchanged*/
/*can you think of anything else?*/
and not (mbLocal.field1=mbRemote.field1
and mbLocal.field2=mbRemote.field2
and mbLocal.field3=mbRemote.field3
and mbLocal.field4=mbRemote.field4)
/*end edit*/
then
update
set
mbLocal.field1=mbRemote.field1,
mbLocal.field2=mbRemote.field2,
mbLocal.field3=mbRemote.field3,
mbLocal.field4=mbRemote.field4
when not matched then
insert
(
id,
field1,
field2,
field3,
field4
)
values
(
mbRemote.id,
mbRemote.field1,
mbRemote.field2,
mbRemote.field3,
mbRemote.field4
)
WHEN NOT MATCHED BY SOURCE then delete;
After this statement completes, the local (SQL Server) copy is fully in sync with the remote (PgSQL server).
A few questions about this approach:
is it sane?
it strikes me that an update will be run over all fields in local rows that haven't necessarily changed. The only prerequisite is that the local and remote id field match. Is there a more fine grained approach/a way of constraining the merge statment to only update rows that have actually changed?
That looks like a reasonable method if you're not able or wanting to use a tool like SSIS.
You could add in a check on the when matched line to check if changes have occurred, something like:
when matched and mbLocal.field1 <> mbRemote.field1 then
This many be unwieldy if you have more than a couple of columns to check, so you could add a check column in (like LastUpdatedDate for example) to make this easier.