code reuse tsqlt / sql test - tsql

I am just starting to use tsqlt inside redgate's sql test. I have to deal with quite large tables (large number of columns) in a legacy databases. What is the best practise to insert some fake data into such tables (the 'script as' insert statements are quite large) - hence they would make my 'arrange part' of the unit test literally unreadable. Can I factor out such code? Also is there a way to not only script the insert statement but also fill in some values automagically? Thanks.

I would agree with your comment that you don't need to fill out all of the columns in your insert statement.
tSQLt.FakeTable removes all non-null constraints from the columns, as well as computed columns and identity columns (although these last two can be reinstated using particular parameters to FakeTable).
Therefore, you only need to populate the columns which are relevant to your code under test, which is usually only a smaller subset of columns from the table(s).
I wrote about this in a bit more detail in this article which also contains a few other 'gotchas' you may want to know.
Additionally, I'd suggest that if you have a number of tests which all need the same table faked and data inserted, that you consider using a SetUp routine - this is a Stored procedure in the test class (schema) which is called SetUp, and is called by tSQLt before each test in that schema. They won't show in RedGate's SQL Test window as yet (I've suggested it as an improvement), but will still work. This can make it harder to see - but does modularise that code thus reducing identical, repeated code.

Related

T-SQL - Insert set into multiple constrained tables on linked server

So, I have a data set on server "A" and two tables ("t1", "t2") on linked server "B".
I cannot modify server "B" schema.
What I am trying to achieve is: insert first two data set (non-identity) columns into [B].[t1] which should generate identity index value, and then add second two, also non-identity, columns into [B].[t2], which contains column MsgId that should map to corresponding row in [B].[t2].
A visual version of how it works
It seems that I cannot use OUTPUT nor SCOPE_IDENTITY() on a linked server (both return nothing).
My question is so: Is there a way to add data from server A to server B without modifying latter?
My current solution involves cursors and creating a stored procedure on a remote server, but I feel like I am doing it completely wrong and reinventing wheel in the process. Also, modifying database structure is a painful process involving lots of paperwork.
What you want to do is ETL (Extract, transform, load) and using a tool like SSIS helps to do that much cleaner and more maintainable.

Codefluent tablediff.sql bloat

I am using CodeFluent for some time now and I am really, really happy with it. However, the tablediff that is generated seems to contain unnecessary changes every time. Does anyone know if these changes can be turned off?
All [nvarchar] (max) columns are set to [nvarchar] (max) every time, even if they are already [nvarchar] (max).
All defaults seem to be set every time, even if they are already there. In our diff there are almost 100 _trackLastWriteTime and _trackCreationTime.
Constraints are dropped and added
Why is this a problem?
Since the tablediff is so large it is hard to see and verify the
'real' changes
Executing the tablediff takes much longer, especially
on a large database
The dropping of constraints reverses changes we
made to primary keys and indexes, sometimes even resulting in
creating only a heap.
The goal of the generated diffs script is to get the right database schema after building the model during the development phase. It can contain useless statements as long as the final database schema is ok.
To upgrade the database in a production environment you should use the SQL Server Pivot Script producer. The Pivot Script producer generates an XML file (or a zip file) that contains the list of tables, stored procedures, functions, etc. of the database. The Pivot Runner read this file, and create or upgrade a database to the state describe in the XML file. As it generates and executes SQL statements directly (without generating a script file), the PivotRunner does less operations than the diffs script.
If needed, you can have full control on what the pivot runner does: https://stackoverflow.com/a/36426389/2996339
This schema shows the development and deployment workflow:

SSIS or TSQL for SQL/MySQL table comparrison

I am new to SSIS and am after some assistance in creating an SSIS package to do a specific task. My data is stored remotely within a MySQL Database and this is downloaded to a SQL Server 2014 Database. What I want to do is the following, create a package where I can enter 2 dates that can be compared against the create date/date modified per record on a number of tables to give me a snap shot and compare the MySQL Data to the SQL Data so that I can see if there are any rows that are missing from my local SQL Database or if any need to be updated. Some tables have no dates so I just want to see a record count on what is missing if anything between the 2. If this is better achieved through TSQL I am happy to hear about other suggestions or sites to look at where things have been done similar.
In relation to your query Tab :
"Hi Tab, What happens at the moment is our master data is stored in a MySQL Database, the data was then downloaded to a SQL Server Database as a one off. What happens at the moment is I have a SSIS package that uses the MAX ID which can be found on most of the tables to work out which records are new and just downloads them or updates them. What I want to do is run separate checks on the tables to make sure that during the download nothing has been missed and everything is within sync. In an ideal world I would like to pass in to a SSIS package or tsql stored procedure a date range, shall we say calender week, this would then check for any differences between the remote MySQL database tables and the local SQL tables. It does not currently have to do anything but identify issues, correcting them may come later or changes would need to be made to the existing sync package. Hope his makes more sense."
Thanks P
To do this, you need to implement a Type 1 Slowly Changing Dimension type data flow in SSIS. There are a number of ways to do this, including a built in transformation aptly called the Slowly Changing Dimension transformation. Whilst this is easy to set up, it is a pain to maintain and it runs horrendously slowly.
There are numerous ways to set this up using other transformations or even SQL merge statements which are detailed here: https://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/
I would recommend that you use Lookup transformations as they perform better than the Slowly Changing Dimension transformation but offer better diagnostics and error handling than the better performing SQL merge statement.
Before you do this you will need to add a Checksum or Hashbytes column to your SQL data for ease of comparison with the incoming MySQL data.
In short, calculate some sort of repeatable checksum as the data is downloaded into your SQL Server, then use this in an SSIS Lookup, matching on the row key, to check for changes. Where the checksum value is different for the same row it needs updating and where there is no matching row key in your SQL Data you need to insert the new row.

SSIS for table-to-table inserts vs. (SQL only) INSERT INTO () SELECT FROM approach

I am currently transferring a large amount of records from one table to another, summarizing in the process. So, I have a SQL in this general format:
INSERT INTO TargetTable
(Col1,
Col2,
...
ColX)
Tot
)
SELECT
Col1,
Col2,
...
ColX
SUM(TOT)
FROM
SourceTable
GROUP BY
Col1,
Col2,
...
ColX
Is there any performance advantage of moving this SQL into an SSIS task when transferring records from one table to another using a SQL SELECT as a source? For example, is logging turned off?
Secondary question: Are there any tactics that I could use to ensure a maximum transfer rate? For example, removing indexes from the Target table before inserting, locking the table, etc?
In my experience (and, bear in mind that it's been a year and change since I've done this), the only advantage you'd get from SSIS is its ability to make use of the bulk insert task. This adds an additional step, requiring you to export your source data to a flat file before you begin the import process.
Alternatively, if you stick with a SQL statement, the section in this article titled Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging provides the following suggestions:
You can use INSERT INTO SELECT FROM to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
I personally dislike SSIS packages for a particular reason: I have never had a DBA who was dedicated to maintaining them. The data import projects I worked on required a lot of fiddling, as the source data wasn't clean (which I assume won't be a problem for you), so I had many packages that worked just fine in a testing environment with a limited data sample that crashed immediately when deployed into production, which made the process a pain in the neck to deal with.
This is just my opinion, but I would say that unless you or someone else you work with focuses on SSIS packages as a part of database maintenance, then it's easier to maintain and document a process that lives inside a stored procedure.
Set logging as simple. Set the log size high enough to handle the insert. Are others on the sytems? A tablock will help the insert - TargetTable with (tablock). If you have a clustered index on TargetTable order the data that way in the select. If you can accept dirty read SourceTable with (nolock). If you are inserting more than 100,000 records you might want to break up the insert using a where.

Most efficient way of bulk loading unnormalized dataset into PostgreSQL?

I have loaded a huge CSV dataset -- Eclipse's Filtered Usage Data using PostgreSQL's COPY, and it's taking a huge amount of space because it's not normalized: three of the TEXT columns is much more efficiently refactored into separate tables, to be referenced from the main table with foreign key columns.
My question is: is it faster to refactor the database after loading all the data, or to create the intended tables with all the constraints, and then load the data? The former involves repeatedly scanning a huge table (close to 10^9 rows), while the latter would involve doing multiple queries per CSV row (e.g. has this action type been seen before? If not, add it to the actions table, get its ID, create a row in the main table with the correct action ID, etc.).
Right now each refactoring step is taking roughly a day or so, and the initial loading also takes about the same time.
From my experience you want to get all the data you care about into a staging table in the database and go from there, after that do as much set based logic as you can most likely via stored procedures. When you load into the staging table don't have any indexes on the table. Create the indexes after the data is loaded into the table.
Check this link out for some tips http://www.postgresql.org/docs/9.0/interactive/populate.html