Most straightforward way to add a row to an SQL Server table in ADO.NET without hardcoded SQL? - ado.net

I am wondering what the best / most efficient / common way is to add a row to an SQL Server table using C# and ADO.NET. I know of course that I can just create an SQL statement for that, but first, the destination table schema might vary, so I want to keep this flexible, and second, there are so much columns that I do not want to code and maintain this manually. So I currently use a SqlCommandBuilder that is automatically creating the proper insert statement for me, together with an SQLDataAdapter, like this:
var dataAdapter = new SqlDataAdapter("select * from sometable", _databaseConnection);
new SqlCommandBuilder(dataAdapter);
dataAdapter.Fill(dataTable);
// ... add row to dataTable, fill fields from some external file that
// ... includes column names as well,
//.... add some more field values not from the file, etc. ...
dataAdapter.Update(dataTable);
This seems pretty inefficient though to first grab all the records from the table even though I do not need them for anything (especially considering that there might even already be a million records in there). Using some select statement like select * from sometable where 1=2 would work, but it does not seem like a very clean approach. I imagine there is some different solution for this that I am just not aware of.
Thanks,
Timo

I think the best way to insert rows is by using Stored Procedures through the ADO.NET command object.
If you are inserting massive amounts of data and are using SQL Server 2008 you can pass DataTable objects to a stored procedure by using a User-Defined Table Types.
In SQL:
CREATE TYPE SAMPLE_TABLE_TYPE --
AS
field1 VARCHAR(255)
field2 VARCHAR(255)
CREATE STORED PROCEDURE insert_data
AS
#data Sample_TABLE_TYPE
BEGIN
INSERT INTO table1 (field1, field1)
SELECT username, password FROM #data;
In .NET:
DataTable myTable = new DataTable();
myTable.Columns.Add(new DataColumn("field1", typeof(string));
myTable.Columns.Add(new DataColumn("field1", typeof(string));
SqlCommand command = new SqlCommand(conn, CommandType.StoredProcedure);
command.Parameters.Add("#data", myTable);
command.ExecuteNonQuery();
If you data also contains updates you can use the new MERGE function used in SQL Server 2008 to efficiently perform both inserts and updates in the same procedure.
However, if creating User-Defined Table Types and creating stored procedures is too much work, and you need a complete dynamic solution I would stick with what you have, with the recommendation of using the
Where 1 = 0
appended to your SQL text.

You also can use "SELECT TOP(0) * FROM SOMETABLE;" query.

Related

Handling the output of jsonb_populate_record

I'm a real beginner when it comes to SQL and I'm currently trying to build a database using postgres. I have a lot of data I want to put into my database in JSON files, but I have trouble converting it into tables. The JSON is nested and contains many variables, but the behavior of jsonb_populate_record allows me to ignore the structure I don't want to deal with right now. So far I have:
CREATE TABLE raw (records JSONB);
COPY raw from 'home/myuser/mydocuments/mydata/data.txt'
create type jsonb_type as (time text, id numeric);
create table test as (
select jsonb_populate_record(null::jsonb_type, raw.records) from raw;
When running the select statement only (without the create table) the data looks great in the GUI I use (DBeaver). However it does not seem to be an actual table as I cannot run select statements like
select time from test;
or similar. The column in my table 'test' also is called 'jsonb_populate_record(jsonb_type)' in the GUI, so something seems to be going wrong there. I do not know how to fix it, I've read about people using lateral joins when using json_populate_record, but due to my limited SQL knowledge I can't understand or replicate what they are doing.
jsonb_populate_record() returns a single column (which is a record).
If you want to get multiple columns, you need to expand the record:
create table test
as
select (jsonb_populate_record(null::jsonb_type, raw.records)).*
from raw;
A "record" is a a data type (that's why you need create type to create one) but one that can contain multiple fields. So if you have a column in a table (or a result) that column in turn contains the fields of that record type. The * then expands the fields in that record.

t-sql stored procedure add where clause outside of SP

I have a stored procedure that I cannot modify, but I need to add a where clause to filter it even more. What would be the best way to do this without inserting data from stored procedure to a temptable then doing a where on that temptable. Is there another way?
The store procedure is executable but filtering happens inside the select statement so you should bring the result in a table to select it.
There is no way except temp tables.
Table valued Udf also has table like temp tables.

PostgreSQL bulk insert with ActiveRecord

I've a lot of records that are originally from MySQL. I massaged the data so it will be successfully inserted into PostgreSQL using ActiveRecord. This I can easily do with insertions on row basis i.e one row at a time. This is very slow I want to do bulk insert but this fails if any of the rows contains invalid data. Is there anyway I can achieve bulk insert and only the invalid rows failing instead of the whole bulk?
COPY
When using SQL COPY for bulk insert (or its equivalent \copy in the psql client), failure is not an option. COPY cannot skip illegal lines. You have to match your input format to the table you import to.
If data itself (not decorators) is violating your table definition, there are ways to make this a lot more tolerant though. For instance: create a temporary staging table with all columns of type text. COPY to it, then fix offending rows with SQL commands before converting to the actual data type and inserting into the actual target table.
Consider this related answer:
How to bulk insert only new rows in PostreSQL
Or this more advanced case:
"ERROR: extra data after last expected column" when using PostgreSQL COPY
If NULL values are offending, remove the NOT NULL constraint from your target table temporarily. Fix the rows after COPY, then reinstate the constraint. Or take the route with the staging table, if you cannot afford to soften your rules temporarily.
Sample code:
ALTER TABLE tbl ALTER COLUMN col DROP NOT NULL;
COPY ...
-- repair, like ..
-- UPDATE tbl SET col = 0 WHERE col IS NULL;
ALTER TABLE tbl ALTER COLUMN col SET NOT NULL;
Or you just fix the source table. COPY tells you the number of the offending line. Use an editor of your preference and fix it, then retry. I like to use vim for that.
INSERT
For an INSERT (like commented) the check for NULL values is trivial:
To skip a row with a NULL value:
INSERT INTO (col1, ...
SELECT col1, ...
WHERE col1 IS NOT NULL
To insert sth. else instead of a NULL value (empty string in my example):
INSERT INTO (col1, ...
SELECT COALESCE(col1, ''), ...
A common work-around for this is to import the data into a TEMPORARY or UNLOGGED table with no constraints and, where data in the input is sufficiently bogus, text typed columns.
You can then do INSERT INTO ... SELECT queries against the data to populate the real table with a big query that cleans up the data during import. You can use a lot of CASE statements for this. The idea is to transform the data in one pass.
You might be able to do many of the fixes in Ruby as you read the data in, then push the data to PostgreSQL using COPY ... FROM STDIN. This is possible with Ruby's Pg gem, see eg https://bitbucket.org/ged/ruby-pg/src/tip/sample/copyfrom.rb .
For more complicated cases, look at Pentaho Kettle or Talend Studio ETL tools.

JPA - insert into select - copy large amount of records

I would like to copy records with diffrent key values. what is the best way to do so ?
In plain sql I would do:
insert into tableX values (x1,x2,x3,x4,x5) select 2,T1.x2,T1.x3,T1.x4,T1.x5 from tableX T1
(x1 is my primary key).
I tried writing this code inside the entity #NamedQuery, but i got org.eclipse.persistence.exceptions.JPQLException and after searching a way to write it i understand that this sql cannot be wriitten inside NamedQuery - is that correct?
I also tried looping through the object list representing tableX and for every object I did em.find() or created a new object and then inserted it with em.persist - but it seems to be an inefficient way. (when using find I do a select for each object , so if i have a list of 2000 records, it dosent make sense to create 2000 selectes and then insert with new key value).
So my question is what is the best way to implement copying all the records?
also if I got an exception, or something went wrong I would like to rollback so that I wont have only part of the records inside my database table.
Thanks In Advance.
You can use any SQL in JPA through a native query. SQL would be best for this type of insert.
If you need to do anything in Java on the data before inserting it, then you would query the objects, then insert them. Enable batch writing to improve efficiency.
http://java-persistence-performance.blogspot.com/2013/05/batch-writing-and-dynamic-vs.html

Creating Stored Procedures that can work with different tables

I need to use the same Stored Procedures against many tables all with the same structure in my DB. This is data loaded from customers,with one table/customer and the data needs calculations/checks run before it's loaded to our DataWarehouse.
So far these are the options and issues I've found and I'm looking for a better pattern/approach.
Create a view that points to the
table I want to process, the SPs
then talk to that view. This works
well (especially once I'd worked out
how to create views 'automagically'
based on their columns). But the
view can only be used with one table
at a time, forcing the system to
deal with one customer at a time.
Use dynamic sql within each SP -
makes the SPs much harder to
read/debug and for those reasons has
been ruled out
Create a partitioned view across
all the tables and then use a
paramatised table function to return
just the data we're interested in -
ah but then I can't update the data
as the function returns a table that
can be only used for select
Use dynamic sql inside a function
(can't be done) to create a view
(which also can't be done) .... give
up
Within the SP create a temp table
with over the target table using
dynamic sql, but then the temp table
only exists in the session that runs
the dynamic sql not the 'parent'
session that's running the SP ...
give up
Create a global temp table using
dynamic SQL to avoid the scope issue
of 5, then run the SP against the
global temp table. Still run into
the single customer issue.
Create the view as in 1 within a
transaction and then run all the SPs
and then commit - works fine for one
user, but any others are now blocked
trying to create a new view of the
same name
Use a temporary view ... can't in
T/Sql
Move all the code into .Net - but
we have environment issues where
tsql is much easier to host/run
I know I'm not the only person who has this problem, have any of you good people solved it, please help.
Maybe your approach is wrong, I will go deep in details in a while but it seems that your problem can be solved using SSIS
-- Updated answer:
First, the big picture:
The most affordable way to process the tables dynamically is using a script instead of a stored procedure. If you want to make table access randomly chosen, you certainly will not use any of the performance advantages of stored procedures, i.e. execution plans. A SQL Script can be easily upgraded to point one table at runtime using placeholders and replacing it before executing.
The script can be loaded from the filesystem, a variable, a text column in a table, etc. The loading process consists in read the script content to a string variable. This step occurs once.
The next step is the preparation stage. This step will be executed for each table to be processed. The main business of this step is to replace the table placeholders with the current table being processed. Also is possible to set parameter values like any parameter you can need to pass into the sp that you already wrote.
The last step is the execution of the script. As is already loaded into a variable and the placeholders were set to the current table name, you can safely call a ExecuteSQLTask with the sql variable as the input. This process of course happens for each table you want to process.
Ok. Now let's see this in action.
This is a sample database model:
CREATE TABLE [dbo].[t_n](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [varchar](50) NOT NULL,
[start] [datetime] NULL,
CONSTRAINT [PK_t_n] PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
where t_n represents any table (t_1, t_2, t_3, etc).
This is your current stored procedure:
CREATE PROCEDURE SpProcessT_n
AS
BEGIN
SET NOCOUNT ON;
SELECT * FROM [t1];
END
GO
Now, transform this stored procedure to a Sql script, placing a placeholder instead of the table name
SET NOCOUNT ON;
SELECT * FROM [$table_name];
I choose to save this in a .sql file in the filesystem to keep the POC as simple as possible.
Next, create a SSIS Package like this:
These are the settings I choose to set up the loop:
And this is the way you can assign the table name to a variable called appropriately _table_name_
This is the setup of the script task, here you find that the variable _table_name_ has read only access, while a new variable called SqlExec has read/write access:
And this is it's Main function:
public void Main()
{
String Table_Name = Dts.Variables["table_name"].Value.ToString();
String SqlScript;
Regex reg = new Regex(#"\$table_name", RegexOptions.Compiled);
using (var f = File.OpenText(#"c:\sqlscript.sql")) {
SqlScript = f.ReadToEnd();
f.Close();
}
SqlScript = reg.Replace(SqlScript, Table_Name);
Dts.Variables["SqlExec"].Value = SqlScript;
Dts.TaskResult = (int)ScriptResults.Success;
}
You can notice that the Dts Variable SqlExec contains the sql script that will be executed. Now you can set the following options in your ExecuteSqlTask:
Successfully tested in MSSQL 2008, if you put a insert inside the script file you will notice new rows in each table.
Hope this helps!
If your application can afford to have one cut-off day late, then you can have a nightly scheduled job to run an SSIS package that will consolidate all 150+ tables into one single huge table. Since the freshness of the results of the queries against that huge table will then be 1 'date' late, this solution will not include any rows that recently been loaded.
You can actually time the running of this package. If it is still amazingly fast, say within 30 minutes, then you can bet to run it in every few hours, like during: the start of work day, lunch break, and end of day. This way you can have a nearly fresh data to query with.
Write a partitioned view including table names?
SELECT 'TableName', t.* FROM TableName t
UNION ALL
SELECT 'TableName2', t.* FROM TableName2 t
Then write a single instead of trigger which uses dynamic SQL for writing (less testing involved with that use of dynamic SQL because you'd just write the simple CRUD operations once for all tables I'd think)
I would not do this with SQL. What you are describing sounds like a traditional ETL situation.
Since all of the customer tables are the same, I would create a table in the data warehouse with all the columns from the client table, a surrogate key column, and a type identifier. You have an option to create a "staging" table here that will only have data in it during the ETL process, or just working on a single "live" table. I would create the staging table.
Then within SSIS package (don't worry you can still schedule from SQL Server agent, it hasn't totally left the DB server), start the ETL process...
E(xtract): copy the data from your source into the staging table in the data warehouse. You most likely want to use a sub-package within a foreach loop and changing the name of the table that you want to process from an external store (most people would say put this in the warehouse, but its up to you).
T(ransform): run the calculations/checks you were talking about, but do it on the whole set...
L(oad): Copy it to your real within the data warehouse.
There are a couple things I would NOT do.
1. Modify the data in the source table.
2. Try to do this in t-sql. Its just not what tsql is good at.
If you need more detail on this approach, I would probably ask the question with some Business Intelligence tags. I'll be traveling for the next week or so, but I will try to look at the comments to clear anything up if you need me to.
I am fairly certain that the standard way to solve this is using dynamic SQL in each sp (your option 2), which has already been ruled out.
Your goal is to make generic, multi-table SQL. I don't see how you intend to accomplish that without sacrificing some efficiency and readability.