I keep reading stuff like this:
The text in row option will be removed in a future version of SQL
Server. Avoid using this option in new development work, and plan to
modify applications that currently use text in row. We recommend that
you store large data by using the varchar(max), nvarchar(max), or
varbinary(max) data types. To control in-row and out-of-row behavior
of these data types, use the large value types out of row option.
So what should we do if we have a varchar(max) field that we want to limit to 16 chars in row?
Thanks!
EDIT. When I say "in row", I mean the VARCHAR/TEXT strings are stored directly in the data row, not as a pointer (with the string data stored elsewhere.) Moving the data out of the row will increase table scan performance if the data moved out of the row is not part of the "where" clause.
EDIT. The text I quoted, says this:
To control in-row and out-of-row behavior
of these data types, use the large value types out of row option.
Sure enough:
https://msdn.microsoft.com/en-us/library/ms173530.aspx
But on that page it says this:
The text in row feature will be removed in a future version of SQL
Server. To store large value data, we recommend that you use of the
varchar(max), nvarchar(max) and varbinary(max) data types.
So the question remains.
EDIT. It appears we will still have the ability to use this table option:
large value types out of row. A value of 1 means varbinary(max), xml
and large user-defined type (UDT) columns in the table are stored out
of row, with a 16-byte pointer to the root. a value of 0 means
varchar(max), nvarchar(max), varbinary(max), xml and large UDT values
are stored directly in the data row, up to a limit of 8000 bytes and
as long as the value can fit in the record. If the value does not fit
in the record, a pointer is stored in-row and the rest is stored out
of row in the LOB storage space. 0 is the default.
However, we seem to be losing the option to keep the data in the row when it is small. It will be either all in or all out. Is there any other way to do this?
You are correct, there is no configuration able to do the same as the old 'TEXT_IN_ROW'.
You either let SQL Server store up to 8Kb in the page or store information of any size out of row.
So what should we do if we have a varchar(max) field that we want to
limit to 16 chars in row?
CREATE TABLE dbo.T1 (SomeCol VARCHAR(MAX) CHECK (LEN(SomeCol)<=16));
Related
oid -> creates a table pg_largeobjects and stores data in there
bytea -> if the compressed data would still exceed 2000 bytes, PostgreSQL splits variable length data types in chunks and stores them out of line in a special “TOAST table” according to https://www.cybertec-postgresql.com/en/binary-data-performance-in-postgresql/
I don't want any other table for large data I want to store them in a column in my defined table, is that possible?
It is best to avoid Large Objects.
With bytea you can prevent PostgreSQL from storing data out of line in a TOAST table by changing the column definition like
ALTER TABLE tab ALTER col SET STORAGE MAIN;
Then PostgreSQL will compress that column but keep it in the main table.
Since the block size in PostgreSQL is 8kB, and one row is always stored in a single block, that will limit the size of your table rows to somewhat under 8kB (there is a block header and other overhead).
I think that you are trying to solve a non-problem, and your request to not store large data out of line is unreasonable.
I am approaching the 10 GB limit that Express has on the primary database file.
The main problem appears to be some fixed length char(500) columns that are never near that length.
I have two tables with about 2 million rows between them. These two tables add up to about 8 GB of data with the remainder being spread over another 20 tables or so. These two tables each have 2 char(500) columns.
I am testing a way to convert these columns to varchar(500) and recover the trailing spaces.
I tried this:
Alter Table Test_MAILBACKUP_RECIPIENTS
Alter Column SMTP_address varchar(500)
GO
Alter Table Test_MAILBACKUP_RECIPIENTS
Alter Column EXDN_address varchar(500)
This quickly changed the column type but obviously didn’t recover the space.
The only way I can see to do this successfully is to:
Create a new table in tempdb with the varchar(500) columns,
Copy the information into the temp table trimming off the trailing spaces,
Drop the real table,
Recreate the real table with the new varchar(500) columns,
Copy the information back.
I’m open to other ideas here as I’ll have to take my application offline while this process completes?
Another thing I’m curious about is the primary key identity column.
This table has a Primary Key field set as an identity.
I know I have to use Set Identity_Insert on to allow the records to be inserted into the table and turn it off when I’m finished.
How will recreating a table affect new records being inserted into the table after I’m finished. Or is this just “Microsoft Magic” and I don’t need to worry about it?
The problem with you initial approach was that you converted the columns to varchar but didn't trim the existing whitespace (which is maintained after the conversion), after changing the data type of the columns to you should do:
update Test_MAILBACKUP_RECIPIENTS set
SMTP_address=rtrim(SMTP_address), EXDN_address=rtrim(EXDN_address)
This will eliminate all trailing spaces from you table, but note that the actual disk size will be the same, as SQL Server don't shrink automatically database files, it just mark that space as unused and available for other data.
You can use this script from another question to see the actual space used by data in the DB files:
Get size of all tables in database
Usually shrinking a database is not recommended but when there is a lot of difference between used space and disk size you can do it with dbcc shrinkdatabase:
dbcc shrinkdatabase (YourDatabase, 10) -- leaving 10% of free space for new data
OK I did a SQL backup, disabled the application and tried my script anyway.
I was shocked that it ran in under 2 minutes on my slow old server.
I re-enabled my application and it still works. (Yay)
Looking at the reported size of the table now it went from 1.4GB to 126Mb! So at least that has bought me some time.
(I have circled the Data size in KB)
Before
After
My next problem is the MailBackup table which also has two char(500) columns.
It is shown as 6.7GB.
I can't use the same approach as this table contains a FileStream column which has around 190gb of data and tempdb does not support FleStream as far as I know.
Looks like this might be worth a new question.
Is there a constraint available in DB2 such that when a column is restricted to a particular length, the values are trimmed to appropriate length before insertion. For Eg. if a column has been specified to be of length 5 , inserting a value 'overflow' will get inserted as 'overf'.
Can CHECK constraint be used here? My understanding of CHECK constraint is that it will allow insertions or not allow them but it can not modify values to satisfy the condition.
A constraint isn't going to be able to do this.
A before insert trigger is normally the mechanism you'd use to modify data during an insert before it is actually placed in the table.
However, I'm reasonably sure it won't work in this case. You'd get an SQLCODE -404 (SQLSTATE 22001) "The Sql Statement specified contains a String that is too long." thrown before the trigger gets fired.
I see two possible options
1) Create a view over the table where the column is cast to a larger size. Then create an INSTEAD OF trigger on the view to substring the data during write.
2) Create and use a stored procedure that accepts a larger size and substrings the data then inserts it.
In Postgres does the order of columns in a CREATE TABLE statement impact performance? Consider the following two cases:
CREATE TABLE foo (
a TEXT,
B VARCHAR(512),
pkey INTEGER PRIMARY KEY,
bar_fk INTEGER REFERENCES bar(pkey),
C bytea
);
vs.
CREATE TABLE foo2 (
pkey INTEGER PRIMARY KEY,
bar_fk INTEGER REFERENCES bar(pkey),
B VARCHAR(512),
a TEXT,
C bytea
);
Will performance of foo2 be better than foo because of better byte alignment for the columns? When Postgres executes CREATE TABLE does it follow the column order specified or does it re-organize the columns in optimal order for byte alignment or performance?
Question 1
Will the performance of foo2 be better than foo because of better byte
alignment for the columns?
Yes, the order of columns can have a small impact on performance. Type alignment is the more important factor, because it affects the footprint on disk. You can minimize storage size (play "column tetris") and squeeze more rows on a data page - which is the most important factor for speed.
Normally, it's not worth bothering. With an extreme example like in this related answer you get a substantial difference:
Calculating and saving space in PostgreSQL
Type alignment details:
Making sense of Postgres row sizes
The other factor is that retrieving column values is slightly faster if you have fixed size columns first. I quote the manual here:
To read the data you need to examine each attribute in turn. First
check whether the field is NULL according to the null bitmap. If it
is, go to the next. Then make sure you have the right alignment. If
the field is a fixed width field, then all the bytes are simply
placed. If it's a variable length field (attlen = -1) then it's a bit
more complicated. All variable-length data types share the common
header structure struct varlena, which includes the total length of
the stored value and some flag bits.
There is an open TODO item to allow reordering of column positions in the Postgres Wiki, partly for these reasons.
Question 2
When Postgres executes a CREATE TABLE does it follow the column order
specified or does it re-organize the columns in optimal order for byte
alignment or performance?
Columns are stored in the defined order, the system does not try to optimize.
I fail to see any relevance of column order to TOAST tables like another answer seems to imply.
As far as I understand, PostgreSQL adheres to the order in which you enter the columns when saving records. Whether this affects performance is debatable. PostgreSQL stores all table data in pages each being 8kb in size. 8kb is the default, but it can be change at compile time.
Each row in the table will take up space within the page. Since your table definition contains variable columns, a page can consist of a variable amount of records. What you want to do is make sure you can fit as many records into one page as possible. That is why you will notice performance degradation when a table has a huge amount of columns or column sizes are huge.
This being said, declaring a varchar(8192) does not mean the page will be filled up with one record, but declaring a CHAR(8192) will use up one whole page irrespective of the amount of data in the column.
There is one more thing to consider when declaring TOASTable types such as TEXT columns. These are columns that could exceed the maximum page size. A table that has TOASTable columns will have an associated TOAST table to store the data and only a pointer to the data is stored with the table. This can impact performance, but can be improved with proper indexes on the TOASTable columns.
To conclude, I would have to say that the order of the columns do not play much of role in the performance of a table. Most queries utilise indexes which are store separately to retrieve records and therefore column order is negated. It comes down to how many pages needs to be read to retrieve the data.
I have sql 2008 R2 database. I created a table and when trying to execute a select statement (with order by clause) against it, I receive the error "Cannot create a row of size 8870 which is greater than the allowable maximum row size of 8060."
I am able to select the data without an order by clause, however the order by clause is important and I require it. I have tried a ROBUST PLAN option but I still received the same error.
My table has 300+ columns with data type TEXT. I have tried using varchar and nvarchar, but have had no success.
Can someone please provide some insight?
Update:
Thanks for comments. I agree. 300+ columns in one table is not very good design. What I'm trying to do is bring excel tabs into the database as data tables. Some tabs have 300+ columns.
I first use a CREATE statement to create a table based on the excel tab so the columns vary. Then I do various SELECT, UPDATE, INSERT, etc statements on the table after the table is created with data.
The structure of the table usually follow this patter:
fkVersionID, RowNumber(autonumber), Field1, Field2, Field3, etc...
is there any way to get around the 8060 row size limit?
You mentioned that you tried nvarchar and varchar ... remember that nvarchar doubles the bytes used, but it is the only one of the two to support foreign characters in some cases, such as accent marks.
varchar is a good choice if you can limit its maximum size appropriately.
8000 characters is still a real limit, but if on average each varchar column is no more than 26 characters, you'll be okay.
You could go riskier and go with varchar and 50char length, but on average only utilize 26characters per column.. meaning one column maybe 36 character length, and the next is 16character length... then you are okay again. (As long as you never exceed the average of 26characters per column for the 300 columns.)
Obviously with dynamic number of fields, and potential to way exceed the 8000 character limit, it is doomed by SQL's specs.
Your only other alternative is to create multiple tables and when you access the data, have a unique key to join appropriate records on. So in your select statement, use the join, and from multiple tables then you can handle rows with 8000 + 8000 + ...
So it is doable, but you have to work with SQL rules.
I believe you're running into this limitation:
There is no limit to the number of items in the ORDER BY clause. However, there is a limit of 8,060 bytes for the row size of intermediate worktables needed for sort operations. This limits the total size of columns specified in an ORDER BY clause.
I had a legacy app like this, it was a nightmare.
First, I broke it into multiple tables, all one-to-one. This is bad, but less bad than what you've got.
Then I changed the queries to request only the columns that were actually needed. (I can't tell if you have that option.)