We store billions of rows in an infobright table which currently has about 45 columns. We want to add 50 more columns to it. Will adding these columns bring down the performance of reads? Is creating a new table for these columns a better option? Or, since infobright is a column oriented database additions of 50 extra columns not matter much?
Thanks!
I think "adding these columns" will not "bring down the performance of reads" that do not use the added columns.
I think "creating a new table for these columns" is not "a better option".
Since "infobright is a column oriented database additions of 50 extra columns" should have no effect on the performance of queries that do not use the added columns.
The maximum number of columns for Infobrigh6t tables is 4096. However, that is if they are only TINYINT columns. I would suggest that you do not use more than 1000 columns. The key though is ensuring that in your SQL query that you do not do a SELECT * FROM. You should SELECT CustomerID, CustomerName FROM instead for ONLY those columns necessary to resolve your needs.
Related
I have a table that contains both metadata about the row as well as some numeric information. The metadata is much bigger (URLs and free text versus just a few numbers for the other part).
Most of my queries ignore the metadata, e.g. they are just adding up some subset of the numbers.
If I split metadata into a different table, would that make these queries meaningfully faster? My table has about 30 million rows.
Selecting all the columns of the table will reduce the query performance.
SELECT * FROM TABLE;
Selecting only the required column will not affect the query performance
SELECT column1, column2 FROM TABLE1;
Disadvantages of having too many columns in PostgreSQL explained here
I'm recreating a DB and I have a table with 150 columns and it has 700 rows currently (small dataset) - It will likely take 10 more years to get to 1000 rows.
My question:
Most of my data is normalized. About 125 fields contain a single numeric value (hours, currency, decimals, and integers). There are 10 or so columns that can have multiple values.
Do I continue to use the single table with 150 Rows?
Or
Do I create cross-reference tables and use a pivot query to turn my rows into columns? Something like this:
**c_FieldNames** **cx_FieldValues** **Project**
id int identity (PK) id int identity(1,1) ProjID int (PK)
fkProjectID int ProjectName
FieldName nvarchar FieldNameID int (FK to id from c_fieldNames)
Decimals nvarchar(2) FieldValue numeric(16,2)
The decimals would tell me how many decimal places a given field would need - I'd like to incorporate that into my query... Not sure if that's possible.
For each of my 125 fields with numbers, I would create a row in the cx_FieldNames table which would get an ID. That ID would be used in the FieldNameID as a foreign key.
I would then create a view a pivot table that would create a table of the 125 rows dynamically in addition to my standard table or so rows to look like the table with 150 columns.
I'm pretty sure I will be able to use a pivot table to turn my rows into columns. (Dynamically display rows as columns)
Benefits:
I could create a table for reports that would have all the "columns" I need for that report and then filter to them and just pull those fields dynamically.
Reports
ReportID int
FieldID int
The fieldID's would be based on the c_FieldName id's and I could turn all required field names (that are in the rows) into headers and run a vast majority of reports based on dynamic sql generated based on the field names. Same applies to all data structured... [Edit from Author] The more I think about this, I could do this with either table structure, which negates the benefits I saw here, as I am adding complexity for no good reason, as pointed out in the comments.
My thought is that it will same me much development time as I can use a pivot table to generate reports and pull data on the fly without much trouble. Updating data will be a bit of a chore, but not that much more than normal. I am creating a C#.NET Website with Visual Studio (hosted on Azure) to allow users to view, update, run reports on the data. Any major drawbacks in this structure? Is this a good idea? Are 125 columns in a Pivot too many? Thanks in Advance!
I'm having the following Redshift performance issue:
I have a table with ~ 2 billion rows, which has ~100 varchar columns and one int8 column (intCol). The table is relatively sparse, although there are columns which have values in each row.
The following query:
select colA from tableA where intCol = ‘111111’;
returns approximately 30 rows and runs relatively quickly (~2 mins)
However, the query:
select * from tableA where intCol = ‘111111’;
takes an undetermined amount of time (gave up after 60 mins).
I know pruning the columns in the projection is usually better but this application needs the full row.
Questions:
Is this just a fundamentally bad thing to do in Redshift?
If not, why is this particular query taking so long? Is it related to the structure of the table somehow? Is there some Redshift knob to tweak to make it faster? I haven't yet messed with the distkey and sortkey on the table, but it's not clear that those should matter in this case.
The main reason why the first query is faster is because Redshift is a columnar database. A columnar database
stores table data per column, writing a same column data into a same block on the storage. This behavior is different from a row-based database like MySQL or PostgreSQL. Based on this, since the first query selects only colA column, Redshift does not need to access other columns at all, while the second query accesses all columns causing a huge disk access.
To improve the performance of the second query, you may need to set "sortkey" to colA column. By setting sortkey to a column, that column data will be stored in sorted order on the storage. It reduces the cost of disk access when fetching records with a condition including that column.
I have sql 2008 R2 database. I created a table and when trying to execute a select statement (with order by clause) against it, I receive the error "Cannot create a row of size 8870 which is greater than the allowable maximum row size of 8060."
I am able to select the data without an order by clause, however the order by clause is important and I require it. I have tried a ROBUST PLAN option but I still received the same error.
My table has 300+ columns with data type TEXT. I have tried using varchar and nvarchar, but have had no success.
Can someone please provide some insight?
Update:
Thanks for comments. I agree. 300+ columns in one table is not very good design. What I'm trying to do is bring excel tabs into the database as data tables. Some tabs have 300+ columns.
I first use a CREATE statement to create a table based on the excel tab so the columns vary. Then I do various SELECT, UPDATE, INSERT, etc statements on the table after the table is created with data.
The structure of the table usually follow this patter:
fkVersionID, RowNumber(autonumber), Field1, Field2, Field3, etc...
is there any way to get around the 8060 row size limit?
You mentioned that you tried nvarchar and varchar ... remember that nvarchar doubles the bytes used, but it is the only one of the two to support foreign characters in some cases, such as accent marks.
varchar is a good choice if you can limit its maximum size appropriately.
8000 characters is still a real limit, but if on average each varchar column is no more than 26 characters, you'll be okay.
You could go riskier and go with varchar and 50char length, but on average only utilize 26characters per column.. meaning one column maybe 36 character length, and the next is 16character length... then you are okay again. (As long as you never exceed the average of 26characters per column for the 300 columns.)
Obviously with dynamic number of fields, and potential to way exceed the 8000 character limit, it is doomed by SQL's specs.
Your only other alternative is to create multiple tables and when you access the data, have a unique key to join appropriate records on. So in your select statement, use the join, and from multiple tables then you can handle rows with 8000 + 8000 + ...
So it is doable, but you have to work with SQL rules.
I believe you're running into this limitation:
There is no limit to the number of items in the ORDER BY clause. However, there is a limit of 8,060 bytes for the row size of intermediate worktables needed for sort operations. This limits the total size of columns specified in an ORDER BY clause.
I had a legacy app like this, it was a nightmare.
First, I broke it into multiple tables, all one-to-one. This is bad, but less bad than what you've got.
Then I changed the queries to request only the columns that were actually needed. (I can't tell if you have that option.)
I am migrating a large quantity of mostly empty tables into SQL Server 2008.
The tables are vertical partitions of one big logical table.
Problem is this logical table has more than 1024 columns.
Given that most of the fields are null, I plan to use a sparse table.
For all of my tables so far I have been using SELECT...INTO, which has been working really well.
However, now I have "CREATE TABLE failed because column 'xyz' in table 'MyBigTable' exceeds the maximum of 1024 columns."
Is there any way I can do SELECT...INTO so that it creates the new table with sparse support?
What you probably want to do is create the table manually and populate it with an INSERT ... SELECT statement.
To create the table, I would recommend scripting the different component tables and merging their definitions, making them all SPARSE as necessary. Then just run your single CREATE TABLE statement.
You cannot (and probably don't want to anyway). See INTO Clause (TSQL) for the MSDN documentation.
The problem is that sparse tables are a physical storage characteristic and not a logical characteristic, so there is no way the DBMS engine would know to copy over that characteristic. Moreover, it is a table-wide property and the SELECT can have multiple underlying source tables. See the Remarks section of the page I linked where it discusses how you can only use default organization details.