I am having a table with columns appid,logmessage and date. here, neither of the log message,date or appid are unique or primary keys.
I have confusion about usage of indexes. The table might have millions of rows, hence it is very much required that the data retieval should be as efficient as possible.
can any one suggest good design for this table using clusterd and non clustered indexes.
You should place the index on the column used to filter the data.
In this case, it could be appid and date.
You must be able to predict or detect the SQL run against this table to decide which indexes are required.
if it mostly filter on appid and date, create one index including both columns.
Related
I am trying to index already created columns with over 5 million data in my table. My question is if I add index with the migration will the already created data be indexed as well ? Or do I need to re-index the created data if so how ?
This is my migration
add_index :data_prods, :date_field
add_index :data_prods, :entity_id
Thank you.
Edit
I am using PostgreSQL dbms.
The process of adding an index re-indexes the entire tables contents. A table with 5 million rows may take some time, I suggest testing in a staging environment (with a similar amount of data) to see how long this migration will take, as well as impact to the application.
Re: your comment about improving query times
Indexes will make queries faster, where the indexed columns are commonly referenced in "where" clauses. In your case, any query where you filter by date_field OR entity_id will be faster, but other queries will not be improved. It should be noted that each query will only use 1 index, if the majority of your queries use both date_field AND entity_id at the same time to filter data, you might be better off using a composite index. Id check out this post for further reading on composite indexes.
Index on multiple columns in Ruby on Rails
I do have multiple tables (MySQL) and I want to have a single index for them.
Each table has the primary key of int autoincrement type.
The structure of collected data is the same for each table (so no conflict), but as the IDs collide so it seems that I have to query each index separately (unless you can give me a hint of how to avoid ID collision)
Question is: If I query each index separately does it means that the weight of returned results are comparable between indexes?
unless you can give me a hint of how to avoid ID collision
See for example
http://sphinxsearch.com/forum/view.html?id=13078
You can just arrange for the ids to be offset differently. The 'sphinx document id' doesnt have to match the real primary key, but having a simple mapping makes the application simpler.
You have a choice between one-index, one-source (using a single sql query to union all the tables together. one-index, many-source. (a source per table, all making one index) or many-indexes (one index per table, each with own source). Which ever way will give the same query results.
If I query each index separately does it means that the weight of returned results are comparable between indexes?
Pretty much. The difference should be negiblibe that doesnt matter whic way round you do it.
I am new to postgres and am experimenting with the hstore extension.Looking for some guidance. I need to support basic reporting on timeseries data for various products that we sell. I have a large amount data in the format "Timestamp, Value" for each product. This data is available in a csv fle for each product.
I am thinking of using hstore to store this data in the key value format. Assuming that all the timeseries data for a single product can be stored in a single hstore object. I need to be able to query this data by specific times, say what was the value of a product at a given time? Also need to run simple queries like retrieving the times where the product costed more than $100.
I'm planning to have a table with a product id column and an hstore column. But I am not very clear on how to make this work:
The hstore column needs to be loaded from thousands of timestamp,value records that exist in a csv. The hstore should be appended whenever we get a new csv.
The table needs to store the productId and corresponding Timeseries data.
Can you please advise if using hstore would be helpful ? If yes then how can I load data from csv as explained above. Also, if there could be any impact on the performance on inserts/updates in the hstore, as data grows please share your experiences.
I do think you should start with a simple, normalised schema first, especially since you are new to PostgreSQL. Something like:
CREATE TABLE product_data
(
product TEXT, -- I'm making an assumption about the types of your columns
time TIMESTAMP,
value DOUBLE PRECISION,
PRIMARY KEY (product, time);
);
I would definitely keep hstore and similar options in mind, if and when your data becomes large enough that efficiency is more important and simplicity. But note that all options have an efficiency tradeoff.
Do you know how much data you're going to support? Number of products, number of distinct timestamps for each product?
What other queries do you want to run? A query for the times where a single product cost more than $100 would benefit from an index on (product, value), if the product has many distinct timestamps.
Other options
hstore is most useful if you want to store a table set of arbitrary key-value pairs in a row. You could use it here, with a row for each product, and each distinct timestamp for that product being a key in the product's table. The downsides are that keys and values in hstore are text, whereas your keys are timestamps, and your values are numbers of some kind. So there will be a certain reduction in type checking, and a certain increase in type casting cost required. Another possible downside is that some queries on the hstore might not use indexes very efficiently. The above table can use simple btree indexes for range queries (say you want to pull out the values between two dates for a product). But hstore indexes are much more limited; you can use a gist or gin index on an hstore column to find all the rows that feature a certain key.
Another option (which I've played with and use experimentally for some of my databases) is arrays. Basically, each product will have an array of values, and each timestamp maps to an index in the array. This is easy if the timestamps are perfectly regular. For example, if all your products had a value every hour for every day, you could use a table like this:
CREATE TABLE product_data
(
product TEXT,
day DATE,
values DOUBLE PRECISION[], -- An array from 0 to 23.
PRIMARY KEY (product, day);
);
You can construct views and indexes to make querying this table moderate easy. (I wrote a blog post on this technique at http://ejrh.wordpress.com/2011/03/20/vector-denormalisation-in-postgresql/.)
But my advice is still: start with a simple table, then explore ways to improve efficiency when you know you're going to need them.
So I have a large table that I query (select only) quite frequently. The table is around 12,000 rows long. Since the advent of iOS, the time that it is taking to run these select queries has gone up 4-5x.
I was told that I need to add an index to my table. The query that I am using looks like this:
SELECT * FROM book_content WHERE book_id = ? AND chapter = ? ORDER BY verse ASC
How can I create an index for this table? Is it a command I just run once? What exactly is the index going to do? I didn't learn about these in school so they still seem like some sort of magic to me at this point, so I was hoping to get a little instruction.
Thanks!
You want an index on book_id and chapter. Without an index, a server would do a table scan and essentially load the entire table into memory to do its search. Do a quick search on the CREATE INDEX command for the RDBMS that you are using. You create the index once and every time you do an INSERT or DELETE or UPDATE, the server will update the index automatically. An index can be UNIQUE and it can be on multiple fields (in your case, book_id and chapter). If you make it UNIQUE, the database will not allow you to insert a second row with the same key (in this case, book_id and chapter). On most servers, having one index on two fields is different from having two individual indexes on single fields each.
A Mysql example would be:
CREATE INDEX id_chapter_idx ON book_content (book_id,chapter);
If you want only one record for each book_id, chapter combination, use this command:
CREATE UNIQUE INDEX id_chapter_idx ON book_content (book_id,chapter);
A PRIMARY INDEX is a special index that is UNIQUE and NOT NULL. Each table can only have one primary index. In fact, each table should have one primary index to ensure table integrity, especially during joins.
You don't have to think of indexes as "magic".
An index on an SQL table is much like the index in a printed book - it lets you find what you're looking for without reading the entire book cover-to-cover.
For example, say you have a cookbook, and you're looking for recipes that involve chicken. The index in the back of the book might say something like:
chicken: 30,34,72,84
letting you know that you will find chicken recipes on those 4 pages. It's much faster to find this information in the index than by reading through the whole book, because the index is shorter, and (more importantly) it's in alphabetical order, so you can quickly find the right place in the index.
So, in general you want to create indexes on columns that you will regularly need to query (book_id and chapter, in your example).
When you declare a column as primary key automatically generates an index on that column. In your case for using more often select an index is ideal, because they improve time of selection queries and degrade the time of insertion. So you can create the indexes you think you need without worrying about the performance
Indexes are a very sensitive subject. If you consider using them, you need to be very careful how many you make. The primary key, or id, of each table should have a clustered index. All the rest, it depends on how you plan to use them. I'm very fuzzy in the subject of indexes, and have actually never worked with them, but from a seminar I just watched actually yesterday, you don't want too many indexes - because they can actually slow things down when you don't need to use them.
Let's say you put an index on 5 out of 8 fields on a table. Each index is designated for a particular query somewhere in your software. Well, when 1 query is run, it uses that 1 index, and doesn't need the other 4. So that's unneeded weight on this 1 query. If you need an index, be sure that this is an index which could be useful in many places, not just 1 place.
I am trying to create an index on one of my tables with an accurate label. Here is how I am trying it...expecting "sysname" to resolve to the column or table name. But after I run this command and view it in the Object Explorer, it is listed as
"[<Name of Missing Index, sysname + '_prod',>]".
How do u define index_names in a better descriptive fashion? (I am trying to add the extension "_prod" to the index_name, since INDEX of index_name already exists).
USE [AMDMetrics]
GO
CREATE NONCLUSTERED INDEX
[<Name of Missing Index, sysname + '_prod',>]
ON [SMARTSOLVE].[V_CXP_CUSTOMER_PXP] ([QXP_UDF_STRING_8], [QXP_REPORT_DATE],
[QXP_XRS_DESCRIPTION])
INCLUDE ([QXP_ID], [QXP_EXCEPTION_NO], [QXP_BASE_EXCEPTION], [QXP_CATEGORY],
[QXP_OCCURENCE_DATE], [QXP_COORD_ID], [QXP_SHORT_DESC], [QXP_ROOT_CAUSE],
[QXP_DESCRIPTION], [QXP_QEI_ID], [PXP_LOT_NUMBER], [CXP_ID], [CXP_AWARE_DATE],
[QXP_XSV_CODE], [QXP_COORD_NAME], [PXP_PRODUCT_CODE], [PXP_PRODUCT_NAME],
[QXP_ORU_NAME], [QXP_RESOLUTION_DESC], [QXP_CLOSED_DATE], [CXP_CLIENT_CODE],
[CXP_CLIENT_NAME])
I'm not 100% sure what you are trying to do, but it seems like you are trying to find a way to properly name your index (or find a good naming convention). Conventions are best when they are easy to follow, and make sense to people without having to explain it to them. A lot of different conventions fit this MO, but the one that is most common is this:
Index Type Prefix Complete Index name
-------------------------------------------------------------------
Index (not unique, non clustered) IDX_ IDX_<name>_<column>
Index (unique, non clustered) UDX_ UDX_<name>_<column>
Index (not unique, clustered) CIX_ CIX_<name>_<column>
Index (unique, clustered) CUX_ CUX_<name>_<column>
Although on a different note, I have to question why you have so many columns in your INCLUDE list....without knowing the size of those columns, there are some drawbacks to adding so many columns:
Avoid adding unnecessary columns. Adding too many index columns,
key or nonkey, can have the following performance implications:
- Fewer index rows will fit on a page. This could create I/O increases
and reduced cache efficiency.
- More disk space will be required to store the index. In particular,
adding varchar(max), nvarchar(max), varbinary(max), or xml data types
as nonkey index columns may significantly increase disk space requirements.
This is because the column values are copied into the index leaf level.
Therefore, they reside in both the index and the base table.
- Index maintenance may increase the time that it takes to perform modifications,
inserts, updates, or deletes, to the underlying table or indexed view.
You will have to determine whether the gains in query performance outweigh
the affect to performance during data modification and in additional disk
space requirements.
From here: http://msdn.microsoft.com/en-us/library/ms190806.aspx