How to use new RT index with existing large MySQL table with Sphinx / Manticore? - sphinx

I have a large (200Gb) MySQL table which is constantly grows with new rows. Is it possible to create an RT index in Manticore and fill it with existing data from this table? Or is it possible to alter existing RT index with new charset_table and be available to search through all the table data, now only added after altering the index?

I've found the solution! Attaching a plain index to an RT index.
At first, create a plain index with source, then attach it to RT index and populate the RT index with new incoming data. In my case it took near 2 hours for plain indexing and less than one second for attaching.

Related

Postgres timescale hypertable: is separate index necessary?

I want to create a hypertable in postgres timescale.
What I do is CREATE TABLE then CREATE INDEX and finally SELECT CREATE_HYPERTABLE.
My question: is CREATE INDEX necessary, helpful or problematic for a high performance of the hypertable?
In short: no indexes are needed to be created as TimescaleDB will create an index on time dimension by default. Depending on your usage you might want to create indexes to speedup select queries and it is good to create them after creating the hypertable.
In more details:
Creating hypertable with create_hypertable function replaces the original PotgreSQL table with new table. Thus it is better to create hypertable and then create index. It also works to create index first, and then call create_hypertable. In such case the existing indexes will be recreated on the hypertable. It is important to remember that unique indexes and primary keys need to include time dimension column. And note that create_hypertable will create an index on the time dimension column by default.
In general, the considerations for creating indexes are the similar as with PostgreSQL: there are tradeoffs in using indexes. Indexes introduces overheads during data ingesting, while can improve select queries significantly. I suggest to check the best practice of using indexes in TimescaleDB and the blog about using composite indexes for time-series queries

incremental update of a postgres index

I have inserted a lot of data (more than 2 millions documents) in a table and created an a full text search index using GIN and it works great. I can query the database and retrieve the apropriate documents rapidly.
Regularly, I collect new data that I can insert in the database. What I would like to do is to update my index with the new data only, but I have failed so far. I don't want to drop the index and recreate it because it takes ages to recreate it. I basically would like to do an incremental update of the index. I can do that on the fly when data is being inserted but this is very very slow. I read that creating an index on inserted data was faster (true) so I guessed that updating an index on the new data could be done. But I can't do it so far.
I use postgresql 12.
Can anybody help me, please?
There is no way to suspend adding values to the index while you load data.
But GIN indexes already have a feature to optimize that: the GIN fast update technique.
If you set the gin_pending_list_limit storage parameter to the index to a high value. Once you are done with the bulk load, VACUUM the table to integrate the pending list into the main index.
An alternative approach is to use partitioning and load a partition at once. Then create the index on the partition and attach it to the partitioned table.

Postgres alter index vs drop index and create index

I have to write a migration command to remove a column from the index. Currently let us say I have table1 that has index on col1 and col2
I want to remove col1 from the index. I am looking at https://www.postgresql.org/docs/9.4/static/sql-alterindex.html but it does not seem I can actually just remove a column?
If yes, will it be better to remove the column and how VS
Create new Index
Drop the old index
Also, I want to do the reverse if I need to do downgrade. So just wondering how to achieve this
The ability to alter an index doesn't exist because in order to do so you would have to destroy and recreate the index with the new columns. By default, Postgres uses B-Trees to create indices and removing a column causes that B-Tree to become invalid. As a result the B-Tree needs to be built from scratch.
If you want some more details on how indices work under the hood, this is a good article: Postgres Indices Under the Hood
You’re right, you’ll have to create a new index with a single column and then drop an old index with two columns.

Will postgresql generate index automatically?

is there automatic index in Postgresql or need users to create index explicitly? if there is automatic index, how can I view it? thanks.
An index on the primary key and unique constraints will be made automatically. Use CREATE INDEX to make more indexes. To view existing database structure including the indexes, use \d table.
A quick example of generating an index would be:
CREATE INDEX unique_index_name ON table (column);
You can create an index on multiple columns:
CREATE INDEX unique_index_name ON table (column1, column2, column3);
Or a partial index which will only exist when conditions are met:
CREATE INDEX unique_index_name ON table (column) WHERE column > 0;
There is a lot more you can do with them, but that is for the documentation (linked above) to tell you. Also, if you create an index on a production database, use CREATE INDEX CONCURRENTLY (it will take longer, but not lock out new writes to the table). Let me know if you have any other questions.
Update:
If you want to view indexes with pure SQL, look at the pg_catalog.pg_indexes table:
SELECT *
FROM pg_catalog.pg_indexes
WHERE schemaname='public'
AND tablename='table';

Confused between clustered and nonclustered index. Contains 5 doubts

Do the clustered and non-clustered indexes both work on B-Tree? I read that clustered indexes affect the way how the data is physically stored in table whereas with non-clustered indexes a separate copy of the column is created and that is stored in sorted order. Also, Sql Server creates clustered indexes on primary key by default.
Does that mean :
1) Non clustered indexes occupy more space than clustered indexes since a separate copy of column is stored in non clustered?
2) How does the clustered and non clustered index work when we have primary key based on two columns say.. (StudentName,Marks)?
3) Are there only 2 types of indexes? If so, then what are bitmap indexes? I can't seem to find any such index type in Sql Server Management Studio but in my datawarehousing book all these types are mentioned.
4) Is creating clustered or non-clustered index on primary key effecient?
5) Suppose we create clustered index on name i.e data is physically stored in sorted order name wise then a new record is created. How will the new record find it's place in table?
Thanks in advance :)
Indexes are structures stored separately from the actual datapages and simply contain pointers to the datapages. In SQL Server indexes are B-Trees.
Clustered indexes sort and store the datapages in the table according to the columns defined for the index. In SQL Server 2005 you can add additional columns to an index so it should not be a problem when you have composite primary keys. You can think of a clustered index like a set of filing cabinets with folders. In the first draw you have documents starting with A and in the first folder of that draw you may have documents starting from AA to AC and so on. To search for "Spider" then, you can jump straight to the S draw and look for the folder containing "SP" and quickly find what you are looking for. But it is obvious that if you sort all documents physically by one index then you cannot physically sort the same set of documents by another index. Hence, only one clustered index per table.
A Non Clustered index is a separate structure much like the table of contents or the index at the back of a book. So I think I have only answered some of your questions specifically:
Yes the index does occupy space but not as much as the original table. That is why you must choose your indexes carefully. There is also a small performance hit for update operations since the index has to be maintained.
Your book will mention all the theoretical types of indexes. Bitmap indexes are useful in data warehousing applications or for data that has a few distinct values like days of the week etc. So they are not generally used in your basic RDBMS. I know that Oracle has some implementations but I don't know much about that.
I think that efficiency of an index is determined by how the field is used. It is expected that the majority of the data scanning in your table will be done on the primary key then an index on the primary key makes sense. You usually add indexes to columns that appear in the where clause or the join condition of your queries.
On insert the index has to be maintained, so there is a little extra work that has to be done by the system to rearrange things a bit.