Is index necessary in temp-table? - progress-4gl

I am linking two tables existing in Database using Temp Table criteria. Is it necessary to create an index while creating temp-tables in Progress?

It's not necessary but it is a good idea!
If you care about your performance you should add NO-UNDO as well as the correct indices to your temp-tables. This has been stated in numerous "best practices" documentations.
It might be hard to know beforehand if there are performance gains with an added index. But: the cost of adding the index is usually very low. Just add index to match how you query the temp-table!
In version 11.6 the XREF compile argument has added support for temp-table index usage.

Related

Choosing the right database index type

I have a very simple Mongo database for a personal nodejs project. It's basically just records of registered users.
My most important field is an alpha-numeric string (let's call it user_id and assume it can't be only numeric) of about 15 to 20 characters.
Now the most important operation is checking if the user exists at or all not. I do this by querying db.collection.find("user_id": "testuser-123")
if no record returns, I save the user along with some other not so important data like first name, last and signup date.
Now I obviously want to make user_id an index.
I read the Indexing Tutorials on the official MongoDB Manual.
First I tried setting a text index because I thought that would fit the alpha-numeric field. I also tried setting language:none. But it turned out that my query returned in ~12ms instead of 6ms without indexing.
Then I tried just setting an ordered index like {user_id: 1}, but I haven't seen any difference (is it only working for numeric values?).
Can anyone recommend me the best type of index for this case or quickest query to check if the user exists? Or maybe is MongoDB not the best match for this?
Some random thoughts first:
A text index is used to help full text search. Given your description this is not what is needed here, as, if I understand it well, you need to use an exact match of the whole field.
Without any index, MongoDB will use a linear search. Using big O notation, this is an O(n) operation. With an (ordered) index, the search is performed in O(log(n)). That means that an index will dramatically speed up queries when you will have many documents. But you will not necessary see any improvement if you have a small number of documents. In that case, O(n) can even be worst than O(log(n)). Some database management systems don't even bother using the index if the optimizer estimate that it will not provide enough benefits. I don't know if MongoDB does that, though.
Given your use case, I think the proper index is an unique index. This is an ordered index that would prevent insertion of two identical documents.
In your application, do not test before insert. In real application, this could lead to race condition when you have concurrent inserts. If you use an unique index, just try to insert -- and be prepared to gracefully handle an error caused by a duplicate key.

How to find implicitly generated indexes on postgresql

Is there any way to detect if an index was generated automatically by the DBMS when adding,
e.g. the primary key or a unique constraint?
At the moment I try to fetch all the indexes of an table by using jdbc metadata. But the result here contains also the implizit generated indexes. I need now the possibility to detect if a dedicated index was auto-generated or not.
I've already tried to get these information from tables like pg_class or pg_index. But no success.
I don't think there is any way to distinguish those indexes - after all there is no difference between an index created automatically and one that was created manually.
The only way I can think of is to stick to some naming convention for your "manual" indexes. Then you could filter out all those that do not comply with that naming convention.

How to determine what type of index to use in Postgres?

I have a Postgres database that has 2 columns that are not primary keys (nor can be), but are searched on a lot and are compared for equality to 2 columns in other tables.
I believe this is a perfect case for adding an index to my tables. I have never used indexing on a database before so I am trying to learn the proper way of doing this.
I have learned that there are multiple types of indexing I can pick from. How do I determine what method will be the most efficient for my database? Also would the proper method be to create a single index that covers both columns?
Postgres support B-tree, R-tree, Hash, GiST and GIN indexing types. B-tree indexing is the most common and fits most common scenarios. This is the syntax:
CREATE INDEX idex_name ON table_name USING btree(column1, column2);
Here is the createindex documentation and here is more info on different indextypes in postgres.
What type of index you should use depends on what types of operations you want to perform. If you simply want equality checking then hash index is the best. For most common operations(e.g. comparison, pattern matching) B-tree should be used. I have personally never used GiST or GIN indexing. ANY Guru out there?
The documentation describes all these types. They can help you better than me :)
Hope this helps.
Try to understand the queryplanner as well, because this part of PostgreSQL has to work with your indexes. EXPLAIN ANALYZE will be essential to optimise your queries.

What's the difference between B-Tree and GiST index methods (in PostgreSQL)?

I have been working on optimizing my Postgres databases recently, and traditionally, I've only ever use B-Tree indexes. However, I saw that GiST indexes suport non-unique, multicolumn indexes, in the Postgres 8.3 documentation.
I couldn't, however, see what the actual difference between them is. I was hoping that my fellow coders might beable to explain, what the pros and cons between them are, and more importantly, the reasons why I would use one over the other?
In a nutshell: B-Tree indexes perform better, but GiST indexes are more flexible. Usually, you want B-Tree indexes if they'll work for your data type. There was a recent post on the PG lists about a huge performance hit for using GiST indexes; they're expected to be slower than B-Trees (such is the price of flexibility), but not that much slower... work is, as you might expect, ongoing.
From a post by Tom Lane, a core PostgreSQL developer:
The main point of GIST is to be able to index queries that simply are
not indexable in btree. ... One would fully
expect btree to beat out GIST for btree-indexable cases. I think the
significant point here is that it's winning by a factor of a couple
hundred; that's pretty awful, and might point to some implementation
problem.
Basically everybody's right - btree is default index as it performs very well. GiST are somewhat different beasts - it's more of a "framework to write index types" than a index type on its own. You have to add custom code (in server) to use it, but on the other hand - they are very flexible.
Generally - you don't use GiST unless the datatype you're using tell you to do so. Example of datatypes that use GiST: ltree (from contrib), tsvector (contrib/tsearch till 8.2, in core since 8.3), and others.
There is well known, and pretty fast geographic extenstion to PostgreSQL - PostGIS (http://postgis.refractions.net/) which uses GiST for its purposes.
GiST are more general indexes. You can use them for broader purposes that the ones you would use with B-Tree. Including the ability to build a B-Tree using GiST.
I.E.: you can use GiST to index on geographical points, or geographical areas, something you won't be able to do with B-Tree indexes, since the only thing that matter on a B-Tree is the key (or keys) you are indexing on.
GiST indexes are lossy to an extent, meaning that the DBMS has to deal with false positives/negatives, i.e.:
GiST indexes are lossy because each document is represented in the index by a fixed-
length signature. The signature is
generated by hashing each word into a
random bit in an n-bit string, with
all these bits OR-ed together to
produce an n-bit document signature.
When two words hash to the same bit
position there will be a false match.
If all words in the query have matches
(real or false) then the table row
must be retrieved to see if the match
is correct.
b-trees do not have this behavior, so depending on the data being indexed, there may be some performance difference between the two.
See for text search behavior http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html and http://www.postgresql.org/docs/8.3/static/indexes-types.html for a general purpose comparison.

PostgreSQL: GIN or GiST indexes?

From what information I could find, they both solve the same problems - more esoteric operations like array containment and intersection (&&, #>, <#, etc). However I would be interested in advice about when to use one or the other (or neither possibly).
The PostgreSQL documentation has some information about this:
GIN index lookups are about three times faster than GiST
GIN indexes take about three times longer to build than GiST
GIN indexes are about ten times slower to update than GiST
GIN indexes are two-to-three times larger than GiST
However I would be particularly interested to know if there is a performance impact when the memory to index size ration starts getting small (ie. the index size becomes much bigger than the available memory)? I've been told on the #postgresql IRC channel that GIN needs to keep all the index in memory, otherwise it won't be effective, because, unlike B-Tree, it doesn't know which part to read in from disk for a particular query? The question would be: is this true (because I've also been told the opposite of this)? Does GiST have the same restrictions? Are there other restrictions I should be aware of while using one of these indexing algorithms?
First of all, do you need to use them for text search indexing? GIN and GiST are index specialized for some data types. If you need to index simple char or integer values then the normal B-Tree index is the best.
Anyway, PostgreSQL documentation has a chapter on GIST and one on GIN, where you can find more info.
And, last but not least, the best way to find which is best is to generate sample data (as much as you need to be a real scenario) and then create a GIST index, measuring how much time is needed to create the index, insert a new value, execute a sample query. Then drop the index and do the same with a GIN index. Compare the values and you will have the answer you need, based on your data.