GIST Index Expression based on Geography Type Column Problems - postgresql

I have question about how postgresql use indexes. I have problems with Gist Index Expression based on Geography Type Column in Postgresql with Postgis enabled database.
I have the following table:
CREATE TABLE place
(
id serial NOT NULL,
name character varying(40) NOT NULL,
location geography(Point,4326),
CONSTRAINT place_pkey PRIMARY KEY (id )
)
Then I created Gist Index Expression based on column "location"
CREATE INDEX place_buffer_5000m ON place
USING GIST (ST_BUFFER(location, 5000));
Now suppose that in table route I have column shape with Linestring object and I want to check which 5000m polygons (around the location) the line crosses.
The query below in my opinion shoud use the "place_buffer_5000m" index but does not use it.
SELECT place.name
FROM place, route
WHERE
route.id=1 AND
ST_CROSSES(route.shape::geometry, ST_BUFFER(place.location, 5000)::geometry))
Table place have about 76000 rows. Analyze and Vacuum was run on this table with recreating "place_buffer_5000m" index but the index is not used during the above query.
What is funny when I create another column in table place named "area_5000m" (geograpthy type) and update the table like:
UPDATE place SET area_5000m=ST_BUFFER(location, 5000)
And then create gist index for this column like this:
CREATE INDEX place_area_5000m ON place USING GIST (area_5000m)
Then using the query:
SELECT place.name
FROM place, route
WHERE
route.id=1 AND
ST_CROSSES(route.shape::geometry, place.area_5000m::geometry))
The index "place_area_5000m" is used.
The question is why the Index expression that is calculated based on location column is not used?

Did you try to add a cast to your "functional index"?
This could help to determine the data type.
It should work with geometry and probably also for geography, like this:
CREATE INDEX place_buffer_5000m ON place
USING GIST(ST_BUFFER(location, 5000)::geometry);

Ultimately, you want to know what routes are within 5 km of places, which is a really simple and common type of query. However, you are falling into a common trap: don't use ST_Buffer to filter! It is expensive!
Use ST_DWithin, which will use a regular GiST index (if available):
SELECT place.name
FROM place, route
WHERE route.id = 1 AND ST_DWithin(route.shape::geography, place.location, 5000);

Related

fuzzy finding through database - prisma

I am trying to build a storage manager where users can store their lab samples/data. Unfortunately, this means that the tables will end up being quite dynamic, as each sample might have different data associated with it. I will still require users to define a schema, so I can display the data properly, however, I think this schema will have to be represented as a JSON field in the underlying database.
I was wondering, in Prisma, is there a way to fuzzy search through collections. Could I type something like help and then return all rows that match this expression ANYWHERE in their columns? (including the JSON fields). Could i do something like this at all with posgresql? Or with MongoDB?
thank you
You can easily do that with jsonb in PostgreSQL.
If you have a table defined like
CREATE TABLE userdata (
id bigint PRIMARY KEY,
important_col1 text,
important_col2 integer,
other_cols jsonb
);
You can create an index like this
CREATE INDEX ON userdata USING gin (other_cols);
and search efficiently with
SELECT id FROM userdata WHERE other_cols #> '{"attribute": "value"}';
Here, #> is the JSON containment operator in PostgreSQL.
Yes, in PostgreSQL you surely can do this. It's quite straightforward. Here is an example.
Let your table be called the_table aliased as tht. Cast an entire table row as text tht::text and use case insensitive regular expression match operator ~* to find rows that contain help in this text. You can use more elaborate and powerful regular expression for searching too.
Please note that since the ~* operator will defeat any index, this query will result in a sequential scan.
select * -- or whatever list of expressions you need
from the_table as tht
where tht::text ~* 'help';

Indexing PostgreSQL JSONB Array Elements

Like the title says, how can I index a JSONB array?
The contents look like...
["some_value", "another_value"]
I can easily access the elements like...
SELECT * FROM table WHERE data->>0 = 'some_value';
I created an index like so...
CREATE INDEX table_data_idx ON table USING gin ((data) jsonb_path_ops);
When I run EXPLAIN, I still see it sequentially scanning...
What am I missing on indexing an array of text elements?
If you want to support that exact query with an index, the index would have to look like this:
CREATE INDEX ON "table" ((data->>0));
If you want to use the index you have, you cannot limit the search to just a specific array element (in your case, the first). You can speed up a search for some_value anywhere in the array:
SELECT * FROM "table"
WHERE data #> '["some_value"]'::jsonb;
I ended up taking a different approach. I am still having problems getting the search to work using a JSONB Type, so I ended up switching my column to a varchar ARRAY
CREATE TABLE table (
data varchar ARRAY NOT NULL
);
CREATE INDEX table_data_idx ON table USING GIN (data);
SELECT * FROM table WHERE data #> '{some_value}';
This works and is using the index.
I think my problem with my JSONB approach is because the element is actually nested much further and being treated as text.
i.e. data->'some_key'->>'array_key'->>0
And everytime I try to search I get all sorts of invalid token errors and other such things.
You may want to create a materialized view that has the primary key (or other unique index of your table) and expands the array field into a text column with the jsonb_array_elements_text function:
CREATE MATERIALIZED VIEW table_mv
AS
SELECT DISTINCT table.id, jsonb_array_elements_text(data->0) AS array_elem FROM table;
You can then create a unique index on this materialized view (primary keys are not supported on materialized views):
CREATE UNIQUE INDEX table_array_idx ON table_mv(id, array_elem);
Then query with a join to the original table on its primary key:
SELECT * FROM table INNER JOIN table_mv ON table.id = table_mv.id WHERE table_mv.array_elem = 'some_value';
This query should use the unique index and then look up the primary key of the original table, both very fast.

jsonb data type lookup cost in postgres

This might be an obvious and simple question.
But I read through the jsonb data type documentation, but nowhere it mentions the lookup cost of a key in jsonb data.
For example, let's say I have a table with following schema:
CREATE TABLE A (id character varying (20),
info jsonb);
I want to know how postgres would parse a where query as below:
SELECT * FROM A WHERE info->>'city' = 'portland';
While going through the jsonb field of a row, is the lookup constant time (O(1)) or linear time (checking each key one by one in the row's jsonb dictionary) within that jsonb data dictionary?
My intuition is that it must be constant time (else what's the point of a dictionary style data?) but I can't see it in the official documentation to convince my team.
Any help would be great!
Thanks!
As with any WHERE condition in SQL: if there is no index, the database has to go through all rows of the table to find those that satisfy your condition.
You can either index a specific expression, or you can index the whole json value using a GIN index which then enables Postgres to use the index if any of the supported operators are used.
If you always check for the city, you can create a regular B-Tree index:
create index on a ( (info->>'city') );
If you don't know what you will be looking for, a GIN index might be a better choice:
create index on a using gin (info);
But you will need to change your query to use one of the operators that are supported by a GIN index, e.g. using the contains operator #>
select *
from a
where info #> '{"city": "portland"}::jsonb;
Note that an index lookup is not always the most efficient solution. Sometimes it's faster to simply go through all rows, sometimes the index lookup is faster.
If you want to learn more about indexes in relational database, go through the material here: http://use-the-index-luke.com/

How to properly structure a Multicolumn Index with a partial field search

What is the best way to setup multicolumn index using the full_name column and the state column? The search will use the exact state with a partial search on the full_name column. The query will like this:
WHERE full_name ~* 'jones' AND state = 'CA';
Searching roughly 20 million records.
Thanks!
John
The state seems straight-forward enough -- a normal index should suffice. As far as the full name search, this is a lot of work, but with 20 million records, I think the dividends will speak for themselves.
Create a new fields in your table as a tsvector, and call it full_name_search for the sake of this example:
alter table <blah> add column full_name_search tsvector;
Do an initial population of the column:
update <blah>
set full_name_search = to_tsvector (full_name);
If possible, make the field non-nullable.
Create a trigger that will now automatically populate this field whenever it's updated:
create trigger <blah>_insert_update
before insert or update on <blah>
for each row execute procedure
tsvector_update_trigger(full_name_search,'pg_catalog.english',full_name);
Add an index on the new field:
create index <blah>_ix1 on <blah>
using gin(full_name_search);
From here, restructure the query to search on the tsvector field instead of the text field:
WHERE full_name_search ## to_tsquery('jones') AND state = 'CA';
You can take shortcuts on some of these steps (for example, don't create an extra field but use a function-based index instead), and it will get you improved performance, but not as good as what you can get.
One caveat -- I think the to_tsvector will split into vector components based on logical breaks in the contents, so this:
Catherine Jones Is a Nice Lady
will work fine, but this:
I've been Jonesing all day
Probably won't.

sqlite3 database help in improving performance and design

I have a sqlite3 database with this schema:
CREATE TABLE [dict] (
[Entry] [CHAR(209)],
[Definition] [CHAR(924975)]);
CREATE INDEX [i_dict_entry] ON [dict] ([Entry]);
it's a kind of dictionary with 260000 records and nearly 1GB of size; I have created an index for the Entry column to improve performance;
a sample of a row's entry column is like this:
|love|lovingly|loves|loved|loving|
All the words which are separated with | are referring to the same definition;(I put all of them in one string, separated with | to prevent duplication of data in Definition column)
and this is the command that I use to retrieve the results:
SELECT * FROM dict WHERE Entry like '%|loves|%'
execution time: ~1.7s
if I use = operator instead of LIKE operator, the execution is nearly instantaneous;
SELECT * FROM dict WHERE Entry='|love|lovingly|loves|loved|loving|'
but this way I can't search for words like: love,loves...(separately I mean)
My questions:
Although I have created an index for the Entry column, is indexing really effective while we are using LIKE operator with % in it?
what about the idea that I create different rows for each part of composite Entry columns(one for love another for loves...then all will have the same definition) and then use = operator? if yes; is there anyway of referencing of data? I mean rather than repeating the same Definition for each entry, create one and all others point to it; is it possible?
thanks in advance for any tip and suggestion;
Every entry should have a separate row in the database:
CREATE TABLE Definitions (
DefinitionID INTEGER PRIMARY KEY,
Definition TEXT
);
CREATE TABLE Entries (
EntryID INTEGER PRIMARY KEY,
DefinitionID INTEGER REFERENCES Definitions(DefinitionID),
Entry TEXT
);
CREATE INDEX i_entry ON Entries(Entry);
You can then query the definition by joiing the two tables:
SELECT Definition
FROM Entries
JOIN Definitions USING (DefinitionID)
WHERE Entry = 'loves'
Also see Database normalization.