sqlite3 database help in improving performance and design - select

I have a sqlite3 database with this schema:
CREATE TABLE [dict] (
[Entry] [CHAR(209)],
[Definition] [CHAR(924975)]);
CREATE INDEX [i_dict_entry] ON [dict] ([Entry]);
it's a kind of dictionary with 260000 records and nearly 1GB of size; I have created an index for the Entry column to improve performance;
a sample of a row's entry column is like this:
|love|lovingly|loves|loved|loving|
All the words which are separated with | are referring to the same definition;(I put all of them in one string, separated with | to prevent duplication of data in Definition column)
and this is the command that I use to retrieve the results:
SELECT * FROM dict WHERE Entry like '%|loves|%'
execution time: ~1.7s
if I use = operator instead of LIKE operator, the execution is nearly instantaneous;
SELECT * FROM dict WHERE Entry='|love|lovingly|loves|loved|loving|'
but this way I can't search for words like: love,loves...(separately I mean)
My questions:
Although I have created an index for the Entry column, is indexing really effective while we are using LIKE operator with % in it?
what about the idea that I create different rows for each part of composite Entry columns(one for love another for loves...then all will have the same definition) and then use = operator? if yes; is there anyway of referencing of data? I mean rather than repeating the same Definition for each entry, create one and all others point to it; is it possible?
thanks in advance for any tip and suggestion;

Every entry should have a separate row in the database:
CREATE TABLE Definitions (
DefinitionID INTEGER PRIMARY KEY,
Definition TEXT
);
CREATE TABLE Entries (
EntryID INTEGER PRIMARY KEY,
DefinitionID INTEGER REFERENCES Definitions(DefinitionID),
Entry TEXT
);
CREATE INDEX i_entry ON Entries(Entry);
You can then query the definition by joiing the two tables:
SELECT Definition
FROM Entries
JOIN Definitions USING (DefinitionID)
WHERE Entry = 'loves'
Also see Database normalization.

Related

Indexing on Timescaledb

I am testing some queries on Postgresql extension Timescaledb.
The table is called timestampdb and i run some queries on that seems like this
select id13 from timestampdb where timestamp1 >='2010-01-01 00:05:00' and timestamp1<='2011-01-01 00:05:00',
select avg(id13)::numeric(10,2) from timestasmpdb where timestamp1>='2015-01-01 00:05:00' and timestamp1<='2015-01-01 10:30:00'
When i create a hypertable i do this.
create hyper_table('timestampdb','timestamp1')
The thing is that now i want to create an index on id13.
should i try something like this?:
create hyper_table('timestampdb','timestamp1') ,import data of the table and then create index on timestampdb(id13)
or something like this:
create table timestampdb,then create hypertable('timestampdb',timestamp1') ,import the data and then CREATE INDEX ON timestampdb (timestamp1,id13)
What is the correct way to do this?
You can create an index without time dimension column, since you don't require it to be unique. Including time dimension column into an index is needed if an index contains UNIQUE or is PRIMARY KEY, since TimescaleDB partitions a hypertable into chunks on the time dimension column, which is timestamp1 in the question. If partitioning key will include space dimension columns in addition to time, they will need to be included too.
So in your case the following should be sufficient after the migration to hypertable:
create index on timestampdb(id13);
The question contains two queries and none of them need index on id13. It will be valuable to create the index on id13 if you expect different queries than in the question, which will contain condition or join on id13 column.

I have two columns, I want the second column to have the same values as the first column

I have two columns, I want the second column to have the same values as the first column always, in PostgreSQL.
The columns are landmark_id (integer) and name (varchar), I want the name column to always have the same values (id's) from landmark_id.
landmark_id (integer) | name (varchar)
1 1
2 2
3 3
I don't understand why you would want to do that, but I can think of two ways to accomplish your request. One is by using a generated column
CREATE TABLE g (
landmark_id int,
name varchar(100) GENERATED ALWAYS AS (landmark_id::varchar) STORED
)
and the other is by enforcing a constraint
CREATE TABLE c (
landmark_id int,
name varchar(100),
CONSTRAINT equality_cc CHECK (landmark_id = name::varchar)
)
Both approaches will cause the name column to occupy disk space. The first approach will not allow you to specify the name column in INSERT or UPDATE statements. In the latter case, you will be forced to specify both columns when inserting.
You could also have used a trigger to update the second column.
Late edit: Others suggested using a view. I agree that it's a better idea than what I wrote.
Create a view, as suggested by #jarlh in comments. This automatically generates column name for you on the fly. This is usually preferred to storing essentially the same data multiple times as in an actual table, where the data occupies more disk space and also can get out of sync. For example:
CREATE VIEW landmarks_names AS
SELECT landmark_id,
landmark_id::text AS name
FROM landmarks;

Indexing PostgreSQL JSONB Array Elements

Like the title says, how can I index a JSONB array?
The contents look like...
["some_value", "another_value"]
I can easily access the elements like...
SELECT * FROM table WHERE data->>0 = 'some_value';
I created an index like so...
CREATE INDEX table_data_idx ON table USING gin ((data) jsonb_path_ops);
When I run EXPLAIN, I still see it sequentially scanning...
What am I missing on indexing an array of text elements?
If you want to support that exact query with an index, the index would have to look like this:
CREATE INDEX ON "table" ((data->>0));
If you want to use the index you have, you cannot limit the search to just a specific array element (in your case, the first). You can speed up a search for some_value anywhere in the array:
SELECT * FROM "table"
WHERE data #> '["some_value"]'::jsonb;
I ended up taking a different approach. I am still having problems getting the search to work using a JSONB Type, so I ended up switching my column to a varchar ARRAY
CREATE TABLE table (
data varchar ARRAY NOT NULL
);
CREATE INDEX table_data_idx ON table USING GIN (data);
SELECT * FROM table WHERE data #> '{some_value}';
This works and is using the index.
I think my problem with my JSONB approach is because the element is actually nested much further and being treated as text.
i.e. data->'some_key'->>'array_key'->>0
And everytime I try to search I get all sorts of invalid token errors and other such things.
You may want to create a materialized view that has the primary key (or other unique index of your table) and expands the array field into a text column with the jsonb_array_elements_text function:
CREATE MATERIALIZED VIEW table_mv
AS
SELECT DISTINCT table.id, jsonb_array_elements_text(data->0) AS array_elem FROM table;
You can then create a unique index on this materialized view (primary keys are not supported on materialized views):
CREATE UNIQUE INDEX table_array_idx ON table_mv(id, array_elem);
Then query with a join to the original table on its primary key:
SELECT * FROM table INNER JOIN table_mv ON table.id = table_mv.id WHERE table_mv.array_elem = 'some_value';
This query should use the unique index and then look up the primary key of the original table, both very fast.

Compute shared hstore key names in Postgresql

If I have a table with an HSTORE column:
CREATE TABLE thing (properties hstore);
How could I query that table to find the hstore key names that exist in every row.
For example, if the table above had the following data:
properties
-------------------------------------------------
"width"=>"b", "height"=>"a"
"width"=>"b", "height"=>"a", "surface"=>"black"
"width"=>"c"
How would I write a query that returned 'width', as that is the only key that occurs in each row?
skeys() will give me all the property keys, but I'm not sure how to aggregate them so I only have the ones that occur in each row.
The manual gets us most of the way there, but not all the way... way down at the bottom of http://www.postgresql.org/docs/8.3/static/hstore.html under the heading "Statistics", they describe a way to count keys in an hstore.
If we adapt that to your sample table above, you can compare the counts to the # of rows in the table.
SELECT key
FROM (SELECT (each(properties)).key FROM thing1) AS stat
GROUP BY key
HAVING count(*) = (select count(*) from thing1)
ORDER BY key;
If you want to find the opposite (all those keys that are not in every row of your table), just change the = to < and you're in business!

GIST Index Expression based on Geography Type Column Problems

I have question about how postgresql use indexes. I have problems with Gist Index Expression based on Geography Type Column in Postgresql with Postgis enabled database.
I have the following table:
CREATE TABLE place
(
id serial NOT NULL,
name character varying(40) NOT NULL,
location geography(Point,4326),
CONSTRAINT place_pkey PRIMARY KEY (id )
)
Then I created Gist Index Expression based on column "location"
CREATE INDEX place_buffer_5000m ON place
USING GIST (ST_BUFFER(location, 5000));
Now suppose that in table route I have column shape with Linestring object and I want to check which 5000m polygons (around the location) the line crosses.
The query below in my opinion shoud use the "place_buffer_5000m" index but does not use it.
SELECT place.name
FROM place, route
WHERE
route.id=1 AND
ST_CROSSES(route.shape::geometry, ST_BUFFER(place.location, 5000)::geometry))
Table place have about 76000 rows. Analyze and Vacuum was run on this table with recreating "place_buffer_5000m" index but the index is not used during the above query.
What is funny when I create another column in table place named "area_5000m" (geograpthy type) and update the table like:
UPDATE place SET area_5000m=ST_BUFFER(location, 5000)
And then create gist index for this column like this:
CREATE INDEX place_area_5000m ON place USING GIST (area_5000m)
Then using the query:
SELECT place.name
FROM place, route
WHERE
route.id=1 AND
ST_CROSSES(route.shape::geometry, place.area_5000m::geometry))
The index "place_area_5000m" is used.
The question is why the Index expression that is calculated based on location column is not used?
Did you try to add a cast to your "functional index"?
This could help to determine the data type.
It should work with geometry and probably also for geography, like this:
CREATE INDEX place_buffer_5000m ON place
USING GIST(ST_BUFFER(location, 5000)::geometry);
Ultimately, you want to know what routes are within 5 km of places, which is a really simple and common type of query. However, you are falling into a common trap: don't use ST_Buffer to filter! It is expensive!
Use ST_DWithin, which will use a regular GiST index (if available):
SELECT place.name
FROM place, route
WHERE route.id = 1 AND ST_DWithin(route.shape::geography, place.location, 5000);