Size of a GIN index in postgreSQL - postgresql

I have created a model in Django.
class MyModel(models.Model):
features = TextField(blank=True, default='')
There are several possible ways to store the data in the feature field. Some examples below.
feature1;feature2
feature1, feature2
feature1,feature2
And so on. I created a GIN index for that field using migrations.RunSQL() (thanks to the following answer). The postgreSQL command looks as follows
CREATE INDEX features_search_idx ON "mymodel" USING gin (regexp_split_to_array("mymodel"."features", '[,;\\s]+'));
Now I need to check the size of the created index in my database. I tried to do it with the following commands
SELECT pg_size_pretty (pg_indexes_size("mymodel"."features_search_idx"));
SELECT pg_size_pretty(pg_indexes_size("features_search_idx")) FROM "mymodel";
The latter one failed with ERROR: column "features_search_idx" does not exist and the former one failed with ERROR: missing FROM-clause entry for table "mymodel".
How can I check the index size?

pg_indexes_size takes an argument of type regclass, that is an object ID that is represented as a string that is the object name. So if you don't supply an object ID, you have to supply a string (single quotes) that is the name of the table:
SELECT pg_size_pretty (pg_indexes_size('mymodel.features_search_idx'));

Related

NUMBER is automatically getting converted to DECFLOAT

I am new with DB2. I am trying to run an alter query on an existing table.
Suppose the EMP table is Already there in db which have below columns
id int
name varchar(50)
Now I am trying to Add a new column Salary for that I am running below query.
ALTER TABLE EMP ADD SALARY NUMBER
The above query run successfully. After that I described the EMP table it gave me below result:
ID INTEGER
NAME VARCHAR
SALARY DECFLOAT
As I am trying to add a column with NUMBER datatype, I dont know how NUMBER is getting converted to DECFLOAT.
It will be helpful if anybody can explain this.
Db2 version details are as follow :
Service_Level: DB2 v11.1.2.2
Fixpack_num : 2
For Db2 for Linux/Unix/Windows, with NUMBER data type enabled (it is not the default), this is the documented behaviour.
Specifically
The effects of setting the number_compat database configuration
parameter to ON are as follows. When the NUMBER data type is
explicitly encountered in SQL statements, the data type is implicitly
mapped as follows:
If you specify NUMBER without precision and scale attributes, it is mapped to DECFLOAT(16).
If you specify NUMBER(p), it is mapped to DECIMAL(p).
If you specify NUMBER(p,s), it is mapped to DECIMAL(p,s).

Signature of the Redshift internal "identity" function

While working on a legacy Redshift database I discovered unfamiliar pattern for default identity values for an autoincrement column. E.g.:
create table sometable (row_id bigint default "identity"(24078855, 0, '1,1'::text), ...
And surprisingly I wasn't able to find any docs about that identity function. The only thing I was able to dig up is the following:
select * from pg_proc proc
join pg_language lang on proc.prolang = lang.oid
where proc.proname = 'identity';
So I've found out that function to be internal, and it's prosrc column is just ff_identity_int64 (not googleable, unfortunately).
Could someone please provide me with some info about its first and second arguments? I mean 24078855 and 0 from that example "identity"(24078855, 0, '1,1'::text). ('1,1'::text -- here first 1 is the start value and second 1 is the step of increment). But 24078855 and 0 are still mysterious for me.
"identity"(24078855, 0, '1,1'::text)
table OID
0-based column index
text representation of parameters provided with IDENTITY clause
For reference look at pg_attrdef table
The IDENTITY clause is documented in the CREATE TABLE docs here: http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html#identity-clause
IDENTITY(seed, step)
Clause that specifies that the column is an IDENTITY column. An IDENTITY column contains unique auto-generated values. The data type for an IDENTITY column must be either INT or BIGINT. When you add rows using an INSERT statement, these values start with the value specified as seed and increment by the number specified as step. When you load the table using a COPY statement, an IDENTITY column might not be useful. With a COPY operation, the data is loaded in parallel and distributed to the node slices. To be sure that the identity values are unique, Amazon Redshift skips a number of values when creating the identity values. As a result, identity values are unique and sequential, but not consecutive, and the order might not match the order in the source files.
I've found that the first argument is the oid of the table. So, in your example (without the schema specified)
select oid from pg_class where relname = 'sometable'

sqlite3 database help in improving performance and design

I have a sqlite3 database with this schema:
CREATE TABLE [dict] (
[Entry] [CHAR(209)],
[Definition] [CHAR(924975)]);
CREATE INDEX [i_dict_entry] ON [dict] ([Entry]);
it's a kind of dictionary with 260000 records and nearly 1GB of size; I have created an index for the Entry column to improve performance;
a sample of a row's entry column is like this:
|love|lovingly|loves|loved|loving|
All the words which are separated with | are referring to the same definition;(I put all of them in one string, separated with | to prevent duplication of data in Definition column)
and this is the command that I use to retrieve the results:
SELECT * FROM dict WHERE Entry like '%|loves|%'
execution time: ~1.7s
if I use = operator instead of LIKE operator, the execution is nearly instantaneous;
SELECT * FROM dict WHERE Entry='|love|lovingly|loves|loved|loving|'
but this way I can't search for words like: love,loves...(separately I mean)
My questions:
Although I have created an index for the Entry column, is indexing really effective while we are using LIKE operator with % in it?
what about the idea that I create different rows for each part of composite Entry columns(one for love another for loves...then all will have the same definition) and then use = operator? if yes; is there anyway of referencing of data? I mean rather than repeating the same Definition for each entry, create one and all others point to it; is it possible?
thanks in advance for any tip and suggestion;
Every entry should have a separate row in the database:
CREATE TABLE Definitions (
DefinitionID INTEGER PRIMARY KEY,
Definition TEXT
);
CREATE TABLE Entries (
EntryID INTEGER PRIMARY KEY,
DefinitionID INTEGER REFERENCES Definitions(DefinitionID),
Entry TEXT
);
CREATE INDEX i_entry ON Entries(Entry);
You can then query the definition by joiing the two tables:
SELECT Definition
FROM Entries
JOIN Definitions USING (DefinitionID)
WHERE Entry = 'loves'
Also see Database normalization.

GIST Index Expression based on Geography Type Column Problems

I have question about how postgresql use indexes. I have problems with Gist Index Expression based on Geography Type Column in Postgresql with Postgis enabled database.
I have the following table:
CREATE TABLE place
(
id serial NOT NULL,
name character varying(40) NOT NULL,
location geography(Point,4326),
CONSTRAINT place_pkey PRIMARY KEY (id )
)
Then I created Gist Index Expression based on column "location"
CREATE INDEX place_buffer_5000m ON place
USING GIST (ST_BUFFER(location, 5000));
Now suppose that in table route I have column shape with Linestring object and I want to check which 5000m polygons (around the location) the line crosses.
The query below in my opinion shoud use the "place_buffer_5000m" index but does not use it.
SELECT place.name
FROM place, route
WHERE
route.id=1 AND
ST_CROSSES(route.shape::geometry, ST_BUFFER(place.location, 5000)::geometry))
Table place have about 76000 rows. Analyze and Vacuum was run on this table with recreating "place_buffer_5000m" index but the index is not used during the above query.
What is funny when I create another column in table place named "area_5000m" (geograpthy type) and update the table like:
UPDATE place SET area_5000m=ST_BUFFER(location, 5000)
And then create gist index for this column like this:
CREATE INDEX place_area_5000m ON place USING GIST (area_5000m)
Then using the query:
SELECT place.name
FROM place, route
WHERE
route.id=1 AND
ST_CROSSES(route.shape::geometry, place.area_5000m::geometry))
The index "place_area_5000m" is used.
The question is why the Index expression that is calculated based on location column is not used?
Did you try to add a cast to your "functional index"?
This could help to determine the data type.
It should work with geometry and probably also for geography, like this:
CREATE INDEX place_buffer_5000m ON place
USING GIST(ST_BUFFER(location, 5000)::geometry);
Ultimately, you want to know what routes are within 5 km of places, which is a really simple and common type of query. However, you are falling into a common trap: don't use ST_Buffer to filter! It is expensive!
Use ST_DWithin, which will use a regular GiST index (if available):
SELECT place.name
FROM place, route
WHERE route.id = 1 AND ST_DWithin(route.shape::geography, place.location, 5000);

auto-increment column in PostgreSQL on the fly?

I was wondering if it is possible to add an auto-increment integer field on the fly, i.e. without defining it in a CREATE TABLE statement?
For example, I have a statement:
SELECT 1 AS id, t.type FROM t;
and I am can I change this to
SELECT some_nextval_magic AS id, t.type FROM t;
I need to create the auto-increment field on the fly in the some_nextval_magic part because the result relation is a temporary one during the construction of a bigger SQL statement. And the value of id field is not really important as long as it is unique.
I search around here, and the answers to related questions (e.g. PostgreSQL Autoincrement) mostly involving specifying SERIAL or using nextval in CREATE TABLE. But I don't necessarily want to use CREATE TABLE or VIEW (unless I have to). There are also some discussions of generate_series(), but I am not sure whether it applies here.
-- Update --
My motivation is illustrated in this GIS.SE answer regarding the PostGIS extension. The original query was:
CREATE VIEW buffer40units AS
SELECT
g.path[1] as gid,
g.geom::geometry(Polygon, 31492) as geom
FROM
(SELECT
(ST_Dump(ST_UNION(ST_Buffer(geom, 40)))).*
FROM point
) as g;
where g.path[1] as gid is an id field "required for visualization in QGIS". I believe the only requirement is that it is integer and unique across the table. I encountered some errors when running the above query when the g.path[] array is empty.
While trying to fix the array in the above query, this thought came to me:
Since the gid value does not matter anyways, is there an auto-increment function that can be used here instead?
If you wish to have an id field that assigns a unique integer to each row in the output, then use the row_number() window function:
select
row_number() over () as id,
t.type from t;
The generated id will only be unique within each execution of the query. Multiple executions will not generate new unique values for id.