Optimizing PostgreSQL query with slow ORDER BY - postgresql

I'm trying to get this query to run faster. It seems like sorting by the quality field is what really slows it down (the table has about 5 million rows) - maybe there is an index I can use for that?
Query:
SELECT "connectr_twitterpassage"."id", "connectr_twitterpassage"."third_party_id", "connectr_twitterpassage"."third_party_created", "connectr_twitterpassage"."source", "connectr_twitterpassage"."text", "connectr_twitterpassage"."author", "connectr_twitterpassage"."raw_data", "connectr_twitterpassage"."retweet_count", "connectr_twitterpassage"."favorited_count", "connectr_twitterpassage"."lang", "connectr_twitterpassage"."location", "connectr_twitterpassage"."author_followers_count", "connectr_twitterpassage"."is_retweet", "connectr_twitterpassage"."url", "connectr_twitterpassage"."author_fk_id", "connectr_twitterpassage"."quality", "connectr_twitterpassage"."is_top_tweet", "connectr_twitterpassage"."created", "connectr_twitterpassage"."modified"
FROM "connectr_twitterpassage"
INNER JOIN "connectr_twitterpassage_words" ON ("connectr_twitterpassage"."id" = "connectr_twitterpassage_words"."twitterpassage_id")
WHERE "connectr_twitterpassage_words"."word_id" = 18974807
ORDER BY "connectr_twitterpassage"."quality"
DESC LIMIT 100
Here is the EXPLAIN ANALYZE: http://explain.depesz.com/s/7zb
And the table definitions:
\d connectr_twitterpassage
Column | Type | Modifiers
------------------------+--------------------------+----------------------------------------------------------------------
id
| integer | not null default nextval('connectr_twitterpassage_id_seq'::regclass)
third_party_id | character varying(10000) | not null
source | character varying(10000) | not null
text | character varying(10000) | not null
author | character varying(10000) | not null
raw_data | character varying(10000) | not null
created | timestamp with time zone | not null
modified | timestamp with time zone | not null
third_party_created | timestamp with time zone |
retweet_count | integer | not null
favorited_count | integer | not null
lang | character varying(10000) | not null
location | character varying(10000) | not null
author_followers_count | integer | not null
is_retweet | boolean | not null
url | character varying(10000) | not null
author_fk_id | integer |
quality | bigint |
is_top_tweet | boolean | not null
Indexes:
"connectr_passage_pkey" PRIMARY KEY, btree (id)
"connectr_twitterpassage_third_party_id_uniq" UNIQUE CONSTRAINT, btree (third_party_id)
"connectr_passage_author_followers_count" btree (author_followers_count)
"connectr_passage_favorited_count" btree (favorited_count)
"connectr_passage_retweet_count" btree (retweet_count)
"connectr_passage_source" btree (source)
"connectr_passage_source_like" btree (source varchar_pattern_ops)
"connectr_passage_third_party_id" btree (third_party_id)
"connectr_passage_third_party_id_like" btree (third_party_id varchar_pattern_ops)
"connectr_twitterpassage_author_fk_id" btree (author_fk_id)
"connectr_twitterpassage_created" btree (created)
"connectr_twitterpassage_is_top_tweet" btree (is_top_tweet)
"connectr_twitterpassage_quality" btree (quality)
"connectr_twitterpassage_third_party_created" btree (third_party_created)
"id_to_quality_sorted" btree (id, quality DESC NULLS LAST)
Foreign-key constraints:
"author_fk_id_refs_id_074720a5" FOREIGN KEY (author_fk_id) REFERENCES connectr_twitteruser(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "connectr_passageviewevent" CONSTRAINT "passage_id_refs_id_892b36a6" FOREIGN KEY (passage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
TABLE "connectr_connection" CONSTRAINT "twitter_from_id_refs_id_8adbab24" FOREIGN KEY (twitter_from_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
TABLE "connectr_connection" CONSTRAINT "twitter_to_id_refs_id_8adbab24" FOREIGN KEY (twitter_to_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
TABLE "connectr_twitterpassage_words" CONSTRAINT "twitterpassage_id_refs_id_720f772f" FOREIGN KEY (twitterpassage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
connectr=# \d connectr_twitterpassage_words
Table "public.connectr_twitterpassage_words"
Column | Type | Modifiers
-------------------+---------+----------------------------------------------------------------------------
id | integer | not null default nextval('connectr_twitterpassage_words_id_seq'::regclass)
twitterpassage_id | integer | not null
word_id | integer | not null
Indexes:
"connectr_twitterpassage_words_pkey" PRIMARY KEY, btree (id)
"connectr_twitterpassage_twitterpassage_id_613c80271f09fba8_uniq" UNIQUE CONSTRAINT, btree (twitterpassage_id, word_id)
"connectr_twitterpassage_words_twitterpassage_id" btree (twitterpassage_id)
"connectr_twitterpassage_words_word_id" btree (word_id)
"word_to_twitterpassage_id" btree (word_id, twitterpassage_id)
Foreign-key constraints:
"twitterpassage_id_refs_id_720f772f" FOREIGN KEY (twitterpassage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
"word_id_refs_id_64f49629" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
connectr=# \d connectr_word
Table "public.connectr_word"
Column | Type | Modifiers
---------------------+--------------------------+------------------------------------------------------------
id | integer | not null default nextval('connectr_word_id_seq'::regclass)
word | character varying(10000) | not null
created | timestamp with time zone | not null
modified | timestamp with time zone | not null
frequency | double precision |
is_username | boolean | not null
is_hashtag | boolean | not null
cloud_eligible | boolean | not null
passage_count | integer |
avg_quality | double precision |
last_twitter_search | timestamp with time zone |
cloud_approved | boolean | not null
display_word | character varying(10000) | not null
is_trend | boolean | not null
Indexes:
"connectr_word_pkey" PRIMARY KEY, btree (id)
"connectr_word_word_uniq" UNIQUE CONSTRAINT, btree (word)
"connectr_word_avg_quality" btree (avg_quality)
"connectr_word_cloud_eligible" btree (cloud_eligible)
"connectr_word_last_twitter_search" btree (last_twitter_search)
"connectr_word_passage_count" btree (passage_count)
"connectr_word_word" btree (word)
Referenced by:
TABLE "connectr_passageviewevent" CONSTRAINT "source_word_id_refs_id_178d46eb" FOREIGN KEY (source_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
TABLE "connectr_wordmatchrewardevent" CONSTRAINT "tapped_word_id_refs_id_c2ffb369" FOREIGN KEY (tapped_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
TABLE "connectr_connection" CONSTRAINT "word_id_refs_id_00cccde2" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
TABLE "connectr_twitterpassage_words" CONSTRAINT "word_id_refs_id_64f49629" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED

Looking at the explain output, the sort is taking very little of the time. It is gathering the data it needs to sort that takes the time.
You must be spending a bit of time hitting the disk. If you could get your data better cached, that should speed it up a lot using the same query.
Otherwise, your best bet may be to denormalize the data by adding the quality field to the connectr_twitterpassage_words table and having an index on (word_id, quality,...)

Related

postgresl: search no result on id

DB: postgresql 9.5
It returns result when seraching with field name, but it returns no result when searching with id.
somedb=# select id from users where name='abc';
id
------
123
(1 row)
somedb=# select id from users where id=123;
id
----
(0 rows)
bigquant_jupyterhub=# \d users;
Table "public.users"
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
name | character varying(1023) |
_server_id | integer |
admin | boolean |
last_activity | timestamp without time zone |
cookie_id | character varying(1023) |
state | text |
auth_state | text |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"users__server_id_fkey" FOREIGN KEY (_server_id) REFERENCES servers(id)
Referenced by:
TABLE "api_tokens" CONSTRAINT "api_tokens_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)
TABLE "user_group_map" CONSTRAINT "user_group_map_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)
Thanks for all your help.

How do I update 1.3 billion rows in this table more efficiently?

I have 1.3 billion rows in a PostgreSQL table sku_comparison that looks like this:
id1 (INTEGER) | id2 (INTEGER) | (10 SMALLINT columns) | length1 (SMALLINT)... |
... length2 (SMALLINT) | length_difference (SMALLINT)
The id1 and id2 columns are referenced in a table called sku, which contains about 300,000 rows, and have an associated varchar(25) value in each row from a column, code.
There is a btree index built on id1 and id2, and a compound index of id1 and id2 in sku_comparison. There is a btree index on the id column of sku, as well.
My goal is to update the length1 and length2 columns with the lengths of the corresponding code column from the sku table. However, I ran the following code for over 20 hours, and it did not complete the update:
UPDATE sku_comparison SET length1=length(sku.code) FROM sku
WHERE sku_comparison.id1=sku.id;
All of the data is stored on a single hard disk on a local computer, and the processor is fairly modern. Constructing this table, which required much more complicated string comparisons in Python, only took about 30 hours or so, so I am not sure why something like this would take as long.
edit: here are formatted table definitions:
Table "public.sku"
Column | Type | Modifiers
------------+-----------------------+--------------------------------------------------
id | integer | not null default nextval('sku_id_seq'::regclass)
sku | character varying(25) |
pattern | character varying(25) |
pattern_an | character varying(25) |
firsttwo | character(2) | default ' '::bpchar
reference | character varying(25) |
Indexes:
"sku_pkey" PRIMARY KEY, btree (id)
"sku_sku_idx" UNIQUE, btree (sku)
"sku_firstwo_idx" btree (firsttwo)
Referenced by:
TABLE "sku_comparison" CONSTRAINT "sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
TABLE "sku_comparison" CONSTRAINT "sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)
Table "public.sku_comparison"
Column | Type | Modifiers
---------------------------+----------+-------------------------
id1 | integer | not null
id2 | integer | not null
consec_charmatch | smallint |
consec_groupmatch | smallint |
consec_fieldtypematch | smallint |
consec_groupmatch_an | smallint |
consec_fieldtypematch_an | smallint |
general_charmatch | smallint |
general_groupmatch | smallint |
general_fieldtypematch | smallint |
general_groupmatch_an | smallint |
general_fieldtypematch_an | smallint |
length1 | smallint | default 0
length2 | smallint | default 0
length_difference | smallint | default '-999'::integer
Indexes:
"sku_comparison_pkey" PRIMARY KEY, btree (id1, id2)
"ssd_id1_idx" btree (id1)
"ssd_id2_idx" btree (id2)
Foreign-key constraints:
"sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
"sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)
Would you consider using an anonymous code block?
using pseudo code...
FOREACH 'SELECT ski.id,
sku.code,
length(sku.code)
FROM sku
INTO v_skuid, v_skucode, v_skulength'
DO
UPDATE sku_comparison
SET sku_comparison.length1 = v_skulength
WHERE sku_comparison.id1=v_skuid;
END DO
END FOREACH
This would break the whole thing into smaller transactions and you will not be evaluating the length of sku.code every time.

In Postgres, how can I delete a row from table B when a row from table A is deleted?

I’m using Postgres 9.5.0. I have the following table
myproject=> \d my_objects;
Table "public.my_objects"
Column | Type | Modifiers
---------------------+-----------------------------+-------------------------------------
name | character varying |
day | date |
distance | double precision |
user_id | integer |
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
distance_unit_id | integer |
import_completed | boolean |
id | character varying | not null default uuid_generate_v4()
linked_my_object_time_id | character varying |
web_crawler_id | integer |
address_id | character varying |
Indexes:
"my_objects_pkey" PRIMARY KEY, btree (id)
"index_my_objects_on_user_id_and_day_and_name" UNIQUE, btree (user_id, day, name)
"index_my_objects_on_user_id" btree (user_id)
"index_my_objects_on_web_crawler_id" btree (web_crawler_id)
Foreign-key constraints:
"fk_rails_5287d445c0" FOREIGN KEY (address_id) REFERENCES addresses(id) ON DELETE CASCADE
"fk_rails_970b2325bf" FOREIGN KEY (distance_unit_id) REFERENCES distance_units(id)
"fk_rails_dda3297b57" FOREIGN KEY (linked_my_object_time_id) REFERENCES my_object_times(id) ON DELETE CASCADE
"fk_rails_ebd32625bc" FOREIGN KEY (web_crawler_id) REFERENCES web_crawlers(id)
"fk_rails_fa07601dff" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
Right now, each my_object has an address field. What I would like is when I delete the my_object, the corresponding address entry be deleted as well. Without moving the address_id column out of the my_objects table, is it possible to set something up such that when I delete a row from the my_objects table, any corresponding address data is deleted as well? Obviously, the foreign key I have set up will not get the job done.
You can do this with a trigger:
CREATE OR REPLACE FUNCTION remove_address() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
DELETE FROM public.addresses WHERE id = OLD.address_id;
RETURN OLD;
END;$$;
CREATE TRIGGER remove_address
AFTER DELETE ON public.my_objects FOR EACH ROW
EXECUTE PROCEDURE remove_address()

Is trigger or alias returning data in Postgres?

Postgres 9.2.2, Django 1.6.5, Python 2.7.5
Column | Type | Modifiers
---------------+------------------------+------------------------------------
id | integer | not null default nextval('employee_id_seq'::regclass)
first_name | character varying(75) | not null
surname | character varying(75) | not null
mname | character varying(75) |
date_of_birth | date |
staff_id | character varying(20) |
img1 | character varying(100) |
slug | character varying(50) |
created | date | not null
modified | date | not null
ppsn | character varying(20) |
Indexes:
"employee_pkey" PRIMARY KEY, btree (id)
"employee_slug_key" UNIQUE, btree (slug)
"employee_slug_like" btree (slug varchar_pattern_ops)
Referenced by:
TABLE "employeedynamic" CONSTRAINT "employee_id_refs_id_71e22023" FOREIGN KEY (employee_id) REFERENCES employee(id) DEFERRABLE INITIAL
LY DEFERRED
TABLE "drvliclicence" CONSTRAINT "employee_id_refs_id_afc65012" FOREIGN KEY (employee_id) REFERENCES employee(id) DEFERRABLE INITIALLY
DEFERRED
TABLE "coursedetail_attendance" CONSTRAINT "employee_id_refs_id_c8466b5f" FOREIGN KEY (employee_id) REFERENCES employee(id) DEFERRABLE
INITIALLY DEFERRED
.
=# select a.name from employee a where a.id = 366;
(366,Tommy,Gibbons,"",1800-08-21,1002180,images/GibbonsT2010_1.
(1 row)
Problem: How is a.name returning these details?
I've tried looking up aliases and triggers but I cannot figure this.
Check trigger:
=# \dft
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------+------------------+---------------------+------
(0 rows)
or
=# SELECT tgname FROM pg_trigger, pg_class WHERE tgrelid=pg_class.oid and relname = 'employee';
RI_ConstraintTrigger_101722
RI_ConstraintTrigger_101723
RI_ConstraintTrigger_101732
RI_ConstraintTrigger_101733
RI_ConstraintTrigger_101737
RI_ConstraintTrigger_101738
(6 rows)
Any idea how I can find what is causing a.name to return data?

Why evolution in Play Framework doesn't work?

I'm using Play 2.3 and trying to generate relational database by evolution for PostgreSQL 9.4.
I have following statements in my conf/evolutions/default/1.sql script:
ALTER TABLE ONLY round
ADD CONSTRAINT round_event_id_fkey FOREIGN KEY (event_id) REFERENCES event(id);
ALTER TABLE ONLY round
ADD CONSTRAINT round_event_id UNIQUE (event_id);
Following is my event table description:
Table "public.event"
Column | Type | Modifiers
-------------------------------+-----------------------------+---------------------------------------------------- id | integer | not null default nextval('event_id_seq'::regclass) related_event_hash | character varying(45) | start_time | timestamp without time zone | end_time | timestamp without time zone | name | character varying(45) | status | character varying(45) | not null owner_id | bigint | not null venue_id | bigint | participation_hash | character varying(45) | number_of_participants | integer | number_of_backup_participants | integer | created | timestamp without time zone | not null updated | timestamp without time zone | not null Indexes:
"event_pkey" PRIMARY KEY, btree (id)
"index_event_name" btree (name)
"index_event_status" btree (status)
"index_start_time" btree (start_time) Foreign-key constraints:
"event_owner_id_fkey" FOREIGN KEY (owner_id) REFERENCES person(id)
"event_venue_id_fkey" FOREIGN KEY (venue_id) REFERENCES venue(id) Referenced by:
TABLE "anonymous_person" CONSTRAINT "anonymous_person_event_id_fkey" FOREIGN KEY (event_id) REFERENCES event(id)
TABLE "mix_game" CONSTRAINT "mix_game_event_id_fkey" FOREIGN KEY (event_id) REFERENCES event(id)
TABLE "participant" CONSTRAINT "participant_event_id_fkey" FOREIGN KEY (event_id) REFERENCES event(id)
When I start the application in a browser I get this error:
Database 'default' is in an inconsistent state!
While trying to run this SQL script, we got the following error:
ERROR: there is no unique constraint matching given keys for referenced table "round" [ERROR:0, SQLSTATE:42830]
What could be wrong? How to fix this error and add foreign key constraints?
Note that it generates database round as follows without foreign key constraints.
Table "public.round"
Column | Type | Modifiers
------------------+-----------------------+----------------------------------------------------
id | integer | not null default nextval('round_id_seq'::regclass)
round_no | integer | not null
event_id | bigint | not null
state | character varying(20) | not null
team_composition | character(12) | not null
result | character varying(20) |
description | character varying(45) |
play_time | integer | not null
shift_time | integer |
change_time | integer |
Indexes:
"round_pkey" PRIMARY KEY, btree (id)
"round_event_id" UNIQUE CONSTRAINT, btree (event_id)
Take a look at the documentation.
As you see you have to delimit the both Ups and Downs section by using
comments in your SQL script.
Also, do not edit the 1.sql file because it is updated by the evolution mechanism. Start your own evolutions at 2.sql.