How to solve code duplication in the following PostgreSQL query? - postgresql

I have a table Inputs and a derived table Parameters
CREATE TABLE Configurables
(
id SERIAL PRIMARY KEY
);
CREATE TABLE Inputs
(
configurable integer REFERENCES Configurables( id ),
name text,
time timestamp,
PRIMARY KEY( configurable, name, time )
);
CREATE TABLE Parameters
(
configurable integer,
name text,
time timestamp,
value text,
FOREIGN KEY( configurable, name, time ) REFERENCES Inputs( configurable, name, time )
);
The following query checks whether a parameter has been changed, or is not present yet, and inserts the parameter with a new value.
QString PostgreSQLQueryEngine::saveParameter( int configurable, const QString& name, const QString& value )
{
return QString( "\
INSERT INTO Inputs( configurable, name, time ) \
WITH MyParameter AS \
( \
SELECT configurable, name, time, value \
FROM \
( \
SELECT configurable, name, time, value \
FROM Parameters \
WHERE (configurable = %1) AND (name = '%2') AND time = \
( \
SELECT max( time ) \
FROM Parameters \
WHERE (configurable = %1) AND (name = '%2') \
) \
UNION \
SELECT %1 AS configurable, '%2' AS name, '-infinity' AS time, NULL AS value \
)AS foo \
) \
SELECT %1 AS configurable, '%2' AS name, 'now' AS time FROM MyParameter \
WHERE time = (SELECT max(time) FROM MyParameter) AND (value <> '%3' OR value IS NULL); \
\
INSERT INTO Parameters( configurable, name, time, value ) \
WITH MyParameter AS \
( \
SELECT configurable, name, time, value \
FROM \
( \
SELECT configurable, name, time, value \
FROM Parameters \
WHERE (configurable = %1) AND (name = '%2') AND time = \
( \
SELECT max( time ) \
FROM Parameters \
WHERE (configurable = %1) AND (name = '%2') \
) \
UNION \
SELECT %1 AS configurable, '%2' AS name, '-infinity' AS time, NULL AS value \
)AS foo \
) \
SELECT %1 AS configurable, '%2' AS name, 'now' AS time, '%3' AS value FROM MyParameter \
WHERE time = (SELECT max(time) FROM MyParameter) AND (value <> '%3' OR value IS NULL); \
" ).arg( configurable ).arg( name ).arg( value );
}
How should I best solve the duplication of 2 the MyParameter subqueries?
Any other tips on cleaning up a query like this

You should avoid de-normalized tables. You should use a view for easy overview of Parameter table. It would be much, much easier.
You should only use de-normalized summary table if your view isn't fast enough. But any de-normalized tables should be maintained using triggers, as otherwise you risk that this tables go out of sync.
For this you can create a trigger on Parameters that will upsert into Inputs on insert. If you ever delete or update this columns on Parameters then maintaining Inputs would be complicated. You'd have to delete rows when there's no corresponding row in Parameters - you'd need to maintain counts in Inputs, to know when there's no corresponding row in Parameters. Concurrent insert/update/delete performance will suck, as any change in Parameters will have to block a row in Inputs. This is all ugly and bad - a view is much better solution.

Related

pg_dump excluded functions

I created a pg_dump with the following command -
pg_dump -U postgres -d db -n public \
--exclude-table-data 'exclude_table_*' \
--exclude-table-data 'another_set_of_tables_to_exclude*' > dump.sql
This excluded the tables I needed it to exclude, but it didn't dump any functions that were in the public schema. Why did it not dump the functions and how do I get it to dump them?
UPDATE
This is the definition of a materialized view -
CREATE MATERIALIZED VIEW public.attending AS
SELECT (split_part((ct.id)::text, '-'::text, 1))::bigint AS
attending_physician,
split_part((ct.id)::text, '-'::text, 2) AS business,
(split_part((ct.id)::text, '-'::text, 3))::bigint AS organization,
split_part((ct.id)::text, '-'::text, 4) AS county,
ct.id,
ct."qtr-0",
ct."qtr-1",
ct."qtr-2",
ct."qtr-3",
ct."qtr-4",
ct."qtr-5",
ct."qtr-6",
ct."qtr-7",
ct."qtr-8"
FROM crosstab('SELECT attending_practitioner || ''-'' || business || ''-'' || organization || ''-'' || county AS id, period, COALESCE(admits, 0)
FROM calc ORDER BY 1, 2 DESC'::text, 'SELECT year || ''q'' || quarter FROM calc_trend ORDER BY 1 DESC limit 9'::text) ct(id character varying(32), "qtr-0" integer, "qtr-1" integer, "qtr-2" integer, "qtr-3" integer, "qtr-4" integer, "qtr-5" integer, "qtr-6" integer, "qtr-7" integer, "qtr-8" integer);
It should dump functions (and all other objects) in the public schema.
The functions that are not dumped are those that are part of an extension, like the crosstab in your case. Such objects are not dumped individually, they are included in the CREATE EXTENSION.
Unfortunately extensions are not dumped with a schema dump (they belong to the database).
You should create the extensions manually on the destination database before restoring the dump:
CREATE EXTENSION crosstab;

Rebuild sphinx index fail

We have 4 sphinx indexes built using data from one table. All indexes have the same source settings except that they take different documents. We have checks like this mod(id, 4) = <index number> to distribute documents and document attributes between indexes.
Question: One of the four indexes (the same one) fails to rebuild almost every time we rebuild the indexes. Other indexes never have this issue and are rebuild correctly.
We have partitioned the documents and attribute tables. For example this is how documents table is partitioned:
PARTITION BY HASH(mod(id, 4))(
PARTITION `p0` COMMENT '',
PARTITION `p1` COMMENT '',
PARTITION `p2` COMMENT '',
PARTITION `p3` COMMENT ''
);
We think that indexer hangs after it has received all documents but before it starts receiving attributes. We can see this when we check sessions on MySQL server.
The index which fails to rebuild is using mod(id, 4) = 0 condition.
We use Sphinx 2.0.4-release on Ubuntu 64bit 12.04.02 LTS.
Data source config
source ble_job_2 : ble_job
{
sql_query = select job_notice.id as id, \
body, title, source, company, \
UNIX_TIMESTAMP(insertDate) as date, \
substring(company, 1, 1) as companyletter, \
job_notice.locationCountry as country, \
location_us_state.stateName as state, \
0 as expired, \
clusterId, \
groupCity, \
groupCityAttr, \
job_notice.cityLat as citylat, \
job_notice.cityLng as citylng, \
job_notice.zipLat as ziplat, \
job_notice.zipLng as ziplng, \
feedId, job_notice.rating as rating, \
job_notice.cityId as cityid \
from job_notice \
left join location_us_state on job_notice.locationState = location_us_state.stateCode \
where job_notice.status != 'expired' \
and mod(job_notice.id, 4) = 1
sql_attr_multi = uint attr from query; \
select noticeId, attributeId as attr from job_notice_attribute where mod(noticeId, 4) = 1
} # source ble_job_2
Index config
index ble_job_2
{
type = plain
source = ble_job_2
path = /var/lib/sphinxsearch/data/ble_job_2
docinfo = extern
mlock = 0
morphology = none
stopwords = /etc/sphinxsearch/stopwords/blockwords.txt
min_word_len = 1
charset_type = utf-8
enable_star = 0
html_strip = 0
} # index_ble_job_2
Any help would be greatly appreciated.
Warm regards.
Luckily we have fixed the issue.
We have applied the range query setup and this helped us to get index rebuild stable. I think this is because Sphinx runs several queries and each returns limited relatively small set of results. This allows MySQL to complete the query normally and sent all the results back to Sphinx.
The same issue is described on Sphinx forum Indexer Hangs & MySQL Query Sleeps.
The changes in the config for data source are
sql_query_range = SELECT MIN(id),MAX(id) FROM job_notice where mod(job_notice.id, 4) = 1
sql_range_step = 200000
sql_query = select job_notice.id as id, \
...
and mod(job_notice.id, 4) = 1 and job_notice.id >= $start AND job_notice.id <= $end
Please note that no ranges should be applied to sql_attr_multi query - Bad query in Sphinx MVA

Sphinx weird behavior

I have weird trouble creating index on sphinx 2.0.5-id64-release (r3308)
/etc/sphinx/sphinx.conf
source keywords
{
// ..
sql_query = \
SELECT keywords.lid, keywords.keyword FROM keywords_sites \
LEFT JOIN keywords ON keywords_sites.kid = keywords.kid \
GROUP BY keywords_sites.kid \
sql_attr_uint = lid
sql_field_string = keyword
// ...
}
I get warning
WARNING: attribute 'lid' not found - IGNORING
But when i change query to:
sql_query = \
SELECT 1, keywords.lid, keywords.keyword FROM keywords_sites \
LEFT JOIN keywords ON keywords_sites.kid = keywords.kid \
GROUP BY keywords_sites.kid \
I don't get any warnings. Why is this happen?
The first column from the sql_query is ALWAYS used as the document_id.
The document_id can not be defined as an attibute.
If you want to store the primary key in an attribute as well, then you need to include it twice in the query.

Sphinx + Postgres + uuid issues

I have a sql_query for a source defined like so:
sql_query = SELECT \
criteria.item_uuid, \
criteria.user_id, \
criteria.color, \
criteria.selection, \
criteria.item_id, \
home.state, \
item.* \
FROM criteria \
INNER JOIN item USING (item_uuid) \
INNER JOIN user_info home USING (user_id) \
WHERE criteria.item_uuid IS NOT NULL
And then an index:
index csearch {
source = csearch
path = /usr/local/sphinx/var/data/csearch
docinfo = extern
enable_star = 1
min_prefix_len = 0
min_infix_len = 0
morphology = stem_en
}
But when I run indexer --rotate csearch I get:
indexing index 'csearch'...
WARNING: zero/NULL document_id, skipping
The idea is that the item_uuid column is the identifier I want, based on some combination of the other columns. The item_uuid column is a uuid type in postgres: perhaps sphinx does not support this? Anyway, any ideas here would be greatly appreciated.
Read the docs, the document_id must be unique unsigned non-zero integers.
http://www.sphx.org/docs/manual-1.10.html#data-restrictions
You could try using SELECT row_number(), uuid, etc...

Howto design Tables for Navigating Hierarchical Regions with Diamond Structures

Our solution needs us to work in hierarchies of regions which are as follows.
STATE
|
DISTRICT
|
TALUK
/ \
/ \
HOBLI PANCHAYAT
\ /
\ /
\ /
VILLAGE
There are 2 ways to navigate to a village from a Taluk. Either through HOBLI OR through PANCHAYAT.
We need a PK(non-business KEY) and a SERIAL_NUMBER/ID for each STATE, DISTRICT, TALUK, HOBLI, PANCHAYAT, VILLAGE; However, each village has 8 additional attributes.
How do I design this structure in PostgreSQL 8.4 ?
My previous experience was on Oracle so I'm wondering how to navigate hierarchical structures in PostgreSQL 8.4 ? If at all, the solution should be friendly for READ/navigation speed.
================================================================
Quassnoi : Here is a sample hierarchy
KARNATAKA
|
|
TUMKUR (District)
|
|
|
KUNIGAL (Taluk)
/ \
/ \
/ \
HULIYUR DURGA(Hobli) CHOWDANAKUPPE(Panchayat)
\ /
\ /
\ /
\ /
\ /
Voddarakempapura(Village)
Ankanahalli(Village)
Chowdanakuppe(Village)
Yedehalli(Village)
NAVIGATE : For now, I will be presenting 2 separate UI screens each having separate navigable hierarchies
#1 using HOBLI and
So, for #1, I will need the entire tree starting from STATE, DISTRICT(s), TALUK(s), HOBLI(s), VILLAGE(s). Using the above tree, I will need
KARNATAKA (State)
|
|
|---TUMKUR (District)
|
|
|-----KUNIGAL(Taluk)
|
|
**|----HULIYUR DURGA(Hobli)**
|
|
|---VODDARAKEMPAPURA(Village)
|
|---Yedehalli(Village)
|
|---Ankanahalli(Village)
#2 using PANCHAYAT.
So, for #2, I will need the entire tree starting from STATE, DISTRICT(s), TALUK(s), PANCHAYAT(s), VILLAGE(s)
KARNATAKA (state)
|
|
|---TUMKUR (District)
|
|
|-----KUNIGAL(Taluk)
|
|
**|----CHOWDANAKUPPE (Panchayat)**
|
|
|---VODDARAKEMPAPURA(Village)
|
|---Ankanahalli(Village)
|
|---Chowdanakuppe(Village)
ResultSet
Should be able to create above Trees with the following details.
We need a PK(non-business KEY) and a SERIAL_NUMBER/ID for each STATE, DISTRICT, TALUK, HOBLI, PANCHAYAT, VILLAGE along with a Name and LEVEL of the relationship(similar to ORACLE'S LEVEL).
For now, getting the above ResultSet is OK. But in the future, we will need an ability to do reporting(some aggregation) at a HOBLI/PANCHAYAT/TALUK level.
=====================================
#Quassnoi #2,
Thank you very much,
"If you are planning to add some more hierarchy axes, it may be worth creating a separate table to store the hierarchies (with the axis field added) rather than adding the fields to the table."
Actually, I simplified the existing requirement so as NOT to confuse anyone. The actual hierarchy is like this
STATE
|
DISTRICT
|
TALUK
/ \
/ \
HOBLI PANCHAYAT
\ /
\ /
\ /
REVENUE VILLAGE
|
|
HABITATION
Sample data for such a hierarchy is like below
KARNATAKA
|
TUMKUR (District)
|
KUNIGAL (Taluk)
/ \
/ \
HULIYUR DURGA(Hobli) CHOWDANAKUPPE(Panchayat)
\ /
\ /
Thavarekere(Revenue Village)
/ \
Bommanahalli(habitation) Tavarekere(Habitation)
Will anything in your solution below change by the above modification ?
Also, would you recommend that I create another Table like below to store the 7 properties of the Habitats ? Is there a better way to store such info ?
CREATE TABLE habitatDetails
(
id BIGINT NOT NULL PRIMARY KEY,
serialNumber BIGINT NOT NULL,
habitatid BIGINT NOT NULL, -- we will add these details only for habitats
CONSTRAINT "habitatdetails_fk" FOREIGN KEY ("habitatid")
REFERENCES "public"."t_hierarchy"("id")
prop1 VARCHAR(128) ,
prop2 VARCHAR(128) ,
prop3 VARCHAR(128) ,
prop4 VARCHAR(128) ,
prop5 VARCHAR(128) ,
prop6 VARCHAR(128) ,
prop7 VARCHAR(128) ,
);
Thank you,
CREATE TABLE t_hierarchy
(
id BIGINT NOT NULL PRIMARY KEY,
type VARCHAR(128) NOT NULL,
name VARCHAR(128) NOT NULL,
tax_parent BIGINT,
gov_parent BIGINT,
CHECK (NOT (tax_parent IS NULL AND gov_parent IS NULL))
);
CREATE INDEX ix_hierarchy_taxparent ON t_hierarchy (tax_parent);
CREATE INDEX ix_hierarchy_govparent ON t_hierarchy (gov_parent);
INSERT
INTO t_hierarchy
VALUES (1, 'State', 'Karnataka', 0, 0),
(2, 'District', 'Tumkur', 1, 1),
(3, 'Taluk', 'Kunigal', 2, 2),
(4, 'Hobli', 'Huliyur Durga', 3, NULL),
(5, 'Panchayat', 'Chowdanakuppe', NULL, 3),
(6, 'Village', 'Voddarakempapura', 4, 5),
(7, 'Village', 'Ankanahalli', 4, 5),
(8, 'Village', 'Chowdanakuppe', 4, 5),
(9, 'Village', 'Yedehalli', 4, 5)
CREATE OR REPLACE FUNCTION fn_hierarchy_tax(level INT, start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT $1, h
FROM t_hierarchy h
WHERE h.id = $2
UNION ALL
SELECT (f).*
FROM (
SELECT fn_hierarchy_tax($1 + 1, h.id) f
FROM t_hierarchy h
WHERE h.tax_parent = $2
) q;
$$
LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION fn_hierarchy_tax(start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT fn_hierarchy_tax(1, $1);
$$
LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION fn_hierarchy_gov(level INT, start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT $1, h
FROM t_hierarchy h
WHERE h.id = $2
UNION ALL
SELECT (f).*
FROM (
SELECT fn_hierarchy_gov($1 + 1, h.id) f
FROM t_hierarchy h
WHERE h.gov_parent = $2
) q;
$$
LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION fn_hierarchy_gov(start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT fn_hierarchy_gov(1, $1);
$$
LANGUAGE 'sql';
SELECT ht.level, (ht.h).*
FROM fn_hierarchy_tax(1) ht;
SELECT ht.level, (ht.h).*
FROM fn_hierarchy_gov(1) ht;
The main idea is to keep two parents in two different fields, and use CONNECT BY emulation (rather than recursive CTE) functionality to preserve the order.
If you are planning to add some more hierarchy axes, it may be worth creating a separate table to store the hierarchies (with the axis field added) rather than adding the fields to the table.
Update:
Will anything in your solution below change by the above modification?
No, it will work alright.
By "axes" I mean hierarchy chains. Currently, you have two axes: political hierarchy (though hablis) and tax hierarchy (through panchayats). If you are planning to add some more axes (which is of course improbable), you may consider storing the hierarchies in another table and adding "axis" field to that table. Again, it's very improbable that you want to do this, I just mentioned this possibility for the other readers who may have a similar problem.
Also, would you recommend that I create another Table like below to store the 7 properties of the Habitats ? Is there a better way to store such info ?
Yes, keeping them in a separate table is a good idea.