How to use subquery In Sphinx's multi-valued attributes (MVA) for query type - postgresql

My database is PostgreSQL, and I want to use Sphinx Search Engines to index my data;
How can i use sql_attr_multi to fetch the relationship data?
The tables in postgresql, schemas is:
crm=# \d orders
Table "orders"
Column | Type | Modifiers
-----------------+-----------------------------+----------------------------------------
id | bigint | not null
trade_id | bigint | not null
item_id | bigint | not null
price | numeric(10,2) | not null
total_amount | numeric(10,2) | not null
subject | character varying(255) | not null default ''::character varying
status | smallint | not null default 0
created_at | timestamp without time zone | not null default now()
updated_at | timestamp without time zone | not null default now()
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)
"orders_trade_id_idx" btree (trade_id)
crm=# \d trades
Table "trades"
Column | Type | Modifiers
-----------------------+-----------------------------+---------------------------------------
id | bigint | not null
operator_id | bigint | not null
customer_id | bigint | not null
category_ids | bigint[] | not null
total_amount | numeric(10,2) | not null
discount_amount | numeric(10,2) | not null
created_at | timestamp without time zone | not null default now()
updated_at | timestamp without time zone | not null default now()
Indexes:
"trades_pkey" PRIMARY KEY, btree (id)
The Sphinx's config is:
source trades_src
{
type = pgsql
sql_host = 10.10.10.10
sql_user = ******
sql_pass = ******
sql_db = crm
sql_port = 5432
sql_query = \
SELECT id, operator_id, customer_id, category_ids, total_amount, discount_amount, \
date_part('epoch',created_at) AS created_at, \
date_part('epoch',updated_at) AS updated_at \
FROM public.trades;
#attributes
sql_attr_bigint = operator_id
sql_attr_bigint = customer_id
sql_attr_float = total_amount
sql_attr_float = discount_amount
sql_attr_multi = bigint category_ids from field category_ids
#sql_attr_multi = bigint order_ids from query; SELECT id FROM orders
#how can i add where condition is the query for orders? eg. WHERE trade_id = ?
sql_attr_timestamp = created_at
sql_attr_timestamp = updated_at
}
I used MVA (multi-valued attributes) on category_ids field, and it is the ARRAY type in Postgresql.
But I donot know How to define MVA on order_ids. It will be through subquery?

Copied from sphinx forum....
sql_attr_multi = bigint order_ids from query; SELECT trade_id,id FROM orders ORDER BY trade_id
The first column of the query is the sphinx 'document_id' (ie the id in main sql_query)
The second column is the value to insert into the MVA array for that document.
(The ORDER BY might not strictly be needed, but sphinx is much quicker at processing the data if ordered by document_id IIRC)

Related

Optimizing Postgres Count For Aggregated Select

I have a query that is intended to retrieve the counts of each grouped product like so
SELECT
product_name,
product_color,
(array_agg("product_distributor"))[1] AS "product_distributor",
(array_agg("product_release"))[1] AS "product_release",
COUNT(*) AS "count"
FROM
product
WHERE
product.id IN (
SELECT
id
FROM
product
WHERE
(product_name ilike "%red%"
OR product_color ilike "%red%")
AND product_type = 1)
GROUP BY
product_name, product_color
LIMIT
1000
OFFSET
0
This query is run on the following table
Column | Type | Collation | Nullable | Default
---------------------+--------------------------+-----------+----------+---------
product_type | integer | | not null |
id | integer | | not null |
product_name | citext | | not null |
product_color | character varying(255) | | |
product_distributor | integer | | |
product_release | timestamp with time zone | | |
created_at | timestamp with time zone | | not null |
updated_at | timestamp with time zone | | not null |
Indexes:
"product_pkey" PRIMARY KEY, btree (id)
"product_distributer_index" btree (product_distributor)
"product_product_type_name_color" UNIQUE, btree (product_type, name, color)
"product_product_type_index" btree (product_type)
"product_name_color_index" btree (name, color)
Foreign-key constraints:
"product_product_type_fkey" FOREIGN KEY (product_type) REFERENCES product_type(id) ON UPDATE CASCADE ON DELETE CASCADE
"product_product_distributor_id" FOREIGN KEY (product_distributor) REFERENCES product_distributor(id)
How can I improve the performance of this query, specifically the COUNT(*) portion, which when removed improves the query but is requisite?
You may try using an INNER JOIN in place of a WHERE ... IN clause.
WITH selected_products AS (
SELECT id
FROM product
WHERE (product_name ilike "%red%" OR product_color ilike "%red%")
AND product_type = 1
)
SELECT product_name,
product_color,
(ARRAY_AGG("product_distributor"))[1] AS "product_distributor",
(ARRAY_AGG("product_release"))[1] AS "product_release",
COUNT(*) AS "count"
FROM product p
INNER JOIN selected_products sp
ON p.id = sp.id
GROUP BY product_name,
product_color
LIMIT 1000
OFFSET 0
Then create an index on the "product.id" field as follows:
CREATE INDEX product_ids_idx ON product USING HASH (id);

How to get the standard price field value in odoo?

I have problem.
The standard_price is computed field and not stored product_template, product_product table. How to get the standard price field value in odoo xlsx report?
The error is:
Record does not exist or has been deleted.: None
Help, I need any solution and idea?
Check the cost field of the product_price_history table. I think that is what you are looking for. This table is related with the product_product table through the field product_id:
base=# \dS product_price_history
Table "public.product_price_history"
Column | Type | Modifiers
-------------+-----------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('product_price_history_id_seq'::regclass)
create_uid | integer |
product_id | integer | not null
company_id | integer | not null
datetime | timestamp without time zone |
cost | numeric |
write_date | timestamp without time zone |
create_date | timestamp without time zone |
write_uid | integer |
Indexes:
"product_price_history_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"product_price_history_company_id_fkey" FOREIGN KEY (company_id) REFERENCES res_company(id) ON DELETE SET NULL
"product_price_history_create_uid_fkey" FOREIGN KEY (create_uid) REFERENCES res_users(id) ON DELETE SET NULL
"product_price_history_product_id_fkey" FOREIGN KEY (product_id) REFERENCES product_product(id) ON DELETE CASCADE
"product_price_history_write_uid_fkey" FOREIGN KEY (write_uid) REFERENCES res_users(id) ON DELETE SET NULL

Data Type for select in plpgsql function and access its fields

I have the following tables in a Postgres 9.5 database:
product
Column | Type | Modifiers
----------------+-----------------------------+-----------------------------------------------------
id | integer | not null default nextval('product_id_seq'::regclass)
name | character varying(100) |
number_of_items | integer |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone | default now()
total_number | integer |
provider_id | integer |
Indexes:
"pk_product" PRIMARY KEY, btree (id)
Foreign-key constraints:
"fk_product_provider" FOREIGN KEY (provider_id) REFERENCES provider(id)
And we also have
provider
Column | Typ | Modifiers
-------------+------------------------+------------------------------
id | integer | not null default nextval('property_id_seq'::regclass)
name | text |
description | text |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone | default now()
Indexes:
"pk_provider" PRIMARY KEY, btree (id)
I am implelemtnig a plpgsql function which is supposed to find some specific products of a provider and loop through them
products = select u_id, number_of_items from product
where provider_id = p_id and total_number > limit;
loop
//here I need to loop through the products
end loop;
Question
what kind of data type should I declare for the products variables in order to store queried products into it? and also how should I have later on access to its columns like id or number_of_items?
In PostgreSQL, creating table also defines a composite data type with the same name as the table.
You can either use a variable of that type:
DECLARE
p product;
BEGIN
FOR p IN SELECT product FROM product WHERE ...
LOOP
[do something with "p.id" and "p.val"]
END LOOP;
END;
Or you can use several variables for the individual fields you need (probably better):
DECLARE
v_id integer;
v_val text;
BEGIN
FOR v_id, v_val IN SELECT id, val FROM product WHERE ...
LOOP
[do something with "v_id" and "v_val"]
END LOOP;
END;

How do I update 1.3 billion rows in this table more efficiently?

I have 1.3 billion rows in a PostgreSQL table sku_comparison that looks like this:
id1 (INTEGER) | id2 (INTEGER) | (10 SMALLINT columns) | length1 (SMALLINT)... |
... length2 (SMALLINT) | length_difference (SMALLINT)
The id1 and id2 columns are referenced in a table called sku, which contains about 300,000 rows, and have an associated varchar(25) value in each row from a column, code.
There is a btree index built on id1 and id2, and a compound index of id1 and id2 in sku_comparison. There is a btree index on the id column of sku, as well.
My goal is to update the length1 and length2 columns with the lengths of the corresponding code column from the sku table. However, I ran the following code for over 20 hours, and it did not complete the update:
UPDATE sku_comparison SET length1=length(sku.code) FROM sku
WHERE sku_comparison.id1=sku.id;
All of the data is stored on a single hard disk on a local computer, and the processor is fairly modern. Constructing this table, which required much more complicated string comparisons in Python, only took about 30 hours or so, so I am not sure why something like this would take as long.
edit: here are formatted table definitions:
Table "public.sku"
Column | Type | Modifiers
------------+-----------------------+--------------------------------------------------
id | integer | not null default nextval('sku_id_seq'::regclass)
sku | character varying(25) |
pattern | character varying(25) |
pattern_an | character varying(25) |
firsttwo | character(2) | default ' '::bpchar
reference | character varying(25) |
Indexes:
"sku_pkey" PRIMARY KEY, btree (id)
"sku_sku_idx" UNIQUE, btree (sku)
"sku_firstwo_idx" btree (firsttwo)
Referenced by:
TABLE "sku_comparison" CONSTRAINT "sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
TABLE "sku_comparison" CONSTRAINT "sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)
Table "public.sku_comparison"
Column | Type | Modifiers
---------------------------+----------+-------------------------
id1 | integer | not null
id2 | integer | not null
consec_charmatch | smallint |
consec_groupmatch | smallint |
consec_fieldtypematch | smallint |
consec_groupmatch_an | smallint |
consec_fieldtypematch_an | smallint |
general_charmatch | smallint |
general_groupmatch | smallint |
general_fieldtypematch | smallint |
general_groupmatch_an | smallint |
general_fieldtypematch_an | smallint |
length1 | smallint | default 0
length2 | smallint | default 0
length_difference | smallint | default '-999'::integer
Indexes:
"sku_comparison_pkey" PRIMARY KEY, btree (id1, id2)
"ssd_id1_idx" btree (id1)
"ssd_id2_idx" btree (id2)
Foreign-key constraints:
"sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
"sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)
Would you consider using an anonymous code block?
using pseudo code...
FOREACH 'SELECT ski.id,
sku.code,
length(sku.code)
FROM sku
INTO v_skuid, v_skucode, v_skulength'
DO
UPDATE sku_comparison
SET sku_comparison.length1 = v_skulength
WHERE sku_comparison.id1=v_skuid;
END DO
END FOREACH
This would break the whole thing into smaller transactions and you will not be evaluating the length of sku.code every time.

Peewee Python Default is not reflecting in DateTimeField

Not able to set default timestamp in spite of using DateTimeField(default=datetime.datetime.now)
The column is set to not null but no default value is set
I have my model
import datetime
database = PostgresqlDatabase(dbname,username,password,host)
class BaseModel(Model):
class Meta:
database = database
class UserInfo(BaseModel):
id = PrimaryKeyField()
username = CharField(unique=True)
password = CharField()
email = CharField(null=True)
created_date = DateTimeField(default=datetime.datetime.now)
when I create table using this model and below code
database.connect()
database.create_tables([UserInfo])
I am getting below table
Table "public.userinfo"
Column | Type | Modifiers
--------------+-----------------------------+------------------------- ------------------------------
id | integer | not null default nextval('userinfo_id_seq'::regclass)
username | character varying(255) | not null
password | character varying(255) | not null
email | character varying(255) |
created_date | timestamp without time zone | not null
Indexes:
"userinfo_pkey" PRIMARY KEY, btree (id)
"userinfo_username" UNIQUE, btree (username)
Here in table created date doesn't set to any default
When using the default parameter, the values are set by Peewee rather than being a part of the actual table and column definition
try created_date = DateTimeField(constraints=[SQL('DEFAULT CURRENT_TIMESTAMP')])