Sphinx duplicate document id pairs found

Sphinx duplicate document id pairs found - sphinx

I have little problem duplicate index.
sql_query = SELECT id, title, file_id as table_id, "0" as description, "0" as content, "file" as type FROM language_files UNION ALL \
SELECT id, title, id as table_id, "0" as description, "0" as content, "list" as type FROM file_lists UNION ALL \
SELECT id, title, id as table_id, description, "0" as content, "tip" as type FROM tips UNION ALL \
SELECT id, title, id as table_id, "0" as description, content, "question" as type FROM questions
This code belove my sphinx config. If matching questions.id and files.id same then Sphnix return question instance. Any idea?

You should use this query:
sql_query = SELECT ((id<<3)+1) as id, title, file_id as table_id, "0" as description, "0" as content, "file" as type FROM language_files UNION ALL \
SELECT ((id<<3)+2) as id, title, id as table_id, "0" as description, "0" as content, "list" as type FROM file_lists UNION ALL \
SELECT ((id<<3)+3) as id, title, id as table_id, description, "0" as content, "tip" as type FROM tips UNION ALL \
SELECT ((id<<3)+4) as id, title, id as table_id, "0" as description, content, "question" as type FROM questions
It will generate different id for different tables.

Related

How to create a comparison chart with feature count in PostgreSQL?

I have OpenStreetMap data loaded to a PostgreSQL table. A hstore type column contains all of the tags. I would like to make a comparison chart to see how many records has name, name:en, name:bg tags for example. The result I would like to see is something like this:
I can achieve this manually using this query:
SELECT 1 AS id, '+' AS name, NULL AS "name:en", NULL AS "name:bg", count(*) FROM public.ways WHERE exist(tags,'name') UNION
SELECT 2 AS id, NULL AS name, '+' AS "name:en", NULL AS "name:bg", count(*) FROM public.ways WHERE exist(tags,'name:en') UNION
SELECT 3 AS id, NULL AS name, NULL AS "name:en", '+' AS "name:bg", count(*) FROM public.ways WHERE exist(tags,'name:bg') UNION
SELECT 4 AS id, '+' AS name, '+' AS "name:en", NULL AS "name:bg", count(*) FROM public.ways WHERE exist(tags,'name') AND exist(tags,'name:en') UNION
SELECT 5 AS id, '+' AS name, NULL AS "name:en", '+' AS "name:bg", count(*) FROM public.ways WHERE exist(tags,'name') AND exist(tags,'name:bg') UNION
SELECT 6 AS id, '+' AS name, '-' AS "name:en", NULL AS "name:bg", count(*) FROM public.ways WHERE exist(tags,'name') AND NOT exist(tags,'name:en') UNION
SELECT 7 AS id, '-' AS name, '+' AS "name:en", NULL AS "name:bg", count(*) FROM public.ways WHERE NOT exist(tags,'name') AND exist(tags,'name:en')
ORDER BY id
I consider this unnecessarily long and overcomplicated, plus I have to do it manually. I know there are some possibilities using the crosstab function, but I couldn't get it working. Based on the answer to this question I was able to create something like this:
SELECT * FROM crosstab(
'SELECT tags::text~''"name"=>".*"'' as a, tags::text~''"name:en"=>".*"'' as b, tags::text~''"name_int"=>".*"'' as c FROM public.ways')
AS ct (name boolean,"name:en" boolean, "name:bg" boolean)
GROUP BY name,"name:en","name:bg"
My problem is that I cannot seem to add a count column to this, and that it does not contain options where only one of the three condition is taken into account.
Any idea how could I solve this problem, or any direction where should I start?
Example data lines:
1 "name"=>"dm"
2 "name"=>"Ешекчи дере", "name:en"=>"Khatak Dere River"
3 "name:en"=>"Sushitsa"
4 "name"=>"Слънчева", "name:bg"=>"Слънчева", "name:en"=>"Slantcheva"

Hello look if its works for you , it is possible to generate a join from a emulated table from a select to group the values :
SELECT row_number() OVER() AS id ,COUNT(*) AS count , COALESCE(a.tags , '')||COALESCE(b.tags,'')||COALESCE(c.tags ,'') AS tagcombination,
CASE WHEN COALESCE(a.tags , '')||COALESCE(b.tags,'')||COALESCE(c.tags ,'')="name:en" THEN '+'
WHEN COALESCE(a.tags , '')||COALESCE(b.tags,'')||COALESCE(c.tags ,'') = 'name:en' THEN '+' END AS name
FROM public.ways AS a
LEFT JOIN (SELECT DISTINCT tags FROM public.ways WHERE tags = 'name' ) AS b ON a.tags = b.tags
LEFT JOIN (SELECT DISTINCT tags FROM public.ways WHERE tags IN('name:en', 'name:bg' ) ) AS c ON a.tags = c.tags
JOIN (SELECT generate_series )
GROUP BY tagcombination
--WHERE a.tags IS NOT NULL
--ORDER BY name
The name column could be translated into numbers from the tagscombination and even be ordered later if it fits better your relatory.
Need to do the test and use a predicate to filter if there is more values possibilities than you want to count in the table also.

RedShift: troubles with regexp_substr

I have this JSON at RedShift: {"skippable": true, "unit": true}
I want to get only words between "" (JSON keys). Example: "skippable", "unit" etc.
I use this QUERY:
SELECT regexp_substr(REPLACE(REPLACE(attributes, '{', ''), '}', '')::VARCHAR, '\S+:') AS regexp, JSON_PARSE(attributes) AS attributes_super
FROM source.table
WHERE prompttype != 'input'.
But I have nothing to column "regexp".

Solution is:
SELECT
n::int
INTO TEMP numbers
FROM
(SELECT
row_number() over (order by true) as n
FROM table limit 30)
CROSS JOIN
(SELECT
max(regexp_count(attributes, '[,]')) as max_num
FROM table limit 30)
WHERE
n <= max_num + 1;
WITH all_values AS (
SELECT c.id, c.attributes, c.attributes_super.prompt, c.attributes_super.description,
c.attributes_super.topic, c.attributes_super.context,
c.attributes_super.use_case, c.attributes_super.subtitle, c.attributes_super.txValues, c.attributes_super.flashmode,
c.attributes_super.skippable, c.attributes_super.videoMaxDuration, c.attributes_super.defaultCameraFacing, c.attributes_super.locationRequired
FROM (
SELECT *, JSON_PARSE(attributes) AS attributes_super
FROM table
WHERE prompttype != 'input'
) AS c
ORDER BY created DESC
limit 1
), list_of_attr AS (
SELECT *, regexp_substr(split_part(attributes,',',n), '\"[0-9a-zA-Z]+\"') as others_attrs
FROM
all_values
CROSS JOIN
numbers
WHERE
split_part(attributes,',',n) is not null
AND split_part(attributes,',',n) != ''
), combine_attrs AS (
SELECT id, attributes, prompt, description,
topic, context, use_case, subtitle, txvalues, flashmode,
skippable, videomaxduration, defaultcamerafacing, locationrequired, LISTAGG(others_attrs, ',') AS others_attrs
FROM list_of_attr
GROUP BY id, attributes, prompt, description, topic,
context, use_case, subtitle, txvalues, flashmode,
skippable, videomaxduration, defaultcamerafacing, locationrequired)

Join multiple tables of Amazon Redshift in a single one obtains error: column X is of type boolean but expression is of type character varying Hint:

I am trying to join multiple tables of Amazon Redshift in a single one.
One of the initial tables is this one:
create table order_customers(
id int,
email varchar(254),
phone varchar(50),
customer_id int,
order_id int NOT NULL,
ip text,
geoip_location varchar(1024),
logged_in boolean,
PRIMARY KEY (id),
FOREIGN KEY (order_id) REFERENCES orders (id)
);
I am using the command to insert the data into the large table:
INSERT INTO orders_large ( id, showid, created_at, status, status_enum, currency, tax_orders, shipping, discount_orders,
subtotal, total, store_id, payment_method_id, shipping_method_name, shipping_method_id, additional_information,
payment_information, locale, shipping_required_orders, payment_method_type, coupons, payment_notification_id,
recover_token, updated_at, external, shipping_tax, shipping_discount, shipping_discount_decimal,
completed_at, payment_name, shipping_service_id, app_id, fulfillment_status, date_traffic_sources,
landing_url, referral_url, referral_code, utm_campaign, utm_source, utm_term, utm_medium, utm_content,
user_agent, subscription_id_traffic_sources, email, phone, customer_id_order_customers,
ip, geoip_location, logged_in, name, surname, company, address, street_number, city, postal, country, region,
type, taxid, default_, region_format, municipality, latitude, longitude, subscription_id_addresses,
customer_id_addresses, pickup_point_id, taxid_type, sku, qty, price, product_id, weight,
product_option_property_id, discount_order_products, shipping_required_orders_products, brand,
tax_order_products, width, height, length, volume, diameter, package_format)
SELECT o.id, o.showid, o.created_at, o.status, o.status_enum, o.currency, o.tax, o.shipping, o.discount,
o.subtotal, o.total, o.store_id, o.payment_method_id, o.shipping_method_name, o.shipping_method_id, o.additional_information,
o.payment_information, o.locale, o.shipping_required, payment_method_type, coupons, payment_notification_id,
recover_token, o.updated_at, o.external, shipping_tax, o.shipping_discount, shipping_discount_decimal,
completed_at, payment_name, shipping_service_id, o.app_id, o.fulfillment_status, t.date,
t.landing_url, t.referral_url, t.referral_code, t.utm_campaign, t.utm_source, t.utm_term, t.utm_medium, t.utm_content,
t.user_agent, t.subscription_id, oc.email, oc.phone, oc.customer_id, oc.order_id, oc.ip,
oc.geoip_location, oc.logged_in, a.name, a.surname, a.company, a.address, a.street_number, a.city, a.postal,
a.country, a.region, a.type, a.taxid, a.default_, a.region_format, a.municipality, a.latitude, a.longitude, a.order_id,
a.subscription_id, a.customer_id, a.pickup_point_id, a.taxid_type, op.sku, op.qty, op.price, op.product_id,
op.order_id, op.weight, op.product_option_property_id, op.discount, op.shipping_required, op.brand,
op.tax, op.width, op.height, op.length, op.volume, op.diameter, op.package_format
FROM orders o
INNER JOIN traffic_sources t ON o.id = t.order_id
INNER JOIN order_customers oc ON o.id = oc.order_id
INNER JOIN addresses a ON o.id = a.order_id
INNER JOIN order_products op ON o.id = op.order_id;
An I obtain this error mensage:
ERROR: column "logged_in" is of type boolean but expression is of type character varying Hint: You will need to rewrite or cast the expression.
I try using DECODE(oc.logged_in, 'false', '0', 'true', '1')::varchar::bool in the oc.logged_in field, but another error message appears:
ERROR: cannot cast type character varying to boolean

The problem was the correspondence between the fields. It worked after removing the fiels with "order_id". To cast in Redshift there are 2 options:
CONVERT ( type, expression )
CAST ( expression AS type ) or expression :: type
font: https://docs.aws.amazon.com/redshift/latest/dg/r_CAST_function.html#convert-function

As the message says - column "logged_in" is of type boolean
So in your DECODE you need to compare it to boolean values, not strings. Try:
DECODE(oc.logged_in, true, 'true', 'false')
The code above works for my understanding of you issue. Below is test SQL which runs fine on Redshift.
create table oc as (select 1=1 as logged_in union all select 1=0);
select * from oc;
select DECODE(oc.logged_in, true, 'true string', 'false string') as test from oc;
I now expect that the issue is not in using oc.logged_in but rather with orders_large.logged_in and what you are putting in it. What data type is logged_in defined as in orders_large? Boolean, I assume. Which should take a boolean value just fine w/o casting.
Looking at your SQL I see that the number of elements in INSERT clause doesn't match the number of elements in the SELECT clause. This mismatch is causing your SQL to try and put a different (text) value into orders_large.logged_in. Here's a "diff" between the 2 lists (SELECT on the left / INSERT on the right):
id id
showid showid
created_at created_at
status status
status_enum status_enum
currency currency
tax_orders | tax
shipping shipping
discount_orders | discount
subtotal subtotal
total total
store_id store_id
payment_method_id payment_method_id
shipping_method_name shipping_method_name
shipping_method_id shipping_method_id
additional_information additional_information
payment_information payment_information
locale locale
shipping_required_orders | shipping_required
payment_method_type payment_method_type
coupons coupons
payment_notification_id payment_notification_id
recover_token recover_token
updated_at updated_at
external external
shipping_tax shipping_tax
shipping_discount shipping_discount
shipping_discount_decimal shipping_discount_decimal
completed_at completed_at
payment_name payment_name
shipping_service_id shipping_service_id
app_id app_id
fulfillment_status fulfillment_status
date_traffic_sources | date
landing_url landing_url
referral_url referral_url
referral_code referral_code
utm_campaign utm_campaign
utm_source utm_source
utm_term utm_term
utm_medium utm_medium
utm_content utm_content
user_agent user_agent
subscription_id_traffic_sources | subscription_id
email email
phone phone
customer_id_order_customers | customer_id
> order_id
ip ip
geoip_location geoip_location
logged_in logged_in
name name
surname surname
company company
address address
street_number street_number
city city
postal postal
country country
region region
type type
taxid taxid
default_ default_
region_format region_format
municipality municipality
latitude latitude
longitude longitude
subscription_id_addresses | order_id
customer_id_addresses | subscription_id
> customer_id
pickup_point_id pickup_point_id
taxid_type taxid_type
sku sku
qty qty
price price
product_id product_id
> order_id
weight weight
product_option_property_id product_option_property_id
discount_order_products | discount
shipping_required_orders_products | shipping_required
brand brand
tax_order_products | tax
width width
height height
length length
volume volume
diameter diameter
package_format package_format
As you can see there is an unmatched "orders_id" in the INSERT list just a couple of columns before logged_in. You need to get the column alignment fixed.

how to get the name field of a record when given the id

I have a query:
SELECT Id, recordtypeid FROM myObject__c
but instead of returning the recordtype id, i want to return the recordtype name. How do i do that?

Select Id, RecordType.Name from myObject__c

Selecting MAX from two tables in select statement

I have tables for a forum system. I am tring to get the following data to show on the forum page
Subject, Descripton, Last Posting Date (either post or comment), and username who made last post(either post or comment)
here are my tables
ForumSubject[
Id,
Subject,
Description
]
ForumPost[
id,
Subject,
Title
Body,
UserId,
Date
]
ForumComment[
id,
PostId,
UserId,
Date,
Comment
]
User[
id
Name
]
Here is what i have so far
SELECT
subject.Id,
subject.Description,
subject.Subject
FROM dbo.ForumSubject subject
How now can I get the MAX Date of either a post or comment which ever is last, and the user name for the post???
Thank you!

You can do that :
SELECT s.Id, s.Subject, s.Description, t2.LastDate
FROM dbo.FormSubjet s
INNER JOIN (
SELECT Id, Max(Date) as LastDate
FROM (
SELECT Id, Date
FROM dbo.FormPost
UNION ALL
SELECT Id, Date
FROM dbo.FormComment
) t1
GROUP BY t1.Id
) t2 ON t2.Id = s.Id

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Sphinx duplicate document id pairs found - sphinx

Related

How to create a comparison chart with feature count in PostgreSQL?

RedShift: troubles with regexp_substr

Join multiple tables of Amazon Redshift in a single one obtains error: column X is of type boolean but expression is of type character varying Hint:

how to get the name field of a record when given the id

Selecting MAX from two tables in select statement

Categories

Resources