Sphinx 3.1.1 not returning correct snippet - sphinx

I have a Sphinx 3.1.1. installation where I want to show snippets for the found results using DocStore. However, the snippet is just returning the beginning of the content of the document.
The query I use:
SELECT id, SNIPPET(content, QUERY()) AS snippet FROM test_index WHERE MATCH('test');
This returns me results like:
+--+--------------------------------------------------------+
|id |snippet |
+-----------------------------------------------------------+
|1 |this is a test document to test Sphinx 3.1.1 ... |
+-----------------------------------------------------------+
|2 |another test document to test Sphinx 3.1.1. ... |
+--+--------------------------------------------------------+
Please note that the returned snippets have no highlighting b-tags around the search word test and the returned snippet is the starting string of the document. If I for instance search for test2, the results are the same (the documents contain test2 further in the content, but the snippet only shows the first x words from the content without any highlighting?)
The configuration of my index is:
index test_index
{
type = rt
path = /mtn/data001/test_index
rt_field = content
stored_fields = content
}
What am I doing wrong and why does my snippet not contain highlight tags?

Hmm, I just tried copy/pasting your test_index to a config file, and starting a sphinx3 instance...
barry#tea:~/sphinx-3.1.1$ bin/searchd --config test.conf
Sphinx 3.1.1 (commit 612d99f)
Copyright (c) 2001-2018, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'test.conf'...
listening on all interfaces, port=10312
listening on all interfaces, port=10306
precaching index 'test_index'
precached 1 indexes in 0.001 sec
barry#tea:~/sphinx-3.1.1$ mysql --protocol=tcp -P10306 --prompt='sphinxQL3>' --default-character-set=utf8
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 3.1.1 (commit 612d99f)
sphinxQL3>SELECT id, DOCUMENT() as doc, DOCUMENT({content}) FROM test_index WHERE MATCH('test');
Empty set (0.00 sec)
sphinxQL3>insert into test_index values (1,'this is a test');
Query OK, 1 row affected (0.00 sec)
sphinxQL3>insert into test_index values (2,'this is a test more');
Query OK, 1 row affected (0.00 sec)
sphinxQL3>SELECT id, SNIPPET(DOCUMENT({content}), QUERY()) AS snippet FROM test_index WHERE MATCH('test');
+------+----------------------------+
| id | snippet |
+------+----------------------------+
| 1 | this is a <b>test</b> |
| 2 | this is a <b>test</b> more |
+------+----------------------------+
2 rows in set (0.00 sec)
sphinxQL3>SELECT id, SNIPPET(content, QUERY()) AS snippet FROM test_index WHERE MATCH('test');
+------+----------------------------+
| id | snippet |
+------+----------------------------+
| 1 | this is a <b>test</b> |
| 2 | this is a <b>test</b> more |
+------+----------------------------+
2 rows in set (0.00 sec)
sphinxQL3>SELECT id, SNIPPET(content, QUERY()) AS snippet FROM test_index WHERE MATCH('more');
+------+----------------------------+
| id | snippet |
+------+----------------------------+
| 2 | this is a test <b>more</b> |
+------+----------------------------+
1 row in set (0.00 sec)
sphinxQL3>insert into test_index values (3,'this is a test document to test Sphinx 3.1.1 Technically, Sphinx is a standalone software package provides fast and relevant full-text search functionality to client applications. It was specially designed to integrate well with SQL databases storing the data, and to be easily accessed by scripting languages. However, Sphinx does not depend on nor require any specific database to function. ');
Query OK, 1 row affected (0.00 sec)
sphinxQL3>SELECT id, SNIPPET(content, QUERY()) AS snippet FROM test_index WHERE MATCH('test'); +------+-------------------------------------------------------------------------------+
| id | snippet |
+------+-------------------------------------------------------------------------------+
| 1 | this is a <b>test</b> |
| 2 | this is a <b>test</b> more |
| 3 | this is a <b>test</b> document to <b>test</b> Sphinx 3.1.1 Technically, ... |
+------+-------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
sphinxQL3>SELECT id, SNIPPET(content, QUERY()) AS snippet FROM test_index WHERE MATCH('scripting');
+------+------------------------------------------------------------------------------------------+
| id | snippet |
+------+------------------------------------------------------------------------------------------+
| 3 | ... to be easily accessed by <b>scripting</b> languages. However, Sphinx does not ... |
+------+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
So it seems that 3.1.1 does work as such, but something odd is going on with your configuration.
Maybe try deleting the test_index files (while searchd is shutdown) and trying again. Maybe you've somehow corrupted your index files (eg changed the config since creating it) - this is quite easy to do during experimentation

Related

What do the various columns in schema_version_history table of flyway represent?

I'm new to flyway & have been going through the documentation of flyway but couldn't find a doc which describes what each column in schema_version_history (or whatever you would have configured to name the flyway table) means. I'm specifically intrigued by the column named "type". So far the possible values for this column that I've observed in some legacy project at work are SQL & DELETE.
But I have no clue what this means in terms of flyway migrations.
Below are some sample rows from the table. Note that for installed rank 54 & 56, same migration file is present with same checksum but one has type SQL and another has DELETE.
-[ RECORD 53 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 54
version | 2022.11.18.11.35.49.65
description | add column seqence in attribute table
type | SQL
script | V2022_11_18_11_35_49_65__add_column_seqence_in_attribute_table.sql
checksum | 408921517
installed_by | postgres
installed_on | 2022-11-18 12:04:47.652058
execution_time | 345
success | t
-[ RECORD 54 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 55
version | 2022.11.15.14.17.44.36
description | update address column in attribute table
type | DELETE
script | V2022_11_15_14_17_44_36__update_address_column_in_attribute_table.sql
checksum | 1347853326
installed_by | postgres
installed_on | 2022-11-18 14:52:09.265902
execution_time | 0
success | t
-[ RECORD 55 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 56
version | 2022.11.18.11.35.49.65
description | add column seqence in attribute table
type | DELETE
script | V2022_11_18_11_35_49_65__add_column_seqence_in_attribute_table.sql
checksum | 408921517
installed_by | postgres
installed_on | 2022-11-18 14:52:09.265902
execution_time | 0
success | t
-[ RECORD 56 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 58
version | 2022.11.18.11.35.49.65
description | add column seqence in attribute table
type | SQL
script | V2022_11_18_11_35_49_65__add_column_seqence_in_attribute_table.sql
checksum | 408921517
installed_by | postgres
installed_on | 2022-12-09 14:01:59.352589
execution_time | 174
success | t
Great question. This is as close as I got to documentation on that table:
https://www.red-gate.com/hub/product-learning/flyway/exploring-the-flyway-schema-history-table
That article doesn't really describe the type column well at all, suggesting it only has two possible values and I've seen at least three; DELETE, SQL and JDBC. Not sure what else it may have.
EDIT: Also now confirmed these two values; BASELINE and UNDO_SQL
It's actually marked as intentionally not documented since it's not a part of the public API:
https://flywaydb.org/documentation/learnmore/faq#case-sensitive

flyway sessions locking itself

I was trying to implement code through flyway:
create index concurrently if not exists api_client_system_role_idx2 on profile.api_client_system_role (api_client_id);
create index concurrently if not exists api_client_system_role_idx3 on profile.api_client_system_role (role_type_id);
create index concurrently if not exists api_key_idx2 on profile.api_key (api_client_id);
However flyway sessions were blocking each other and script is in "pending" state.
| Versioned | 20.1 | add email verification table | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 21.1 | create role for doc api | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 22 | create indexes for profile | SQL | 2022-10-21 10:23:41 | Success |
| Versioned | 23 | test flyway | SQL | | Pending |
+-----------+---------+----------------------------------------------+--------+---------------------+---------+
Flyway: Flyway Community Edition 9.3.1 by Redgate
Database: Postgresql 14.4
Can you please advice how to properly implement creating indexes concurrently in postgresql?
I've tried simply to kill blocking session and let the script to continue, however then implementation failed and scripts stayed in "Pending" status.

Visualizing time-series from a SQL Database (Postgres)

I am building an app that applies a datascience model on a SQL Database, for sensor metrics. For this purpose I chose PipelineDB (based on Postgres) that enables me to build a Continuous View on my metrics and apply the model to each new line.
For now, I just want to observe the metrics I collect through the sensor on a dashboard. The table "metrics" looks like this :
+---------------------+--------+---------+------+-----+
| timestamp | T (°C) | P (bar) | n | ... |
+---------------------+--------+---------+------+-----+
| 2015-12-12 20:00:00 | 20 | 1.13 | 0.9 | |
+---------------------+--------+---------+------+-----+
| 2015-12-13 20:00:00 | 20 | 1.132 | 0.9 | |
+---------------------+--------+---------+------+-----+
| 2015-12-14 20:00:00 | 40 | 1.131 | 0.96 | |
+---------------------+--------+---------+------+-----+
I'd like to build a dashboard in which I could see all my metric evolving through time. Even be able to choose which column to display.
So I found a few tools that could match with my need, which are Grafana or Chronograf for InfluxDB.
But neither of them enable me to plug directly on Postgres and query my table to generate metric-formatted data that is required by these tools.
Do you have any advice on what I should do to use such dashboards with such data ?
A bit late here, but Grafana now supports Postgresql datasources directly: https://grafana.com/docs/features/datasources/postgres. I've used it in several projects and it has been really easy to set up and use.

How to debug "Sugar CRM X Files May Only Be Used With A Sugar CRM Y Database."

Sometimes one gets a message like:
Sugar CRM 6.4.5 Files May Only Be Used With A Sugar CRM 6.4.5 Database.
I am wondering how Sugar determines what version of the database it is using. In the above case, I get the following output:
select * from config where name='sugar_version';
+----------+---------------+-------+
| category | name | value |
+----------+---------------+-------+
| info | sugar_version | 6.4.5 |
+----------+---------------+-------+
1 row in set (0.00 sec)
cat config.php |grep sugar_version
'sugar_version' => '6.4.5',
Given the above output, I am wondering how to debug the output "Sugar CRM 6.4.5 Files May Only Be Used With A Sugar CRM 6.4.5 Database.": Sugar seems to think the files are not of version 6.4.5 even though the sugar_version is 6.4.5 in config.php; where should I look next?
Two options for the issue:
Option 1: Update your database for the latest version.
Option 2: Follow the steps below and change the SugarCRM cnfig version.
mysql> select * from config where name ='sugar_version';
+----------+---------------+---------+----------+
| category | name | value | platform |
+----------+---------------+---------+----------+
| info | sugar_version | 7.7.0.0 | NULL |
+----------+---------------+---------+----------+
1 row in set (0.00 sec)
Update your sugarcrm version to apporipriate :
mysql> update config set value='7.7.1.1' where name ='sugar_version';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
The above commands seem to be correct. Sugar seems to check that config.php and the config table in the database contain the same version. In my case I was making the mistake of using the wrong database -- so if you're like me and tend to have your databases mixed up, double check in config.php that 'dbconfig' is indeed pointing to the right database.

Google Cloud SQL VERY SLOW

I am thinking to migrate my website to Google Cloud SQL and I signed up for a free account (D32).
Upon testing on a table with 23k records the performances were very poor so I read that if I move from the free account to a full paid account I would have access to faster CPU and HDD... so I did.
performances are still VERY POOR.
I am running my own MySQL server for years now, upgrading as needed to handle more and more connections and to gain raw speed (needed because of a legacy application). I highly optimize tables, configuration, and heavy use of query cache, etc...
A few pages of our legacy system have over 1.5k of queries per page, currently I was able to push the mysql query time (execution and pulling of the data) down to 3.6seconds for all those queries, meaning that MySQL takes about 0.0024 seconds to execute the queries and return the values.. not the greatest but acceptable for those pages.
I upload a table involved in those many queries to Google Cloud SQL. I notices that the INSERT already takes SECONDS to execute instead than milliseconds.. but I think that it might be the sync vs async setting. I change it to async and the execution time for the insert doesn't feel like it changes. for now not a big problem, I am only testing queries for now.
I run a simple select * FROM <table> and I notice that it takes over 6 seconds.. I think that maybe the query cache needs to build.. i try again and this times it takes 4 seconds (excluding network traffic). I run the same query on my backup server after a restart and with no connections at all, and it takes less than 1 second.. running it again, 0.06 seconds.
Maybe the problem is the cache, too big... let's try a smaller subset
select * from <table> limit 5;
to my server: 0.00 seconds
GCS: 0.04
so I decide to try a dumb select on an empty table, no records at all, just created with only 1 field
to my server: 0.00 seconds
GCS: 0.03
profiling doesn't give any insights except that the query cache is not running on Google Cloud SQL and that the queries execution seems faster but .. is not...
My Server:
mysql> show profile;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000225 |
| Waiting for query cache lock | 0.000116 |
| init | 0.000115 |
| checking query cache for query | 0.000131 |
| checking permissions | 0.000117 |
| Opening tables | 0.000124 |
| init | 0.000129 |
| System lock | 0.000124 |
| Waiting for query cache lock | 0.000114 |
| System lock | 0.000126 |
| optimizing | 0.000117 |
| statistics | 0.000127 |
| executing | 0.000129 |
| end | 0.000117 |
| query end | 0.000116 |
| closing tables | 0.000120 |
| freeing items | 0.000120 |
| Waiting for query cache lock | 0.000140 |
| freeing items | 0.000228 |
| Waiting for query cache lock | 0.000120 |
| freeing items | 0.000121 |
| storing result in query cache | 0.000116 |
| cleaning up | 0.000124 |
+--------------------------------+----------+
23 rows in set, 1 warning (0.00 sec)
Google Cloud SQL:
mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000061 |
| checking permissions | 0.000012 |
| Opening tables | 0.000115 |
| System lock | 0.000019 |
| init | 0.000023 |
| optimizing | 0.000008 |
| statistics | 0.000012 |
| preparing | 0.000005 |
| executing | 0.000021 |
| end | 0.000024 |
| query end | 0.000007 |
| closing tables | 0.000030 |
| freeing items | 0.000018 |
| logging slow query | 0.000006 |
| cleaning up | 0.000005 |
+----------------------+----------+
15 rows in set (0.03 sec)
keep in mind that I connect to both server remotely from a server located in VA and my server is located in Texas (even if it should not matter that much).
What am I doing wrong ? why simple queries take this long ? am I missing or not understanding something here ?
As of right now I won't be able to use Google Cloud SQL because a page with 1500 queries will take way too long (circa 45 seconds)
I know this question is old but....
CloudSQL has poor support for MyISAM tables, it's recommend to use InnoDB.
We had poor performance when migrating a legacy app, after reading through the doc's and contacting the paid support, we had to migrate the tables into InnoDB; No query cache was also a killer.
You may also find later on you'll need to tweak the mysql conf via the 'flags' in the google console. An example being 'wait_timeout' is set too high by default (imo.)
Hope this helps someone :)
Query cache is not as yet a feature of Cloud SQL. This may explain the results. However, I recommend closing this question as it is quite broad and doesn't fit the format of a neat and tidy Q&A. There are just too many variables not mentioned in the Q&A and it doesn't appear clear what a decisive "answer" would look like to the very general question of optimization when there are so many variables at play.