How to debug sphinx search using the indextool - sphinx

I ran indextool on an index that crashes sphinx when I use indexer on it.
The output of indextool shows many failures such as:
FAILED, string offset out of bounds (row=18, stringattr=3, docid=3317, index=896070)
Can someone help me understand what the parameters (row, stringattr, docid, index) relate to so I can inspect the index csv file to try and see what's causing the failure?

Those are offsets within the generated index. Not in the original source dataset.
But also as far as I know indextool is only inspecting existing indexes. Running indexer is trying to create a new version of the index from the 'source' data. So if indexer is 'crashing' a proper index is NOT being built.
So indextool is inspecting some previous version, rather than the partly built index when indexer crashed! THat early version was already corrupted.
In short using indextool is a non starter. Need to debug using indexer instead.
Maybe try --dump-rows and/or --verbose options to indexer maybe will reveal something useful just before the crash happens?

Related

Sphinx Error: (type='index') already exists

I have my sphinx configuration broken out into 3 files as I have several different indexes that each use common files.
So I have a master sphinx.com that defines paths for where source and index are
Then each index has a source file (sql select, fields) and an index file with all the wordform paths and any rules for that specific index of data.
So I've been making changes to one of the indexes but all of a sudden am getting this error:
ERROR: section 'MyIndex_0' (type='index') already exists in /etc/sphinxsearch/sphinx.conf line 17863 col 19
Yet I did not touch the sphinx.conf file.
Update: As a test I found an old version of the index file I am trying to rotate so I still get the error. So it is not changes I made to this file (once in a while when making changes I get an error which is always due to some type or another).
Can some file have gotten corrupted?
I did stop and start sphinx to no avail.

Can I debug a PostgreSQL query sent from an external source, that I can't edit?

I see how to debug queries stored as Functions in the database. But my problem is with an external QGIS plugin that connects to my Postgres 10.4 via network and does a complex query and calculations, and stores the results back into PostGIS tables:
FOR r IN c LOOP
SELECT
(1 - ST_LineLocatePoint(path.geom, ST_Intersection(r.geom, path.geom))) * ST_Length(path.geom)
INTO
station
(continues ...)
When it errors, it just returns that line number as the failing location, but no clue where it was in the loop through hundreds of features. (And any features it has processed are not stored to the output tables when it fails.) I totally don't know enough about the plugin and about SQL to hack the external query, and I suspect if it was a reasonable task the plugin author would have included more revealing debug messages.
So is there some way I could use pgAdmin4 (or anything) from the server side to watch the query process? Even being able to see if it fails the first time through the loop or later would help immensely. Knowing the loop count at failure would point me to the exact problem feature. Being able to see "station" or "r.geom" would make it even easier.
Perfectly fine if the process is miserably slow or interferes with other queries, I'm the only user on this server.
This is not actually a way to watch the RiverGIS query in action, but it is the best I have found. It extracts the failing ST_Intersects() call from the RiverGIS code and runs it under your control, where you can display any clues you want.
When you're totally mystified where the RiverGIS problem might be, run this SQL query:
SELECT
xs."XsecID" AS "XsecID",
xs."ReachID" AS "ReachID",
xs."Station" AS "Station",
xs."RiverCode" AS "RiverCode",
xs."ReachCode" AS "ReachCode",
ST_Intersection(xs.geom, riv.geom) AS "Fraction"
FROM
"<your project name>"."StreamCenterlines" AS riv,
"<your project name>"."XSCutLines" AS xs
WHERE
ST_Intersects(xs.geom, riv.geom)
ORDER BY xs."ReachID" ASC, xs."Station" DESC
Obviously replace <your project name> with the QGIS project name.
Also works for the BankLines step if you replace "StreamCenterlines" with "BankLines". Probably could be adapted to other situations where ST_Intersects() fails without a clue.
You'll get a listing with shorter geometry strings for good cross sections and double-length strings for bad ones. Probably need to widen your display column a lot to see this.
Works for me in pgAdmn4, or in QGIS3 -> Database -> DB Manager -> (click the wrench icon). You could select only bad lines, but I find the background info helpful.

Using each plugin in Nutch separately

I'm using extractor plugin with Nutch-1.15. The plugin makes use of parsed data.
The plugin works fine when used as a whole. The problem arises when a few changes are made to the custom-extractos.xml file.
The entire crawling process needs to be restarted even if there is a small change in the custom-extractors.xml file.
Is there a way that single plugin can be used separately on parsed data?
Since this plugin is a Parser filter, it must be used as part of the Parse step, and is not stand-alone.
However, there are a number of things you can do.
If you are looking to change the configuration on the fly (only affecting newly parsed documents), you can use the extractor.file property to specify any location on the HDFS, and replace this file as needed, it will be read by each task.
If you are want to reapply the changes to previously parsed documents, the answer is dependent on the specifics of your crawl, but you may be able to run the parse step again using nutch parse on old segments (you will need to delete the existing parse folders in the segments).

Jmeter - Can I change a variable halfway through runtime?

I am fairly new to JMeter so please bear with me.
I need to understand whist running a JMeter script I can change the variable with the details of "DB1" to then look at "DB2".
Reason for this is I want to throw load at a MongoDB and then switch to another DB at a certain time. (hotdb/colddb)
The easiest way is just defining 2 MongoDB Source Config elements pointing to separate database instances and give them 2 different MongoDB Source names.
Then in your script you will be able to manipulate MongoDB Source parameter value in the MongoDB Script test element or in JSR223 Samplers so your queries will hit either hotdb or colddb
See How to Load Test MongoDB with JMeter article for detailed information
How about reading the value from a file in a beanshell/javascript sampler each iteration and storing in a variable, then editing/saving the file when you want to switch? It's ugly but it would work.

Solr AutoCommit not working with Postgresql

I am using Solr 4.10.0 with PostgreSql 9.3. I am able to configure my solr core properly using data-config.xml and search through the database different tables. However, I am not able to setup the autoCommit feature. Whenever any row gets added in the table, I expect them to start appearing in the results after the maxTime (1 minute) but that doesn't happen. I have to explicitly rebuild the index by doing a full data-import and then everything works fine.
My solrconfig.xml is:
<updateHandler class="solr.DirectUpdateHandler2">
<autoCommit>
<maxTime>60000</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>
</updateHandler>
Is there something extra needs to be done for using autoCommit here? I checked my log files as well but there is no error / exception. What am I missing?
Please find the below link...
SOLR: What does an autoSoftCommit maxtime of -1 mean?
I think this is what is happening in your case..
First off, you can see the expression ${solr.autoSoftCommit.maxTime:-1} within the tag. This allows you to make use of Solr's variable substitution. That feature is described in detail here in the reference. If that variable has not been substituted by any of those means -1 is taken as value for that configuration.
Turning commitMaxTime to -1 effectively turns autocommit off.