Can Sphinx query be tested command line w/o an index? - command-line

If I just want to test that my query will work vs a term inside a sentence "and am selling an Iphone 3gs", is it possible to use command line to test this? This way I don't need to keep adding to and rotating an index but can simply tweak my query and the data I'd plan on storing. Mainly I am trying to tweak various query parameters like SENTENCE and PROXIMITY vs wordforms/stopwords/ignore_char and would like to be able to work fast and test different query structures vs test words/patterns.

In theory can use BuildExcerpts, to run an arbitrary query, against a block of text (make sure use query_mode=true).
http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts
But even then not totally sure it will completely honour the query, not sure SENTENCE etc will truely work.
... but if you wanting to play with ignore_char etc, you are going to be modifying the index config file. So surely just quickly running indexer to rebuild the index, to see the results is not htat difficult.

Related

Compare tables to ensure non regression in postgresql

Here is my issue: I often need to compare the same postgresql tables (or views that depend on it) between some ETL code refactoring to check for non regressions in my developments.
Let's say I have an ETL code I want to refactor, which regularly uploads data in a table. Currently, once my modifs are done, I often download my data from postgresql as a .csv file as a first step, then empty it, fill it again using my refactored code, and download the data again. Then, I compare the .csv files using for instance Python in a Jupyter Notebook.
That does not seem like the way to go at all. That notably supposes I am the only one to use that table during the operation, and so many other things I can't list them all here.
Is there a better way to go ?
It sounds to me like you have the correct approach. There's no magic to the CSV export operation: whatever tool you use runs a query and formats its resultset into the file. Any other before-and-after comparison operation would have to run the same query.
If you're doing this sort of regression test on an active database, it's probably wise to put some sort of distinctive tag on your test records, maybe prepend ETLTEST- to your customer names, so it's ETLTEST-John Bull. Then you can make your queries handle only your test records. And make sure you do something reliable for ORDER BY.
Juptyer seems a complex way to diff your csv files. Most operating systems have lightweight fast difftools.

Time of first occurrence of the query from pg_stat_statements. Possible to get?

I'm debugging a DB performance issue. There's a lead that the issue was introduced after a certain deploy, e.g. when DB started to serve some new queries.
I'm looking to correlate deployment time with the performance issues, and would like to identify the queries that are causing this.
Using pg_stat_statements has been very handy so far. Unfortunately it does not store the time stamp of the first occurrence of each query.
Is there any auxiliary tables I could look into to see the time of first occurrence of queries?
Ideally, if this information would have been available in pg_stat_statements, I'd make a query like this:
select queryid from where date(first_run) = '2020-04-01';
Additionally, it'd be cool to see last_run as well, so to filter out some old queries that no longer execute at all, but remain in pg_stat_statements. That's more of a nice thing that's a necessity though.
This information is not stored anywhere, and indeed it would not be very useful. If the problem statement is a new one, you can easily identify it in your application code. If it is not a new statement, but something made the query slower, the first time the query was executed won't help you.
Is your source code not under version control?

Unit tests producing different results when using PostgreSQL

Been working on a module that is working pretty well when using MySQL, but when I try and run the unit tests I get an error when testing under PostgreSQL (using Travis).
The module itself is here: https://github.com/silvercommerce/taxable-currency
An example failed build is here: https://travis-ci.org/silvercommerce/taxable-currency/jobs/546838724
I don't have a huge amount of experience using PostgreSQL, but I am not really sure why this might be happening? The only thing I could think that might cause this is that I am trying to manually set the ID's in my fixtures file and maybe PostgreSQL not support this?
If this is not the case, does anyone have an idea what might be causing this issue?
Edit: I have looked again into this and the errors appear to be because of this assertion, which should be finding the Tax Rate vat but instead finds the Tax Rate reduced
I am guessing there is an issue in my logic that is causing the incorrect rate to be returned, though I am unsure why...
In the end it appears that Postgres has different default sorting to MySQL (https://www.postgresql.org/docs/9.1/queries-order.html). The line of interest is:
The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on
In the end I didn't really need to test a list with multiple items, so instead I just removed the additional items.
If you are working on something that needs to support MySQL and Postgres though, you might need to consider defining a consistent sort order as part of your query.

Performance of repeatedly executing the same javascript on MongoDb?

I'm looking at using some JavaScript in a MongoDb query. I have a couple of choices:
db.system.js.save the function in the db then execute it
db.myCollection.find with a $where clause and send the JS each time
exec_js in MongoEngine (which I imagine uses one of the above)
I plan to use the JavaScript in a regularly used query that's executed as part of a request to a site or API (i.e. not a batch administrative jobs) so it's important that the query executes with reasonable speed.
I'm looking at a 30ish line function.
Is the Javascript interpreted fresh each time? Will the performance be ok? Is it a sensible basis upon which to build queries?
Is the Javascript interpreted fresh each time?
Pretty much. MongoDB only has one "javascript instance" per running instance of MongoDB. You'll notice this if you try to run two different Map/Reduces at the same time.
Will the performance be ok?
Obviously, there are different definitions of "OK" here. The $where clause can not use indexes. You can combine that clause with another indexed query. In either case each object will need to be pushed from BSON over to the Javascript run-time and then acted on inside the run-time.
The process is definitely not what you would call "performant". Of course, by that measure Map/Reduce is also not very performant and people use that on production systems.
Is it a sensible basis upon which to build queries?
The real barrier here isn't the number of lines in the code, it's the number of possible documents this code will interpret. Even though it's "server-side" javascript, it's still a bunch of work that the server has to do. (in one thread, in an interpreted environment)
If you can test it and scope it correctly, it may well work out. Just don't expect miracles.
What is your point here? Write a JS script and call it regularly through cron. What should be the problem with that?

Full Text Searching in Apple's Core Data Framework

I would like to implement a full text search in an iPhone application. I have data stored in an sqlite database that I access via the Core Data framework. Just using predicates and ORing a bunch of "contains[cd]" phrases for every search word and column does not work well at all.
What have you done that seems to work well?
We have FTS3 working very nicely on 150,000+ records. We are getting subsecond query times returning over 200 results on a single keyword query.
Presently the only way to get Sqlite FTS3 working on the iPhone is to compile your own binary and link it to your project. To my knowledge, the binary included in your own project will not work with Core Data. Perhaps Apple will turn on the FTS3 compiler option in a future release?
You can still link in your own Sqlite FTS3 binary and use it just for full text searches. This would be very similar to the way Sphinx or Lucene are used in Web App environments. Note you will still have to update the search index at some point to keep synchronicity with the Core Data stores.
Good luck !!
I assume that by "does not work well" you mean 'performs badly'. Full-text search is always relatively slow, especially in memory or space constrained environments. You may be able to speed things up by making sure the attributes you're searching against are indexed and using BEGINSWITH[cd] instead of CONTAINS[cd]. My recollection (can't find the cocoa-dev post at this time) is that SQLite will use the index for prefix matching, but falls back to linear search for infix searches.
I use contains[cd] in my predicate and this works fine. Perhaps you could post your predicate and we could see if there's an obvious fault.
Sqlite has its own full text indexing module: http://sqlite.org/fts3.html
You have to have full control of the SQL you send to the db (I don't know how Core Data works), but using the full text indexing module is key to speed of execution and simplicity in your SQL SELECT statements that do full text searching.
Using CONTAINS is fine if you don't need fast execution, but selects made with it can't make use of regular indexes so are destined to be slow, and the larger the database the slower it will be. Using real full text indexing allows same sort of searches as you can do with 'CONTAINS', but things are indexed for fast results even with large db's.
I've been working on this same problem and just got around to following up on my post about this from a few weeks ago. Instead of using CONTAINS, I created a separate entity with an instance for each canonicalized word. I added an index on the words (in XCode model builder) and can then use a BEGINSWITH operator to exploit the index. Nevertheless, as I just posted a few minutes ago, query time is still very slow for even small data sets.
There must be a better way! After all, we see this sort of full text search in lots of apps!