Apparently max_matches is deprecated in the current Sphinx so it has to be commented out in the confi file. I am using SphinxQL in a MySql editor.
When I use limit 10,000 I still only get 1000
When I use option max_matches = 20000 I get 20
When I use them both together, limit 20000 option max_matches = 20000 I get 20,000. Should I have to use both together and would that affect performance is so?
Its only the 'server' variable max_matches that is depreciated. It is still a run-time variable.
Set it as big as you need it. No bigger.
Related
I am trying to create a filter for a field that contains over 5000 unique values. However, the filter's query is automatically setting a limit of 1000 rows, meaning that the majority of the values do not get displayed in the filter dropdown.
I updated the config.py file inside the 'anaconda3/lib/python3.7/site-packages' directory by increasing the DEFAULT_SQLLAB_LIMIT and QUERY_SEARCH_LIMIT to 6000, however this did not work.
Is there any other config that I need to update?
P.S - The code snippet below shows the json representation of the filter where the issue seems to be coming from.
"query": "SELECT casenumber AS casenumber\nFROM pa_permits_2019\nGROUP BY casenumber\nORDER BY COUNT(*) DESC\nLIMIT 1000\nOFFSET 0"
After using the grep command to find all files containing the text '1000', I found out the the filter limit can be configured through the filter_row_limit in viz.py
I am running into an issue where if I have more than 15 LIKE operators within a case statement, I get an error java.lang.StackOverflowError.
Here is an example of what I am doing against a table with 60 million rows:
SELECT
CASE WHEN field LIKE '%value%' THEN 'result'
WHEN field LIKE '%value2%' THEN 'result2'
.... 14 more of those
END
I haven't seen this limitation documented anywhere. Any ideas how to get around this?
It sounds like it's an out-of-memory error.
I think you have some options:
use an intermediate table before doing the like processing (or use intermediate tables to process subsets of your initial data)
bump up the number of queue slots that you're using for this query to have more memory available https://docs.aws.amazon.com/redshift/latest/dg/r_wlm_query_slot_count.html
take a look at the explain output to see if it gives you clues about what's going wrong
You could Create a Scalar Python User-Defined Function to replace the LIKE comparisons.
Then, just use:
SELECT f_myfunc(field)
This turned out to be a driver issue. I was initially using 1.2.16.1027 and upgraded to 1.2.20.1043 and I am no longer receiving the error.
I have simple table
id: primary
name: varchar fulltext index
here is my Sphinx config
https://justpaste.it/1okop
Indexer warns about docinfo
Sphinx 2.2.11-id64-release (95ae9a6) Copyright (c) 2001-2016, Andrew
Aksyonoff Copyright (c) 2008-2016, Sphinx Technologies Inc
(http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'... indexing index
'words'... WARNING: Attribute count is 0: switching to none docinfo
collected 10000 docs, 0.1 MB sorted 0.0 Mhits, 100.0% done total 10000
docs, 79566 bytes total 0.065 sec, 1210829 bytes/sec, 152179.20
docs/sec total 3 reads, 0.000 sec, 94.6 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 47.5 kb/call avg, 0.0 msec/call avg
But it's said here Sphinx: WARNING: Attribute count is 0: switching to none docinfo that it's nothing seriouse.
OK, starting service and search for part of the word:
SELECT *
FROM test_words
where match (name) AGAINST ('lema')
No rows.
Tha same as
SELECT *
FROM test_words
where match (name) AGAINST ('*lema*')
No rows.
And in the same time there are results for query
SELECT *
FROM `test_words`
where position('lema' in name)>0
so as far as I can see - Sphinx is not searching by part of a word.
Why and how to fix it?
And - if I uncomment
min_infix_len = 3
infix_fields = name
I get
WARNING: index 'words': prefix_fields and infix_fields has no effect with dict=keywords, ignoring
And one more - show engines; show no Sphinx engine, is it normal now? mysql service was restarted.
All sql queries were run through Adminer logged in localhost:3312
There's no functions like position() or syntax like 'match(name) against' in Sphinx. According to your config your index is 'words' while your requests are against 'test_words' which is the source table you build your index from. So it seems to me you're connecting not to Sphinx, but to mysql. If you're looking for closer integration between MySQL and Sphinx try SphinxSE (http://sphinxsearch.com/docs/devel.html#sphinxse), it will then show up in "SHOW ENGINES" or if you don't want to deal with compiling MySQL with SphinxSE enable you might want to try Manticore Search (a fork of Sphinx) since it has integration with FEDERATED engine (https://docs.manticoresearch.com/2.6.4/html/federated_storage_engine.html) which is by default compiled in mysql, you just need to start it properly to enable it.
If you want to use Sphinx traditional way just make sure you connect to it, not to MySQL.
In sphinx it calls extended-syntax, you should specify field by #field, try this:
SELECT * FROM test_words
WHERE match('#name lema')
http://sphinxsearch.com/docs/current.html#extended-syntax
So I have a complex, almost 200 lines long stored procedure in PostgreSQL and I would like to analyze it quickly, but unfortunately the PgAdmin's built in explain analyze function does not support nested loops and it does not let me look under the hood, so I updated my postgresql.conf file with the following:
auto_explain.log_analyze = true
auto_explain.log_timing = true
auto_explain.log_verbose = true
auto_explain.log_min_duration = '0ms'
auto_explain.log_nested_statements = true
auto_explain.log_buffers = true
So I can see the detailed logs in my pg_log folder, but it generates almost 300 lines long result log and its not easy to analyze.
Is there a better, more elegant way to do this? Maybe is there a UI tool for it on windows?
While explain.depesz.com is very useful, you can analyze your procedure with https://github.com/bigsql/plprofiler as well. You can combine both tools
As #a_horse_with_no_name suggested it in the comments the explain.depesz.com site is very useful. Just have to copy paste your explain analyze plan, and see the output. You can click on column headers to let it know which parameter is the most important for you – exclusive node time, inclusive node time, or rowcount mis-estimate.
About explain.depesz.com
User.find(:all, :order => "RANDOM()", :limit => 10) was the way I did it in Rails 3.
User.all(:order => "RANDOM()", :limit => 10) is how I thought Rails 4 would do it, but this is still giving me a Deprecation warning:
DEPRECATION WARNING: Relation#all is deprecated. If you want to eager-load a relation, you can call #load (e.g. `Post.where(published: true).load`). If you want to get an array of records from a relation, you can call #to_a (e.g. `Post.where(published: true).to_a`).
You'll want to use the order and limit methods instead. You can get rid of the all.
For PostgreSQL and SQLite:
User.order("RANDOM()").limit(10)
Or for MySQL:
User.order("RAND()").limit(10)
As the random function could change for different databases, I would recommend to use the following code:
User.offset(rand(User.count)).first
Of course, this is useful only if you're looking for only one record.
If you wanna get more that one, you could do something like:
User.offset(rand(User.count) - 10).limit(10)
The - 10 is to assure you get 10 records in case rand returns a number greater than count - 10.
Keep in mind you'll always get 10 consecutive records.
I think the best solution is really ordering randomly in database.
But if you need to avoid specific random function from database, you can use pluck and shuffle approach.
For one record:
User.find(User.pluck(:id).shuffle.first)
For more than one record:
User.where(id: User.pluck(:id).sample(10))
I would suggest making this a scope as you can then chain it:
class User < ActiveRecord::Base
scope :random, -> { order(Arel::Nodes::NamedFunction.new('RANDOM', [])) }
end
User.random.limit(10)
User.active.random.limit(10)
While not the fastest solution, I like the brevity of:
User.ids.sample(10)
The .ids method yields an array of User IDs and .sample(10) picks 10 random values from this array.
Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally
For MYSQL this worked for me:
User.order("RAND()").limit(10)
You could call .sample on the records, like: User.all.sample(10)
The answer of #maurimiranda User.offset(rand(User.count)).first is not good in case we need get 10 random records because User.offset(rand(User.count) - 10).limit(10) will return a sequence of 10 records from the random position, they are not "total randomly", right? So we need to call that function 10 times to get 10 "total randomly".
Beside that, offset is also not good if the random function return a high value. If your query looks like offset: 10000 and limit: 20 , it is generating 10,020 rows and throwing away the first 10,000 of them,
which is very expensive. So call 10 times offset.limit is not efficient.
So i thought that in case we just want to get one random user then User.offset(rand(User.count)).first maybe better (at least we can improve by caching User.count).
But if we want 10 random users or more then User.order("RAND()").limit(10) should be better.
Here's a quick solution.. currently using it with over 1.5 million records and getting decent performance. The best solution would be to cache one or more random record sets, and then refresh them with a background worker at a desired interval.
Created random_records_helper.rb file:
module RandomRecordsHelper
def random_user_ids(n)
user_ids = []
user_count = User.count
n.times{user_ids << rand(1..user_count)}
return user_ids
end
in the controller:
#users = User.where(id: random_user_ids(10))
This is much quicker than the .order("RANDOM()").limit(10) method - I went from a 13 sec load time down to 500ms.