i need your help to solve an issue in Thinking Sphinx.
i am Using 'sphinx-2.0.5-win32' and following gems
gem 'thinking-sphinx', '2.0.13' and gem 'riddle', '1.5.3'
sphinx.yml contains
development:
min_infix_len: 3
charset_table: "0..9, A..Z->a..z, _, a..z, -, U+410..U+42F->U+430..U+44F, U+430..U+44F, ., %, #, #, &, *, $"
binlog_path: '#'
My model file :
Class Rm
define_index do
set_property :delta => true
indexes :code, :as => :rm_code, :sortable => true
has id
end
end
I am searching like this :
Rm.search Riddle.escape('"rm0001"'), :page => params[:page], :per_page => 25, :match_mode => :extended -----------> getting 2 results
Code
rm0001
rm0001N
I want only 'rm0001' in search results,
Please help me
Thanks in advance.
Praveen
You´ve set min_infix_len - which enables part word matches.
But NOT set enable_star=true, which means that querys are automatically part word.
So either remove min_infix_len to disable part word matches, OR set enable_star so that part word matches only happen when include * at star/end of words.
Related
I have a vast array list like below
$data= [
'user_name' => 's',
'user_place' => 'a',
'address_list_code' => 's',
'block_number' => 3,
];
so I want to replace the key string with all uppercase.I know to convert selected text to uppercase using vs code shortcut Ctl+Alt+u and it works.
But I want to select only all keys in between a single quote and make it uppercase so the expected output is
[
'USER_NAME' => 's',
'USER_PLACE' => 'a',
'ADDRESS_LIST_CODE' => 's',
'BLOCK_NUMBER' => 3,
];
Even I tried this extension but not suceded to select all text in between single quotes
https://marketplace.visualstudio.com/items?itemName=dbankier.vscode-quick-select&ssr=false#version-history
Check out this fiddle : https://jsfiddle.net/m82ycfxw/
I made a normal JS code to convert the desired text to upper case.
This might be a temporary thing but I hope this will work for you.
let str = `
$data= [
'user_name' => 's',
'user_place' => 'a',
'address_list_code' => 's',
'block_number' => 3,
]
`;
var regex = /'[a-z_]+' =>/g;
str = str.replace(regex, foundText => {
return foundText.toUpperCase();
});
console.log(str);
Just change the str variable. Put all your complete data object inside backticks (` `)and run the code.
The extension Select By could help.
Place cursors at the start of the lines where you want to Uppercase text
with MoveBy: Move cursors based on regex move to the next '
with SelectBy: Mark positions of cursors
with MoveBy: Move cursors based on regex move to the next '
with SelectBy: Mark positions of cursors (create selections)
execute: Transform to Uppercase
Esc to exit multi Cursor
Sure, you can do this natively:
In settings.json, temporarily remove _ as a word separators (e.g. "editor.wordSeparators": "!##$%^&*()-=+[{]}\\|;:'\",.<>/?,) (Notice no _)
.
Place cursors at beginnings of all desired lines.
Right arrow to move all cursors within the quotes.
Execute command expand selection. Since you turned off _ as a word delimiter, this will expand to fill the quotes; otherwise, all the keys would need to have the same number of words for this to work.
Execute upper case.
In settings.json re-add _ to word separators.
Easy with Find and Replace. See regex101 demo
Find: (^\s+')([^']*)'
Replace: $1\U$2'
The \U will uppercase the following capture group $2.
Starting the find at the beginning of the line with ^ makes it easy to target just the "keys" (the first '-delimmited strings) and not the other following strings.
I am trying to parse all the files and verify if any of the file content has strings TESTDIR or TEST_DIR
Files contents might look something like:-
TESTDIR = foo
include $(TESTDIR)/chop.mk
...
TEST_DIR := goldimage
MAKE_TESTDIR = var_make
NEW_TEST_DIR = tesing_var
Actually I am only interested in TESTDIR ,$(TESTDIR),TEST_DIR but in my case last two lines should be ignored. I am new to perl , Can anyone help me out with re-rex.
/\bTEST_?DIR\b/
\b means a "word boundary", i.e. the place between a word character and a non-word character. "Word" here has the Perl meaning: it contains characters, numbers, and underscores.
_? means "nothing or an underscore"
Look at "characterset".
Only (space) surrounding allowed:
/^(.* )?TEST_?DIR /
^ beginning of the line
(.* )? There may be some content .* but if, its must be followed by a space
at the and says that a whitespace must be there. Otherwise use ( .*)?$ at the end.
One of a given characterset is allowed:
Should the be other characters then a space be possible you can use a character class []:
/^(.*[ \t(])?TEST_?DIR[) :=]/
(.*[ \t(])? in front of TEST_?DIR may be a (space) or a \t (tab) or ( or nothing if the line starts with itself.
afterwards there must be one of (space) or : or = or ). Followd by anything (to "anything" belongs the "=" of ":=" ...).
One of a given group is allowed:
So you need groups within () each possible group in there devided by a |:
/^(.*( |\t))?TEST_?DIR( | := | = )/
In this case, at the beginning is no change to [ \t] because each group holds only one character and \t.
At the end, there must be (single space) or := (':=' surrounded by spaces) or = ('=' surrounded by spaces), following by anything...
You can use any combination...
/^(.*[ \t(])?TEST_?DIR([) =:]| :=| =|)/
Test it on Debuggex.com. (Use 'PCRE')
There is a table, which contains site URLs.
I want to sort Sphinx results in a way: "the closer keyword to the beginning of string - the more relevant"
"foobar.com, barfoo.com, barbarfoo.com" is correct result set for keyword "foo"
I have tried :
$s = new SphinxClient;
$s->setServer("localhost", 9312);
$s->SetMatchMode(SPH_MATCH_ALL);
$s->SetSortMode(SPH_SORT_RELEVANCE);
$s->SetFieldWeights(array(
'id' => 0,
'url' => 1000,
));
$result = $s->query("foo");
Unfortunately I get result, that sorted by id.
Hmm, dont think sphinx can do that directly. There are various ranking factors but all based on words.
Can match part words using the 'min_prefix_len', but cant get 'where in the word' the match happens, to be able to rank by it.
The only way may be able to get to work with sphinx, would be to use wordbreaker
http://sphinxsearch.com/blog/2013/01/29/a-new-tool-in-the-trunk-wordbreaker/
to index your domains names as seperate words. Hoping that your domains would be split correctly at "foo bar com", "bar foo com", "bar bar foo com" - which then could rank by word position, eg min_hit_pos
http://sphinxsearch.com/docs/current.html#field-factors
I'm trying to set up a grammar that requires that [\w] characters cannot appear directly adjacent to each other if they are not in the same lexeme. That is, words must be separated from each other by a space or punctuation.
Consider the following grammar:
use Marpa::R2; use Data::Dump;
my $grammar = Marpa::R2::Scanless::G->new({source => \<<'END_OF_GRAMMAR'});
:start ::= Rule
Rule ::= '9' 'september'
:discard ~ whitespace
whitespace ~ [\s]+
END_OF_GRAMMAR
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');
This parses successfully. Now I want to change the grammar to force a separation between 9 and september. I thought of doing this by introducing an unused lexeme that matches [\w]+:
use Marpa::R2; use Data::Dump;
my $grammar = Marpa::R2::Scanless::G->new({source => \<<'END_OF_GRAMMAR'});
:start ::= Rule
Rule ::= '9' 'september'
:discard ~ whitespace
whitespace ~ [\s]+
word ~ [\w]+ ### <== Add unused lexeme to match joined keywords
END_OF_GRAMMAR
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');
Unfortunately, this grammar fails with:
A lexeme is not accessible from the start symbol: word
Marpa::R2 exception at marpa.pl line 3.
Although this can be resolved by using a lexeme default statement:
use Marpa::R2; use Data::Dump;
my $grammar = Marpa::R2::Scanless::G->new({source => \<<'END_OF_GRAMMAR'});
lexeme default = action => [value] ### <== Fix exception by adding lexeme default statement
:start ::= Rule
Rule ::= '9' 'september'
:discard ~ whitespace
whitespace ~ [\s]+
word ~ [\w]+
END_OF_GRAMMAR
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');
This results in the following output:
Inaccessible symbol: word
Error in SLIF parse: No lexemes accepted at line 1, column 1
* String before error:
* The error was at line 1, column 1, and at character 0x0039 '9', ...
* here: 9september
Marpa::R2 exception at marpa.pl line 16.
That is, the parse has failed due to the fact that there is no gap between 9 and september which is exactly what I want to happen. The only fly in the ointment is that there is an annoying Inaccessible symbol: word message on STDERR because the word lexeme is not used in the actual grammar.
I see that in Marpa::R2::Grammar I could have declared word as inaccessible_ok in the constructor options but I can't do that in Marpa::R2::Scanless.
I also could have done something like the following:
Rule ::= nine september
nine ~ word
september ~ word
then used a pause to use custom code to examine the actual lexeme value and return the appropriate lexeme depending on the value.
What is the best way to construct a grammar that uses keywords or numbers and words but will disallow adjacent lexemes to be run together without white space or punctuation separating them?
Well, the obvious solution is to require some whitespace in between (on the G1 level). When we use the following grammar
:default ::= action => ::array
:start ::= Rule
Rule ::= '9' (Ws) 'september'
Ws ::= [\s]+
:discard ~ whitespace
whitespace ~ [\s]+
then 9september fails, but 9 september is parsed. Important points to note:
Lexemes can be both discarded and required, when they are both a longest token. This is why the :discard and Ws rule don't interfere with each other. Marpa doesn't mind this kind of “ambiguity”.
The Ws rule is enclosed in parens, which discards the value – to keep the resulting parse tree clean.
You do not usually want to use tricks like phantom lexemes to misguide the parser. That way lies breakage.
When every bit of whitespace is important, you might want to get rid of :discard ~ whitespace. This is meant to be used e.g. for C-like languages where whitespace traditionally does not matter.
This may be a very simple task for many but I could not find anything appropriate for me.
I have a file name: filenm_A006.2011.269.10.47.G25_2010
I want to separate all its parts (separated by . and _) to use them separately. How can I do it with simple matlab commands?
Kind Regards,
Mushi
I recommend regexp:
fname = 'filenm_A006.2011.269.10.47.G25_2010';
parts = regexp(fname, '[^_.]+', 'match');
parts =
'filenm' 'A006' '2011' '269' '10' '47' 'G25' '2010'
You can now refer to parts{1} through parts{8} for the pieces. Explanation: the regexp pattern [^_.] means all characters not equal to _ or ., and the + means you want groups of at least 1 character. Then 'match' asks the regexp function to return a cell array of the strings of all the matches of that pattern. There are other regexp modes; for example, the indices of each piece of the file.
Use the command
strsplit.
cellArrayOfParts = strsplit(fileName,{'.' '_'});
You can use strsplit to split it:
strsplit('filenm_A006.2011.269.10.47.G25_2010',{'_','.'})
ans =
'filenm' 'A006' '2011' '269' '10' '47' 'G25' '2010'
Another option is to use regexp, like Peter suggested.