Lucene Search doesn't find keyword indexed fields - zend-framework

i save my fieldes with this code:
class Places_Search_Document extends Zend_Search_Lucene_Document{
public function __construct($class, $key, $title,$contents, $summary, $createdBy, $dateCreated)
{
$this->addField(Zend_Search_Lucene_Field::Keyword('docRef', "$class:$key"));
$this->addField(Zend_Search_Lucene_Field::UnIndexed('class', $class));
$this->addField(Zend_Search_Lucene_Field::UnIndexed('key', $key));
$this->addField(Zend_Search_Lucene_Field::Keyword('title', $title ,'UTF-8'));
$this->addField(Zend_Search_Lucene_Field::unStored('contents', $contents , 'UTF-8'));
$this->addField(Zend_Search_Lucene_Field::text('summary', $summary , 'UTF-8'));
//$this->addField(Zend_Search_Lucene_Field::UnIndexed('createdBy', $createdBy));
$this->addField(Zend_Search_Lucene_Field::Keyword('dateCreated', $dateCreated));
}
}
i search the word with this code:
$index = Places_Search_Lucene::open(SearchIndexer::getIndexDirectory());
$term = new Zend_Search_Lucene_Index_Term($q);
$query = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$results = $index->find($query);
now it work perfect for unsorted and text fields , but it doesn`t search for keyword !!

Are you sure you really want those fields to be keyword analyzed? The keyword analyzer puts the whole text of the field as one token, which you rarely want.

Related

How correctly to use the snippet() function?

My first Sphinx app almost works!
I successfully save path,title,content as attributes in index!
But I decided go to SphinxQL PDO from AP:
I found snippets() example thanks to barryhunter again but don't see how use it.
This is my working code, except snippets():
$conn = new PDO('mysql:host=ununtu;port=9306;charset=utf8', '', '');
if(isset($_GET['query']) and strlen($_GET['query']) > 1)
{
$query = $_GET['query'];
$sql= "SELECT * FROM `test1` WHERE MATCH('$query')";
foreach ($conn->query($sql) as $info) {
//snippet. don't works
$docs = array();
foreach () {
$docs[] = "'".mysql_real_escape_string(strip_tags($info['content']))."'";
}
$result = mysql_query("CALL SNIPPETS((".implode(',',$docs)."),'test1','" . mysql_real_escape_string($query) . "')",$conn);
$reply = array();
while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {
$reply[] = $row['snippet'];
}
// path, title out. works
$path = rawurlencode($info["path"]); $title = $info["title"];
$output = '<a href=' . $path . '>' . $title . '</a>'; $output = str_replace('%2F', '/', $output);
print( $output . "<br><br>");
}
}
I have got such structure from Sphinx index:
Array
(
[0] => Array
(
[id] => 244
[path] => DOC7000/zdorovie1.doc
[title] => zdorovie1.doc
[content] => Stuff content
I little bit confused with array of docs.
Also I don't see advice: "So its should be MUCH more efficient, to compile the documents and call buildExcepts just once.
But even more interesting, is as you sourcing the the text from a sphinx attribute, can use the SNIPPETS() sphinx function (in setSelect()!) in the main query. SO you dont have to receive the full text, just to send back to sphinx. ie sphinx will fetch the text from attribute internally. even more efficient!
"
Tell me please how I should change code for calling snippet() once for docs array, but output path (link), title for every doc.
Well because your data comes from sphinx, you can just use the SNIPPET() function (not CALL SNIPPETS()!)
$query = $conn->quote($_GET['query']);
$sql= "SELECT *,SNIPPET(content,$query) AS `snippet` FROM `test1` WHERE MATCH($query)";
foreach ($conn->query($sql) as $info) {
$path = rawurlencode($info["path"]); $title = $info["title"];
$output = '<a href=' . $path . '>' . $title . '</a>'; $output = str_replace('%2F', '/', $output);
print("$output<br>{$info['snippet']}<br><br>");
}
the highlighted text is right there in the main query, dont need to mess around with bundling the data back up to send to sphinx.
Also shows you should be escaping the raw query from user.
(the example you found does that, because the full text comes fom MySQL - not sphinx - so it has no option but to mess around sending data back and forth!)
Just for completeness, if REALLY want to use CALL SNIPPETS() would be something like
<?php
$query =$conn->quote($_GET['query']);
//make query request
$sql= "SELECT * FROM `test1` WHERE MATCH($query)";
$result = $conn->query($sql);
$rows = $result->fetchAll(PDO::FETCH_ASSOC);
//build list of docs to send
$docs = array();
foreach ($rows as $info) {
$docs[] = $conn->quote(strip_tags($info['content']));
}
//make snippet reqest
$sql = "CALL SNIPPETS((".implode(',',$docs)."),'test1',$query)";
//decode reply
$reply = array();
foreach ($conn->query($sql) as $row) {
$reply[] = $row['snippet'];
}
//output results using $rows, and cross referencing with $reply
foreach ($rows as $idx => $info) {
// path, title out. works
$path = rawurlencode($info["path"]); $title = $info["title"];
$output = '<a href=' . $path . '>' . $title . '</a>'; $output = str_replace('%2F', '/', $output);
$snippet = $reply[$idx];
print("$output<br>$snippet<br><br>");
}
Shows putting the rows into an array, because need to lopp though the data TWICE. Once to 'bundle' up the docs array to send. Then again to acully display rules, when have $rows AND $reply both available.

Why Laravel Request object is replacing spaces with underscores on my form names?

I have a Form posting variables containing spaces in their names
e.g.
I perform my ajax request and i can see in chrome inspector that name is correctly passed "with blank space)
In my api.php:
Route::post('/user', 'UserController#get');
UserController
function get(Request $request)
{
dd($request->input('Name Surname')); //display null
dd($request->all()); //I notice the key's changed to Name_Surname
}
Taken that I can't change the names because they have to contain spaces (bad practice? ok but it has to be like that):
how can I avoid spaces to be replaced?
(maybe without to have to manipulate the request->all() returned array keys by hand....)
Short answer I don't believe there to be such a way.
You can map the response with a bit of string replace though:
$data = $request->all()->mapWithKeys(function($item, $key) {
return [str_replace("_", " ", $key) => $item];
});
If it's something you want to apply across the board, you could possible rig up some middleware to apply it to all requests.
If previous answer not work for you, try this:
$data = collect($request->all())->mapWithKeys(function($item, $key) {
return [str_replace("_", " ", $key) => $item];
})->toArray();
You may also normalize the Input Name if it is known...
$field_name = 'FIELD NAME WITH SPACES';
$value = request( str_replace( ' ', '_', $field_name ) );

Yii Lucene Encoding

Can't find a solution here or anywhere else, so I'm asking another question about Zend Lucene. Everyone tells about some encoding of Lucene. Where should I switch this encoding?
When I use search (PL language) I'm getting
oprĂłcz wystÄ…pi reprezentacja Rosji. Mistrzowie
olimpijscy z Londynu powalczÄ…
This Ăł should be "ó" in Polish, Ä… (umlaut?) is "ą" and so on...
It works great with English of course.
Again searchController.php (actions create + search):
public function actionCreate()
{
$_indexFiles = 'runtime.search';
$index = Zend_Search_Lucene::create($_indexFiles);
$index = new Zend_Search_Lucene(Yii::getPathOfAlias('application.' . $this->_indexFiles), true);
$posts = News::model()->with('comment')->findAll();
foreach($posts as $news){
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('title',CHtml::encode($news->name), 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('link',CHtml::encode($news->url), 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('content',CHtml::encode($news->description), ' utf-8 '));
$index->addDocument($doc);
}
setlocale(LC_CTYPE, 'pl_PL.utf-8');
$index->commit();
echo 'Lucene index created';
}
public function actionSearch()
{
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ());
$this->layout='column2';
if (($term = Yii::app()->getRequest()->getParam('q', null)) !== null) {
$index = new Zend_Search_Lucene(Yii::getPathOfAlias('application.' . $this->_indexFiles));
$results = $index->find($term);
$query = Zend_Search_Lucene_Search_QueryParser::parse($term);
$this->render('search', compact('results', 'term', 'query'));
}
}
Wellcome to Zend_Lucene, when you get tired of it you can start using a native search engine like Solr, or Sphinx
"Learn from the mistakes of others. You can't live long enough to make them all yourself."

SeleniumRC/Perl dynamic XPath selector not working

This is more a question for XPath syntax than anything else.
I have multiple product pages on a site that have multiple products on each product pages. Each product has a unique ID for the add-to-cart button. I'm trying to return all of the unique ID's so that I can add a couple of the products to the bag. Searching with XPath seems to be the correct solution for this. I have the following code for querying the HTML with XPath and returning the unique ID's:
$XPATH_COUNT = $sel->get_xpath_count("//div[\#class='quick-info-link']/a");
#my_array;
$my_array[0] = $sel->get_attribute("//div[\#class='quick-info-link']/a/\#id");
print $my_array[0];
$count = 0;
while( $count < $XPATH_COUNT )
{
$arrayCount=0;
$a = "//";
foreach( #my_array )
{
$tmp = "a[\#id!='" . $my_array[$arrayCount] . "' and ";
$b .= $tmp;
$d .= "]";
$arrayCount++;
}
$c = "img[\#alt='Quick Shop']";
$e = $c . $d . "/\#id";
$xpath_query = $a . $b . $e;
$my_array[$count] = $sel->get_attribute($xpath_query);
$count++;
}
The output of the first run of this is an XPath query that looks like this:
//a[#id!='quickview-link-PROD7029' and img[#alt='Quick Shop']]/#id
Which correctly returns quickview-link-PROD6945. The second run produces this:
//a[#id!='quickview-link-PROD7029' and a[#id!='quickview-link-PROD6945' and img[#alt='Quick Shop']]]/#id
Which throws an error in my SeleniumRC terminal window of ERROR: Element [..xpath query..] not found on session.
I am aware of the possible use of indexes (i.e. adding an [i] to the end of the XPath query) to access elements on the page, however this isn't something that has worked for me in Selenium.
Any help would be great. Thanks for your time,
Steve
//a[#id!='quickview-link-PROD7029'
and a[#id!='quickview-link-PROD6945' and
img[#alt='Quick Shop']
]
]/#id
Which throws an error in my SeleniumRC
terminal window of ERROR: Element
[..xpath query..] not found on session
It would greatly help if you provide the XML document on which the XPath expression is applied and explain which node(s) you want to select.
Without this necessary information:
The most obvious reason for this problem is that the above expression is looking for a elements that have an a child with some property.
Usually an a element doesn't have any a children.
What you really want is something like:
//a[#id != 'quickview-link-PROD7029'
and
#id != 'quickview-link-PROD6945' and img[#alt='Quick Shop']
]/#id
This can be simplified a bit:
//a[img[#alt='Quick Shop']/#id
[not(. = 'quickview-link-PROD7029'
or
. = 'quickview-link-PROD6945'
)
]

Zend_Search_Luncene handle Querys

iam trying to implement an Searchmachine into my site. Iam using Zend_Search_Lucene for this.
The index is created like this :
public function create($config, $create = true)
{
$this->_config = $config;
// create a new index
if ($create) {
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()
);
$this->_index = Zend_Search_Lucene::create(APPLICATION_PATH . $this->_config->index->path);
} else {
$this->_index = Zend_Search_Lucene::open(APPLICATION_PATH . $this->_config->index->path);
}
}
{
public function addToIndex($data)
$i = 0;
foreach ($data as $val) {
$scriptObj = new Sl_Model_Script();
$scriptObj->title = $val['title'];
$scriptObj->description = $val['description'];
$scriptObj->link = $val['link'];
$scriptObj->tutorials = $val['tutorials'];
$scriptObj->screenshot = $val['screenshot'];
$scriptObj->download = $val['download'];
$scriptObj->tags = $val['tags'];
$scriptObj->version = $val['version'];
$this->_dao->add($scriptObj);
$i++;
}
return $i;
}
/**
* Add to Index
*
* #param Sl_Interface_Model $scriptObj
*/
public function add(Sl_Interface_Model $scriptObj)
{
// UTF-8 for INDEX
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::text('title', $scriptObj->title, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('tags', $scriptObj->tags, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('version', $scriptObj->version, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('download', $scriptObj->download, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('link', $scriptObj->link));
$doc->addField(Zend_Search_Lucene_Field::text('description', $scriptObj->description, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('tutorials', $scriptObj->tutorials, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('screenshot', $scriptObj->screenshot));
$this->_index->addDocument($doc);
}
But when i try to query the index with :
$index->find('Wordpress 2.8.1' . '*');
im getting the following error :
"non-wildcard characters are required at the beginning of pattern."
any ideas how to query for a string like mine ? an query for "wordpress" works like excepted.
Lucene cannot handle leading wildcards, only trailing ones. That is, it does not support queries like 'tell me everyone whose name ends with 'att'' which would be something like
first_name: *att
It only supports trailing wildcards. Tell me everyone whose names end that start with 'ma'
first_name: ma*
See this Lucene FAQ entry:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695
There IS a workaround for Lucene 2.1 but the developers say it can be "expensive".