Yii Lucene Encoding - zend-framework

Can't find a solution here or anywhere else, so I'm asking another question about Zend Lucene. Everyone tells about some encoding of Lucene. Where should I switch this encoding?
When I use search (PL language) I'm getting
oprĂłcz wystÄ…pi reprezentacja Rosji. Mistrzowie
olimpijscy z Londynu powalczÄ…
This Ăł should be "ó" in Polish, Ä… (umlaut?) is "ą" and so on...
It works great with English of course.
Again searchController.php (actions create + search):
public function actionCreate()
{
$_indexFiles = 'runtime.search';
$index = Zend_Search_Lucene::create($_indexFiles);
$index = new Zend_Search_Lucene(Yii::getPathOfAlias('application.' . $this->_indexFiles), true);
$posts = News::model()->with('comment')->findAll();
foreach($posts as $news){
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('title',CHtml::encode($news->name), 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('link',CHtml::encode($news->url), 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('content',CHtml::encode($news->description), ' utf-8 '));
$index->addDocument($doc);
}
setlocale(LC_CTYPE, 'pl_PL.utf-8');
$index->commit();
echo 'Lucene index created';
}
public function actionSearch()
{
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ());
$this->layout='column2';
if (($term = Yii::app()->getRequest()->getParam('q', null)) !== null) {
$index = new Zend_Search_Lucene(Yii::getPathOfAlias('application.' . $this->_indexFiles));
$results = $index->find($term);
$query = Zend_Search_Lucene_Search_QueryParser::parse($term);
$this->render('search', compact('results', 'term', 'query'));
}
}

Wellcome to Zend_Lucene, when you get tired of it you can start using a native search engine like Solr, or Sphinx
"Learn from the mistakes of others. You can't live long enough to make them all yourself."

Related

How correctly to use the snippet() function?

My first Sphinx app almost works!
I successfully save path,title,content as attributes in index!
But I decided go to SphinxQL PDO from AP:
I found snippets() example thanks to barryhunter again but don't see how use it.
This is my working code, except snippets():
$conn = new PDO('mysql:host=ununtu;port=9306;charset=utf8', '', '');
if(isset($_GET['query']) and strlen($_GET['query']) > 1)
{
$query = $_GET['query'];
$sql= "SELECT * FROM `test1` WHERE MATCH('$query')";
foreach ($conn->query($sql) as $info) {
//snippet. don't works
$docs = array();
foreach () {
$docs[] = "'".mysql_real_escape_string(strip_tags($info['content']))."'";
}
$result = mysql_query("CALL SNIPPETS((".implode(',',$docs)."),'test1','" . mysql_real_escape_string($query) . "')",$conn);
$reply = array();
while ($row = mysql_fetch_array($result,MYSQL_ASSOC)) {
$reply[] = $row['snippet'];
}
// path, title out. works
$path = rawurlencode($info["path"]); $title = $info["title"];
$output = '<a href=' . $path . '>' . $title . '</a>'; $output = str_replace('%2F', '/', $output);
print( $output . "<br><br>");
}
}
I have got such structure from Sphinx index:
Array
(
[0] => Array
(
[id] => 244
[path] => DOC7000/zdorovie1.doc
[title] => zdorovie1.doc
[content] => Stuff content
I little bit confused with array of docs.
Also I don't see advice: "So its should be MUCH more efficient, to compile the documents and call buildExcepts just once.
But even more interesting, is as you sourcing the the text from a sphinx attribute, can use the SNIPPETS() sphinx function (in setSelect()!) in the main query. SO you dont have to receive the full text, just to send back to sphinx. ie sphinx will fetch the text from attribute internally. even more efficient!
"
Tell me please how I should change code for calling snippet() once for docs array, but output path (link), title for every doc.
Well because your data comes from sphinx, you can just use the SNIPPET() function (not CALL SNIPPETS()!)
$query = $conn->quote($_GET['query']);
$sql= "SELECT *,SNIPPET(content,$query) AS `snippet` FROM `test1` WHERE MATCH($query)";
foreach ($conn->query($sql) as $info) {
$path = rawurlencode($info["path"]); $title = $info["title"];
$output = '<a href=' . $path . '>' . $title . '</a>'; $output = str_replace('%2F', '/', $output);
print("$output<br>{$info['snippet']}<br><br>");
}
the highlighted text is right there in the main query, dont need to mess around with bundling the data back up to send to sphinx.
Also shows you should be escaping the raw query from user.
(the example you found does that, because the full text comes fom MySQL - not sphinx - so it has no option but to mess around sending data back and forth!)
Just for completeness, if REALLY want to use CALL SNIPPETS() would be something like
<?php
$query =$conn->quote($_GET['query']);
//make query request
$sql= "SELECT * FROM `test1` WHERE MATCH($query)";
$result = $conn->query($sql);
$rows = $result->fetchAll(PDO::FETCH_ASSOC);
//build list of docs to send
$docs = array();
foreach ($rows as $info) {
$docs[] = $conn->quote(strip_tags($info['content']));
}
//make snippet reqest
$sql = "CALL SNIPPETS((".implode(',',$docs)."),'test1',$query)";
//decode reply
$reply = array();
foreach ($conn->query($sql) as $row) {
$reply[] = $row['snippet'];
}
//output results using $rows, and cross referencing with $reply
foreach ($rows as $idx => $info) {
// path, title out. works
$path = rawurlencode($info["path"]); $title = $info["title"];
$output = '<a href=' . $path . '>' . $title . '</a>'; $output = str_replace('%2F', '/', $output);
$snippet = $reply[$idx];
print("$output<br>$snippet<br><br>");
}
Shows putting the rows into an array, because need to lopp though the data TWICE. Once to 'bundle' up the docs array to send. Then again to acully display rules, when have $rows AND $reply both available.

Joomla: get all users in a usergroup

I am working on a component where I want to show all users of a specific usergroup. Right now I found two solutions for this but I'm not feeling comfortable with both of them.
Solution 1
$usersID = JAccess::getUsersByGroup(3);
$users = array();
foreach($usersID as $cUserID)
{
$users[] = JFactory::getUser($cUserID);
}
This one seems to produce two database queries every time JFactory::getUser($cUserID) is called. I really don't want this.
Solution 2
function inside model
function getUsers()
{
if(!isset($this->users))
{
$groupID = 3;
$db = JFactory::getDbo();
$query = $db->getQuery(true);
$select = array( 'users.id', 'users.name');
$where = $db->quoteName('map.group_id') . ' = ' . $groupID;
$query
->select($select)
->from( $db->quoteName('#__user_usergroup_map', 'map') )
->leftJoin( $db->quoteName('#__users', 'users') . ' ON (map.user_id = users.id)' )
->where($where);
$db->setQuery($query);
$this->users = $db->loadObjectList();
}
return $this->users;
}
This one works like a charm but I feel there should be a "more Joomla! way" of doing this. I don't like working on their tables.
Right now I'm going with solution 2 but i really wonder if there is some better way to do it.

How to prevent SQL injection in PhalconPHP when using sql in model?

Let's say I am building a search that finds all the teacher and got an input where the user can put in the search term. I tried reading the phalcon documentation but I only see things like binding parameters. I read the other thread about needing prepare statements do I need that in Phalcon as well?
And my function in the model would be something like this:
public function findTeachers($q, $userId, $isUser, $page, $limit, $sort)
{
$sql = 'SELECT id FROM tags WHERE name LIKE "%' . $q . '%"';
$result = new Resultset(null, $this,
$this->getReadConnection()->query($sql, array()));
$tagResult = $result->toArray();
$tagList = array();
foreach ($tagResult as $key => $value) {
$tagList[] = $value['id'];
....
}
}
My question is for the Phalcon framework is there any settings or formats I should code for this line $sql = 'SELECT id FROM tags WHERE name LIKE "%' . $q . '%"';
And also any general recommendation for preventing SQL Injection in PhalconPHP controllers and index would be appreciated.
For reference:
My controller:
public function searchAction()
{
$this->view->disable();
$q = $this->request->get("q");
$sort = $this->request->get("sort");
$searchUserModel = new SearchUsers();
$loginUser = $this->component->user->getSessionUser();
if (!$loginUser) {
$loginUser = new stdClass;
$loginUser->id = '';
}
$page = $this->request->get("page");
$limit = 2;
if (!$page){
$page = 1;
}
$list = $searchUserModel->findTeachers($q, $loginUser->id, ($loginUser->id)?true:false, $page, $limit, $sort);
if ($list){
$list['status'] = true;
}
echo json_encode($list);
}
My Ajax:
function(cb){
$.ajax({
url: '/search/search?q=' + mapObject.q + '&sort=<?php echo $sort;?>' + '&page=' + mapObject.page,
data:{},
success: function(res) {
//console.log(res);
var result = JSON.parse(res);
if (!result.status){
return cb(null, result.list);
}else{
return cb(null, []);
}
},
error: function(xhr, ajaxOptions, thrownError) {
cb(null, []);
}
});
with q being the user's search term.
You should bind the query parameter to avoid an SQL injection. From what I can remember Phalcon can be a bit funny with putting the '%' wildcard in the conditions value so I put them in the bind.
This would be better than just filtering the query.
$tags = Tags::find([
'conditions' => 'name LIKE :name:',
'bind' => [
'name' => "%" . $q . "%"
]
])
Phalcon\Filter is helpful when interacting with the database.
In your controller you can say, remove everything except letters and numbers from $q.
$q = $this->request->get("q");
$q = $this->filter->sanitize($q, 'alphanum');
The shortest way for requests:
$q = $this->request->get('q', 'alphanum');

Lucene Search doesn't find keyword indexed fields

i save my fieldes with this code:
class Places_Search_Document extends Zend_Search_Lucene_Document{
public function __construct($class, $key, $title,$contents, $summary, $createdBy, $dateCreated)
{
$this->addField(Zend_Search_Lucene_Field::Keyword('docRef', "$class:$key"));
$this->addField(Zend_Search_Lucene_Field::UnIndexed('class', $class));
$this->addField(Zend_Search_Lucene_Field::UnIndexed('key', $key));
$this->addField(Zend_Search_Lucene_Field::Keyword('title', $title ,'UTF-8'));
$this->addField(Zend_Search_Lucene_Field::unStored('contents', $contents , 'UTF-8'));
$this->addField(Zend_Search_Lucene_Field::text('summary', $summary , 'UTF-8'));
//$this->addField(Zend_Search_Lucene_Field::UnIndexed('createdBy', $createdBy));
$this->addField(Zend_Search_Lucene_Field::Keyword('dateCreated', $dateCreated));
}
}
i search the word with this code:
$index = Places_Search_Lucene::open(SearchIndexer::getIndexDirectory());
$term = new Zend_Search_Lucene_Index_Term($q);
$query = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$results = $index->find($query);
now it work perfect for unsorted and text fields , but it doesn`t search for keyword !!
Are you sure you really want those fields to be keyword analyzed? The keyword analyzer puts the whole text of the field as one token, which you rarely want.

Zend_Search_Luncene handle Querys

iam trying to implement an Searchmachine into my site. Iam using Zend_Search_Lucene for this.
The index is created like this :
public function create($config, $create = true)
{
$this->_config = $config;
// create a new index
if ($create) {
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()
);
$this->_index = Zend_Search_Lucene::create(APPLICATION_PATH . $this->_config->index->path);
} else {
$this->_index = Zend_Search_Lucene::open(APPLICATION_PATH . $this->_config->index->path);
}
}
{
public function addToIndex($data)
$i = 0;
foreach ($data as $val) {
$scriptObj = new Sl_Model_Script();
$scriptObj->title = $val['title'];
$scriptObj->description = $val['description'];
$scriptObj->link = $val['link'];
$scriptObj->tutorials = $val['tutorials'];
$scriptObj->screenshot = $val['screenshot'];
$scriptObj->download = $val['download'];
$scriptObj->tags = $val['tags'];
$scriptObj->version = $val['version'];
$this->_dao->add($scriptObj);
$i++;
}
return $i;
}
/**
* Add to Index
*
* #param Sl_Interface_Model $scriptObj
*/
public function add(Sl_Interface_Model $scriptObj)
{
// UTF-8 for INDEX
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::text('title', $scriptObj->title, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('tags', $scriptObj->tags, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('version', $scriptObj->version, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('download', $scriptObj->download, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('link', $scriptObj->link));
$doc->addField(Zend_Search_Lucene_Field::text('description', $scriptObj->description, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('tutorials', $scriptObj->tutorials, 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::text('screenshot', $scriptObj->screenshot));
$this->_index->addDocument($doc);
}
But when i try to query the index with :
$index->find('Wordpress 2.8.1' . '*');
im getting the following error :
"non-wildcard characters are required at the beginning of pattern."
any ideas how to query for a string like mine ? an query for "wordpress" works like excepted.
Lucene cannot handle leading wildcards, only trailing ones. That is, it does not support queries like 'tell me everyone whose name ends with 'att'' which would be something like
first_name: *att
It only supports trailing wildcards. Tell me everyone whose names end that start with 'ma'
first_name: ma*
See this Lucene FAQ entry:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695
There IS a workaround for Lucene 2.1 but the developers say it can be "expensive".