pg_search_scope: chaining scopes seems impossible - pg-search

I have a search form for searching "documents", that have a small dozen of search criterions, including "entire_text", "keywords" and "description".
I'm using pg_search_scope, but I have 2 different scopes.
This is in my document.rb:
pg_search_scope :search_entire_text,
:against => :entire_text,
:using => {
:tsearch => {
:prefix => true,
:dictionary => "french"
}
}
pg_search_scope :search_keywords,
:associated_against => {
:keywords => [:keyword]
},
:using => {
:tsearch => {
:any_word => true
}
}
Each separately works fine. But I can't do this:
#resultats = Document.search_keywords(params[:ch_document][:keywords]).search_entire_text(params[:ch_document][:entire_text])
Is there any way to work around this?
Thanks

I've never used pg_search_scope but it looks like you indeed can't combine two pg_search_scope's.
What you could do is use :search_entire_text with a pg_search_scope and use the resulting id's in a Document.where([1,2,3]) that way you can use standard rails scope's for the remaining keyword searches.
Example:
# If pluck doesn't work you can also use map(&:id)
txt_res_ids = Document.search_entire_text(params[:ch_document][:entire_text]).pluck(:id)
final_results = Document.where(id: txt_res_ids).some_keyword_scope.all

It works. Here's the entire code ... if ever this could help someone :
Acte.rb (I didn't translate to english, the explanations are commented to correspond to the question above)
pg_search_scope :cherche_texte_complet, #i.e. find entire text
:against => :texte_complet,
:using => {
:tsearch => {
:prefix => true,
:dictionary => "french"
}
}
pg_search_scope :cherche_mots_clefs, #find keywords
:associated_against => {
:motclefs => [:motcle]
},
:using => {
:tsearch => {
:any_word => true
}
}
def self.cherche_date(debut, fin) #find date between
where("acte_date BETWEEN :date_debut AND :date_fin", {date_debut: debut, date_fin: fin})
end
def self.cherche_mots(mots)
if mots.present? #the if ... else is necessary, see controller.rb
cherche_mots_clefs(mots)
else
order("id DESC")
end
end
def self.ids_texte_compl(ids)
if ids.any?
where("id = any (array #{ids})")
else
where("id IS NOT NULL")
end
end
and actes_controller.rb
ids = Acte.cherche_texte_complet(params[:ch_acte][:texte_complet]).pluck(:id)
#resultats = Acte.cherche_date(params[:ch_acte][:date_debut],params[:ch_acte][:date_fin])
.ids_texte_compl(ids)
.cherche_mots(params[:ch_acte][:mots])
Thanks !

chaining works in pg_search 2.3.2 at least
SomeModel.pg_search_based_scope("abc").pg_search_based_scope("xyz")

Related

Elasticsearch searching with perl client

I'm attempting to do something that should be simple but I cannot get it to work. I've looked and search all over to find detailed doc for perl search::elsticsearch. I can only find CPAN doc and as far as search is concerned it is barely mentioned. I've search here and cannot find a duplicate question.
I have elasticsearch and filebeat. Filebeat is sending syslog to elasticsearch. I just want to search for messages with matching text and date range. I can find the messages but when I try to add date range the query fails. Here is the query from kibana dev tools.
GET _search
{
"query": {
"bool": {
"filter": [
{ "term": { "message": "metrics" }},
{ "range": { "timestamp": { "gte": "now-15m" }}}
]
}
}
}
I don't get exactly what I'm looking for but there isn't an error.
Here is my attempt with perl
my $results=$e->search(
body => {
query => {
bool => {
filter => {
term => { message => 'metrics' },
range => { timestamp => { 'gte' => 'now-15m' }}
}
}
}
}
);
This is the error.
[Request] ** [http://x.x.x.x:9200]-[400]
[parsing_exception]
[range] malformed query, expected [END_OBJECT] but found [FIELD_NAME],
with: {"col":69,"line":1}, called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__
at ./elasticsearchTest.pl line 15.
With vars: {'body' => {'status' => 400,'error' => {
'root_cause' => [{'col' => 69,'reason' => '[range]
malformed query, expected [END_OBJECT] but found [FIELD_NAME]',
'type' => 'parsing_exception','line' => 1}],'col' => 69,
'reason' => '[range] malformed query, expected [END_OBJECT] but found [FIELD_NAME]',
'type' => 'parsing_exception','line' => 1}},'request' => {'serialize' => 'std',
'path' => '/_search','ignore' => [],'mime_type' => 'application/json',
'body' => {
'query' => {
'bool' =>
{'filter' => {'range' => {'timestamp' => {'gte' => 'now-15m'}},
'term' => {'message' => 'metrics'}}}}},
'qs' => {},'method' => 'GET'},'status_code' => 400}
Can someone help me figure out how to search with the search::elasticsearch perl module?
Multiple filter clauses must be passed as separate JSON objects within an array (like in your initial JSON query), not multiple filters in the same JSON object. This maps to how you must create the Perl data structure.
filter => [
{term => { message => 'metrics' }},
{range => { timestamp => { 'gte' => 'now-15m' }}}
]

Add a field if match

i'm triyng to monitor an irc server. And i'm loot for a way to create a new numeral field (example: Alert_level) only if a message match a specific word inside.
Example: Message: ABC | Alert_level: 1 ; Message: ZYX | Alert_level: 3.
Its the running code
input {
irc {
channels => "#xyz"
host => "a.b.c"
nick => "myusername"
catch_all => true
get_stats => true
}
}
output {
stdout { codec => "rubydebug" }
elasticsearch {
hosts => "localhost"
index => "logstash-irc-%{+YYYY.MM.dd}"
}
}
Thank you!
As #Val suggested above you might need to use the grok filter in order match something from the input. For example your filter could look something like this:
filter {
grok {
match => { "message" => "%{GREEDYDATA:somedata}" }
}
if "ZYX" in [message]{ <-- change your condition accordingly
mutate {
add_field => { "%{Alert_level}" => "12345" } <-- somefield is the field name
convert => { "Alert_level" => "integer" } <-- do the conversion
}
}
}
NOTE that you have to do the conversion in order to create a numeric field through logstash, where you can't directly create one. The above is just a sample so that you can reproduce. Do change the grok match in respect to your requirement. Hope it helps!

need help in html::tagFilter

I wrote a filter like this in perl
my $tf = HTML::TagFilter->new(
allow => {
img => { src => [] },
b => { all => [] },
i => { all => [] },
em => { all => [] },
u => { all => [] },
s => { all => [] }
}
);
$message_body = $tf->filter($message_body);
now what I needed from this filter to do is allowing the given tags, and for img to allow the src attribute. The code gives great results except for tag like this <img src="cid:img.png" alt="Smiley face"> it just return <img> instead of <img src="sid:imp.png"> which is what I want, does any one here knows why?!
The reason your src attribute isn't being passed through is because of the module's cross-site scripting protection. The value cid:img.png is rejected as an invalid URL, and so the attribute is removed.
The tidiest way to get around this is to extend the list of valid protocols to include cid, like this:
my #protocols = $tf->xss_permitted_protocols;
push #protocols, 'cid';
$tf->xss_permitted_protocols(#protocols);
$message_body = $tf->filter($message_body);
If you set log_rejects => 1 when you create the HTML::TagFilter object then you can examine the values returned by $tf->report to see the module's reasons for rejecting each component of the HTML.
You need to set skip_xss_protection to 1:
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TagFilter;
my $tf = HTML::TagFilter->new(
allow => {
img => {src => []},
b => { all => [] },
i => { all => [] },
em => { all => [] },
u => { all => [] },
s => { all => [] }
},
skip_xss_protection => 1,
);
my $html = qq{<img src="cid:img.png" alt="Smiley face">};
$html = $tf->filter($html);
print $html;
prints:
<img src="cid:img.png">

Build and Access a Complex Data Structure Using Perl

I have a large one-dimensional hash with lots of data that I need to structure in such a way that I can sort it easily into a format that is the same each time the code executes.
Original Hash Data:
{
'datetime' => 'datetime value',
'param_name' => 'param name',
'param_value' => 'param value',
'category' => 'category name'
}
Current Data Structure:
{
'datetime value' => {
'category' => {
'param_name' = > 'param name',
'param_value' => 'param value
}
}
}
I can almost build this structure in code, except for every category, there could be multiple param_names and param_values with the same key name.
The problem I have is that if there are multiple param names/values, only the last pair are saved in the new data structure.
I know that keys have to be unique, so I'm not quite sure how to resolve this as of yet.
Once the structure is built, I then need to understand how to sort the data based on datetime, then param_name so that the order is always the same in the output.
Looking at the difference between your first and second example, I think you have your structure a bit off. I think this matches more of what you want:
{
DATE => date_time_value,
PARAMETERS => {
param_name1 => parameter_value1,
param_name2 => parameter_value2
}
}
This way, the structure with data may look like this:
{
DATE_TIME => "10/31/2031 12:00am",
PARAMETERS => {
COLOR => "red",
SIZE => "Really big",
NAME => "Herman",
}
}
Usually, you think of objects having fields which contain values. Think of a row of a SQL table or a spreadsheet. You have columns with headings, and rows that contain the value.
Let's take an employee. They have a name, age, job, and a phone number:
{
NAME => "Bob Smith",
AGE => "None of your business",
JOB => "Making your life miserable",
PHONE => "555-1212"
}
Unlike a table, each entry could contain other structure. For example, people usually have more than one phone number, and we might want to store the last name separate from the first name:
{
NAME => {
FIRST => "Bob",
LAST => "Smith"
}
AGE => "None of your business",
JOB => "Making your life miserable"
PHONE => {
CELL => "555.1234",
WORK => "555.1212"
}
}
Then we have the people who have multiple phones of the same time. For example, Bob has two cell phones. In this case, we'll make each phone type field an array of values:
{
NAME => {
FIRST => "Bob",
LAST => "Smith",
}
AGE => "None of your business",
JOB => "Making your life miserable"
PHONE => {
CELL => ["555.1234", "555.4321"]
WORK => ["555.1212"]
}
}
And to initialize it:
my $person = {};
$person->{NAME}->{FIRST} = "Bob";
$person->{NAME}->{LAST} = "Smith";
$person->{AGE} = "None of your business";
$person->{JOB} = "Making your life miserable";
$person->{PHONE}->{CELL}->[0] = "555.1234";
$person->{PHONE}->{CELL}->[1] = "555.4321";
$person->{PHONE}->{WORK}->[0] = ""555.1212";
I think it seems appropriate to have a params hash where the keys are all of the names and the values are the actual values. It seems like that is what you want.
my %hash = {
'datetime value' => {
'category' => {
'params' => {
'param-name1' => 'param-value1',
'param-name2' => 'param-value2',
'param-name3' => 'param-value3',
etc..
}
}
}
}
After this restructuring it should be pretty easy to sort based on whatever you would like.
alphabetically by key:
my #alphabetic_keys = sort { $hash{$a} cmp $hash{$b} } keys %{ $hash{params} };
length by key:
my #by_length_keys = sort { length($a) <=> length($b) } keys %{ $hash{params} };
Assuming category names are unique, I would suggest the following data structure:
{
'datetime value 1' => {
'category name 1' => {
'param name 1' = > [param value1, param value2, ...],
'param name 2' = > [param value3, param value4, ...],
etc...
},
'category name 2' => {
'param...' => [ value... ]
},
'datetime value 2' => {
etc...
}
}

OR operator in Mongo 'update' $criteria

I want to check for an existing record by matching either title or url fields. If either one matches, update that record. Otherwise, insert.
How do write the following properly (using Mongoid in Ruby):
articles.update(
{ **:title => story.title OR :url => story.url** },
{ :title => story.title, :url => story.url, :source => story.source, :last_updated => Time.now },
{ :upsert => true } )
Thanks!
You need do the request and update it like :
'''
articles.any_of({:title => xxx}, {:url => yyyy}).update( :foo => 'bar')
'''