Filter cloud watch log insights based on the availability of two or strings in the log and make it case insensitive - aws-cloudwatch-log-insights

How to check if two or more predefined words are available in the log case insensitively and filter out those logs only. I am currently using
FILTER log like /(?i)(alter table)/.
This matches only if the words are in same order as the filter, but it is case insensitive.
Is there a way to do something like
FILTER any(['alter', 'create', 'drop if exists']) in log
along with case insensitivity while filtering.

you can use or and and statements in filter
| filter #message ~="alter" or #message ~= "create"
you can also use regex here as well
fields #timestamp, message, id, some_other_var
| filter !isempty(some_other_var) and id like /\w*D\b/ # Ends in D

Related

Postgresql Query - Return all matching search terms for each result row when using an ANY query and LIKE

Essentially what I'm trying to figure out is if there is a way to return all matching search terms in addition to the matched row when running a query that looks up a list of items using ANY or IN. In most cases the search term will exactly match the returned column value but in cases such as text search or with certain extensions like IP4r this is not always the case. In addition, you can have multiple search terms match on a single row.
To make this concrete suppose this is my query:
SELECT id, item_name, description FROM items WHERE description LIKE ANY('{%gaming%, %computer%, %socks%, %men%}');
and it returns the following two rows:
id, item_name, description
1, 'computer', 'super fast gaming computer that will help you win'
5, 'socks', 'These socks are sure to please the men in your family'
What I'd like to know is which original search terms map to the result row that was returned. In other words, I'd like the returned rows to look like this:
id, search_terms, item_name, description
1, '{%gaming%, %computer%}', 'computer', 'super fast gaming computer that will help you win'
5, '{%socks%, %men%}', 'socks', 'These socks are sure to please the men in your family'
Is there a way to efficiently do this in PostgreSQL? In the example above we're using LIKE with strings but in my real-world scenario I'm using the IP4r extension to do IP lookups against CIDR ranges where you can have multiple IP addresses in the same returned CIDR range.
I previously asked this question: PostgreSQL 9.5: Return matching search terms in each result row when using LIKE which used a CASE statement to almost solve the problem I'm describing here.
The added complexity in the scenario above is that you can have multiple search terms match a single row (e.f., gaming and computer are both matches for the description super fast gaming computer that will help you win). If you use a CASE statement then only the first match in the CASE statement gets set as the search term and you miss any other matching search terms.
Thank you for your help!
This would be a way using VALUES:
SELECT i.id, i.item_name, i.description, m.pat
FROM items AS i
JOIN (VALUES ('%gaming%'), ('%computer%'), ('%socks%'), ('%men%')) AS m(pat)
ON i.description LIKE m.pat;

Is it possible to Group By a field extracted from JSON input in a Siddhi Query?

I currently have an stream, with 1 string attribute that contains a Json event.
This stream receives different events, which I want to apply Json path expressions so I can use those attributes on filters and functions.
JsonPath extractors work like a charm on filters and selectors, unfortunately, I am not being able to use them for the 'Group By' part.
I am actually doing it in an embedded Siddhi App with siddhi-execution-json extension added manually, but for the discussion, so everybody can
easily check and test it, I will paste an example app that works on WSO2 Stream Processor.
The objective looks like the following App:
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='query1')
from JsonStream#window.time(10 sec)
select json:getString(json, '$.myField') as myField, count() as count
group by myField having count > 1
insert into LogStream;
and it can accept the following events:
{"myField": "my_value"}
However, this query will raise the error:
Cannot find attribute type as 'myField' does not exist in 'JsonStream'; define stream JsonStream(json string)
I have also tried to use directly the Json extractor at 'Group by':
group by json:getString(json, '$.myField') as myField having count > 1
However the error now is:
mismatched input ':' expecting {',', ORDER, LIMIT, OFFSET, HAVING, INSERT, DELETE, UPDATE, RETURN, OUTPUT}
which seems to not be expecting to use an extension here
I am just wondering, if it is possible to group by attributes not directly defined in the input stream. In this case is a field extracted from a JSON object, but it could be any other function that generates another attribute.
I am also using versions from maven central repository
Siddhi: io.siddhi:siddhi-core:5.0.1
siddhi-execution-json: io.siddhi.extension.execution.json:siddhi-execution-json:2.0.1
(Edit) Clarification
The objective is, to use attributes not directly defined in the Stream, to be used on the Group By.
The reason why is, I currently have an embedded app which defines the whole set of input streams coming from external sources formatted as JSON, and there are also a set of output streams to inform external components when a query matches.
This app allows users to create custom queries on this set of predefined Streams, but they are not able to create Streams by their own.
Many thanks!
It seems we are expecting the group by fields from the query input stream, in this case, JsonStream. Use another query before this for extraction and the aggregation and filtering in the following query,
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='extract_stream')
from JsonStream
select json:getString(json, '$.myField') as myField
insert into ExtractedStream;
#info(name='query1')
from ExtractedStream#window.time(10 sec)
select myField, count() as count
group by myField
having count > 1
insert into LogStream;

why is this postgresql full text search query returning ts_rank of 0?

Before I invest in using solr or lucene or sphinx, I wanted to try to implement a search capability on my system using postgresql full text search.
I have a national list of businesses in a table that I want to search. I created a ts vector that combines the business name and city so that I can do a search like "outback atlanta".
I am also implementing an auto-completion function by using the wildcard capability of the search by appending ":" to the search pattern and inserting " & " between keywords, so the search pattern "outback atl" turns into the "outback & atl:" before getting converted into a query using to_tsquery().
Here's the problem that I am running into currently.
if the search pattern is entered as "ou", many "Outback Steakhouse" records are returned.
if the search pattern is entered as "out", no results are returned.
if the search pattern is entered as "outb", many "Outback Steakhouse" records are returned.
doing a little debugging, I came up with this:
select ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('ou:*')) as "ou",
ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('out:*')) as "out",
ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('outb:*')) as "outb"
which results this:
ou out outb
0.0607927 0 0.0607927
What am I doing wrong?
Is this a limitation of pg full text search?
Is there something that I can do with my dictionary or configuration to get around this anomaly?
UPDATE:
I think that "out" may be a stop word.
when I run this debug query, I don't get any lexemes for "out"
SELECT * FROM ts_debug('english','out back outback');
alias description token dictionaries dictionary lexemes
asciiword Word all ASCII out {english_stem} english_stem {}
blank Space symbols {}
asciiword Word all ASCII back {english_stem} english_stem {back}
blank Space symbols {}
asciiword Word all ASCII outback {english_stem} english_stem {outback}
So now I ask how do I modify the stop word list to remove a word?
UPDATE:
here is the query that I currently using:
select id,name,address,city,state,likes
from view_business_favorite_count
where textsearchable_index_col ## to_tsquery('simple',$1)
ORDER BY ts_rank(textsearchable_index_col, to_tsquery('simple',$1)) DESC
When I execute the query (I'm using Strongloop Loopback + Express + Node), I pass the pattern in to replace $1 param. The pattern (as stated above) will look something like "keyword:" or "keyword1 & keyword2 & ... & keywordN:"
thanks
The problem here is that you are searching against business names and as #Daniel correctly pointed out - 'english' dictionary will not help you to find "fuzzy" match for NON-dictionary words like "Outback Steakhouse" etc;
'simple' dictionary
'simple' dictionary on its own will not help you neither, in your case business names will work only for exact match as all words are unstemmed.
'simple' dictionary + pg_trgm
But if you use 'simple' dictionary together with pg_trgm module - it will be exactly what you need, in particular:
for to_tsvector('simple','<business name>') you don't need to worry about stop words "hack", you will get all the lexemes unstemmed;
using similarity() from pg_trgm you will get the the highest "rank"
for the best match,
look at this:
WITH pg_trgm_test(business_name,search_pattern) AS ( VALUES
('Outback Steakhouse','ou'),
('Outback Steakhouse','out'),
('Outback Steakhouse','outb')
)
SELECT business_name,search_pattern,similarity(business_name,search_pattern)
FROM pg_trgm_test;
result:
business_name | search_pattern | similarity
--------------------+----------------+------------
Outback Steakhouse | ou | 0.1
Outback Steakhouse | out | 0.15
Outback Steakhouse | outb | 0.2
(3 rows)
Ordering by similarity DESC you will be able to get what you need.
UPDATE
For you situation there are 2 possible options.
Option #1.
Just create trgm index for name column in view_business_favorite_count table; index definition may be the following:
CREATE INDEX name_trgm_idx ON view_business_favorite_count USING gin (name gin_trgm_ops);
Query will look something like that:
SELECT
id,
name,
address,
city,
state,
likes,
similarity(name,$1) AS trgm_rank -- similarity score
FROM
view_business_favorite_count
WHERE
name % $1 -- trgm search
ORDER BY trgm_rank DESC;
Option #2.
With full text search, you need to :
create a separate table, for example unnested_business_names, where you will store 2 columns: 1st column will keep all lexemes from to_tsvector('simple',name) function, 2nd column will have vbfc_id(FK for id from view_business_favorite_count table);
add trgm index for the column, which contains lexemes;
add trigger for unnested_business_names, which will update OR insert OR delete new values from view_business_favorite_count to keep all words up to date

How to filter out data with special character in vectorwise

There is a column named Prod_code in my product table.
I need to select only valid prod_code from product table and load it into another table.
valid prod_code are the codes which don't have any special characters in it.
VALID prod_code: WER1234, ASD1345
INVALID prod_code: ABC$123,LPS????,$$$ (which I need to check and filter out).
How can I achieve this?
Src list of prod code
WER1234
ASD1345
ABC$123
LPS????
$$$
target list of prod code
WER1234
ASD1345
This is pretty hideous but it does the job. It uses the LIKE predicate to reject unacceptable strings. You can probably also use the SIMILAR TO predicate to specify acceptable strings but I would guess that LIKE is faster.
select prod_code from products
where prod_code not like '%\['+x'00'+'-'+x'2f'+','+
x'3a'+'-'+x'40'+','+
x'5d'+'-'+x'7f'+'\]%' escape '\'
and prod_code not like '%\%'
and prod_code not like '%[%'

Filtering with P6SPY

Is there a way to set the filter in p6spy, such that it only logs "insert/delete/update" and NOT "select" SQL statements?
Documentation of p6spy mentions:
"P6Spy allows you to monitor specific tables or specific statement types"
An example they gave was the following:
An example showing capture of all
select statements, except the orders
table follows:
filter = true
# comma separated list of tables to include
include = select
# comma separated list of tables to exclude
exclude = orders
So I thought, there must be a way to include insert, delete, updates and exclude select... hence, I prepared my properties file like so:
filter = true
# comma separated list of tables to include
include = insert,update,delete
# comma separated list of tables to exclude
exclude = select
but that does not seem to work. Anyone with any suggestions??
The key to the answer is in the comments
# comma separated list of tables to include
include = select
select is a name of a table, not the type of a statement.
It seems impossible to filter by statement types (at least by select/update/delete) easily. You'll be able to do it by using
# sql expression to evaluate if using regex filtering
sqlexpression=
#allows you to use a regex engine or your own matching engine to determine
#which statements to log
stringmatcher=com.p6spy.engine.common.GnuRegexMatcher
The definition has changed. In short, you can use what the OP used i.e:
filter=true
# comma separated list of *******STRINGS******* to include
include=insert,update,delete
# comma separated list of *******STRINGS******* to exclude
exclude=select
Or if you want to have more control, you can use sqlexpression like so:
filter=true
sqlexpression=^(.*(from\\scustomers).*)$