How to filter a field using a regex containing characters and digits in AWS Cloudwatch? - aws-cloudwatch-log-insights

Assuming the log stream contains a message like hello something 1234
The following cloudwatch insights query doesn't return any results.
fields #timestamp, #message
| filter #message like /something 1234/
| sort #timestamp desc
| limit 100
However, using them separately returns results.
fields #timestamp, #message
| filter #message like /1234/
| sort #timestamp desc
| limit 100
Or
fields #timestamp, #message
| filter #message like /something/
| sort #timestamp desc
| limit 100
I'm unable to understand why the regex isn't working as expected

So, this is not an issue with the insights query itself.
But, with how cloudwatch log group displays log messages. Since, I took the displayed message as the source of truth, ended up with this issue.
This is an issue with how HTML displays consecutive spaces.
Consecutive spaces are collapsed by default within a html tag. In this case they are using a <span>.
When we expand the line, they use the css property white-space and change the behaviour to not collapse the consecutive spaces.
Reference: https://www.w3.org/TR/CSS2/text.html#white-space-prop

Related

Grafana, how to choose which column for axis in Timechart?

I am using Grafana to perform a Log to Metric query using an Azure Data Explorer datasource, that gives me a result like this as a table:
This comes from this query:
Log
| where $__timeFilter(TIMESTAMP)
| where eventId == 666
| summarize count() by bin(TIMESTAMP, 15m), Region
| order by TIMESTAMP asc
When rendered as a timechart in AppInsights, it renders perfectly like this:
However in Grafana, this perplexingly renders by the Count_ column, not using the obvious Regional breakout field:
My goal is to get an AppInsight's like timechart with multiple data series, within Grafana
I found my answer! It turns out I was rendering the data as a Table, using the Grafana query wizard here.
Once I changed that to TimeSeries, it all just worked!

CloudWatch Logs Insights get average of count returned

I'm querying CloudWatch logs via Logs Insights queries and am trying to get the average for count returned. For example, my current query that gets the counts is (only relevant parts shown):
fields fieldA as A
| stats count(*) as countForEachA by A
returns this:
A countForEachA
______ _____________
a123 22
a124 22
a125 16
I'm trying to get the average for field countForEachA. Therefore, for the example above, I want the average: 20 (sum of countForEachA divided by total results). Here's what I've tried:
fields fieldA as A
| stats avg(count(*)) as average by A
The query above returns this:
A countForEachA
______ _____________
a123 2.3
a124 1.4
a125 2.9
while I expected a single answer representing the average. Any help will be appreciated.
I got it, to get the average for countForEachA, we first need to get the count value and then simply call the avg() function like so:
fields fieldA as A
| stats count(*) as countForEachA, avg(countForEachA)
Hope this helps others.

How to use a substring in a google query to group results?

I'm a recovering engineer who is trying to remember how to do things like this in Googlesheet query language and I've completed just about everything I need. There is one more query to do and I'm stuck.
I know how to split up timestamps, pivot on date info from it, how to find contains results and count occurrences but I'm stumped on this one. How do you group by a substring? I've tried Left, regex extract, and about anything else I can think of but no luck.
I have a google spreadsheet with 5 pages, each with 4 columns. Each page is created from a form that a user is entering data into. I've added a date filter in B1 and B2 on my results sheet that works fine, too.
Here is an example of a query I'd like to see work.
=query({User1!A1:F; User2!A1:F; User3!A1:F; User4!A1:F; User5!A1:F}, "SELECT Col4, COUNT(Col4) Where Col1>= datetime '"&TEXT(B1,"yyyy-mm-dd HH:mm:ss")&"' AND Col1 <= datetime '"&TEXT(B2+1,"yyyy-mm-dd HH:mm:ss")&"' AND (Col4 is not null AND Col4 = 'Submits') group by **Left(Col2,4) AND** Col4 pivot month(Col1)+1, Day(Col1), Year(Col1)",1)
That bold bit seems to be the problem area. The rest of this works.
Here are the contents of the fields I'm working within each of the pages:
Col 1 - Timestamp
Col 2 - Opportunity # String that I want to use the left 4 chars as
a group by - strings look like O347-183XXXX so I want to use just
O347 in this case
Col 3 - Opportunity Name (not needed in result)
Col 4 - Strings that I want to count the occurrence of one item (7
different strings total) so I want the result to be a count the
occurrence of 'submits' for example.
I want to output a table with a group by Col2 Substring on the left column, Col4 count values for each substring, and pivot by date.
It ought to look like this when I get the results
I've seen some things here and other places that lead me partway there but I just hope there is an easy way to do this.

Is there a way to use order by to order similar fields together?

Is there a way to make similar values equivalent in an order by?
Say the data is:
name | number
John. | 9
John | 1
John. | 2
Smith | 4
John | 3
I'd want to order by name and then number, so that the output looks like this, but order by name, number will put all John entries ahead of John. entries.
name | number
John | 1
John. | 2
John | 3
John. | 9
Smith | 4
There's Fuzzy String Matching and beyond that, Pattern Matching.
You need some more advanced treatment on the name field then.
This topic will help you strip non-alphabetic characters from your string before ordering:
How to strip all non-alphabetic characters from string in SQL Server?
But the fact that you need such a complex function makes me question the very building process of your Database: if "John" and "John." are the same person, they should have the same name. So if the "." is important, that means you need another field to store the information it represents.
Use a regex replace function to strip out all special characters in your data, replacing with a space. Then wrap that in a TRIM function to remove the spaces
TRIM(CASE
WHEN name LIKE '%.%'
OR name LIKE '%_%'
OR name ~ '%\d%' --This is for a number
THEN
REGEXP_REPLACE(name, '(\_|\.|\d)', ' ' ) END) AS name_processed
The bit in brackets means replace an underscore or (|) a period or a digit with whatever is after the comma, which here is a space
Now you can order by name_processed and number as well
ORDER BY name_processed, number DESC
But you can always keep the original name in a SELECT afterwards if you wrote a subquery first through WITH. Let me know if you want to do this. Basically the syntx would be:
WITH processed_names AS (
SELECT
name,
TRIM(CASE
WHEN name LIKE '%.%'
OR name LIKE '%_%'
OR name ~ '%\d%' --This is for a number
THEN
REGEXP_REPLACE(name, '(\_|\.|\d)', ' ' ) END) AS name_processed,
number
FROM names
ORDER BY 2,3 DESC)
SELECT
name,
number
FROM processed_names;

merging rows in postgres with matching fields

I have a table format that appears in this type of format:
email | interest | major | employed |inserttime
jake#example.com | soccer | | true | 12:00
jake#example.com | | CS | true | 12:01
Essentially, this is a survey application and users sometimes hit the back button to add new fields. I later changed the INSERT logic to UPSERT so it just updated the row where email=currentUsersEmail , however for the data inserted prior to this code change there are many duplicate entries for single users. i have tried some group by's with no luck, as it continually says the
ID column must appear in the GROUP BY clause or be used in an
aggregate function.
Certainly there will be edge cases where there may be clashing data, in this case maybe user entered true for the employed column and then in the second he/she could have enter false. For now I am not going to take this into account yet.
I simply want to merge or flatten these values into a single row, in this case it would look like:
email | interest | major | employed |inserttime
jake#example.com | soccer | CS | true | 12:01
I am guessing I would take the most recent inserttime. I have been writing the web application in scala/play, but for this task i think probably using a language like python might be easier if i cannot do it directly through psql.
You can GROUP BY and flatten using MAX():
SELECT email, MAX(interest) AS interest,
MAX(major) AS major,MAX(employed) AS employed,
MAX(inserttime) AS inserttime
FROM your_table
GROUP BY email