Is it possible to convert string values to numbers in LogQL? - grafana

I am following documentation and thanks to | line_format and regexReplaceAll I was able to fetch some substring from a line.
Let's say now I have those columns:
Having that I want to perform some transform operation, ex sum, or transform operation with grouping by and taking total.
It is not working as I am suspecting those values are not being numbers, but only strings.
Is it possible to convert it to numbers?
I was trying using unwrap but it didn't worked:
|="text expression"
| json
| line_format `{{ regexReplaceAll "text expression to remove from (\\d+)" .label_id "${1}" | trim }}`
| unwrap label_id [1m]
it ends up with
pipeline error: 'SampleExtractionErr' for series:
when I am filtering out errors, there is no results.


Change prefix in a integer column in pyspark

I want to convert the prefix from 222.. to 999.. in pyspark.
Expected new column new_id with changed prefixt to 999..s
I will be using this column for inner merge b/w 2 pysparl dataframes
You can achieve it with something like this,
# First calculate the number of "2"s from the start till some other value is found, for eg '2223' should give you 3 as the length
# Use that calculated value to repeat the "9" that many times
# replace starting "2"s with the calulated "9" string
# finally drop all the calculated columns
df.withColumn("len_2", F.length(F.regexp_extract(F.col("value"), r"^2*(?!2)", 0)).cast('int'))\
.withColumn("to_replace_with", F.expr("repeat('9', len_2)"))\
.withColumn("new_value", F.expr("regexp_replace(value, '^2*(?!2)', to_replace_with)")) \
.drop("len_2", "to_replace_with")\
|value |new_value |
|222222579844 |999999579844 |
|222225701296 |999995701296 |
|22222955099 |99999955099 |
|22222955099 |99999955099 |
|22222955099 |99999955099 |
|222285678 |999985678 |
I have used the column name as value, you would have to substitute it with id.
You can try the following:
from pyspark.sql.functions import *
df = df.withColumn("tempcol1", regexp_extract("id", "^2*", 0)).withColumn("tempcol2", split(regexp_replace("id", "^2*", "_"), "_")[1]).withColumn("new_id", concat((regexp_replace("tempcol1", "2", "9")), "tempcol2")).drop("tempcol1", "tempcol2")
The id column is split into two temp columns, one having the prefix and the other the rest of the string. The prefix column values are replaced and concatenated back with the second temp column.

Extract dictionary from a string

I have got a string that represents a query. It begins with a function, and the argument is a dictionary.
"runQuery `syms`columns`fastQuery`exchange!((`AAPL`MSFT`GOOG`AMD);(`sym`price`date);1b;`nasdaq)"
How can I extract the dictionary from the string, and save it in kdb as a dictionary type?
parse the string, to get the parse tree, and then take the param (the dict):
q)eval last parse "runQuery `syms`columns`fastQuery`exchange!((`AAPL`MSFT`GOOG`AMD);(`sym`price`date);1b;`nasdaq)"
columns | `sym`price`date
fastQuery| 1b
exchange | `nasdaq
For this example value can have the desired effect:
q)myDict:value {(first where x=" ")_x}"runQuery `syms`columns`fastQuery`exchange!((`AAPL`MSFT`GOOG`AMD);(`sym`price`date);1b;`nasdaq)"
columns | `sym`price`date
fastQuery| 1b
exchange | `nasdaq
If you have free rein to (re)define the function then you could just do:
q)value"runQuery `syms`columns`fastQuery`exchange!((`AAPL`MSFT`GOOG`AMD);(`sym`price`date);1b;`nasdaq)"
columns | `sym`price`date
fastQuery| 1b
exchange | `nasdaq
This could be quite useful if you're replaying a tickerplant-style log using -11!

Add 2 fields in LogQL to use within aggregate function

I have log lines that contain a few timestamp fields. Here is an example of log line I am filtering in order to process it:
"time": "2022-06-22T10:33:08.710037238Z",
"#version": "1",
"message": "Duration at response processing ends",
"logger_name": "",
"thread_name": "reactor-http-epoll-1",
"level": "INFO",
"level_value": 20000,
"rqst_id_ts": "b65c37d9284584e71b1dcd84b6a74075",
"rqst_end_ts": "1655893988698",
"rqst_start_ts": "1655893988698",
"rsp_start_ts": "1655893988709",
"rsp_end_ts": "1655893988709"
What I would like to do is calculate a value that would represent the duration between 2 timestamps in the log line so that I can then put the obtained range into quantile_over_time() or any other aggregate function.
For instance, the following works:
quantile_over_time(0.99, {app="myapp"} |~ "rsp_end_ts"
| json
| __error__ = ""
| unwrap rsp_end_ts | __error__="" [5m]) by (tsNs)
However this is not what I want to do since calculating the quantile of epoch timestamps makes no sense. What I want to calculate is the p99 of (rsp_end_ts - rqst_start_ts).
I tried the following which of course doesn't work, but gives an idea as to what I am attempting to do:
quantile_over_time(0.99, {app="myapp"} |~ "rsp_end_ts"
| json
| __error__ = ""
| unwrap (rsp_end_ts - rqst_start_ts) | __error__="" [5m]) by (tsNs)
If somehow there was a way to create a new label like rqst_duration=(rsp_end_ts - rqst_start_ts)
Then the following would be what I am looking for:
quantile_over_time(0.99, {app="myapp"} |~ "rsp_end_ts"
| json
| __error__ = ""
| unwrap rqst_duration | __error__="" [5m]) by (tsNs)
I couldn't find any documentation about this, which is very surprising, I would think (but it seems I might be wrong) this to be a common use case. Any help would be greatly appreciated :).
You can use template functions for that. Here's a sample query on the Grafana playground that you can inspire from.
So your query will look something like:
{app="myapp"} |~ "rsp_end_ts"
| json
} label_format result=`{{sub .rsp_end_ts rqst_start_ts}}` | line_format "{{ .result }}"
Then you can use the result.

How to parse month-year string using Presto

I have a column that contains a Month-Year string that I would like to convert to an actual date representing the first day of the Month and Year combination. For example
| Original | Desired |
| Aug-19 | 08/01/2019 |
| Sep-20 | 09/01/2020 |
| May-22 | 05/01/2022 |
I have tried breaking apart the Month-Year string using split_part but when I try and pass Month as a parameter into date_parse it throws an error with the input (INVALID_FUNCTION_ARGUMENT). I could break apart the Month-Year into strings and then recombine, hard-coding the 01 however the problem seems that three letter month cannot be parsed into an actual month by Presto. I also want to avoid a 12 line CASE WHEN statement to parse the month if possible.
I'm not sure where the year comes from, but the query will be like this:
select date_format(date_parse('May-22', '%b-%d'), '%m/%d/%Y')

Decimal: Integral number too large in Redshift COPY

I'm getting the following error from Redshift.
Decimal: Integral number too large
This is happening when inserting the following csv line
The error is being thrown by 1.4.
The definition of that column is this:
schemaName | tablename | column | type | encoding | disktkey | sortkey | notnull
public | partners | revenue_partner | numeric(7,7) | none | false | 0 | false
This copy worked fine when the type was numeric(7,2), but I need to change it to fix a rounding error.
numeric(7,7) means the total number of digits allowed is 7 and all 7 are allocated as decimals. If you want 7 decimals and 7 digits you need numeric(14,7)
Reading the docs
It looks like a numeric(7,7) data type can only store values between 0-1 with 7 significant figures. The second number is the number of values you can have after the decimal and the first number - the second number will be the number of values you can have before the decimal.