How to debug large Polars pl.select - narrow down offending row - python-polars

when I encounter a panic error in a select query, how do I narrow down which is / are the offending expressions
Edit:
simply by looking at the logs / setting a Param. Without having to alter the code into a sequential binary search for loop.
pl.select([
expr1, expr2, expr3 ....
])

Where you have
pl.select([
expr1, expr2, expr3 ....
])
isolate the list
exprlist=[expr1, expr2, expr3 .... ]
Then do something like:
for i, expr in enumerate(exprlist):
try:
pl.select(expr)
except:
print(f"{i} is bad")

Related

mismatched input 'as'. Expecting: ',', <expression>

My query in PRESTO returns this error
Query failed (#20220506_153121_03035_vycq3): line 6:41: mismatched
input 'as'. Expecting: ',',
I don't know why, anybody could find the issue?
select
droh.created_by as deliverymen_email,
count(distinct o.order_number) as deliveries,
sum(case when o.last_status = 10 then 1 else 0 end) quantity_canceled,
cast(sum(quantity_canceled as decimal))/cast(count(deliveries as decimal)) as
delivery_cancellation_fee
from sensitive_raw_courier_api.deliveryman_route_order_history droh
left join raw_courier_api."order" o
on droh.order_number = o.order_number and droh.state = 'DM_PICKED_UP'
where 1=1
and o.created_date >= {{date_begin}}
and droh.created_at >= {{date_end}}
and o.customer_email = {{costumer_email}}
group by 1
order by 2 desc
There is an error with the position of 2 brackets.
We count(~) first and then cast( count(~) ) the result.
cast(sum(quantity_canceled as decimal))/cast(count(deliveries as decimal)) as
should be
cast(sum(quantity_canceled) as decimal)/cast(count(deliveries) as decimal) as
Without the context and further information, it's not sure if this is your only issue in this query, but you can't "mix" a cast with a sum or a count like you do. You need to do the cast first and then sum or count the values (or vice versa). So as example this syntax in your query is incorrect:
CAST(SUM(quantity_canceled AS DECIMAL))
It should be this one instead:
SUM(CAST(quantity_canceled AS DECIMAL))
Or this one:
CAST(SUM(quantity_canceled) AS DECIMAL)
You must fix all occurences of this mistake and then check if the query is correct or contains further problems.
A last note: Doing a division could always end in a division by zero exception if you don't prevent this. So you should take care of this.

Approaches to execute PostgreSQL's concat() instead of || in JOOQ?

The ||-operator and the concat(...)-function in PostgreSQL behave differently.
select 'ABC'||NULL||'def';
-- Result: NULL
select concat('ABC', NULL, 'def');
-- Result: 'ABCdef'
concat(...) ignores NULL values, but a NULL whithin a || expression makes the whole result become NULL.
In JOOQ, the DSL.concat() in the PostgreSQL dialect renders expressions using the ||-operator:
Java: dsl.select(
DSL.concat(
DSL.inline("ABC"),
DSL.inline(null, SQLDataType.VARCHAR),
DSL.inline("def"))
).execute();
SQL: select ('ABC' || null || 'def')
Result: NULL
I am looking for (elegant?) ways to invoke the concat(...)-function instead of the ||-operator via JOOQ in PostgreSQL:
Java: dsl.select(???).execute();
SQL: select concat('ABC', null, 'def')
Result: 'ABCdef'
I found two ways to achieve the posed objective.
Approach #1:
dsl.select(
field(
"concat({0})",
SQLDataType.VARCHAR,
list(
inline("ABC"),
inline(null, SQLDataType.VARCHAR),
inline("def")
)
)
).execute();
This has to the intended behavior, but necessitates the in my eyes ugly "concat({0})". A more elegant approach from my point of view is:
Approach #2:
dsl.select(
function(
"concat",
SQLDataType.VARCHAR,
inline("ABC"),
inline(null, SQLDataType.VARCHAR),
inline("def")
)
).execute();
This solution does not involve inline SQL with placeholders as approach #1. Why JOOQ generates || instead of concat(...) in the first place is still to be expounded, though.

Ecto fragment without parentheses

My Postgres table structure:
id | stuff
--------+------------------------------------------------------------
123 | {"type1": {"ref": "ref_1", "...": "..."}, "type2": {"ref": "ref_1", "...": "..."}}
I'd like to query by ref in each type of stuff, I have a working SQL query for this:
SELECT * FROM "stuff" AS c0 CROSS JOIN jsonb_each(c0."stuff") AS f1 WHERE value->>'ref' = 'ref_1';
But using this Ecto query:
(from c in Stuff,
join: fragment("jsonb_each(?)", c.stuff),
where: fragment("value->>'ref' = ?", ^ref)
)
|> Repo.all
I get a Postgres syntax error in the CROSS JOIN statement:
** (Postgrex.Error) ERROR 42601 (syntax_error): syntax error at or near ")"
Inspecting the generated query:
[debug] QUERY ERROR source="stuff" db=0.3ms
SELECT ... FROM "stuff" AS c0 CROSS JOIN (jsonb_each(c0."stuff")) AS f1 WHERE (value->>'ref' = $1) ["ref_1"]
The above works when I remove the outer parentheses around (jsonb_each(c0."stuff")).
Is there a way to have the fragment generate the query without these parentheses or do I have to redesign the query?
Thanks
It seems that Ecto always wraps the join clause in parentheses, which is usually fine. The times when its not unfortunately includes certain calls like the jsonb_each above. There is a wiki here for such cases: The parentheses rules of PostgreSQL, is there a summarized guide?
The linked raw sql example had a much less upvoted answer that seems to work well with both making this query and getting back the expected struct.
sql = "SELECT * FROM "stuff" AS c0 CROSS JOIN jsonb_each(c0."stuff") AS f1 WHERE value->>'ref' = 'ref_1';"
result = JsonbTest.Repo.query!(sql)
Enum.map(result.rows, &JsonbTest.Repo.load(StuffStruct, {result.columns, &1}))
This is a bug in ecto, has been fixed here https://github.com/elixir-ecto/ecto/issues/2537

Ignore special characters before match conditions

How can I write equivalent of following in mongo? I need to ignore some characters(spaces, hyphen) from a particular column before conditions are checked. For the sake of putting an example of mysql I am just removing space.
select * from TABLE
where REPLACE('name', ' ', '') = 'TEST'
So if name column has " T E S T" that should match.
You can try with $where operator in your query:
{$where: "this.name.replace(/[ -]/g,'') == 'TEST'"}
or:
{$where: "this.name.match(/T[ -]*E[ -]*S[ -]*T/)"}
or directly a $regex:
{name: /T[ -]*E[ -]*S[ -]*T/}
More info about $where $regex operators.

PostgreSQL change part of a string to uppercase

I have a field named rspec in a table trace.
So for now the field is like "Vol3/data/20070204_191426_FXBS.v3a".
All I need is a query to change it to the format "Vol3/data/20070204_191426_FXBS.V3A".
Assuming the current version:
select left(rspec, - 3)||upper(right(rspec, 3))
from trace
For older versions:
select substr(rspec, 1, length(rspec) - 3)||upper(substring(rspec from '...$'))
from trace
Or, to cover all possibilities like
file extensions of variable length: abc123.jpeg
no file extension at all: abc123
dot as last character: abc123.
multiple dots: abc.123.jpg
SELECT CASE WHEN rspec ~~ '%.%'
THEN substring(rspec, E'^.*\\.')
|| upper(substring(rspec , E'([^.]*)$'))
ELSE rspec
END AS rspec
FROM (VALUES
('abc123.jpeg')
, ('abc123')
, ('abc123.')
, ('abc.123.jpg')
) ASx(rspec); -- testcases
Explain:
If the string has no dot, use the string.
Else, take everything up to and including the last dot in the string.
Append everything after the last dot in upper case.