Building a dynamic WHERE filer in PySpark (DataBricks) - pyspark

So I'm trying to dynamically load a set of SQL Server tables from info in DataBricks (the company's lakehouse for info) using Python / PySpark. I'm trying to make it as dynamic / data-driven as possible, so I'm trying to build out a dynamic WHERE to filter a dataframe with. Because each pull from the lakehouse will have a different date column to filter by, I need to be able to use both variables for the column to filter on, as well as variables for the dates in question.
I'm trying to do something like this:
where_condition = "((" + check_column + " > '" + start_date_str + "') & (" + check_column + " < '" + end_date_str + "'))"
filtered_df = df.where(where_condition)
But I get back the following error:
AnalysisException: cannot resolve '((`l0_createTime_` > CAST('2022-11-01' AS TIMESTAMP)) & (`l0_createTime_` < CAST('2022-11-02' AS TIMESTAMP)))' due to data type mismatch: '((`l0_createTime_` > CAST('2022-11-01' AS TIMESTAMP)) & (`l0_createTime_` < CAST('2022-11-02' AS TIMESTAMP)))' requires integral type, not boolean; line 1 pos 1;
I feel like I'm missing something (obviously)... I've tried multiple ways of building the where statement, but it's not seeing it as such.
Any suggestions on how to build something dynamic like this, containing both dynamic columns from the dataframe, as well as dynamic values to compare to those dynamic columns?

The & operator, in Spark SQL, is for bitwise AND and requires integral operands, as the error says.
What you want to use here is the logical AND operator, that is AND:
where_condition = "((" + check_column + " > '" + start_date_str + "') AND (" + check_column + " < '" + end_date_str + "'))"
A cleaner way to build your dynamic condition is to use PySpark API:
from pyspark.sql import functions as F
where_condition = (F.col(check_column) > start_date_str) & (F.col(check_column) < end_date_str)
filtered_df = df.where(where_condition)
in this case the & operator is used, which is an overloaded operator in the PySpark Column class.

Related

How to modify this code in Scala by using Brackets

I have a spark dataframe in Databricks, with an ID and 200 other columns (like a pivot view of data). I would like to unpivot these data to make a tall object with half of the columns, where I'll end up with 100 rows per id. I'm using the Stack function and using specific column names.
Question is this: I'm new to scala and similar languages, and unfamiliar with best practices on how to us Brackets when literals are presented in multiple rows as below. Can I replace the Double quotes and + with something else?
%scala
val unPivotDF = hiveDF.select($"id",
expr("stack(100, " +
"'cat1', cat1, " +
"'cat2', cat2, " +
"'cat3', cat3, " +
//...
"'cat99', cat99, " +
"'cat100', cat100) as (Category,Value)"))
.where("Value is not null")
You can use """ to define multiline strings like:
"""
some string
over multiple lines
"""
In your case this will only work assuming that the string you're writing tolerates new lines.
Considering how repetitive it is, you could also generate the string with something like:
(1 to 100)
.map(i => s"'cat$i', cat$i")
.mkString(",")
(To be adapted by the reader to exact needs)
Edit: and to answer your initial question: brackets won't help in any way here.

In Azure Data Factory, how do i create a string expressions made up of a Paramater, string literal and function, in the web UI

I would like to have an expression equal to MyParmater + '_' + utcnow()
My current attempt is: #{pipeline().parameters.Col}_{utcnow()}
but iy fails
Use the concat function:
#concat(pipeline().parameters.Col,'_',utcNow())

How to format LaTeX formulas with double dollar signs (`$$`) in PostgreSQL query?

I am building a PostgreSQL query through a script that returns formatted LaTeX formulas surrounded by double dollar signs, such as the following one:
$$6 x^{14} + \frac{7 x^{13}}{5} + \frac{13 x^{8}}{7} + \frac{5 x^{5}}{6}$$
Moreover, these formulas belong to an array, so that the complete INSERT query would be something like this:
INSERT INTO table("array")
VALUES (
'{"$$6 x^{14} + \frac{7 x^{13}}{5} + \frac{13 x^{8}}{7} + \frac{5 x^{5}}{6}$$",
"$$\frac{9 x^{11}}{13} + \frac{13 x^{9}}{7} + x^{8} + \frac{x^{6}}{3}$$",
"$$2 x^{13} + \frac{52 x^{12}}{3} + \frac{65 x^{4}}{9} + \frac{3}{2}$$"}'
)
However, following the INSERT, the backslash (\) that precedes frac disappears in the database (I get frac instead of \frac. Consequently my formulas do not render well in my application.
Here's the content of the cell:
{"$$6 x^{14} + frac{7 x^{13}}{5} + frac{13 x^{8}}{7} + frac{5 x^{5}}{6}$$",
"$$frac{9 x^{11}}{13} + frac{13 x^{9}}{7} + x^{8} + frac{x^{6}}{3}$$",
"$$2 x^{13} + frac{52 x^{12}}{3} + frac{65 x^{4}}{9} + frac{3}{2}$$"}
I use the sympy module in Python to automatically generate the formulas, so to manually double the backslashes before each frac is not an option.
What should I do to prevent this behavior from happening?
Backslash is an escape character in strings that represent an array of strings:
SELECT ('{a,"b\"c\\d"}'::text[])[2];
text
-------
b"c\d
(1 row)
If the backslash does not precede a character with a special meaning, it is ignored.
Double all the backslashes inside the string representation of a string array in PostgreSQL to get what you want.
If such backslashes are the only ones occurring in your string constant, you could proceed as follows:
SELECT replace(
'{"$$6 x^{14} + \frac{7 x^{13}}{5} + \frac{13 x^{8}}{7} + \frac{5 x^{5}}{6}$$",
"$$\frac{9 x^{11}}{13} + \frac{13 x^{9}}{7} + x^{8} + \frac{x^{6}}{3}$$",
"$$2 x^{13} + \frac{52 x^{12}}{3} + \frac{65 x^{4}}{9} + \frac{3}{2}$$"}',
'\',
'\\'
)::text[];

DoCmd.BrowseTo acBrowseToForm MULTIPLE WHERE conditions

I am new to Access 2010 VBA but have solid SQL background. I am trying to open/browse a form from a toogle button based on complex filter.
The form is called: FormSuivi
In SQL, the filter would be like this:
WHERE Randomise = 'Y' AND ActualSxDate is not null
AND datediff('d', Date(),ActualSxDate) > 140 AND DCD = 0;
In this Accessdatabase, the following field's types are:
Randomise: text
ActualSxDate: Date
DCD: Yes/no -> integer (-1/0)
For now, all I managed to do is to implement one condition at a time:
Private Sub Toggle25_Click()
DoCmd.BrowseTo acBrowseToForm, "FormSuivi", , "Randomise = """ & "Y" & """"
End Sub
How can all the conditions listed in SQL be squeezed into a VBA command line?
The parameter WhereCondition can be a full WHERE string, without the WHERE keyword. Including ANDs, parentheses, etc.
Single quotes ' help to keep the string readable (as opposed to """ constructs).
Variables need to be concatenated, e.g.
Dim S As String
S = "Randomise = '" & strRandomise & "' AND ActualSxDate is not null " & _
"AND datediff('d', Date(),ActualSxDate) > 140 AND DCD = " & bDCD
DoCmd.BrowseTo acBrowseToForm, "FormSuivi", , S

Crystal Reports formula: IsNull + Iif

There are hints of the answer to this question here and there on this site, but I'm asking a slightly different question.
Where does Crystal Reports document that this syntax does not work?
Trim({PatientProfile.First}) + " "
+ Trim(Iif(
IsNull({PatientProfile.Middle})
, Trim({PatientProfile.Middle}) + " "
, " "
)
)
+ Trim({PatientProfile.Last})
I know the solution is
If IsNull({PatientProfile.Middle}) Then
Trim({PatientProfile.First})
+ " " + Trim({PatientProfile.Last})
Else
Trim({PatientProfile.First})
+ " " + Trim({PatientProfile.Middle})
+ " " + Trim({PatientProfile.Last})
but how are we supposed to figure out we can't use the first version?
The documentation for IsNull says
Evaluates the field specified in the current record and returns TRUE if the field contains a null value
and Iif gives
[Returns] truePart if expression is True and falsePart if expression is False. The type of the returned value is the same as the type of truePart and falsePart.
I suppose if you stare at that line about "type of the return value" you can get it, but...
Where does Crystal Reports document that this syntax does not work?
I doubt there is anyplace large enough in the entire universe to document everything that does not work in Crystal Reports...
I know I'm years late on this one, but I came upon this question while trying to figure out the same thing. Funny enough, I couldn't even find the answer in Crystal Reports documentation, but instead in a link to IBM.
Baiscally, if you're using Crystal Reports 8.x or 10.x, ISNULL and IIF don't work together. From the site:
Cause
There is a defect in Crystal Reports 8.x and 10.x that prevents the above formula from working correctly. The 'IIF' and 'IsNull' commands cannot function together, and that includes attempting to use "Not" to modify the IsNull command; for example, IIF(Not IsNull ()).
Resolving the problem
The workaround is to use an "If-Then-Else" statement. For example,
If IsNull({~CRPT_TMP0001_ttx.install_date}) Then "TBD" Else "In Progress"
So if you're using CR 8.x or 10.x (which we are), you're out of luck. It makes it REAL fun when you are concatenating multiple fields together and one of them might be NULL.
I think CR evaluates both IIFs true and false parts. Because you have "Trim({PatientProfile.Middle})" part there, which will be evaluated aganst null value, CR formula evaluator seems just fail.
try this:
currencyvar tt;
currencyvar dect;
tt :={ship.comm_amount};
dect := tt - Truncate(tt);
tt := truncate(tt);
dect := dect * 100;
if dect = 0 then
UPPERCASE('$ ' + ToWords (tt,0 )) + ' ONLY'
else
UPPERCASE('$ ' + ToWords (tt,0) + ' And ' + ToWords(dect,0)) + ' ONLY ';