Using column as a parameter in macro dbt jinja - macros

Hello everyone,
I have 28 tables(models) to create. I need to do some transformations on these tables. So I decided to use macro not to repeat myself and also will have more transformations in the future.
I want to remove '%', '$' signs from the column and cast it to float. When is '%' divide it by 100.
I must mention that I used quoting:
identifier: true
I've created this model:
SELECT
{{ clean_values('"Data"') }} AS "Data"
FROM
{{ source('mml_staging_eastor', 'DATAXIS_Development_indicators') }}
I also created macro:
{% macro clean_values(value_column) -%}
{% do log(node, info=true) %}
{# {% set column_value = 'wme%mt' %} #}
{% set column_value = value_column %}
{% set col_val_list = value_column | list %}
{% if '%' in col_val_list %}
'{{ column_value | replace('%', '') }}'
{% elif '$' in col_val_list %}
'{{ column_value | replace('$', '') }}'
{% else %}
{{ col_val_list }}
{% endif %}
{%- endmacro %}
Macros are compiled (templated) before the query is run. That means that the data doesn't run through the jinja templater. When you {% set column_value = value_column %} you're just passing a string with the value value_column into jinja, not the data from the field with that name.
Which is true and I'll get result in my model like: [ """, "D", "a", "t", "a", """ ].
It's possible to use the run_query macro to pull data into the jinja context, but this is slow and error-prone.
*If I'm using commented line, so dedicating string 'wme%mt' to variable it's working fine.
How can I handle this to clean this column in macro?
Thank you in advance!*

I see you found my other answer that attempts to explain the difference between the compilation step and the query execution step. But I think you're making the same mistake as that other person, and you're using jinja to operate on the string that represents the name of the column, instead of using SQL to operate on the data in the column.
Your macro needs to write SQL that can be executed in the database. I can't quite tell from your question if the $ and % are in the data or the column names, but assuming the former (and that you're on Snowflake or similar dialect), your macro would look like this (note that % is a SQL wildcard character meaning "match any number of any char" and must be escaped):
{% macro clean_values(column_name) -%}
case
when {{ column_name }} like '%^%' escape '^' -- number ending with %
then replace({{ column_name }}, '%', '')::float / 100.0
when {{ column_name }} like '$%' -- number starting with $
then replace({{ column_name }}, '$', '')::float
else {{ column_name }}::float
end
{% endmacro %}
I'm using the simple cast operator, :: here, which will throw an exception if it can't cast the expression to a float. You may prefer to use TRY_TO_DOUBLE, which will return NULL instead of throwing an exception, if the column contains any values Snowflake can't cast.

Related

Filter rows based on a boolean column

I am dynamically creating SQLs for each table via macro.Details of column and table name is stored in a seed file.
select table,
columns,
flag
from {{ file }}
I am using run_query to execute the above query and assigning it to a list.
{%- set results = run_query(query) -%}
{%- if execute -%}
{%- set table_names,columns,flag = [results.columns[0],results.columns[1],results.columns[2]] -%}
{%- endif -%}
Depending on the column flag, I have to group the CTEs and create two union CTEs separately:
flag_true as ( select * from tab1
union
select * from tab2) ,
flag_false as ( select * from tab3
union
select * from tab4)
I tried the below code but the problem is since my table_names loop have all flag values both true and false there is an extra union coming in the end.
I am unable to figure out a way to reduce the loop to only tables having flag == true.
flag_true as (
{% for tab_name in table_names %}
{%- if flag[loop.index-1] %}
select * from {{tab_name}} {% if not loop.last %}union{% endif %}
{% endif %}
{% endfor %}
),
By creating your three lists table_names, columns and flag, based on your table columns, you are complicating the matter, as you are later treating them back as rows.
A better approach would be to filter your rows based on an attribute value, and this could be achieved with the selectattr filter or its opposite, rejectattr filter.
I believe you can use those filters on agate.Table.rows, and since run_query returns a Table object, on results.rows, in your code.
So, something like:
{%- set results = run_query(query) -%}
{%- if execute -%}
{%- set rows = results.rows -%}
{%- endif -%}
flag_true as (
{% for row in rows | selectattr('flag') %}
select * from {{ row.table }} {% if not loop.last %}union{% endif %}
{% endif %}
% endfor %}
),
flag_false as (
{% for row in rows | rejectattr('flag') %}
select * from {{ row.table }} {% if not loop.last %}union{% endif %}
{% endif %}
)

PostgreSQL OR condition not executing if one of them is true

My 2nd 'and' is throwing an error. I'm checking to see if either: 1.) dateRange1.value.start does not exist OR 2.) dateRange1.value.start is >= my job_start variable. #2 works when I have a value present in dateRange1.value.start, but when it's empty (ie; !dateRange1.value.start), I'm getting a "invalid input syntax for type date" error. The same exact type of statement works directly above where I'm checking the jobNameFilter.value. Why is this causing issues for my dateRange1.value.start?
SELECT
id,
user_id,
job_name,
job_start,
job_end
from jobs
where ({{ !jobNameFilter.value }} or lower(job_name) like {{ '%' + jobNameFilter.value.toLowerCase() + '%' }})
and ({{ !dateRange1.value.start }} or date(job_start) >= {{ dateRange1.value.start }})
and user_id = {{loginSuccessID.value}}
order by job_start;

Helm upgrade - preserve value of an environment variable when if condition is not met

I am doing helm upgrade --install with the --reuse-values option. I want to conditionally update the value of an environment variable if a certain condition is met and if the condition is not met preserve the already existing value. I've tried to achieve that this way:
env:
- name: RELEASE_DATE
value: "{{ if eq .Values.config.myCondition "testValue" }}{{ date "2006-01-02T15:04:05" .Release.Time }}{{ end }}"
or
env:
- name: RELEASE_DATE
value: "{{ if eq .Values.config.myCondition "testValue" }}{{ date "2006-01-02T15:04:05" .Release.Time }}{{ else }}{{ .Values.someOtherValue }}{{ end }}"
where .Values.someOtherValue is never really existing, or even doing it this way
env:
{{- if eq .Values.config.deployedService "ftmFrontend" }}
- name: RELEASE_DATE
value: "{{ if eq .Values.config.myCondition "testValue" }}{{ date "2006-01-02T15:04:05" .Release.Time }}{{ end }}"
{{- end }}
I was expecting that if I use a --reuse-values option that the values are going to be kept/preserved if the condition is not met, but instead it always deletes it and leaves it empty.
The option does work however, when there is no condition clause and only the value from the chart is referenced. It's only not working if the if statement is used.
How can I use if statement and still preserve the value if the condition is not met?

DBT - for loop issue with number as variable

I am trying to create in DBT tool 'loop for' tables which have number as a variable, the same variable I am using also in macro. Issue is in union part. DBT is trying to create union with table name which is based on number and it is not possible in postgre sql.
{{ config(materialized='table') }}
{%- set dpids = ["123","1234"] -%}
WITH
{% for dpid in dpids %}
{{ dpid }} AS (
SELECT *
FROM {{ ref('table_name') }} b
WHERE b.create_ts > ({{ get_last_load_timestamp('table_name', dpid) }})
AND b.dpid = '{{ dpid }}'
){% if not loop.last %},{% endif %}
{% endfor %}
{{ union_all(dpids) }}
select * from 123 union all select * from 1234 --fail
So I tried to to change it where I added t_ before number, but in union part, dbt try to get all characters from string.
with
{% for 't_'~dpid in dpids %}
{{ dpid }} AS (
SELECT *
FROM {{ ref('table_name') }} b
WHERE b.create_ts > ({{ get_last_load_timestamp('table_name', dpid) }})
AND b.dpid = '{{ dpid }}'
){% if not loop.last %},{% endif %}
{% endfor %}
{{ union_all('t_'~dpids) }}
select * from t
union all
select * from _
union all
select * from 1
union all
select * from 2
...and so on
I dont know if its not just possible or am I missing something.
If someone has any ideas I would be grateful.
Thanks
Welcome! You've asked a great question for a newcomer!
Essentially, I think you're asking asking
How can I prefix each element of a list of strings in jinja?
How can I do this in a "clean" dbt-eque way?
I'm no postgres expert, but there may be a way to get away with tables named with just integers (casting to string? or quoting them?)
My answer might not be the cleanest, but it should work.
{{ config(materialized='table') }}
{%- set dpids = ["123","1234"] -%}
{%- set cte_names = [] %}
WITH
{% for dpid in dpids %}
{%- set cte_name = 't_' ~ dpid %}
{% do cte_names.append(cte_name) %}
{{ cte_name }} AS (
SELECT *
FROM {{ ref('table_name') }} b
WHERE b.create_ts > ({{ get_last_load_timestamp('table_name', dpid) }})
AND b.dpid = '{{ dpid }}'
){% if not loop.last %},{% endif %}
{% endfor %}
{{ union_all(cte_names) }}

Create entry in helm template only if an entry exists in a map

Assuming this map exists in my values.yaml
something:
somethingElse:
variable1: value1
variable2: value2
variable3: value3
I want to create a helm template for a kubernetes Secret resource (although this is not of primary importance) if and only if say the key - value pair variable2: value2 exists. (I am only actually interest to match the variable2's existence, not what is the value of value2)
I know how to range to include all entries
{{- range $name, $value := .Values.something.somethingElse }}
{{indent 4 $name }}: {{ $value }}
{{- end }}
but in pseudocode, what I want is
if variable2 in .Values.something.somethingElse
variable2: value2
Is this somehow feasible using the helm templating language?
You can do:
{{if (index .Values.something.somethingElse "variable2")}}
Helm contain a flow control and you can create if / else conditional blocks, for your case this example can be helpful.
In your values.yaml
foo:
enabled: true
So, in your template:
{{- if .Values.foo.enabled }}
--- TEMPLATE CODE --
{{- end }}
Ref: https://helm.sh/docs/chart_template_guide/control_structures/