DBT - for loop issue with number as variable - postgresql

I am trying to create in DBT tool 'loop for' tables which have number as a variable, the same variable I am using also in macro. Issue is in union part. DBT is trying to create union with table name which is based on number and it is not possible in postgre sql.
{{ config(materialized='table') }}
{%- set dpids = ["123","1234"] -%}
WITH
{% for dpid in dpids %}
{{ dpid }} AS (
SELECT *
FROM {{ ref('table_name') }} b
WHERE b.create_ts > ({{ get_last_load_timestamp('table_name', dpid) }})
AND b.dpid = '{{ dpid }}'
){% if not loop.last %},{% endif %}
{% endfor %}
{{ union_all(dpids) }}
select * from 123 union all select * from 1234 --fail
So I tried to to change it where I added t_ before number, but in union part, dbt try to get all characters from string.
with
{% for 't_'~dpid in dpids %}
{{ dpid }} AS (
SELECT *
FROM {{ ref('table_name') }} b
WHERE b.create_ts > ({{ get_last_load_timestamp('table_name', dpid) }})
AND b.dpid = '{{ dpid }}'
){% if not loop.last %},{% endif %}
{% endfor %}
{{ union_all('t_'~dpids) }}
select * from t
union all
select * from _
union all
select * from 1
union all
select * from 2
...and so on
I dont know if its not just possible or am I missing something.
If someone has any ideas I would be grateful.
Thanks

Welcome! You've asked a great question for a newcomer!
Essentially, I think you're asking asking
How can I prefix each element of a list of strings in jinja?
How can I do this in a "clean" dbt-eque way?
I'm no postgres expert, but there may be a way to get away with tables named with just integers (casting to string? or quoting them?)
My answer might not be the cleanest, but it should work.
{{ config(materialized='table') }}
{%- set dpids = ["123","1234"] -%}
{%- set cte_names = [] %}
WITH
{% for dpid in dpids %}
{%- set cte_name = 't_' ~ dpid %}
{% do cte_names.append(cte_name) %}
{{ cte_name }} AS (
SELECT *
FROM {{ ref('table_name') }} b
WHERE b.create_ts > ({{ get_last_load_timestamp('table_name', dpid) }})
AND b.dpid = '{{ dpid }}'
){% if not loop.last %},{% endif %}
{% endfor %}
{{ union_all(cte_names) }}

Related

Filter rows based on a boolean column

I am dynamically creating SQLs for each table via macro.Details of column and table name is stored in a seed file.
select table,
columns,
flag
from {{ file }}
I am using run_query to execute the above query and assigning it to a list.
{%- set results = run_query(query) -%}
{%- if execute -%}
{%- set table_names,columns,flag = [results.columns[0],results.columns[1],results.columns[2]] -%}
{%- endif -%}
Depending on the column flag, I have to group the CTEs and create two union CTEs separately:
flag_true as ( select * from tab1
union
select * from tab2) ,
flag_false as ( select * from tab3
union
select * from tab4)
I tried the below code but the problem is since my table_names loop have all flag values both true and false there is an extra union coming in the end.
I am unable to figure out a way to reduce the loop to only tables having flag == true.
flag_true as (
{% for tab_name in table_names %}
{%- if flag[loop.index-1] %}
select * from {{tab_name}} {% if not loop.last %}union{% endif %}
{% endif %}
{% endfor %}
),
By creating your three lists table_names, columns and flag, based on your table columns, you are complicating the matter, as you are later treating them back as rows.
A better approach would be to filter your rows based on an attribute value, and this could be achieved with the selectattr filter or its opposite, rejectattr filter.
I believe you can use those filters on agate.Table.rows, and since run_query returns a Table object, on results.rows, in your code.
So, something like:
{%- set results = run_query(query) -%}
{%- if execute -%}
{%- set rows = results.rows -%}
{%- endif -%}
flag_true as (
{% for row in rows | selectattr('flag') %}
select * from {{ row.table }} {% if not loop.last %}union{% endif %}
{% endif %}
% endfor %}
),
flag_false as (
{% for row in rows | rejectattr('flag') %}
select * from {{ row.table }} {% if not loop.last %}union{% endif %}
{% endif %}
)

Using column as a parameter in macro dbt jinja

Hello everyone,
I have 28 tables(models) to create. I need to do some transformations on these tables. So I decided to use macro not to repeat myself and also will have more transformations in the future.
I want to remove '%', '$' signs from the column and cast it to float. When is '%' divide it by 100.
I must mention that I used quoting:
identifier: true
I've created this model:
SELECT
{{ clean_values('"Data"') }} AS "Data"
FROM
{{ source('mml_staging_eastor', 'DATAXIS_Development_indicators') }}
I also created macro:
{% macro clean_values(value_column) -%}
{% do log(node, info=true) %}
{# {% set column_value = 'wme%mt' %} #}
{% set column_value = value_column %}
{% set col_val_list = value_column | list %}
{% if '%' in col_val_list %}
'{{ column_value | replace('%', '') }}'
{% elif '$' in col_val_list %}
'{{ column_value | replace('$', '') }}'
{% else %}
{{ col_val_list }}
{% endif %}
{%- endmacro %}
Macros are compiled (templated) before the query is run. That means that the data doesn't run through the jinja templater. When you {% set column_value = value_column %} you're just passing a string with the value value_column into jinja, not the data from the field with that name.
Which is true and I'll get result in my model like: [ """, "D", "a", "t", "a", """ ].
It's possible to use the run_query macro to pull data into the jinja context, but this is slow and error-prone.
*If I'm using commented line, so dedicating string 'wme%mt' to variable it's working fine.
How can I handle this to clean this column in macro?
Thank you in advance!*
I see you found my other answer that attempts to explain the difference between the compilation step and the query execution step. But I think you're making the same mistake as that other person, and you're using jinja to operate on the string that represents the name of the column, instead of using SQL to operate on the data in the column.
Your macro needs to write SQL that can be executed in the database. I can't quite tell from your question if the $ and % are in the data or the column names, but assuming the former (and that you're on Snowflake or similar dialect), your macro would look like this (note that % is a SQL wildcard character meaning "match any number of any char" and must be escaped):
{% macro clean_values(column_name) -%}
case
when {{ column_name }} like '%^%' escape '^' -- number ending with %
then replace({{ column_name }}, '%', '')::float / 100.0
when {{ column_name }} like '$%' -- number starting with $
then replace({{ column_name }}, '$', '')::float
else {{ column_name }}::float
end
{% endmacro %}
I'm using the simple cast operator, :: here, which will throw an exception if it can't cast the expression to a float. You may prefer to use TRY_TO_DOUBLE, which will return NULL instead of throwing an exception, if the column contains any values Snowflake can't cast.

Helm upgrade - preserve value of an environment variable when if condition is not met

I am doing helm upgrade --install with the --reuse-values option. I want to conditionally update the value of an environment variable if a certain condition is met and if the condition is not met preserve the already existing value. I've tried to achieve that this way:
env:
- name: RELEASE_DATE
value: "{{ if eq .Values.config.myCondition "testValue" }}{{ date "2006-01-02T15:04:05" .Release.Time }}{{ end }}"
or
env:
- name: RELEASE_DATE
value: "{{ if eq .Values.config.myCondition "testValue" }}{{ date "2006-01-02T15:04:05" .Release.Time }}{{ else }}{{ .Values.someOtherValue }}{{ end }}"
where .Values.someOtherValue is never really existing, or even doing it this way
env:
{{- if eq .Values.config.deployedService "ftmFrontend" }}
- name: RELEASE_DATE
value: "{{ if eq .Values.config.myCondition "testValue" }}{{ date "2006-01-02T15:04:05" .Release.Time }}{{ end }}"
{{- end }}
I was expecting that if I use a --reuse-values option that the values are going to be kept/preserved if the condition is not met, but instead it always deletes it and leaves it empty.
The option does work however, when there is no condition clause and only the value from the chart is referenced. It's only not working if the if statement is used.
How can I use if statement and still preserve the value if the condition is not met?

Is it possible to neatly flatten a list of maps from values.yaml by selecting a key present in each map?

Suppose I have a values.yaml that contains the following arbitrary-length list of maps:
values.yaml
-----------
people:
- name: "alice"
age: 56
- name: "bob"
age: 42
One of the containers in my stack will need access to a list of just the names. (NAMES="alice,bob")
So far, I have come up with the following solution:
templates/_helpers.tpl
----------------------
{{- define "list.listNames" -}}
{{ $listStarted := false }}
{{- range .Values.people }}
{{- if $listStarted }},{{- end }}{{ $listStarted = true }}{{ .name }}
{{- end }}
{{- end }}
templates/configmap.yaml
------------------------
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Release.Name }}-configmap
data:
NAMES: '{{- include "list.listNames" . }}'
This solution works, but it feels inelegant. Looking through the Helm documentation, it seems like this would be the perfect use case for pluck in combination with join, but I haven't found a way to use .Values.people as an argument to that function.
Is there a cleaner way to do this?
There's a join function.
I've two different solutions to utilize join:
Solution 1
I can't collect the names in a list. Therefore I use a dict and extract the keys later:
{{- define "list.listNames" -}}
{{- $map := dict }}
{{- range .Values.people }}
{{- $_ := set $map .name "" }}
{{- end }}
{{- keys $map | join "," }}
{{- end }}
This solution might change the order. Perhaps sortAlpha could fix this.
Solution 2
Because function must return a map, I need to add a key. Otherwise you can't convert to a dict by fromYaml:
templates/_helpers.tpl
----------------------
{{- define "list.listNames" -}}
items:
{{- range .Values.people }}
- {{ .name }}
{{- end }}
{{- end }}
Then I convert to a dict object by fromYaml and join the list:
templates/configmap.yaml
------------------------
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Release.Name }}-configmap
data:
{{- $items := include "list.listNames" . | fromYaml }}
NAMES: {{ $items.items | join "," | quote }}

How to get values dynamically based on parameters in Helm Chart

I did Helm Chart helper file as follows:
{{- define "logging.log_path" -}}
{{- if and .Values.log_path (eq .Values.app_type "sales") }}
{{- join "," .Values.log_path.sales }}
{{ else if and .Values.log_path (eq .Values.app_type "inventory") }}
{{- join "," .Values.log_path.inventory }}
{{ else if and .Values.log_path (eq .Values.app_type "order") }}
{{- join "," .Values.log_path.order }}
{{ else if and .Values.log_path (eq .Values.app_type "warehouse") }}
{{- join "," .Values.log_path.warehouse }}
{{ else }}
{{- join "," .Values.log_path.sales }}
{{- end }}
{{- end }}
The problem is whenever it's required to add new app_type, I need to add app_type in that file manually. I think it's difficult to maintain and time consuming either.
Is there anyway that I could do like that .Values.log_path[".Values.app_type"] or any solution similar to that one? Thanks.
Helm includes a dictionary helper get
get $myDict "key1"
{{ define "logging.log_path" }}
{{- $path := get .Values.log_path .Values.app_type | default .Values.log_path.sales -}}
{{- join "," $path -}}
{{ end }}