copy tables from one database to another using macros in dbt - macros

What I want to achieve is to copy tables from one database to another using macros in dbt
I created a macro which list my table names from a schema as below
{%- macro get_dimension_names(database = target.database, schema = 'dbo', dry_run = True) -%}
{% set get_tables_query %}
select distinct table_name
--- '{{ database | upper }}.' || table_schema ||'.'|| table_name
from {{database}}.information_schema.tables
where table_schema = upper('{{schema}}')
and table_schema ='DBO'
and table_type='BASE TABLE'
{% endset %}
{{ log('\nGenerating dimensions list...\n', info=True) }}
{% set get_dimension_names = run_query(get_tables_query).columns[0].values() %}
-- iterates through the statments generated and executes them if dry_run is false else it just logs the statements
{% for query in get_dimension_names %}
{% if dry_run %}
{{ log(query, info=True) }}
{% else %}
{{ log('Generating tables: ' ~ query, info=True) }}
{% do run_query(query) %}
{% endif %}
{% endfor %}
{%- endmacro -%}
I have another macro which calls the above macro to generate sql select * from the above list to generate sql statements
{%- macro statement_list() -%} ------database = target.database, schema = 'dbo', dry_run = True) -%}
{% set tables = get_dimension_names() %} --- this is from macro which lists table names
{{ log('\nGenerating sql statements list...\n', info=True) }}
{% set query %}
select *
from tables
-- {% if tables -%}
-- where lower(table_name) in {% for t in tables -%} {{ t }} {%- endfor -%}
--- {%- endif -%}
{% endset %}
{{ log('\nGenerating query...\n', info=True) }}
{% set results = run_query(query).columns[0].values() %} -------.columns[0].values() %}
-- iterates through the statments generated and executes them if dry_run is false else it just logs the statements
{% for query in results %}
{% if dry_run %}
{{ log(query, info=True) }}
{% else %}
{{ log('Generating tables: ' ~ query, info=True) }}
{% do run_query(query) %}
{% endif %}
{% endfor %}
{%- endmacro -%}
I am getting the below error
17:58:25 Encountered an error while running operation: Database Error
002003 (42S02): SQL compilation error:
Object 'TABLES' does not exist or not authorized.
I did change my second macro as below
{%- macro generate_dimension_names(database = target.database, schema = 'dbo') -%}
{% set get_tables_list %}
-- query to select table names with column __is_row_current as not all tables in dbo have __is_row_current column
select distinct table_name
from {{database}}.information_schema.columns
where
column_name = '__IS_ROW_CURRENT'
AND table_schema ='DBO'
-- and table_name in ( 'POSTCODE')
{% endset %}
{% set get_tables = run_query(get_tables_list).columns[0].values() %}
{{return(get_tables)}}
{%- endmacro -%}
Even though the macro generates create statements, it is throwing an error as below
Encountered an error while running operation: Database Error
001003 (42000): SQL compilation error:

There are quite a few issues with your code.
The one generating the error:
Your statement_list macro contains the following:
{% set query %}
select *
from tables
...
{% endset %}
Normally you don't nest curlies in jinja, but in a set block, if you want to template a variable, you need to include the curlies:
{% set query %}
select *
from {{ table }}
...
{% endset %}
In the statement_list macro, you're not iterating over the list of tables properly. You need something like this, which will compile to a single string that will have all of your select statements separated by a semicolon:
{% set query %}
{% for table in tables %}
create or replace table destination_schema.{{table}} as (
select *
from {{ table }}
...
)
;
{% endfor %}
{% endset %}
Then once you have query defined, you just run that single statement (gated by dry_run if you want)
You don't return anything from your get_dimension_names macro. You need to add:
{% set dimension_names = run_query(get_tables_query).columns[0].values() %}
{{ return(dimension_names) }}
You need to remove this final block from your get_dimension_names, which seems like it belongs in your other macro:
-- iterates through the statments generated and executes them if dry_run is false else it just logs the statements
{% for query in get_dimension_names %}
{% if dry_run %}
{{ log(query, info=True) }}
{% else %}
{{ log('Generating tables: ' ~ query, info=True) }}
{% do run_query(query) %}
{% endif %}
{% endfor %}
{%- endmacro -%}
Finally, I don't think this is a great approach. If you want to persist a model into a new schema, you can use dbt's custom schema model config. If you want them in two places, you should probably use a view (which you can manage as an additional dbt model with a custom schema config) or if on snowflake, a zero-copy clone.

Related

Postgres Update Statement: how to set a column's value only if the column name contains a certain substring?

Each of the following tables ['table1', 'table2'] are part of the public schema, knowing that each table may contain multiple columns containing the substring 'substring' in the name for Example let's look at the following :
Table_1 (xyz, xyz_substring,...some_other_columns... abc,
abc_substring)
Table_2 (xyz, xyz_substring,..some_other_columns... abc,
abc_substring)
I am coming to this from a pythonic way of thinking, but basically, how could one execute a statement without knowing exactly what to set since the columns we need to target need to meet a certain criteria ?
I had the idea to add a another loop over the column names of the current table and check if the name meets the criteria. then execute the query but It feels very far away from optimal.
DO $$
declare
t text;
tablenames TEXT ARRAY DEFAULT ARRAY['table_1', 'table_2'];
BEGIN
FOREACH t IN ARRAY tablenames
LOOP
raise notice 'table(%)', t;
-- update : for all the column that contain 'substring' in their names set a value
END LOOP;
END$$;
EDIT:
Thanks for the answer #Stefanov.sm , i followed exactly your thought and was able to base your logic to have just one statement :
DO $$
declare
t text;
tablenames TEXT ARRAY DEFAULT ARRAY['table_1', 'table_2'];
dynsql text;
colname text;
BEGIN
FOREACH t IN ARRAY tablenames LOOP
raise notice 'table (%)', t;
dynsql := format('update %I set', t);
for colname in select column_name from information_schema.columns where table_schema = 'public' and table_name = t
loop
if colname like '%substring%' then
dynsql := concat(dynsql,format(' %I = ....whatever expression here (Make sure to check if you should use Literal formatter %L if needed) ....,',colname,...whateverargs...));
end if;
end loop;
dynsql := concat(dynsql,';'); -- not sure if required.
raise notice 'SQL to execute (%)', dynsql;
execute dynsql;
END LOOP;
END;
$$;
Extract the list of columns of each table and then format/execute dynamic SQL. Something like
DO $$
declare
t text;
tablenames text[] DEFAULT ARRAY['table_1', 'table_2'];
dynsql text;
colname text;
BEGIN
FOREACH t IN ARRAY tablenames LOOP
raise notice 'table (%)', t;
for colname in select column_name
from information_schema.columns
where table_schema = 'public' and table_name = t loop
if colname ~ '__substring$' then
dynsql := format('update %I set %I = ...expression... ...other clauses if any...', t, colname);
raise notice 'SQL to execute (%)', dynsql;
execute dynsql;
end if;
end loop;
END LOOP;
END;
$$;
This will cause excessive bloat so do not forget to vacuum your tables. If your tables' schema is not public then edit the select from information_schema accordingly. You may use pg_catalog resource instead of information_schema too.

How to handle special charaters in PostgreSQL JSON's like_regex statement

I have some trouble in using jsonb's like_regex statement.
I usually use the statement for my jsonb's query statement, that is:
jsonb #? jsonpath
The query condition is transfered from the front as a json statement, just like:
{
"fname=":"Tiger",
"flocation~*":"Shenzhen, China, (86)"
}
I write a function to parse the json statement to the jsonpath statement, like:
jfilter:='{"fname=":"Tiger","flocation~*":"Shenzhen, China, (86)"}'::jsonb;
cur refcursor;
vkey text;
vval text;
vrule text;
open cur for select * from jsonb_each(jfilter);
loop
fetch cur into vkey,vval;
if found then
if strpos(vkey,'=')>0 then
...
vrule:=concat('#.',vkey,vval::text);
...
elseif strpos(vkey,'~*')>0 then
...
vrule:=concat('#.',vkey,' like_regex ',vval::text,' flag "i"');
...
end if;
else
exit;
end if;
end loop;
close cur;
And then I get the jsonpath like:
'$[*] ? (#.fname=="Tiger" && #.flocation like_regex "Shenzhen, China, (86)" flag "i")'
When the statement contains '(' or ')', the regulation fails. In fact, for me, '(' or ')' is just a normal part of the condition. I tried to replace the '(' to '\(', but it doesn't work. I know why the statement is failed, but I don't know how to handle this kind of problem.
Pls give me some advice, thanks very much.

FOREACH expression must not be null

At declaration level i have:
sqlDel text := 'DELETE FROM %s WHERE %s IS NULL';
fields text[];
field_name text;
ptable := 'myTable';
Somewhere behind i fill in fields so it contains 3 items - i checked it's fine. Nevertheless down below i have this for loop statement which worked fine until i added this line :
EXECUTE format(sqlDel, ptable, field_name);
error says:
ERROR: FOREACH expression must not be null
Foreach loop:
FOREACH field_name IN ARRAY fields
LOOP
EXECUTE format(sqlDel, ptable, field_name);
raise notice 'Primary key column: %', field_name;
END LOOP;
The error message is clean - variable fields is null. You should to set it first.
fields = ARRAY['id'];
FOREACH ...

Pass entire where condition as argument in postgresql

I want to pass entire where condition as parameter in Postgrel.
Is it possible? I have created procedure like,
CREATE OR REPLACE FUNCTION public.pro_select_all_item_query_builder_data(IN cond character varying)
RETURNS TABLE(id integer, name character varying) AS
$BODY$
BEGIN
FOR id, name IN
SELECT product.id, product.name FROM product WHERE cond
.............
and call it like,
SELECT *
FROM pro_select_all_item_query_builder_data('product.status_id = 1')
It showing ERROR: argument of WHERE must be type boolean, not type character varying
Can you please support me to solve this issue?
Make sure cond has a valid condition, otherwise it will throw syntax error.
FOR id, name IN
EXECUTE ('SELECT product.id, product.name FROM product WHERE ' || cond)
LOOP
....
END LOOP;

PostgreSQL CASE usage in functions

Can't we use CASE condition outside SQL SELECT statements?
E.g.:
CASE
WHEN old.applies_to = 'admin' THEN _applies_to = 'My Self'
ELSE _applies_to = initcap(old.applies_to)
END
_summary = _summary || '<li>Apply To: ' || _applies_to || '</li>';
I get the following error:
ERROR: syntax error at or near "_summary"
LINE 86: _summary = _summary || '<li>Apply To: ' || _applies ...
This concerns the conditional control structure CASE of the procedural language PL/pgSQL, to be used in plpgsql functions or procedures or DO statements.
Not to be confused with CASE expression of SQL. Different language! Subtly different syntax rules.
While SQL CASE can be embedded in SQL expressions inside PL/pgSQL code (which is mostly just glue for SQL commands), you cannot have stand-alone SQL CASE expressions (would be nonsense).
-- inside a plpgsql code block:
CASE
WHEN old.applies_to = 'admin' THEN
_applies_to := 'My Self';
ELSE
_applies_to := initcap(old.applies_to);
END CASE;
You have to use fully qualified statements, terminated with semicolon (;) and END CASE to close it.
Answer to additional question in comment
According to documentation the ELSE keyword of a CASE statement is not optional. I quote from the link above:
If no match is found, the ELSE statements are executed;
but if ELSE is not present, then a CASE_NOT_FOUND exception is raised.
However, you can use an empty ELSE:
CASE
WHEN old.applies_to = 'admin' THEN
_applies_to := 'My Self';
ELSE
-- do nothing
END CASE;
This is different from SQL CASE expressions where ELSE is optional, but if the keyword is present, an expression has to be given, too!