Implementing scd2 in dbt using aws redshift, how do I define conditional natural keys?

Implementing scd2 in dbt using aws redshift, how do I define conditional natural keys? - amazon-redshift

Implementing scd2 in dbt using aws redshift. How do I define conditional natural keys?
unique_id = ['crm_id', 'curr_recrd_flg', 'actve_flg']
I want to provide conditions like curr_recrd_flg = 'Y' and actve_flg = 'Y'
Thanks in advance!

dbt can automate the creation of an SCD Type-2 table from a dimension table using its snapshot feature.
First, you define your snapshot in a .sql file inside of a directory called snapshots inside of your project directory:
-- snapshots/my_snapshot.sql
{% snapshot my_snapshot() %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='crm_id',
strategy='timestamp',
updated_at='updated_at',
)
}}
-- QUERY TEXT HERE
{% endsnapshot %}
The query in your snapshot definition can do anything. In your case, it seems like you want to filter on the "current record" fields, so that would look like:
-- snapshots/my_snapshot.sql
{% snapshot my_snapshot() %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='crm_id',
strategy='timestamp',
updated_at='updated_at',
)
}}
select *
from {{ source('my_source', 'my_table') }}
where curr_recrd_flg = 'Y' and actve_flg = 'Y'
{% endsnapshot %}
Then you can execute dbt snapshot at the command line, and dbt will capture the rows returned by your query. Executing dbt snapshot again will execute the query again and update the snapshot table with any changes, in the style of SCD-2.

Related

Grafana Reference DataSet Variable to Translate Legend Values using Postgres Driver

I have a postgres data-source in Grafana that's normalized which restricts my graph-visualization legend to show only the ID (hash) of my record. I want to make this human-readable but the id -> name mapping is in a different datasource/postgres database.
Grafana supports templating-variables which I think could allow me to load my id -> naming reference data but there isn't clear documentation on how to access the label_values as a reference-table within the postgres driver's query editor.
Is there a way to configure the template variable to load reference data (id -> name) & leverage it to translate my metric/legend ids within the grafana postgres driver?
For Example (pseudo-grafana postgres query editor):
SELECT
$__timeGroupAlias(start,$__interval),
animal_names.__value AS metric,
count(dog.chewed_bones) AS “# bones chewed“
FROM animals.dog dog
JOIN $TEMPLATE_VAR_REF_DATA animal_names ON dog.id = animal_names.__text
WHERE $__timeFilter(start_time)
GROUP BY 1,2
ORDER BY 1,2
Closest answer I found is here but doesn't get into details:
johnymachine's comment # https://github.com/grafana/grafana/issues/1032

I realized the github comment meant use a jsonb aggregate function as a variable like in the following solution:
Dashboard Variable (Type Query): select jsonb_object_agg(id,name) from animal_names;
Grafana Postgres Pseudo-Query:
SELECT
$__timeGroupAlias(start,$__interval),
animal_names::jsonb ->> dog.id::text AS metric,
count(dog.chewed_bones) AS “# bones chewed“
FROM animals.dog
WHERE $__timeFilter(start_time)

Query records before insertion with Ecto (similar to an AR callback)

I'm new to Elixir and Phoenix (less than 10 days) but very excited about it and like many others I come from a Rails background.
I understand Ecto is not AR and callbacks have been deprecated or removed but I need to add a custom validation that should only happen on creation and needs to perform a query.
Here's what my Reservation model basically looks like.
schema "reservations" do
field :ends_at, :utc_datetime
field :name, :string, null: false
field :starts_at, :utc_datetime
field :user_id, :id
end
and then I have another schema Slot, which looks like this:
schema "slots" do
field :ends_at, :utc_datetime
field :name, :string, null: false
field :starts_at, :utc_datetime
field :admin_id, :id
end
Whenever I'm adding a new reservation, I need to query my DB to check if there are any slots with matching ends_at and starts_at. If there are, I need to prevent the record from being saved and add an error to it (similar to what in Rails we accomplish with throw :abort and errors.add).
Can someone please shed a light on this? What's the Ecto way of doing this?
Best regards

*edit: added examples using separate changesets for creation and updation
You can add a custom validation function in your changeset validation chain and do DB queries in it.
Haven't run this code, but something like this should work
# separate changeset for creation
def create_changeset(struct, params) do
struct
|> cast(params, [...list of fields...])
|> validate_unique([:name]) # lets say it has to be unique
|> validate_slots # -- custom validation
end
# separate changeset for updation, no slot-check
def update_changeset(struct, params) do
struct
|> cast(params, [...list of fields...])
|> validate_unique([:name]) # lets say it has to be unique
end
def validate_slots(changeset) do
starts_at = get_field(changeset, :starts_at)
ends_at = get_field(changeset, :ends_at)
slots = Repo.all(from s in Slot, where: s.starts_at == ^starts_at and s.ends_at == ^ends_at)
if Enum.empty?(slots) do
changeset
else
add_error( changeset, :starts_at, "has slot with similar starts_at/ends_at")
end
end
#---- using the changesets
# creation
%Reservation{} |> Reservation.create_changeset(params) |> Repo.insert()
# updation
%Reservation{} |> Reservation.update_changeset(params) |> Repo.update()
Although, from the look of it, you should probably normalize your starts_at and ends_at into a separate table called booking_time_frame or something and add unique indexes to it.
Or you might end up with more types of bookings and then have to check starts_at/ends_at across 3 tables and so on.

TALEND : How to log component level runtime information

Consider the below image part of my talend job
I am aware of the advanced setting int Talend Studio.
I want to be able to log output the whole runtime dynamic values substituted in the query for CREATE_RULE_TICKET component .
For example lets say the component has the following query
SELECT START_DATE FROM TABLENAME WHERE CIF IN ('"+globalMap.get("cif")+')
The log should show me the runtime value for CIF
SELECT START_DATE FROM TABLENAME WHERE CIF IN ('HU8909','JKO98')
How do we go about it?

The component has a global variable QUERY, which returns your query after it has been constructed, so you can log it in a tJava like:
tHiveRow -- OnComponentOk -- tJava (System.out.println((String)globalMap.get("tHiveRow_1_QUERY"));)

OnComponentOrder flow and tMap connections in Talend

I have the following flow:
1 Component that needs to be executed to extract from MYSQL a certain
timestamp
3 MYSQL inputs that needs to use that timestamp
1 tMap which needs to get the 3 mysql input
However, I am not allowed to connect the 3 mysql into the single tMap because they are depending on the first component (through OnComponentOk) but with different order. How do I orchestrate this sort of situations?

You could execute a query and set a global variable using the tSetGlobalVar component (referencing row1.mydate, for example), then in each of your queries going into tMap, reference the global variable like:
SELECT ...
FROM ...
WHERE mydate >= '" + (String) globalMap.get("myDate") + "';"
Two subjobs, one for getting the variable and storing it, and another for doing your three queries into tMap, etc.

Data Dictionary generators for PostgreSQL to Confluence Wiki

I'm looking for a tool that takes PostgreSQL tables and outputs a Data Dictionary in a wiki format (preferably Confluence). It seems like most tools out there require a lot of manual work/multiple tools to accomplish this task (IE> SchemaSpy, DB Visual Architect, Confluence plugins to take outputted HTML DD and convert to Confluence). I'm looking for ONE tool that will scan my Postgres tables and output a wiki friendly Data Dictionary that will allow seamless maintenance as the DB changes, without having to update my database and the DB schema in the other tool.

There is the Bob Swift's Confluence SQL Plugin that allows you to display data derived from a SQL query in a Confluence Page .. e.g. as a table .. perhaps it is worth a look for you?
Confluence versions 3.1.x - 4.9.x are currently supported...
The plugin is free and can be downloaded from Atlassian's Plugin Exchange: https://plugins.atlassian.com/plugins/org.swift.confluence.sql
Additional information about the plugin can be found here:
https://studio.plugins.atlassian.com/wiki/display/SQL/Confluence+SQL+Plugin

I think you'll have to script this yourself, but it's pretty easy and fun. I'll assume Python here.
I like the Confluence XML-RPC interface. For that, see http://goo.gl/KCt3z. The remote methods you care about are likely login, getPage, setPage and/or updatePage. This skeleton will look like:
import xmlrpclib
server = xmlrpclib.Server(opts.url)
conn = server.confluence1
token = conn.login(opts.username, opts.password)
page = conn.getPage(token,'PageSpace',page_title)
page = page + table
page = conn.updatePage(token,page,update_options)
table here is the data from PG tables. We'll build that below.
For pulling simple data from PostgreSQL, I use psycopg2 most often (also consider SQLSoup). Regardless of how you fetch data, you'll end up with a list of rows as a dictionary. The database part will probably look like:
import psycopg2, psycopg2.extras
conn = psycopg2.connect("dbname=reece")
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur.execute('SELECT * FROM sometable')
rows = cur.fetchall()
Now you need to format the data. For simple stuff, print= statements will work. For more complicated formatting, consider a templating engine like jinja2 (http://jinja.pocoo.org/). The rendering code might look like this:
from jinja2 import Template
template = Template(open(template_path).read())
table = template.render( rows = rows )
The file template_path will contain the formatting template, which might look like:
<table>
<tr>
<th>col header 1</th>
<th>col header 2</th>
</tr>
{% for row in rows|sort -%}
<tr>
<td>{{row.col1}}</td>
<td>{{row.col2}}</td>
</tr>
{% endfor %}
</table>
Note: Confluence no longer uses the wiki markup by default. You should be writing HTML.
Finally, if you want to make a page for all tables, you can look at information_schema, which contains information about the database as tables. For example:
select table_name from information_schema.tables where table_schema = current_schema();

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Implementing scd2 in dbt using aws redshift, how do I define conditional natural keys? - amazon-redshift

Implementing scd2 in dbt using aws redshift. How do I define conditional natural keys? unique_id = ['crm_id', 'curr_recrd_flg', 'actve_flg'] I want to provide conditions like curr_recrd_flg = 'Y' and actve_flg = 'Y' Thanks in advance!

Related

Grafana Reference DataSet Variable to Translate Legend Values using Postgres Driver

Query records before insertion with Ecto (similar to an AR callback)

TALEND : How to log component level runtime information

OnComponentOrder flow and tMap connections in Talend

Data Dictionary generators for PostgreSQL to Confluence Wiki

Categories

Resources