How to Select or Filter out values with endsWith() in Mapping data flow - azure-data-factory

I would like to filter out all values that does not end with ":Q_TT".
I have tried with the following activities
My bronze data has a column named "pnt_name". One of the rows in the table ends with ":Q_TT" so I would expect that the Exists activity would pass that row through.
Custom expression in Exists1
endsWith(':Q_TT', pnt_name)
In the future I would like the SourceData dataset to hold the filter values.
Thanks very much

You should be using filter activity instead of exists activity for your case.
The pipeline(where you can see the 2 test cases of A and B:Q_TT:
Here is the preview of the filter activity, you should use an expression of endsWith(pnt_name, ":Q_TT") too. You can see A is removed and B:Q_TT is kept.

Related

How can I alias labels (using a query) in Grafana?

I'm using Grafana v9.3.2.2 on Azure Grafana
I have a line chart with labels of an ID. I also have an SQL table in which the IDs are mapped to simple strings. I want to alias the IDs in the label to the strings from the SQL
I am trying to look for a transformation to do the conversion.
There is a transformation called “rename by regex”, but that will require me to hardcode for each case. Is there something similar with which I don't have to hardcode for each case.
There is something similar for variables - https://grafana.com/blog/2019/07/17/ask-us-anything-how-to-alias-dashboard-variables-in-grafana-in-sql/. But I don't see anything for transformations.
Use 2 queries in the panel - one for data with IDs and seconds one for mapping ID to string. Then add transformation Outer join and use that field ID to join queries results into one result.
You may need to use also Organize fields transformation to rename, hide unwanted fields, so only right fields will be used in the label at the end.

How to get counts from AlterRow transformation in Azure Data Factory

I have an AlterRow transformation that marks each row with the appropriate CRUD operation in an ADFv2 data flow. I don't see any output variables on this activity that will give me the total inserts, updates, etc. I do, however, see methods in the expression syntax to tell me if a particular row is an IsInsert(), IsUpdate(), etc.
Would the correct way to get counts be to
Add another output from the AlterRow transformation
Add derived column that uses the expression syntax IsInsert(), IsUpdate() to set operation type (I, U, D)
Add an aggregate to group by this column to get total counts for each operation
When creating the aggregate, I don't see any metadata that would allow me to group by the CRUD operation type so I assume I would have to create this myself, but it seems like it should already be there since that's the purpose of the AlterRow transformation. Am I working too hard to get these counts?
Add an aggregate after your AlterRow with no group-by and use these formulas:

Querying on multiple LINKMAP items with OrientDB SQL

I have a class that contains a LINKMAP field called links. This class is used recursively to create arbitrary hierarchical groupings (something like the time-series example, but not with the fixed year/month/day structure).
A query like this:
select expand(links['2017'].links['07'].links['15'].links['10'].links) from data where key='AAA'
Returns the actual records contained in the last layer of "links". This works exactly as expected.
But a query like this (note the 10,11 in the second to last layer of "links"):
select expand(links['2017'].links['07'].links['15'].links['10','11'].links) from data where key='AAA'
Returns two rows of the last layer of "links" instead:
{"1000":"#23:0","1001":"#24:0","1002":"#23:1"}
{"1003":"#24:1","1004":"#23:2"}
Using unionAll or intersect (with or without UNWIND) results in this single record:
[{"1000":"#23:0","1001":"#24:0","1002":"#23:1"},{"1003":"#24:1","1004":"#23:2"}]
But nothing I've tried (including various attempts at "compound" SELECTs) will get the expand to work as it does with the original example (i.e. return the actual records represented in the last LINKMAP).
Is there a SQL syntax that will achieve this?
Note: Even this (slightly modified) example from the ODB docs does not result in a list of linked records:
select expand(records) from
(select unionAll(years['2017'].links['07'].links['15'].links['10'].links, years['2017'].links['07'].links['15'].links['11'].links) as records from data where key='AAA')
Ref: https://orientdb.com/docs/2.2/Time-series-use-case.html
I'm not sure of what you want to achieve, but I think it's worth trying with values():
select expand(links['2017'].links['07'].links['15'].links['10','11'].links.values()) from data where key='AAA'

How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.

SSRS: Call a dataset from a textbox with a parameter

I have a dataset that is a query which has a where clause like this 'where field1 like #parameter1' parameter1 is a string defined as a parameter in the dataset1. I have various text boxes that calls the dataset with expressions like =First(Fields!field_xx, "Dataset1"). For each textbox I like to specify a different value for #parameter1 when it calls the "dataset1". How can I modify the expression in each textbox as to call the "dataset1" from each of them with a hardcoded value for #parameter1
the query:
SELECT TOP (1) job.job_id, job.originating_server, job.name, job.enabled, job.description, job.start_step_id, job.category_id, job.owner_sid, job.notify_level_eventlog,
job.notify_level_email, job.notify_level_netsend, job.notify_level_page, job.notify_email_operator_id, job.notify_netsend_operator_id, job.notify_page_operator_id,
job.delete_level, job.date_created, job.date_modified, job.version_number, job.originating_server_id, job.master_server, activity.session_id, activity.job_id AS Expr1,
activity.run_requested_date, activity.run_requested_source, activity.queued_date, activity.start_execution_date, activity.last_executed_step_id,
activity.last_executed_step_date, activity.stop_execution_date, activity.job_history_id, activity.next_scheduled_run_date, steps.step_name
FROM sysjobs_view AS job INNER JOIN
sysjobactivity AS activity ON job.job_id = activity.job_id INNER JOIN
sysjobsteps AS steps ON activity.last_executed_step_id = steps.step_id AND activity.job_id = steps.job_id
WHERE (job.name LIKE 'Actual Job Name')
ORDER BY activity.start_execution_date DESC
It is not possible to call a dataset with different parameters in the same report execution. Every execution and rendering of the report fetches each dataset only once.
This means that you have to construct your dataset in a way so that it returns all the data you need, to populate each of your textboxes.
Depending on your data model, you may want to add more columns to your dataset, or return the data in multiple rows. If you have multiple rows, then you can use the Lookup function in an expression, to filter out the row in each individual textbox.
Perhaps if you elaborated a little more on what your report should look like, and what the structure of the data you are fetching is, it would be possible to give a better answer to how to solve your problem with a single dataset.