select ID and sql query (count query) from table and write ID and result of count query to target table - talend

My source table has query id and a sql query.
Using Talend I need to run this CUSTOM_SQL query against the database and load a target table with the counts.
source table:
QUERY_ID|CUSTOM_SQL |
--------+----------------------------------------------------------------+
1|select count(1) as ROW_COUNT from SYSTEM_PRIVILEGE_MAP |
2|select count(1) as ROW_COUNT from OGIS_SPATIAL_REFERENCE_SYSTEMS|
3|select count(1) as ROW_COUNT from SDO_COORD_SYS |
4|select count(1) as ROW_COUNT from SDO_COORD_REF_SYS |
5|select count(1) as ROW_COUNT from SDO_PREFERRED_OPS_SYSTEM |
6|select count(1) as ROW_COUNT from SDO_TIN_PC_SYSDATA_TABLE |
expected output in target table:
QUERY_ID|QUERY_RESULT |
--------+-------------+
1|290 |
2|322 |
3|784 |
4|8484 |
5|743 |
I created a job that looks as follows but it is not complete:
tdbInput -> tFlowIterate -> tDBInput -> tMap -> tDBOutput
With the above design I'm able to run the CUSTOM_SQL, capture the result from tDBInput, but unable capture and propagate the QUERY_ID.
How do I propagate both query_id and the query result in one row to the target table. What components should I use?
Please note that each CUSTOM_SQLs always return one row and one column. So this is a very specific usecase.
I simplified my scenario by using some dummy data.
I will appreciate any help on this.
Thank you!

With your first tDBInput component, make sure you extract both QUERY_ID and CUSTOM_SQL (select QUERY_iD,CUSTOM_SQL from source_table) : you should get in global variables of your tFlowToIterate two variables (something like ((String)globalMap.get("row1.QUERY_ID")) , with row1 being the name of the flow between tDBInput and tFlowToIterate). You can also check in "outline" view if those variables appear under tFlowToIterate_1 component.
Then in tMap, you can just access this query_id global variable (with the above syntax) and push it to your target table.

Related

ADF - Dataflow, using Join to send new values

there are two tables
tbl_1 as a source data
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5
6 | A00_6
7 | A00_7
tbl_2 as destination. In this table, Submission_id is unique key.
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
tbl_1 as input value and tbl_2 as destination (sink). Expected result is only A00_5, A00_6 & A00_7 sent to tbl_2. So, this picture below is the Join
for AlterRow,
expected ouput
tbl_2
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5 -->(new)
6 | A00_6 -->(new)
7 | A00_7 -->(new)
But, output result from alterRow are all Submission_id. It should be only not equal comparison that has been stated in the alter row condition,
notEquals(DC__Submission_ID_BigInt, SrcStgDestination#{_Submission_ID}).
How to solve this problem in Azure DataFlow use 'Join' ?
I tried doing the same procedure and got the same result (all rows getting inserted). We were able to perform join in the desired way but couldn’t proceed further to get the required output. You can use the approach given below instead, which is achieved using JOINS.
In general, when we want to get records from table1 which are not present in table2, we execute the following query (in sql server).
select t1.id,t1.submission_id from t1 left outer join t2 on t1.submission_id = t2.submission_id where t2.submission_id is NULL
In the Dataflow, we were able to achieve the join successfully (same procedure as yours). Now instead using alter row transformation, I used filter transformation (to achieve t2.submission_id is NULL condition). I used the following expression (condition) to filter.
isNull(d1#submission_id) && isNull(d1#id)
Now proceed to configure the sink (tbl_2). The preview would show the records as in the below image.
Publish and run the dataflow activity in your pipeline to get the desired results.

Select by id and generate column with relationships in array

Essentially what i want to do is to get by id from "Tracks" but i also want to get the relations it has to other tracks (found in table "Remixes").
I can write a simple query that gets the track i want by id, ex.
SELECT * FROM "Tracks" WHERE id IN ('track-id1');
That gives me:
id | dateModified | channels | userId
-----------+---------------------+-----------------+--------
track-id1 | 2019-07-21 12:15:46 | {"some":"json"} | 1
But this is what i want to get:
id | dateModified | channels | userId | remixes
-----------+---------------------+-----------------+--------+---------
track-id1 | 2019-07-21 12:15:46 | {"some":"json"} | 1 | track-id2, track-id3
So i want to generate a column called "remixes" with ids in an array based on the data that is available in the "Remixes" table by a SELECT query.
Here is example data and database structure:
http://sqlfiddle.com/#!17/ec2e6/3
Don't hesitate to ask questions in case anything is unclear,
Thanks in advance
Left join the remixes and then GROUP BY the track ID and use array_agg() to get an array of the remix IDs.
SELECT t.*,
CASE
WHEN array_agg(r."remixTrackId") = '{NULL}'::varchar(255)[] THEN
'{}'::varchar(255)[]
ELSE
array_agg(r."remixTrackId")
END "remixes"
FROM "Tracks" t
LEFT JOIN "Remixes" r
ON r."originalTrackId" = t."id"
WHERE t."id" = 'track-id1'
GROUP BY t."id";
Note that, if there are no remixes array_agg() will return {NULL}. But I figured you rather want an empty array in such a case. That's what the CASE is for.
BTW, providing a fiddle is a nice move of yours! But please also include the code in the original question. The fiddle site might be down (even permanently) and that renders the question useless because of the missing information.
That's a simple outer join with a string aggregation to get the comma separated list:
SELECT t.*,
string_agg(r."remixTrackId", ', ') as remixes
FROM "Tracks" t
LEFT JOIN "Remixes" r ON r."originalTrackId" = t.id
WHERE t.id = 'track-id1'
GROUP BY t.id;
The above assumes that Tracks.id is the primary key of the Tracks table.

Recursive postgres query to view

I have the following table which models a very simple hierarchical data structure with each element pointing to its parent:
Table "public.device_groups"
Column | Type | Modifiers
--------------+------------------------+---------------------------------------------------------------
dg_id | integer | not null default nextval('device_groups_dg_id_seq'::regclass)
dg_name | character varying(100) |
dg_parent_id | integer |
I want to query the recursive list of subgroups of a specific group.
I constructed the following recursive query which works fine:
WITH RECURSIVE r(dg_parent_id, dg_id, dg_name) AS (
SELECT dg_parent_id, dg_id, dg_name FROM device_groups WHERE dg_id=1
UNION ALL
SELECT dg.dg_parent_id, dg.dg_id, dg.dg_name
FROM r pr, device_groups dg
WHERE dg.dg_parent_id = pr.dg_id
)
SELECT dg_id, dg_name
FROM r;
I now want to turn this into a view where I can choose which group I want to drill down for using a WHERE clause. This means I want to be able to do:
SELECT * FROM device_groups_recursive WHERE dg_id = 1;
And get all the (recursive) subgroups of the group with id 1
I was able to write a function (by wrapping the query from above), but I would like to have a view instead of the function.
Side-Node: I know of the shortcoming of an adjacency list representation, I cannot change it currently.

group by in postgres sql with error must appear in the GROUP BY clause or be used in an aggregate function [duplicate]

I've been migrating some of my MySQL queries to PostgreSQL to use Heroku. Most of my queries work fine, but I keep having a similar recurring error when I use group by:
ERROR: column "XYZ" must appear in the GROUP BY clause or be used in
an aggregate function
Could someone tell me what I'm doing wrong?
MySQL which works 100%:
SELECT `availables`.*
FROM `availables`
INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056 AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
GROUP BY availables.bookdate
ORDER BY availables.updated_at
PostgreSQL error:
ActiveRecord::StatementInvalid: PGError: ERROR: column
"availables.id" must appear in the GROUP BY clause or be used in an
aggregate function:
SELECT "availables".* FROM "availables" INNER
JOIN "rooms" ON "rooms".id = "availables".room_id WHERE
(rooms.hotel_id = 5056 AND availables.bookdate BETWEEN E'2009-10-21'
AND E'2009-10-23') GROUP BY availables.bookdate ORDER BY
availables.updated_at
Ruby code generating the SQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => 'availables.bookdate',
:order => 'availables.updated_at')
Expected Output (from working MySQL query):
+-----+-------+-------+------------+---------+---------------+---------------+
| id | price | spots | bookdate | room_id | created_at | updated_at |
+-----+-------+-------+------------+---------+---------------+---------------+
| 414 | 38.0 | 1 | 2009-11-22 | 1762 | 2009-11-20... | 2009-11-20... |
| 415 | 38.0 | 1 | 2009-11-23 | 1762 | 2009-11-20... | 2009-11-20... |
| 416 | 38.0 | 2 | 2009-11-24 | 1762 | 2009-11-20... | 2009-11-20... |
+-----+-------+-------+------------+---------+---------------+---------------+
3 rows in set
MySQL's totally non standards compliant GROUP BY can be emulated by Postgres' DISTINCT ON. Consider this:
MySQL:
SELECT a,b,c,d,e FROM table GROUP BY a
This delivers 1 row per value of a (which one, you don't really know). Well actually you can guess, because MySQL doesn't know about hash aggregates, so it will probably use a sort... but it will only sort on a, so the order of the rows could be random. Unless it uses a multicolumn index instead of sorting. Well, anyway, it's not specified by the query.
Postgres:
SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c
This delivers 1 row per value of a, this row will be the first one in the sort according to the ORDER BY specified by the query. Simple.
Note that here, it's not an aggregate I'm computing. So GROUP BY actually makes no sense. DISTINCT ON makes a lot more sense.
Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in Postgres.
PostgreSQL is more SQL compliant than MySQL. All fields - except computed field with aggregation function - in the output must be present in the GROUP BY clause.
MySQL's GROUP BY can be used without an aggregate function (which is contrary to the SQL standard), and returns the first row in the group (I don't know based on what criteria), while PostgreSQL must have an aggregate function (MAX, SUM, etc) on the column, on which the GROUP BY clause is issued.
Correct, the solution to fixing this is to use :select and to select each field that you wish to decorate the resulting object with and group by them.
Nasty - but it is how group by should work as opposed to how MySQL works with it by guessing what you mean if you don't stick fields in your group by.
If I remember correctly, in PostgreSQL you have to add every column you fetch from the table where the GROUP BY clause applies to the GROUP BY clause.
Not the prettiest solution, but changing the group parameter to output every column in model works in PostgreSQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => Available.column_names.collect{|col| "availables.#{col}"},
:order => 'availables.updated_at')
According to MySQL's "Debuking GROUP BY Myths" http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html. SQL (2003 version of the standard) doesn't requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause.
For others looking for a way to order by any field, including joined field, in postgresql, use a subquery:
SELECT * FROM(
SELECT DISTINCT ON(availables.bookdate) `availables`.*
FROM `availables` INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056
AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
) AS distinct_selected
ORDER BY availables.updated_at
or arel:
subquery = SomeRecord.select("distinct on(xx.id) xx.*, jointable.order_field")
.where("").joins(")
result = SomeRecord.select("*").from("(#{subquery.to_sql}) AS distinct_selected").order(" xx.order_field ASC, jointable.order_field ASC")
I think that .uniq [1] will solve your problem.
[1] Available.select('...').uniq
Take a look at http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields

Postgresql query results to depend on few rows of same table

I'm working on some application, and we're using postgres as our DB. I don't a lot of experience with SQL at all, and now i encountered a problem, that i can't find answer to.
So here's a problem:
We have privacy settings stored in separate table, and accessibility of each row of data depends on few rows of this privacy table.
Basically structure of privacy table is:
entityId | entityType | privacyId | privacyType | allow | deletedAt
-------------------------------------------------------------------
5 | user | 6 | user | f | //example entry
5 | user | 1 | user_all | t |
In two words, this settings mean, that user id5 allows to have access to his data to everybody except user id6.
So i get available data by query like:
SELECT <some_relevant_fields> FROM <table>
JOIN <join>
WHERE
(privacy."privacyId"=6 AND privacy."privacyType"='user' AND privacy.allow=true)
OR (
(privacy."privacyType"='user_all' AND privacy."deletedAt" IS NOT NULL)
AND
(privacy."privacyType"='user' AND privacy."privacyId"=6 AND privacy.allow!=false)
);
I know that this query is incorrect in this form, but i want you to get idea of what i try to achieve.
So it must check for field with its type/id and allow=true, OR check that user_all is not deleted(deletedAt field is null) and there is no field restricting access with allow=false to this user.
But it seems like postgres is chaining all expressions, so it overrides privacy."privacyType"='user_all' with 'user' at the end of expression, and returns no results, or returns data even if user "blocked", because 'user_all' exist.
Is there a way to write WHERE clause to return result if AND expression is true for 2 different rows, for example in code above: (privacy."privacyType"='user_all' AND privacy."deletedAt" IS NOT NULL) is true for one row AND (privacy."privacyType"='user' AND privacy."privacyId"=6 AND privacy.allow!=false) is true for other, or maybe check for absence of row with this values.
Is this what you want?
select <some_fields> from <table> where
privacyType='user_all' AND deletedAt IS NOT NULL
union
select <some_fields> from <table> where
privacyType='user' AND privacyId=6 AND allow<>'f';
You left join the table with itself and found what element doesnt have a match using the where.
SELECT p1.*
FROM privacy p1
LEFT JOIN privacy p2
ON p1."entityId" = p2."entityId"
AND p1."privacyType" = 'user_all'
AND p1."deletedAt" IS NULL
AND p2."privacyType"='user' AND
AND p2."privacyId"= 6
AND p2.allow!=false
WHERE
p2.privacyId IS NOT NULL