Azure Data Factory LOOKUP Activity issues - azure-data-factory

I have the following pipeline with a range of activities, see image below.
I keep on getting the error with my lookup activity
Failure happened on 'Source' side.
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed with the following error: 'Invalid column
name
'updated_at'.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Invalid
column name 'updated_at'.,Source=.Net SqlClient Data
Provider,SqlErrorNumber=207,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=207,State=1,Message=Invalid
column name 'updated_at'.,},],'
I kind of know what the problem is.. the lookup isn't looping through the individual tables to find the column name 'updated_at'.
But, I don't understand why.
The Lookup 'Lookup New Watermark' activity has the following query
SELECT MAX(updated_at) as NewWatermarkvalue FROM #{item().Table_Name}
The ForEach activity 'For Each Table' as the following for Items:
#activity('Find SalesDB Tables').output.value
The Lookup activity 'Find SalesDB Tables' has the following query
SELECT QUOTENAME(table_schema)+'.'+QUOTENAME(table_name) AS Table_Name FROM information_Schema.tables WHERE table_name not in ('watermarktable', 'database_firewall_rules')
The only thing I can see that is wrong with the 'Lookup New Watermark' actvitiy is that its not looping through table. Can someone let me know what is needed.
Just to show the column exists I adjusted the connection from
To the following:
And the Lookup was able to find the updated_at column on dbo.Products, but couldn't locate the updated_at column on the other 4 tables.
Therefore, I'm suggesting the problem is that the Lookup activity isn't iterating over the tables automatically.

The error is when using the following query on a table that does not have updated_at column, we get this error.
SELECT MAX(updated_at) as NewWatermarkvalue FROM #{item().Table_Name}
The items field in for each activity was given the value as #activity('FindSalesDBTables').output.value (returns a list of table names). Inside the for each, when we use the above query, it will be executed as following:
#first iteration
SELECT MAX(updated_at) as NewWatermarkvalue FROM <table_1>
#second iteration
SELECT MAX(updated_at) as NewWatermarkvalue FROM <table_2>
.
.
...
During this process, when we use the above query on a table that does not have updated_at column, it gives the same error. The following is a demonstration of the same.
I created 2 tables (for demonstration) called t1 and t2.
create table t1(id int, updated_at int)
create table t2(id int, up int)
I used look up activity to get the list of table names using the following query:
SELECT QUOTENAME(table_schema)+'.'+QUOTENAME(table_name) AS Table_Name FROM information_Schema.tables WHERE table_name not in ('watermarktable', 'database_firewall_rules','ipv6_database_firewall_rules')
Inside the for each activity (looping through #activity('lookup1').output.value), I have tried the same query as given.
SELECT MAX(updated_at) as NewWatermarkvalue FROM #{item().Table_Name}
After debugging the pipeline, we can observe that it produces the same error.
For iteration where the table is t1 (has updated_at column):
For iteration where the table is t2 (does not have updated_at column):
If you publish and run this pipeline, the pipeline will fail giving the same error.
Therefore, try to check if the updated_at column exists or not in the particular table (current for each item). If it does exist, proceed to query it.
Inside for each use look up with the following query. It returns the length of column in bytes if the column exists in a table, else it returns null. Use this result along with If condition activity.
select COL_LENGTH('#{item().Table_Name}','updated_at') as column_exists
Use the following condition in If activity. If it returns false, then it indicates that the particular table contains updated_at column and we can work with it.
#equals(activity('check for column in table').output.firstRow['column_exists'],null)
The following is the debug output for the same (t1 and t2 tables)
You can continue with other required activities inside the False section of the If condition activity using above process.

Related

getting an error as more than one row returned by a subquery used as an expression when trying to insert more than one rows in table

I am trying to insert multiple values into a table from different table in postgresql and encountering an error as [21000]: ERROR: more than one row returned by a subquery used as an expression
INSERT INTO coupon (id,entityid)
values
(select nextval('seq_coupon')),(select entityid from card where country in ('China')));
This query [select entityid from card where country in ('China'))] has multiple rows.
Any help is much appreciated.
If you want to insert rows that come from a SELECT query, don't use the values clause. The SELECT query you use for the second column's value returns more than one row which is not permitted in places where a single value is required.
To include a constant value for all newly inserted rows, just add it to the SELECT list of the source query.
INSERT INTO coupon (id, entityid, coupon_code)
select nextval('seq_coupon'), entityid, 'A-51'
from card
where country in ('China');
As a side note: when using nextval() there is no need to prefix it with a SELECT, even in the values clause, e.g.
insert into coupon (id, entityid)
values (nextval('some_seq'), ...);

Postgres: insert value from another table as part of multi-row insert?

I am working in Postgres 9.6 and would like to insert multiple rows in a single query, using an INSERT INTO query.
I would also like, as one of the values inserted, to select a value from another table.
This is what I've tried:
insert into store_properties (property, store_id)
values
('ice cream', select id from store where postcode='SW1A 1AA'),
('petrol', select id from store where postcode='EC1N 2RN')
;
But I get a syntax error at the first select. What am I doing wrong?
Note that the value is determined per row, i.e. I'm not straightforwardly copying over values from another table.
demo:db<>fiddle
insert into store_properties (property, store_id)
values
('ice cream', (select id from store where postcode='SW1A 1AA')),
('petrol', (select id from store where property='EC1N 2RN'))
There were some missing braces. Each data set has to be surrounded by braces and the SELECT statements as well.
I don't know your table structure but maybe there is another error: The first data set is filtered by a postcode column, the second one by a property column...

Syntax error when trying to populate column with count of unique values in another column

I'm trying to count the number of unique pool operators for every permit # in a table but am having trouble putting this value in a new column dedicated to that count.
So I have 2 tables: doh_analysis; doh_pools.
Both of these tables have a "permit" column (TEXT), but doh_analysis has about 1000 rows with duplicates in the permit column but occasional unique values in the operator column (TEXT).
I'm trying to fill a column "operator_count" in the table "doh_pools" with a count of unique values in "pooloperator" for each permit #.
So I tried the following code but am getting a syntax error at or near "(":
update doh_pools
set operator_count = select count(distinct doh_analysis.pooloperator)
from doh_analysis
where doh_analysis.permit ilike doh_pools.permit;
When I remove the "select" from before the "count" I get "SQL Error [42803]: ERROR: aggregate functions are not allowed in UPDATE".
I can successfully query a list of distinct permit-pooloperator pairs using:
select distinct permit, pooloperator
from doh_analysis;
And I can query the # of unique pooloperators per permit 1 at a time using:
select count(distinct pooloperator)
from doh_analysis
where permit ilike '52-60-03054';
But I'm struggling to insert a count of unique pairs for each permit # in the operatorcount column.
Is there a way to do this?
There is certainly a better way of doing this but I accomplished my goal by creating 2 intermediary tables and the updating the target table with values from the 2nd intermediate table like so:
select distinct permit, pooloperator
into doh_pairs
from doh_analysis;
select permit, count(distinct pooloperator)
into doh_temp
from doh_pairs
group by permit;
select count(distinct permit)
from doh_temp;
update doh_pools
set operator_count = doh_temp.count
from doh_temp
where doh_pools.permit ilike doh_temp.permit
and doh_pools.permit is not NULL
returning count;

Using results from PostgreSQL query in FROM clause of another query

I have a table that's designed as follows.
master_table
id -> serial
timestamp -> timestamp without time zone
fk_slave_id -> integer
fk_id -> id of the table
fk_table1_id -> foreign key relationship with table1
...
fk_table30_id -> foreign key relationship with table30
Every time a new table is added, this table gets altered to include a new column to link. I've been told it was designed as such to allow for deletes in the tables to cascade in the master.
The issue I'm having is finding a proper solution to linking the master table to the other tables. I can do it programmatically using loops and such, but that would be incredibly inefficient.
Here's the query being used to grab the id of the table the id of the row within that table.
SELECT fk_slave_id, concat(fk_table1_id,...,fk_table30_id) AS id
FROM master_table
ORDER BY id DESC
LIMIT 100;
The results are.
fk_slave_id | id
-------------+-----
30 | 678
25 | 677
29 | 676
1 | 675
15 | 674
9 | 673
The next step is using this data to formulate the table required to get the required data. For example, data is required from table30 with id 678.
This is where I'm stuck. If I use WITH it doesn't seem to accept the output in the FROM clause.
WITH items AS (
SELECT fk_slave_id, concat(fk_table1_id,...,fk_table30_id) AS id
FROM master_table
ORDER BY id DESC
LIMIT 100
)
SELECT data
FROM concat('table', items.fk_slave_id)
WHERE id = items.id;
This produces the following error.
ERROR: missing FROM-clause entry for table "items"
LINE x: FROM string_agg('table', items.fk_slave_id)
plpgsql is an option to use EXECUTE with format, but then I'd have to loop through each result and process it with EXECUTE.
Is there any way to achieve what I'm after using SQL or is it a matter of needing to do it programmatically?
Apologies on the bad title. I can't think of another way to word this question.
edit 1: Replaced rows with items
edit 2: Based on the responses it doesn't seem like this can be accomplished cleanly. I'll be resorting to creating an additional column and using triggers instead.
I don't think you can reference a dynamically named table like that in your FROM clause:
FROM concat('table', rows.fk_slave_id)
Have you tried building/executing that SQL from a stored procedure/function. You can create the SQL you want to execute as a string and then just EXECUTE it.
Take a look at this one:
PostgreSQL - Writing dynamic sql in stored procedure that returns a result set

Primary key duplicate in a table-valued parameter in stored procedure

I am using following code to insert date by Table Valued Parameter in my SP. Actually it works when one record exists in my TVP but when it has more than one record it raises the following error :
'Violation of Primary key constraint 'PK_ReceivedCash''. Cannot insert duplicate key in object 'Banking.ReceivedCash'. The statement has been terminated.
insert into banking.receivedcash(ReceivedCashID,Date,Time)
select (select isnull(Max(ReceivedCashID),0)+1 from Banking.ReceivedCash),t.Date,t.Time from #TVPCash as t
Your query is indeed flawed if there is more than one row in #TVPCash. The query to retrieve the maximum ReceivedCashID is a constant, which is then used for each row in #TVPCash to insert into Banking.ReceivedCash.
I strongly suggest finding alternatives rather than doing it this way. Multiple users might run this query and retrieve the same maximum. If you insist on keeping the query as it is, try running the following:
insert into banking.receivedcash(
ReceivedCashID,
Date,
Time
)
select
(select isnull(Max(ReceivedCashID),0) from Banking.ReceivedCash)+
ROW_NUMBER() OVER(ORDER BY t.Date,t.Time),
t.Date,
t.Time
from
#TVPCash as t
This uses ROW_NUMBER to count the row number in #TVPCash and adds this to the maximum ReceivedCashID of Banking.ReceivedCash.