Talend Tuniqrow Condition Specification - talend

I have been working on Talend lately, and I was wondering if I can specify Static conditions on the TUniqRow Component, as far as I know, In this component you select the columns needed, normally keys, then you can select the duplicates which is calculated based on the selection. Now, what I am looking for is, after choosing a column, I'd like to specify values in that column in which it if duplicates of those values occur it should go to the duplicates output. For example:
Say I have this list:
ID | Status
1 |1
1 |2
1 |3
1 |4
1 |5
1 |6
1 |5
1 |6
What I would like to specify is after choosing status, I specify the values 1 and 5.

tFileInputDelimited ----->tUniqRow---(duplicates)-->tFilterRow---->tLogRow
input file is having both column ID & Status
In tUniqRow take duplicates flow based on STATUS column
than in tFilterRow, use equals operator on STATUS column and give value as "1" & "5" and select the logical operator as OR
hope this helps..

Related

Postgresql: More efficient way of joining tables based on multiple address fields

I have a table that lists two connected values, ID and TaxNumber(TIN) that looks somewhat like this:
IDTINMap
ID | TIN
-------------------------
1234567890 | 654321
-------------------------
3456321467 | 986321
-------------------------
8764932312 | 245234
An ID can map to multiple TINs, and a TIN might map to multiple IDs, but there is a Unique constraint on the table for an ID, TIN pair.
This list isn't complete, and the table has about 8000 rows. I have another table, IDListing that contains metadata for about 9 million IDs including name, address, city, state, postalcode, and the ID.
What I'm trying to do is build an expanded ID - TIN map. Currently I'm doing this by first joining the IDTINMap table with IDListing on the ID field, which gives something that looks like this in a CTE that I'll call Step1 right now:
ID | TIN | Name | Address | City | State | Zip
------------------------------------------------------------------------------------------------
1234567890 | 654321 | John Doe | 123 Easy St | Seattle | WA | 65432
------------------------------------------------------------------------------------------------
3456321467 | 986321 | Tyler Toe | 874 W 84th Ave| New York | NY | 48392
------------------------------------------------------------------------------------------------
8764932312 | 245234 | Jane Poe | 984 Oak Street|San Francisco | CA | 12345
Then I go through again and join the IDListing table again, joining Step1 on address, city, state, zip, and name all being equal. I know I could do something more complicated like fuzzy matching, but for right now we're just looking at exact matches. In the join I preserve the ID in step 1 as 'ReferenceID', keep the TIN, and then have another column of all the matching IDs. I don't keep any of the address/city/state/zip info, just the three numbers.
Then I can go back and insert all the distinct pairs into a final table.
I've tried this with a query and it works and gives me the desired result. However the query is slower than to be desired. I'm used to joining on rows that I've indexed (like ID or TIN) but it's slow to join on all of the address fields. Is there a good way to improve this? Joining on each field individually is faster than joining on a CONCAT() of all the fields (This I have tried). I'm just wondering if there is another way I can optimize this.
Make the final result a materialized view. Refresh it when you need to update the data (every night? every three hours?). Then use this view for your normal operations.

how to show error meassage using trigger in SAP HANA

I’m trying to create a trigger such as whenever I insert a new record in the Sales table, the Product table should update is “Inventory” based on Sales table “quantity”:
Product table Sales table
P_ID|QTY | |P_ID|QTY |
1 |10 | |1 |5 |
2 |15 |
Code:
create trigger "KABIL_PRACTICE"."SALES_TRIGGER"
after insert on "KABIL_PRACTICE"."SALES" REFERENCING NEW ROW AS newrow for each row
begin
update "KABIL_PRACTICE"."Inventory" set "Inventory" = "Inventory" - :newrow.QTY
where "P_ID" = :newrow.P_ID ;
end;
I get the expected result when I insert a record into the Sales table with P-ID 1 and quantity 5:
updated Product table Sales table
P_ID|QTY | |P_ID|QTY |
1 |5 | |1 |5 |
2 |15 | |1 |5 |
But if I insert a record into the Sales table again with P_ID 1 and quantity 6, the Sales table quantity is more than the available inventory quantity means it goes to negative value...
updated Product table Sales table
P_ID|QTY | |P_ID|QTY |
1 |-1 | |1 |5 |
2 |15 | |1 |5 |
|1 |6 |
I just want to intimate sales order quantity value is higher than available inventory quantity and it should not go to negative values... is there is any way to this...
I tried this code:
create trigger "KABIL_PRACTICE"."SALES_UPDATE_TRIGGER"
before insert on "KABIL_PRACTICE"."SALES" REFERENCING NEW ROW AS newrow for each row
begin
if("Inventory" > :newrow.QTY )
Then
update "KABIL_PRACTICE"."Inventory" set "Inventory" = "Inventory" - :newrow.QTY
where "P_ID" = :newrow.P_ID ;
elseif ("Inventory" < :newrow.QTY )
Then NULL;
delete "KABIL_PRACTICE"."SALES" where "QTY" = 0;
end;
The problem you have here is a classic. Usually the two business processes "SALES" and "ORDER FULFILLMENT" are separated, so the act of selling something would not have an immediate effect on the stock level. Instead, the order fulfilment could actually use other resources (e.g. back ordering from another vendor or producing more). That way the sale would be de-coupled from the current stock levels.
Anyhow, if you want to keep it a simple dependency of "only-sell-whats-available-right-now" then you need to consider the following:
multiple sales could be going on at the same time
what to do with sales that can only be partly fulfilled, e.g. should all available items be sold or should the whole order be handled as not able to fulfil?
To address the first point, again, different approaches can be taken. The easiest probably is to set a lock on the inventory records you are interested in as long as you make the decision(s) whether to process the order (and inventory transaction) or not.
SELECT QTY "KABIL_PRACTICE"."Inventory" WHERE P_ID = :P_ID FOR UPDATE;
This statement will acquire a lock on the relevant row(s) and return or wait until the lock gets available in case another session already holds it.
Once the quantity of an item is retrieved, you can call the further business logic (fulfil order completely, partly or decline).
Each of these application paths could be a stored procedure grouping the necessary steps.
By COMMITing the transaction the lock will get released.
As a general remark: this should not be implemented as triggers. Triggers should generally not be involved in application paths that could lead to locking situations in order to avoid system hang situations. Also, triggers don't really allow for a good understanding of the order in which statements get executed, which easily can lead to unwanted side effects.
Rather than triggers, stored procedures can provide an interface for applications to work with your data in a meaningful and safe way.
E.g.
procedure ProcessOrder
for each item in order
check stock and lock entry
(depending on business logic):
subtract all available items from stock to match order as much as possible
OR: only fulfil order items that can be fully provided and mark other items as not available. Reduce sales order SUM.
OR: decline the whole order.
COMMIT;
Your application can then simply call the procedure and take retrieve the outcome data (e.g. current order data, status,...) without having to worry about how the tables need to be updated.

Is it possible to use different forms and create one row of information in a table?

I have been searching for a way to combine two or more rows of one table in a database into one row.
I am currently creating multiple web-based forms that connect to one table in my database. Is there any way to write some mysql and php code that will take separate form submissions and put them into one row of the database instead of multiple rows?
Here is an example of what is going into the database:
This is all in one table with three rows.
Form_ID represents the three different forms that I used to insert the data into the table.
Form_ID | Lot_ID| F_Name | L_Name | Date | Age
------------------------------------------------------------
1 | 1 | John | Evans | *NULL* | *NULL*
-------------------------------------------------------------
2 |*NULL* | *NULL* | *NULL* | 2017-07-06 | *NULL*
-------------------------------------------------------------
3 |*NULL* | *NULL* | *NULL* | *NULL* | 22
This is an example of three separate forms going into one table. Every time the submit button is hit the data just inserts down to the next row of information.
I need some sort of join or update once the submit button is hit to replace the preceding NULL values.
Here is what I want to do after the submit button is hit:
I want it to be combined all into one row but still in one table
Form_ID is still the three separate forms but only in one row now.
Form_ID |Lot_ID | F_Name | L_Name | Date | Age
----------------------------------------------------------
1 | 1 | John | Evans | 2017-07-06 | 22
My goal is once a one form has been submitted I want the next, different form submission to replace the NULL values in the row above it and so on to create a single row of information.
I found a way to solve this issue. I used UPDATE tablename SET columname = newColumnName WHERE Form_ID = newID
So this way when I want to update rows that have blanks spaces I have it finding the matching ID's

Will huge table entries slow down query performance?

Let's say I have a table persons that looks like this:
|id | name | age |
|---|------|-----|
|1 |foo |21 |
|2 |bar |22 |
|3 |baz |23 |
and add a new column history where I store a big JSON blob of, let's say ~4MB.
|id | name | age | history |
|---|------|-----|----------|
|1 |foo |21 |JSON ~ 4MB|
|2 |bar |22 |JSON ~ 4MB|
|3 |baz |23 |JSON ~ 4MB|
Will this negatively impact queries against this table overall?
What about queries like:
SELECT name FROM persons WHERE ... (Guess: This won't impact performance)
SELECT * FROM persons WHERE ... (Guess: This will impact performance as the database needs to read and send the big history entry)
Are there any other side effects like various growing caches etc. that could slow down database performance overall?
The JSON attribute will not be stored in the table itself, but in the TOAST table that belongs to the table, which is where all variable-length entries above a certain size are stored (and compressed).
Queries that do not read the JSON values won't affect performance at all, since the TOAST entries won't even be touched. Only if you read the JSON value performance will be affected, mostly because of the additional data read from storage and transmitted to the client, but of course the additional data will also reside in the database cache and compete with other data there.
So your guess is right.
depending on how many transactions and the type of transactions (Create, Read, Update, Delete) that are using this table there could be performance issues.
if you are updating the history lots, you will be doing a lot of updates transactions which will cause the table to reindex each update transaction.
let say table persons is called every time a user logs in and it also updates the history for that user. you are doing a select and update, if this is happening a lot it will cause lots of reindexing and could cause issues when users are logging on and other users are also updating history.
a better option would be to have a separate table for personupdates with
a relation to the person table.

Using filtered results as field for calculated field in Tableau

I have a table that looks like this:
+------------+-----------+---------------+
| Invoice_ID | Charge_ID | Charge_Amount |
+------------+-----------+---------------+
| 1 | A | $10 |
| 1 | B | $20 |
| 2 | A | $10 |
| 2 | B | $20 |
| 2 | C | $30 |
| 3 | C | $30 |
| 3 | D | $40 |
+------------+-----------+---------------+
In Tableau, how can I have a field that SUMs the Charge_Amount for the Charge_IDs B, C and D, where the invoice has a Charge_ID of A? The result would be $70.
My datasource is SQL Server, so I was thinking that I could add a field (called Has_ChargeID_A) to the SQL Server Table that tells if the invoice has a Charge_ID of A, and then in Tableau just do a SUM of all the rows where Has_ChargeID_A is true and Charge_ID is either B, C or D. But I would prefer if I can do this directly in Tableau (not this exactly, but anything that will get me to the same result).
Your intuition is steering you in the right direction. You do want to filter to only Invoices that contain row with a Charge_ID of A, and you can do this directly in Tableau.
First place Invoice_ID on the filter shelf, then select the Condition tab for the filter. Then select the "By formula" option on the condition tab and enter the formula you wish to use to determine which invoice_ids are included by the filter.
Here is a formula for your example:
count(if Charge_ID = 'A' then 'Y' end) > 0
For each data row, it will calculate the value of the expression inside the parenthesis, and then only include invoice_ids with at least one non-null value for the internal expression. (The implicit else for the if statement, "returns" null).
The condition tab for a dimension field equates to a HAVING clause in SQL.
If condition formulas get complex, it's often a good a idea to define them with a calculated field -- or a combination of several simpler calculated fields, just to keep things manageable.
Finally, if you end up working with sets of dimensions like this frequently, you can define them as sets. You can still drop sets on the filter shelf, but then can reuse them in other ways: like testing set membership in a calculated field (like a SQL IN clause), or by creating new sets using intersection and union operators. You can think of sets like named filters, such as the set of invoices that contain type A charge.