Find duplicate in Talend - talend

I have a flow of data like this:
2014-03-04
2014-03-04
2014-03-04
2014-03-04
2014-03-04
2014-03-04
2014-03-04
2014-03-04
2014-03-05
2014-03-05
2014-03-05
2014-03-05
2014-03-05
2014-03-05
2014-03-05
2014-03-05
Now when I am trying to find duplicate from this 16 rows I can't find any duplicates.
I am using tuniqrow. It is having only one column Start_Time.The Dates are declared as String.
After tuniqrow I can see ll this 16dates are passing to tlogrow whereas only 2 should pass.

You need to :
Put a "unique link" at the output of tuniqrow.
Tick the case "key attribute" on the column you want to be unique.
like :

Related

How to filter rows based on a condition and if the condition isn't met, grab another row in Talend?

It was hard to think of a title for this question, so hopefully that did make sense.
I will explain further. I have a flow of data from an Excel file and each row has one of two words in the last column. It will either contain "Open" or "Current".
So lets say I have an input that looks like this:
NAME | SSN | TYPE
John | 12345| Current
Katy | 99999| Current
Sam | 33333| Current
John | 12345| Open
Cody | 55555| Open
And the goal is grab only a person once. Each person has their unique id as their SSN. I want to grab Open rows if both Open and Current exist for that person. If only Current exists, then grab that.
So the final output should look like this:
NAME | SSN | TYPE
Katy | 99999| Current
Sam | 33333| Current
John | 12345| Open
Cody | 55555| Open
NOTE: As you can see, the first entry for John has been removed since he had an Open row.
I have attempted this already but it is sloppy and I figure there must be a better way. Here is an image of what I have done:
Talend flow
Here's how you can do it:
First sort the data by Name, and Type descending (this is important so that for each person, the Open record is on the top); then in the tMap filter it like this:
Numeric.sequence(row2.name, 1, 1) == 1
Only let the record through if this is the first we're seeing this name.

Tableau Calculation with a Filter in place

I'm trying to get the count of calls received by an individuals, both initial answer and any possible transferred over by another person.
My workbook has a filter based on the individual who first answers the phone. Now I need to add in any calls that were transferred over. These calls are usually answered by another individual.
This is what I would like to see
John: #calls =3 | #calls Transferred to =1
Where the data would show this:
Incoming | Transferred to
John | James
John | NUll
John | NUll
James | John
Does anybody have a way to get in the last call to be attribute it to John with the filter in place?
My filter of John would give me the three calls, but would remove the transferred call.
Try creating two calculated fields using the following formulas and then use them as filters.
Filter - John
if contains([Incoming], "John") then "John"
ELSEif
CONTAINS([Transferred to], "John") then "John"
END
Filter - James
if contains([Incoming], "James") then "James"
ELSEif
CONTAINS([Transferred to], "James") then "James"
END

Crystal Reports Rows in Group

I have a report grouped by Supplier. In the details section for this group, there are customer names with other fields of information. The problem is that a customer may be listed several times, but I only want one record to show per customer.
Supplier
John Doe 10/15 $25.00 Eggs
John Doe 10/15 $29.00 Milk
Susan Weva 10/12 $15.00 Corn
Susan Weva 10/18 $11.00 Bread
What I want is one complete row for John Doe and one for Susan Weva. Any idea as to how I can do this? I tried to suppress each field, but that did not seem to work.
There is one way (which i use in this situations) and it work.
So your data looks like this
John Doe 10/15 $25.00 Eggs
John Doe 10/15 $29.00 Milk
Susan Weva 10/12 $15.00 Corn
Susan Weva 10/18 $11.00 Bread
From what i have understand you want to have your output like
John Doe some columns
Susan Weva some columns
To show it only once per group put fields in group header or group footer. In your case all other columns are different and you should "summarize" those fields. For field price is easy, just do summarize by group and you have for each person amount. But dates and last columns makes a problem. So you should learn about using Cross-tab to solve this problem. If you need only sum of price, then put fields in group header or group footer and it will be show only once per group.
Hope it helps

Display all orders in which a certain product was purchased - Crystal Reports

I've searched all over many sites today and I am unable to find an answer to this. I am trying to display all orders in which a certain product (scarf) was purchased. For Example:
Order #1
Hat $3.00
Scarf $5.00
Order #2
Puzzel $2.00
Order #3
Scarf $5.00
With this example, I'd like to display records #1 and #3, in which a scarf was purchased, but also include "Hat" that was purchased along with the scarf in Order #1...(while excluding Order #2)
Output should be:
Order #1
Hat $3.00
Scarf $5.00
Order #3
Scarf $5.00
I've tried using instr functions to filter this information out, as well as looks for various formulas, but I cannot seem to figure this out. I appreciate everyone's time!
John
Create a SOL-expression field:
// {%order_has_scarf}
// assumes table in main report is `orders`
(
SELECT count(1) total
FROM orders o
WHERE o.order_id = orders.order_id
)
Alter record-selection formula:
AND {%order_has_scarf} > 0
Create a group of orders first then place the fields in detail (Hat..etc) and write a supress condition for detail and group like:
if Order="PUZZEL"
then true
else false
with this condition if puzzel is encountered then both detail and group are supressed.
Thanks everyone, i actaully figured out how to do this. I added another instance of the Order Details table to my crystal report, then I filtered that table by the Item ID related to the scarf, then linked that to the original order table...that way, only orders containing this particular Item ID would be passed to the original table.
Hope this makes sense!

MicroStrategy - Dynamic Attribute with join

In our MicroStrategy 9.3 environment, we have a star schema that has multiple date dimensions. For this example, assume we have a order_fact table has two dates, order_date and ship_date and an invoice_fact table with two dates invoice_date and actual_ship_date. We have a date dimension that has "calendar" related data. We have setup each date with an alias, per the MicroStrategy Advanced Data Warehousing guide, which is MicroStrategy's recommended approach to handling role-playing dimensions.
Now for the problem. The aliased dates allow for users to create reports specific to the date that has been aliased. However, since the dates have been aliased, MicroStrategy won't combine "dates" as they appear to it to be different. Case in point, I can't easily put on a report that shows order quantities and invoice quantities by order_date and invoice_date as it results in a cross join.
The solution we have been talking about internally, is creating a new attribute called order_fact_date and an invoice_fact_date. These dates would be determined at runtime via the psuedo code below:
case when <user picked date> = 'order date'
then order_date
else ship_date end as order_fact_date
case when <user picked date> = 'invoice date'
then invoice_date
else actual_ship_date as invoice_fact_date
Our thinking was then, we could have a "general" date dimension mapped to both dates which would enable MicroStrategy to leverage the same table in the joins and thereby eliminating the cross join issue.
Clear as mud?
Edit 1: Changed "three dates" to "two dates".
if I have understood correctly your problem, you have created multiple dates attributes (with different logical meaning) and they are mapped on different aliases of the calendar table.
Until users use different a single fact table in their reports there is no problem, but when they use metrics/facts from sales and invoices you have multiplied results because "Order Date" and "Invoice Date" are different attributes.
Your SQL looks something like:
...
FROM order_fact a11
INNER JOIN invoice_fact a12
INNER JOIN lu_calendar a13
ON a11.order_date = a13.date_id
INNER JOIN lu_calendar a14
ON a12.invoice_date = a14.date_id
...
As usual there are possible solution, not all of them very straight forward.
Option 1 - Single date attribute
You mention this possibility in your question, instead of using "Order Date" and "Invoice Date", just use a single "Date" attribute and teach users to use it. You can call it "Reporting Date" or "Operation Date" if this makes the life easier for them.
The SQL you should get is something like:
...
FROM order_fact a11
INNER JOIN invoice_fact a12
ON a11.order_date = a12.invoice_date
INNER JOIN lu_calendar a13 -- Only one join
ON a11.order_date = a13.date_id -- because the date is the same
...
Option 2 - We need to keep the two date attributes!
Map "Order Date" and "Invoice Date" on the same alias of your calendar table. This is usually can cause problems in MicroStrategy, because two attributes will be joined together on the same look-up table [see later on this], but in your case this is exactly what you are looking for.
With this solution you should get an SQL like this:
...
FROM order_fact a11
INNER JOIN invoice_fact a12 -- Hey! this is again a cross join!
INNER JOIN lu_calendar a13
ON a11.order_date = a13.date_id -- Relax man, we got you covered.
AND a12.invoice_date = a13.date_id -- Yes, we do it!
...
This is nice, but it works only if you have description forms coming from the calendar table (this is not always the case with dates because the ID is usually also the actual value that you show on your reports). In case you don't have a join with the calendar lookup, you SQL will end up again with duplicated result:
...
FROM order_fact a11 -- Notice no join column between the two facts
INNER JOIN invoice_fact a12 -- and no other conditions will help to join them
...
For this reason if you want to keep the two attributes separate, beside mapping them on the same lookup, you should also:
Create an hidden attribute (let's call it "Date_on_fact") map it on the fact table and the calendar table and make it child of both "Order Date" and "Invoice Date".
Un-map the "Order Date" and "Invoice Date" from the fact tables.
The idea here is to force MicroStrategy to use always the SQL code always the calendar lookup table:
...
FROM order_fact a11
INNER JOIN invoice_fact a12 -- This is like the previous one
INNER JOIN lu_calendar a13 -- But I'm back to help you
ON a11.order_date = a13.date_id
AND a12.invoice_date = a13.date_id
...
The attribute "Date_on_fact" can actually be hidden and users don't need to put it in their reports, but MicroStrategy will use it to go from the parent attributes to the fact table.
Hope this can help you to get out from the mud.
We had a same problem.
We had to create a generic time hierarchy for this and connected 2 different invoice and order time hierarchies to the generic one.
It works like charm!