Remove duplicates based on only 1 column - tableau-api

My data is in the following format:
rep_id user_id other non-duplicated data
1 1 ...
1 2 ...
2 3 ...
3 4 ...
3 5 ...
I am trying to achieve a column for deduped_rep with 0/1 such that only first rep id across the associated users has a 1 and rest have 0.
Expected result:
rep_id user_id deduped_rep
1 1 1
1 2 0
2 3 1
3 4 1
3 5 0
For reference, in Excel, I would use the following formula:
IF(SUMPRODUCT(($A$2:$A2=A2)*($A$2:$A2=A2))>1,0,1)
I know there is the FIXED() LoD calculation http://kb.tableau.com/articles/howto/removing-duplicate-data-with-lod-calculations, but I only see use cases of it deduplicating based on another column. However, mine are distinct.

Define a field first_reg_date_per_rep_id as
{ fixed rep_id : min(registration_date) }
The define a field is_first_reg_date? as
registration_date = first_reg_date_per_rep_id
You can use that last Boolean field to distinguish the first record for each rep_id from later ones

try this query
select
rep_id,
user_id,
row_number() over(partition by rep_id order by rep_id,user_id) deduped_rep
from
table

Related

PySpark subtract last row from first row in a group

I want to use window function to partition by ID and have the last row of each group to be subtracted from the first row and create a separate column with the output. What is the cleanest way to achieve that result?
ID col1
1 1
1 2
1 4
2 1
2 1
2 6
3 5
3 5
3 7
Desired output:
ID col1 col2
1 1 3
1 2 3
1 4 3
2 1 5
2 1 5
2 6 5
3 5 2
3 5 2
3 7 2
Code below
w=Window.partitionBy('ID').orderBy('col1').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df.withColumn('out', last('col1').over(w)-first('col1').over(w)).show()
Sounds like you’re defining the “first” row as the row with the minimum value of col1 in the group, and the “last” row as the row with maximum value of col1 in the group. To compute them, you can use the MIN and MAX window functions:
SELECT
ID,
col1,
(MAX(col1) OVER (PARTITION BY ID)) - (MIN(col1) OVER (PARTITION BY ID)) AS col2
FROM
...
If you’re defining “first” and “last” row somehow differently (e.g., in terms of some timestamp), you can use the more general FIRST_VALUE and LAST_VALUE window functions:
SELECT
ID,
col1,
(LAST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
-
(FIRST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
AS col2
FROM
...
The two snippets above are equivalent, but the latter is more general: you can specify ordering by a different column and/or you can modify the window specification.

MySQLI, need to merge empty values with same ID into 1 row, and add to same row also if multiple inputs

I have a hard time figuring this one out. And i apologize if the answer is out there, i have searched through all of stackoverflow.
I have an order system, where i order by tableID, and continue to make rows in my Database if someone orders stuff. My problem is getting my data out on 1 row, in this example the table ID, so all the ordered stuff, shows on 1 line only.
I have 3 tables, Food, Drink, Dessert, all with a foreign key in my OrderTable.
id fk_tableid fk_drinkId fk_foodId fk_dessId amount
1 5 2 0 0 2
2 5 0 1 0 1
3 5 0 2 0 1
4 5 0 0 2 2
11 8 1 0 0 2
21 1 1 0 0 5
22 1 0 1 0 9
23 1 0 0 1 2
By a normal select, with left joins, i can get the data out on multiple rows, like this where i get those with tableId 5 and showing the name of the ordered consumable also:
id fk_tableId fk_drinkId fk_foodId fk_dessId amount foodName drinkName dessertName
1 5 2 0 0 2 NULL Sodavand NULL
2 5 0 1 0 1 Lasagne NULL NULL
3 5 0 2 0 1 Pizza NULL NULL
4 5 0 0 2 2 NULL NULL Softice
I tried using group_concat also, which put data on 1 line, but it seems to put everything on 1 line, not just grouped by tableId.
How i want it to be is something like this (the 2x Lasagne for example, is just how i want it to look at the site, but maybe i need to use 1xLasagne twice instead. It would just look messy with 1x Beer 10 times.):
fk_tableId fk_drinkId fk_foodId fk_dessId foodName drinkName dessertName fulllPrice
5 2,2 1,2 2,2 Pizza,Lasagne 2xSodavand 2xSoftice 195
I am aware my question might be wrongly formatted, but i also have a hard time 100% knowing what to google and search for here. I have tried for 2 days, and even asked my teacher who could not do it, he was doing something with CASES and sum(), but it did not work out either.
Any help will be much appreciated!
UPDATE:
Added SQL Query:
SELECT
menukort_ordre.id,
fk_tableId,
fk_drinkId,
fk_foodId,
fk_dessId,
amount,
menukort_food.name AS foodName,
menukort_drink.name AS drinkName,
menukort_dessert.name AS dessertName
FROM menukort_ordre
LEFT Join menukort_drink
ON (menukort_ordre.fk_drinkId = menukort_drink.id)
LEFT Join menukort_food
ON (menukort_ordre.fk_foodId = menukort_food.id)
LEFT Join menukort_dessert
ON (menukort_ordre.fk_dessId = menukort_dessert.id)
WHERE fk_tableid = fk_tableid
With GROUP_CONCAT i tried to do this instead, which put it on 1 row, but due to my WHERE, i get all data on 1 row.
GROUP_CONCAT(menukort_food.name ) AS foodName,
GROUP_CONCAT(menukort_drink.name) AS drinkName,
GROUP_CONCAT(menukort_dessert.name) AS dessertName
UPDATE:
First off I changed your database design since there was no need for 3 tables like that unless you really wanted them to be separated as such. I understand wanting to separate data, but there are times to do so and times not to do it. I understand since my personal database project has me breaking everything up. So the below solution will be based off of my design which is as follows.
Category
Code or ID (PK)
Category
This table will be a lookup table just to make sure drink and food and desert is spelled correctly. Frankly you don't need it unless you need that information specific and want it to be correct.
Next will be the table that stores the drinks, deserts, and food
Items
ID serial
Category
Name
Price
and final the order table that will keep track of the orders
Order
BillID
TableNum
ItemNum (fk)
ID (PK)
This way you can keep track of which table the food goes to and each check or bill. Your design was fine if you wanted to find out how much each table made in a day, but I'm assuming like an actual restaurant you would want to know for each bill. With this you can have multiple orders of a coke or whatever at the same table on the same bill.
Now on to the solution.
This doesn't have the count, but could work on it if you really need it, but frankly I think it is pointless to have a count unless you are going to ungroup the results and have something like this:
tableNum BillNum ItemNum ItemName
1 1 1 Coke
1 1 2 Steak
1 1 3 Pasta
1 1 1 Coke
then you could end up with something like this
tableNum BillNum ItemNum ItemName TimesBy
1 1 1 Coke 2
1 1 2 Steak 1
1 1 3 Pasta 1
The SQL CODE below will give you what you need I believe. I'm using my version of the database and I think you should too just because it is easier and there is no point to having 3 tables for each thing.
CREATE TEMPORARY TABLE IF NOT EXISTS table2 AS (
select BillID, tablenum,ItemNum,Items.name,Items.price
from Orders, Items
where Orders.ItemNum=Items.id
);
create TEMPORARY TABLE IF NOT EXISTS table3 AS (
select SUM(price) as total, BillID
from table2
group by BillID
);
select table3.BillID, TableNum, GROUP_CONCAT(ItemNum order by ItemNum ASC) as ItemNum, GROUP_CONCAT(name order by name ASC) as Item, GROUP_CONCAT(price order by name asc) as ItemPrice, total
from table2, table3
where table2.BillID=table3.BillID
group by BillID;
DROP TABLE IF EXISTS table2;
DROP TABLE IF EXISTS table3;
A few other solutions would be to look into using something like php (programming) to help with this or a stored procedure.
If you need an explanation just ask. Also I'm curious is this for homework or a project? I just want to know why your doing this?
Hope it helps.

T-SQL table variable data order

I have a UDF which returns table variable like
--
--
RETURNS #ElementTable TABLE
(
ElementID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
ElementValue VARCHAR(MAX)
)
AS
--
--
Is the order of data in this table variable guaranteed to be same as the order data is inserted into it. e.g. if I issue
INSERT INTO #ElementTable(ElementValue) VALUES ('1')
INSERT INTO #ElementTable(ElementValue) VALUES ('2')
INSERT INTO #ElementTable(ElementValue) VALUES ('3')
I expect data will always be returned in that order when I say
select ElementValue from #ElementTable --Here I don't use order by
EDIT:
If order by is not guaranteed then the following query
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross Apply dbo.MyFunc T2
order by t1.elementid
will not produce 9x9 matrix as
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
consistently.
Is there any possibility that it could be like
1 2
1 1
1 3
2 3
2 2
2 1
3 1
3 2
3 3
How to do it using my above function?
No, the order is not guaranteed to be the same.
Unless, of course you are using ORDER BY. Then it is guaranteed to be the same.
Given your update, you obtain it in the obvious way - you ask the system to give you the results in the order you want:
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross join dbo.MyFunc() T2
order by t1.elementid, t2.elementid
You are guaranteed that if you're using inefficient single row inserts within your UDF, that the IDENTITY values will match the order in which the individual INSERT statements were specified.
Order is not guaranteed.
But if all you want is just simply to get your records back in the same order you inserted them, then just order by your primary key. Since you already have that field setup as an auto-increment, it should suffice.
...or use a deterministic function
SELECT TOP 9
M1 = (ROW_NUMBER() OVER(ORDER BY id) + 2) / 3,
M2 = (ROW_NUMBER() OVER(ORDER BY id) + 2) % 3 + 1
FROM
sysobjects
M1 M2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3

T-SQL: SELECT related column data for the max two other columns

I have table data like the following, where order type is 1 for a quote, and 2 for an order. Any given po_num can have 0 to many of order_type 1, but should only have only 0 or 1 of order_type 2, or all of the above.
I need to return the max order_num of the max order_type of a given po_num, where the order_num is just an additional (but important) column in my result.
Table data:
order_type po_num order_num
1 E0102 1013200
1 E0102 1013162
1 E0104 1012161
2 E0104 1012150
1 E0104 1011449
2 E0107 1010034
2 E0108 1011994
Desired result:
order_type po_num order_num
1 E0102 1013200
2 E0104 1012950
2 E0107 1010034
2 E0108 1011994
The closest I can get is this, which still includes the max(order_no) for both order_type of 1, and order_no of order type 2:
order_type po_num order_num
1 E0102 1013162
1 E0104 1012161
2 E0104 1012150
2 E0107 1010034
2 E0108 1011994
I think you want this:
select order_type
, po_num
, max(order_num)
from orders o1
where order_type = (
select max(order_type)
from orders o2
WHERE o2.po_num = o1.po_num
)
group by po_num,order_type
The inclusion of order_type in the group by is an artifact, it is required because of how the table is designed.
FWIW, the quotes and orders should be split out into two tables. When you get weird SQL like this an difficult or conditional unique constraints it is a table design issue.
I assume you are using a group by clause. Could you add a
having order_type = max(order_type)
to your sql?
See http://msdn.microsoft.com/en-us/library/ms180199.aspx for more details on the 'having' statement.

tsql sum data and include default values for missing data

I would like a query that will show a sum of columns with a default value for missing data. For example assume I have a table as follows:
type_lookup:
id name
-----------
1 self
2 manager
3 peer
And a table as follows
data:
id type_lookup_id value
--------------------------
1 1 1
2 1 4
3 2 9
4 2 1
5 2 9
6 1 5
7 2 6
8 1 2
9 1 1
After running a query I would like a result set as follows:
type_lookup_id value
----------------------
1 13
2 25
3 0
I would like all rows in type_lookup table to be included in the result set - even if they don't appear in the data table.
It's a bit hard to read your data layout, but something like the following should do the trick:
SELECT tl.type_lookup_id, tl.name, sum(da.type_lookup_id) how_much
from type_lookup tl
left outer join data da
on da.type_lookup_id = tl.type_lookup_id
group by tl.type_lookup_id, tl.name
order by tl.type_lookup_id
[EDIT]
...subsequently edited by changing count() to sum().