Tableau calculated field summing up the values - tableau-api

I have a table like this
----------------------------------------------
ID Name Value |
---------------------------------------------|
1 Bob 4 |
2 Mary 3 |
3 Bob 5 |
4 Jane 3 |
5 Jane 1 |
----------------------------------------------
Is there any ways to do out a calculated field where if the name is "Bob" , it'll sum up all the values that have the name "Bob"?
Thanks in advance!

If Name = “Bob” then Value end

Related

How to count rows after the occurence of a value by group (postgresql)

I have for example the following table:
Name
Day
Healthy
Jon
1
No
Jon
2
Yes
Jon
3
Yes
Jon
4
Yes
Jon
5
No
Mary
1
Yes
Mary
2
No
Mary
3
Yes
Mary
4
No
Mary
5
Yes
I want to add a column which counts the number of following days after day X a person was healthy:
Name
Day
Healthy
Number of days the person was healthy after day X (incl.)
Jon
1
No
3
Jon
2
Yes
3
Jon
3
Yes
2
Jon
4
Yes
1
Jon
5
No
0
Mary
1
Yes
3
Mary
2
No
2
Mary
3
Yes
2
Mary
4
No
1
Mary
5
Yes
1
Is it possible to use some sort of window function to create such a column? Thanks a lot for the help!
There are a couple of ways to do this with a window function. One is to order by day descending and use the default window. The other is to specify the window from the current row to the end of the partition.
This example casts the boolean healthy as an int so that it can be summed. If your table has literal Yes and No strings, then you can use sum((healthy = 'yes')::int) over (...) to achieve the same thing.
select name, day,
sum(healthy::int)
over (partition by name
order by day
rows between current row
and unbounded following) as num_subsequent_health_days
from my_table;
name | day | num_subsequent_health_days
:--- | --: | -------------------------:
Jon | 1 | 3
Jon | 2 | 3
Jon | 3 | 2
Jon | 4 | 1
Jon | 5 | 0
Mary | 1 | 3
Mary | 2 | 2
Mary | 3 | 2
Mary | 4 | 1
Mary | 5 | 1
db<>fiddle here
I assume your relation has the following schema:
CREATE TABLE test(name text, day int, healthy boolean);
Then this should produce the desired result:
SELECT name, day, sum(mapped) OVER (PARTITION BY name ORDER BY day DESC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM (SELECT name, day, CASE WHEN healthy THEN 1 ELSE 0 END AS mapped FROM test) sub ORDER BY name, day;

Take new columns as output table - KDB

I have a query which returns results of data, which runs on a frequent basis. The new table will contain results of the old table as well but I only want to take whatever is in new in the most recent run of the new table and send that as an email. I already have the line for the email and trade data but just need a way to be able to:
display the results of the new table to be emailed
save the complete results of the new table to be used in the next run of the query
e.g.
Old results: tbl
| idx | name | age |
| 0 | Tom | 30 |
| 1 | Jerry | 25 |
| 2 | Bob | 30 |
| 3 | Ken | 45 |
New results: tbl
| idx | name | age |
| 0 | Tom | 30 |
| 1 | Jerry | 25 |
| 2 | Bob | 30 |
| 3 | Ken | 45 |
| 4 | Sam | 40 |
output required:
| 4 | Sam | 40 |
and then save the New results to be used in the next run
Thanks! :)
If the only changes between runs is that records are being appended onto the new table, you could just keep a variable denoting the last index seen and then select only those rows where idx is larger than that.
If the indexes are always increasing, this could be achieved using a query like
lastidx:exec last idx from tbl
select from tbl where idx>lastidx
If the idx values don't always increase monotonically, you could keep a count of the number of rows instead and only
lasti:count tbl
select from tbl where i>=lasti
This doesn't require saving the whole table in memory for use in the next iteration.
E.g to start with the old table had 4 rows so lasti = 4
q)tbl
idx name age
-------------
0 Tom 30
1 Jerry 25
2 Bob 30
3 Ken 45
q)lasti
4
The new table comes in and running the command selects the new row
q)tbl
idx name age
-------------
0 Tom 30
1 Jerry 25
2 Bob 30
3 Ken 45
4 Sam 40
q)select from tbl where i>lasti
idx name age
------------
4 Sam 40
lasti can then be updated to reflect the new count
q)lasti:count tbl
q)lasti
5
One way you can get this done, assuming the idx is the unique key :
q)old:([] idx:0 1 2 3; name:`T`J`B`K; age: 30 25 30 45)
q)new:old,enlist `idx`name`age!(4; `S;40) //new output from your query
q)out:()
q)if[0<count i:new[`idx] except old[`idx] ; out:new i ; old:new]
q)out
idx name age
------------
4 S 40
Another way, if your new records are always added to the last of old records:
q)old:([] idx:0 1 2 3; name:`T`J`B`K; age: 30 25 30 45)
q)i:count old
q)new:old,enlist `idx`name`age!(4; `S;40) //new output from your query
q)out:()
q)if[i<c:count new ; out:(i-c)#new ; old:new; i:c]
q)out
idx name age
------------
4 S 40

Add a key element for n rows in PySpark Dataframe

I have a dataframe like the one shown below.
id | run_id
--------------
4 | 12345
6 | 12567
10 | 12890
13 | 12450
I wish to add a new column say Key that will have value 1 for the first n rows and 2 for the next n rows. The result will be like:
id | run_id | key
----------------------
4 | 12345 | 1
6 | 12567 | 1
10 | 12890 | 2
13 | 12450 | 2
Is it possibile to do the same with PySpark?. Thanks in advance for the help.
Here is one way to do it using zipWithIndex:
# sample rdd
rdd=sc.parallelize([[4,12345], [6,12567], [10,12890], [13,12450]])
# group size for key
n=2
# add rownumber and then label in batches of size n
rdd=rdd.zipWithIndex().map(lambda (x, rownum): x+[int(rownum/n)+1])
# convert to dataframe
df=rdd.toDF(schema=['id', 'run_id', 'key'])
df.show(4)

Using DECLARE to create column numbers

When running a query I need column numbers to be applied to each row so that when I use the query to create a report in SSRS I can tell the report which data to put in which column. Example:
Case 1 | Jane Doe | Col 1
Case 1 | John Doe | Col 2
Case 2 | Sally Smith | Col 1 (only name in case)
My current query uses:
DECLARE #NumOfCols int=2;
And then this to tell it how to separate the columns:
(row_number() over (partition by case_num order by child_first) + #NumOfCols - 1)% #NumOfCols + 1 as DisplayCol
The problem is, when I run the query, even if a result only has one name (so only one column is needed) my data is getting duplicated. It seems like it is making it a mandatory column 1 and column 2 even if there is no data for a second column. Like this:
Case 1 | Jane Doe | Col 1
Case 1 | John Doe | Col 2
Case 2 | Sally Smith | Col 1 (only name in case)
Case 2 | Sally Smith | Col 2 (duplicating)
I hope this makes sense. Any ideas on how to eliminate duplicating the data?

Max consecutive years for each customer in Tableau

I am trying to find for each customer the Max consecutive years he buys something. I tried to create a calculated field but to no avail.
I created two calculated fields
Consecutive: if max([Count])>0 then previous_value(0)+1+index()-index() else 0 end
max: window_max([Consecutive])
My data looks something like:
Year | Customer | Count
1996 | a | 2
1996 | b | 1
1997 | a | 1
1997 | b | 2
1998 | b | 1
So the result would be
a:2
b:3
Use nested table calcs.
The first calc, call it running_good_years, is a running count of consecutive years with sales.
If count(Sales) = 0 then 0 else previous_value(0) + 1 end
The second just returns the max
Window_max(running_good_years)
With table calcs, defining the partitioning and addressing is critical. Partition by Customer, Address by year