Tableau mixing aggregate and non-aggregate results error - tableau-api

I have a problem creating a calculated field in Tableau. I have data like so:
ID ... Status Step1 Step2 Step3
1 ... Accepted 1 1 1
2 ... Waiting 1 0 0
3 ... Discard 0 0 0
4 ... Waiting 1 1 0
...
I would like to create a calculated column that will give me the name of the last Step, but only when status is 'Accepted'. Otherwise I want the status. The syntax is quite easy, it looks like this:
IF [Status] = 'Accepted' THEN (
IF [Step3] = 1 THEN 'Step3' ELSEIF [STEP2] = 1 THEN 'Step2' ELSEIF [STEP1] = '1' THEN 'Step1' ELSE 'Step0')
ELSE [Status]
The problem is that the column 'Status' is a Dimension and the 'Step' statuses come from Measure. So they are AGG(Step1), AGG(Step2),...
I guess that is the reason I get this error:
Cannot mix aggregate and non-aggregate comparisons or results in 'IF' expressions.
I am not very familiar with Tableau. Any idea how I can solve this?

Solution:
Just use function ATTR that will make the non-aggregate function (Status) into an aggregate one. Then it is possible to combine them and the calculation is working.
IF ATTR([Status]) = 'Accepted' THEN (
IF [Step3] = 1 THEN 'Step3' ELSEIF [STEP2] = 1 THEN 'Step2' ELSEIF [STEP1] = '1' THEN 'Step1' ELSE 'Step0')
ELSE ATTR([Status])

Tableau automatically interprets numeric values as measures. It appears though that in your case they are a boolean (0 for false, 1 for true) and really ought to be dimensions.
Convert Step 1, Step 2, and Step 3 to dimensions. Highlight the fields, right click, and choose Convert to Dimension.

Related

Intersecting two columns with different lengths

I have a dataset1 containing 5000 user_ids from Twitter. I want to intersect the user_ids from this dataset with another dataset2 containing other user_ids from Twitter and at the same time create a new column in my dataset1, where each user_id in dataset1 either get the score '1' (if intersect) or '0' (if no intersect). I tried the following code below, but I just get an output in the new column 'intersect' with some (random) zeros and then a lot of NA's.
for(i in 1:ncol(data1)){
#intersect with other data
ids_intersect = intersect(data1$user_id, data2$user_id)
if(length(ids_intersect == 0)){
data1[i, "intersect"] <- 0 # no intersect
} else {
data1[i, "intersect"] <- 1 # intersect
}
}
I also tried another code, which I find more intuitive, but this one won't work since the two datasets have different rowlengths ("replacement has 3172 rows, data has 5181"). But in the same way as above the intention here would be that you get the score 1 'if intersect' or 0/NA 'if no intersect' in the new column 'intersect'. However i'm not sure how to implement it in the following code:
data$intersect <- intersect(data1$user_id, data2$user_id)
Any way of assigning either 1 or 0 to the user_ids in a new column depending on whether there is an intersect/match?
A comfortable option is using mutate() from the dplyr package together with the Base R %in% command as follows.
Data
data1 <- data.frame(user_id = c("Test1",
"Test2",
"Test4",
"Test5"))
data2 <- data.frame(user_id = c("Test1",
"Test3",
"Test4"))
Code
data1 %<>%
mutate(Existence = ifelse(user_id %in% data2$user_id,
1,
0))
Output
> data1
user_id Existence
1 Test1 1
2 Test2 0
3 Test4 1
4 Test5 0

OpenRefine: Fill down with increasing counter

Is it possible in OpenRefine to fill down blank cells with a counter instead of copying the top non-blank value?
In this example image:
Or here the same example as typed text - image this as a column from top to bottom:
1
1
blank
1
blank
blank
blank
blank
blank
1
I would like to see the column filled as follows (again, imagine top to bottom):
1
1
2
1
2
3
4
5
6
1
Thanks, help is very much appreciated.
It's not really simple. You have to:
1 Replace the blanks with something else, such as an "x"
2 Create a unique record for the entire dataset
3 Use this Jython script:
import itertools
data = row['record']['cells']['YOUR COLUMN NAME']['value']
x = itertools.count(2)
liste = []
for i, el in enumerate(data):
if data[i] == "x":
liste.append(x.next())
else:
x = itertools.count(2)
liste.append(el)
return ",".join([str(x) for x in liste])
4 Use Blank down to clear duplicates
5 Split the first multivalued cell.
Here is a screencast of the operations described above.
If you know a little Python, you can also transform your file using pandas. I do not know what is the most elegant way to do it, but this script should work.
import itertools
import pandas as pd
x = itertools.count(2)
def set_x():
global x
x = itertools.count(2)
set_x()
def increase(value):
if not value:
return next(x)
else:
set_x()
return value
data = pd.read_csv("your_file.csv", na_values=['nan'], keep_default_na=False)
data['column 1'] = data['column 1'].apply(lambda row: increase(row))
print(data)
data.to_csv("final_file.csv")
Here are two simple solutions using GREL.
Use records
You could move the column to the beginning, telling OpenRefine to use the numbers as records. You might need to transform the column to text to really convince OpenRefine to use it as records.
Then either add a new column or transform the existing one with the following expression.
1 + row.index - row.record.fromRowIndex
Use record markers
In case you don't want to use records or don't have a static number, you can create a similar setup. Imagine you have an incomplete counter like in the following table and want to fill it.
Origin
Desired
1
1
2
1
1
2
2
3
1
1
To fill the missing cells first add a new column based on your orignal column using the following expression and name it record_row_index.
if(isNonBlank(value), row.index, "")
After that fill down the original column and the new column record_row_index.
Then create a new column based on the original filled column using the following expression.
value + row.index - cells["record_row_index"].value
Hint: the expression is expecting both columns to be of type number.
If one of them is of type text, you can either transform the column beforehand or use toNumber() in the expression.
The following table shows how these operations are working together.
Origin
Origin filled
row.index
record_row_index
Desired
1
1
0
0
1 + 0 - 0 = 1
1
1
0
1 + 1 - 0 = 2
1
1
2
2
1 + 2 - 2 = 1
2
2
3
3
2 + 3 - 3 = 2
2
4
3
2 + 4 - 3 = 3
1
1
5
5
1 + 5 - 5 = 1

Reference to non-existent field 'd'

My mat file contains 40,000 rows and two columns. I have to read it line by line
and then get values of last column in a single row.
Following is my code:
for v = 1:40000
firstRowB = data.d(v,:)
if(firstRowB(1,2)==1)
count1=count1+1;
end
if(firstRowB(1,2)==2)
count2=count2+1;
end
end
FirstRowB gets the row checks whether last column equals 1 or 2 and then increases the value of respective count by 1.
But I keep getting this error:
Reference to non-existent field 'd'.
You could use vectorization (it is always convenient especially in Matlab). Taking advantage of the fact that true is one and false is zero, if you just want to count you can do :
count1 = sum ( data.d(:, 2) == 1 ) ;
count2 = sum (data.d(:,2) == 2 ) ;
in fact in general you could define :
getNumberOfElementsInLastColEqualTo = #(numb) sum (data.d(:,end) == numb ) ;
counts =arrayfun( getNumberOfElementsInLastColEqualTo , [1 2 ] );
Hope this helps.

Find values in a matrix and sum when found

I have a matrix X(1e4,20) which takes on values 0:4.
I'm interested in finding (row by row) the number of times values are ~=0, ==1&2&3 and ==3
Why doesn't
eg:
X=randi([0 4],1e4,20)
for ii=1:1e4
onestwosorfours(ii,1)=sum(X(ii,:)==1|2|4)
end
work?
I've ended up doing
sum(X(ii,:)==1)+sum(X(ii,:)==2), etc
This expression is wrong:
sum( X(ii,:)==1|2|4 )
You are finding the bitwise or of 1,2 and 4 which is true, because anything other than false or 0 is true. Then you are finding the amount of times that the array equals the number.
Instead, rewrite it as :
sum( X(ii,:)==1 | X(ii,:)==2 | X(ii,:)==4 )
Or, even better
numel( X(ii,:)==1 | X(ii,:)==2 | X(ii,:)==4 )
Which clarifies what you really meant.
You have to have the A == b parts each time for the logical or of the results:
X=randi([0 4],1e4,20);
for ii=1:1e4
onestwosorfours(ii,1)=sum( X(ii,:)==1 | X(ii,:) == 2 | X(ii,:) == 4);
end

Problem looking at data between 0 and -1

I'm trying to write a program that cleans data, using Matlab. This program takes in the max and min that the data can be, and throws out data that is less than the min or greater than the max. There looks like a small issue with the cleaning part. This case ONLY happens when the minimum range of the variable being checked is 0. If this is the case, for one reason or another, the program won't throw away data points that are between 0 and -1. I've been trying to fix this for some time now, and noticed that this is the only case where this happens, and if you try to run a SQL query selecting data that is < 0, it will leave out data between 0 and -1, so effectively the same error as what's happening to me. Wondering if anyone might recognize this and know what it could be.
I would write such a function as:
function data = cleanseData(data, limits)
limits = sort(limits);
data = data( limits(1) <= data & data <= limits(2) );
end
an example usage:
a = rand(100,1)*10;
b = cleanseData(a, [-2 5]);
c = cleanseData(a, [0 -1]);
-1 is less than 0, so 0 should be the max value. And if this is the case it will keep points between -1 and 0 by your definition of the cleaning operation:
and throws out data that is less than the min or greater than the max.
If you want to throw away (using the above definition)
data points that are between 0 and -1
then you need to set 0 as the min value and -1 as the max value --- which does not make sense.
Also, I think you mean
and throws out data that is less than the min AND greater than the max.
It may be that the floats are getting casted to ints before the comparison. I don't know matlab, but in python int(-0.5)==0, which could explain the extra data points getting in. You can test this by setting the min to -1, if you then also get values from -1 to -2 then you'll need to make sure casting isn't being done.
If I try to mimic your situation with SQL, and run the following query against a datatable that has 1.00, 0.00, -0.20, -0.80. -1.00, -1.20 and -2.00 in the column SomeVal, it correctly returns -0.20 and -0.80, which is as expected.
SELECT SomeVal
FROM SomeTable
WHERE (SomeVal < 0) AND (SomeVal > - 1)
The same is true for MatLab. Perhaps there's an error in your code. Dheck the above statement with your own SELECT statement to see if something's amiss.
I can imagine such a bug if you do something like
minimum = 0
if minimum and value < minimum