Qlik Sense - Create bar chart by summing multiple columns - boxplot

I have several columns of data that are numeric and I want Qlik to create a boxplot by Group which is the sum of each column:
Group Data1 Data2 Data3 Data4 Data5 Data6
1 0 2 0 0 1 0
1 1 3 1 0.5 0 1
2 2 4 0 0 0 0
2 3 5 0 0 0 2
3 3 5 0 0 0 2
If Qlik took the sum of each column, it would look like this:
Obviously, you could chose which Groups to plot. The problem is that I'm trying to plot multiple columns in a single boxplot and I can't find anything similar. I don't know where to begin with this, so any help would be appreciated.
--- update ---
The problem is that boxplots are generally created using a single column and then an aggregate function (sum, count, etc) is applied. So, a typical boxplot would use Group and then count the number of times each group occurs. In my case, I need a single chart with the sum of each Data (Data1 - Data6) column plotted by group. So, my problem is that the single boxplot is derived from several columns, not one.

Related

Extract sub-matrix based on conditions of specific columns in matlab

I want to Select a sub-matrix x7992 based on conditions of Certain columns of original data matrix. Specifically, the original matrix is 23166-by-9, follow a original Gauss code
x7992 =selif(data,data[.,col_coh].==0 .and data[.,col_year].<=1992);
I rewrite this in matlab with
x7992 = data(data(:,col_coh)==0 & data(:,col_year)<=1992);
col_coh,col_year are predefined column number.
However, rather than give me a sub-matrix, the above line of code only give me a single row (23166-by-1),it's not what I want (and not the real result base on this condition).So how to fix it? thank you.
--- Update -----
The data matrix is like (I omit other columns because only first 3 cols are relevant to selection), the first column is id for individuals
1 1979 0
1 1980 0
1 1981 1
1 1982 0
1 1983 1
2 1990 0
2 1991 0
2 1992 0
2 1993 1
3 1985 0
3 1986 0
3 1987 0
Based on the conditions, what I want is a submatrix from data, which excludes those rows with value>1992 in the second column and value=1 in the third one
You only get a single column vector as output since your condition vector is returned as a single 23166x1 vector.
To get the entire row of values you need to add a colon as a second argument.
I've splitted the example in two lines to make more readable.
condIdx = data(:,col_coh)==0 & data(:,col_year)<=1992;
x7992 = data( condIdx, :);
If you want specific columns in your result matrix, just put the column number in a vector instead of the colon operator.
colsInResult = [1 2 3];
x7992 = data( condIdx, colsInResult);
Based on the example that you have given, the following will do this:
data(data(:,2)<=1992 & data(:,3)~=1,:)
which gives this output:
1 1979 0
1 1980 0
1 1982 0
2 1990 0
2 1991 0
2 1992 0
3 1985 0
3 1986 0
3 1987 0

How to count patterns columnwise in Matlab?

I have a matrix S in Matlab that looks like the following:
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1
I would like to count patterns of values column-wise. I am interested into the frequency of the numbers that follow right after number 3 in any of the columns. For instance, number 3 occurs three times in the first column. The first time we observe it, it is followed by 3, the second time it is followed by 3 again and the third time it is followed by 4. Thus, the frequency for the patters observed in the first column would look like:
3-3: 66.66%
3-4: 33.33%
3-1: 0%
3-2: 0%
To generate the output, you could use the convenient tabulate
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1];
idx = find(S(1:end-1,:)==3);
S2 = S(2:end,:);
tabulate(S2(idx))
Value Count Percent
1 0 0.00%
2 0 0.00%
3 4 66.67%
4 2 33.33%
Here's one approach, finding the 3's then looking at the following digits
[i,j]=find(S==3);
k=i+1<=size(S,1);
T=S(sub2ind(size(S),i(k)+1,j(k))) %// the elements of S that are just below a 3
R=arrayfun(#(x) sum(T==x)./sum(k),1:max(S(:))).' %// get the number of probability of each digit
I'm going to restate your problem statement in a way that I can understand and my solution will reflect this new problem statement.
For a particular column, locate the locations that contain the number 3.
Look at the row immediately below these locations and look at the values at these locations
Take these values and tally up the total number of occurrences found.
Repeat these for all of the columns and update the tally, then determine the percentage of occurrences for the values.
We can do this by the following:
A = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// Define your matrix
[row,col] = find(A(1:end-1,:) == 3);
vals = A(sub2ind(size(A), row+1, col));
h = 100*accumarray(vals, 1) / numel(vals)
h =
0
0
66.6667
33.3333
Let's go through the above code slowly. The first few lines define your example matrix A. Next, we take a look at all of the rows except for the last row of your matrix and see where the number 3 is located with find. We skip the last row because we want to be sure we are within the bounds of your matrix. If there is a number 3 located at the last row, we would have undefined behaviour if we tried to check the values below the last because there's nothing there!
Once we do this, we take a look at those values in the matrix that are 1 row beneath those that have the number 3. We use sub2ind to help us facilitate this. Next, we use these values and tally them up using accumarray then normalize them by the total sum of the tallying into percentages.
The result would be a 4 element array that displays the percentages encountered per number.
To double check, if we look at the matrix, we see that the value of 3 follows other values of 3 for a total of 4 times - first column, row 3, row 4, second column, row 2 and third column, row 6. The value of 4 follows the value of 3 two times: first column, row 6, second column, row 3.
In total, we have 6 numbers we counted, and so dividing by 6 gives us 4/6 or 66.67% for number 3 and 2/6 or 33.33% for number 4.
If I got the problem statement correctly, you could efficiently implement this with MATLAB's logical indexing and an approach that is essentially of two lines -
%// Input 2D matrix
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]
Labels = [1:4]'; %//'# Label array
counts = histc(S([false(1,size(S,2)) ; S(1:end-1,:) == 3]),Labels)
Percentages = 100*counts./sum(counts)
Verify/Present results
The styles for presenting the output results listed next use MATLAB's table for a well human-readable format of data.
Style #1
>> table(Labels,Percentages)
ans =
Labels Percentages
______ ___________
1 0
2 0
3 66.667
4 33.333
Style #2
You can do some fancy string operations to present the results in a more "representative" manner -
>> Labels_3 = strcat('3-',cellstr(num2str(Labels','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-1' 0
'3-2' 0
'3-3' 66.667
'3-4' 33.333
Style #3
If you want to present them in descending sorted manner based on the percentages as listed in the expected output section of the question, you can do so with an additional step using sort -
>> [Percentages,idx] = sort(Percentages,'descend');
>> Labels_3 = strcat('3-',cellstr(num2str(Labels(idx)','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-3' 66.667
'3-4' 33.333
'3-1' 0
'3-2' 0
Bonus Stuff: Finding frequency (counts) for all cases
Now, let's suppose you would like repeat this process for say 1, 2 and 4 as well, i.e. find occurrences after 1, 2 and 4 respectively. In that case, you can iterate the above steps for all cases and for the same you can use arrayfun -
%// Get counts
C = cell2mat(arrayfun(#(n) histc(S([false(1,size(S,2)) ; S(1:end-1,:) == n]),...
1:4),1:4,'Uni',0))
%// Get percentages
Percentages = 100*bsxfun(#rdivide, C, sum(C,1))
Giving us -
Percentages =
90.9091 20.0000 0 100.0000
9.0909 20.0000 0 0
0 60.0000 66.6667 0
0 0 33.3333 0
Thus, in Percentages, the first column are the counts of [1,2,3,4] that occur right after there is a 1 somewhere in the input matrix. As as an example, one can see column -3 of Percentages is what you had in the sample output when looking for elements right after 3 in the input matrix.
If you want to compute frequencies independently for each column:
S = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// data: matrix
N = 3; %// data: number
r = max(S(:));
[R, C] = size(S);
[ii, jj] = find(S(1:end-1,:)==N); %// step 1
count = full(sparse(S(ii+1+(jj-1)*R), jj, 1, r, C)); %// step 2
result = bsxfun(#rdivide, count, sum(S(1:end-1,:)==N)); %// step 3
This works as follows:
find is first applied to determine row and col indices of occurrences of N in S except its last row.
The values in the entries right below the indices of step 1 are accumulated for each column, in variable count. The very convenient sparse function is used for this purpose. Note that this uses linear indexing into S.
To obtain the frequencies for each column, count is divided (with bsxfun) by the number of occurrences of N in each column.
The result in this example is
result =
0 0 0 NaN
0 0 0 NaN
0.6667 0.5000 1.0000 NaN
0.3333 0.5000 0 NaN
Note that the last column correctly contains NaNs because the frequency of the sought patterns is undefined for that column.

Subsetting rows from Matlab for which specific column has value greater than zero

I want to subset rows from matrix for which the value in third column is greater than zero. For example, I have a matrix :
test =
1 2 3
4 5 0
4 4 1
4 4 0
Now I want to subset it so that I have
subset =
1 2 3
4 4 1
Any quick suggestion on how I can do this in matlab?
Simply make a logical array that is true for every row you want to keep, and pass it as the index to the rows:
subset = test(test(:,3)>0, :)

Resorting rows in matrix in non-decreasing order based on the entries of an arbitrary column

I am working with nx2 matrices in Matlab, and what I'm trying to do is fairly simple in principle. I randomly generate a square matrix, I run it through a series of functions and I get an mx2 matrix. I use the unique function on the rows to get rid of repeated rows and I end up with an nx2 matrix. What I'm having trouble doing is further reducing this matrix so that for all entries in the first column that have the exact same entry, only keep the row with the highest number on the second column.
I was using a loop to check the ith and (i+1)th entries of the first column and store the rows with the highest value in the second column, but I am trying to avoid for-loops as much as possible.
If anyone has an idea or suggestion please let me know!
Example:
0 0 0 0
0 1 0 1 0 3
A= 0 3 ---> unique(A, 'rows') = 0 3 --WANT--> 1 1
1 0 1 0 2 4
1 0 1 1
0 0 2 1
2 1 2 4
1 1
2 4
What you are looking for is:
[u,~,n] = unique(A(:,1));
B = [u, accumarray(n, A(:,2), [], #max)];
I don't exactly understand your problem description, but it sounds like sortrows() may be of some help to you.

Pick x smallest elements in Matlab

I have a matrix of integer values, the x axis represent different days and y axis represents hour of the day. And in each cell is a number that indicates how many hours of the day of the day correspond some criteria of the day which is just going on. That's why I need to calculate it for every hour and not only at the end of the time.
The whole issue is I have then to pick 5 best days which have the lowest number (least corresponding). So basically in the matrix it means select 5 lowest numbers in the row and remember the indexes of the columns where the minimum is. (I need to know in which day it occured). Because at every time as the time goes on it can be 5 different days so sorting the whole table would do mess.
I can make it work really ugly by taking first 5 number and then when if I find smaller one on the way I will forget biggest one from the 5 and remember the index of the column for the new one. Yet this solution seems to be pretty sloppy. There has to be a better way in Matlab how to solve this.
Any ideas, functions that can make my life easier?
1 1 0 1 1 1 0 0 1 1
1 2 1 2 2 1 0 1 2 2
For example in these two rows indexed from 1-10, in the first row it should return columns
3,7,8 and two others not really caring which one.
In the second row it should return columns 7,8,6,1,3.
A = randi(60,100,2);
[min_val,index] = sort(A(:,2),'ascend');
output = [A(index(1:5),1) A(index(1:5),2)];
this should help you (I guess);
Probably one of the simplest (but not most efficient) way is to use the sort function (which also returns sorted indices):
>> [~,index] = sort([1 1 0 1 1 1 0 0 1 1]);
>> index(1:5)
ans =
3 7 8 1 2
>> [~,index] = sort([1 2 1 2 2 1 0 1 2 2]);
>> index(1:5)
ans =
7 1 3 6 8