An input is a data file with ID number of multiple occurrences. (e.g ID# 123) Now what I want is to gather all rows with same ID numbers, compare column by column, and see if what column do they have difference.
Now after that I will move on to the next ID number with multiple occurrences (e.g. ID#456) and do the same.
I repeat everything until I finish with the last ID number of multiple occurrence.
So my output will be like this,
(1)The column headers will be the same.
(2)The ID# column will have unique entries. Only the ID numbers which have multiple occurrences will be included in this column.
(3)I will add an extra column whose entry contains the number of occurrences the ID number occurred. Example, if it occurred 5 times, the entry is 5.
(4)For, the other columns, if the column has same entries for all the occurrences of a certain ID number, we write "0", else "1". E.g. if for ID#123, the entries in column "Section" is the same for all the occurrences of ID#123, then for our output table, the column "Section" will contain the value of "0". If there is any difference, the output will be "1"
Your question is not very clear but I think you want to count the number of unique values and the number of times the unique rows occur. The table below might demonstrate this.
+-------+---------+-----+----------+---------------------+
| ID | Column1 | ... | Column n | num of occurrencies |
+-------+---------+-----+----------+---------------------+
This can be done with unique and accumarray
In the example below, A is the original data and output is your desired output. The first n columns of output are your unique data and the last column contains the number of times this row occurred. The row [1 5] occurred twice, [2 3] once etc.
A = [1 5
1 5
2 3
2 4
3 9];
[k,~,idx]= unique(A,'rows');
n = accumarray(idx(:),1);
output = [k n]
output =
1 5 2
2 3 1
2 4 1
3 9 1
Related
I have a 29736 x 6 table, which is referred to as table_fault_test_data. It has 6 columns, with names wind_direction, wind_speed, air_temperature, air_pressure, density_hubheight and Fault_Condition respectively. What I want to do is to label the data in the Fault_Condition (last table column with either a 1 or a 0 value, depending on the values in the other columns.
I would like to do the following checks (For eg.)
If wind_direction value(column_1) is below 0.0040 and above 359.9940, label 6 th column entry corresponding to the respective row of the table as a 1, else label as 0.
Do this for the entire table. Similarly, do this check for others
like air_temperature, air_pressure and so on. I know that if-else
will be used for these checks. But, I am really confused as to how I
can do this for the whole table and add the corresponding value to
the 6 th column (Maybe using a loop or something).
Any help in this
regard would be highly appreciated. Many Thanks!
EDIT:
Further clarification: I have a 29736 x 6 table named table_fault_test_data . I want to add values to the 6 th column of table based on conditions as below:-
for i = 1:29736 % Iterating over the whole table row by row
if(1st column value <x | 1st column value > y)
% Add 0 to the Corresponding element of 6 th column i.e. table_fault_test_data(i,6)
elseif (2nd column value <x | 2nd column value > y)
% Add 0 to the Corresponding element of 6 th column i.e. table_fault_test_data(i,6)
elseif ... do this for other cases as well
else
% Add 1 to the Corresponding element of 6 th column i.e. table_fault_test_data(i,6)
This is the essence of my requirements. I hope this helps in understanding the question better.
You can use logical indexing, which is supported also for tables (for loops should be avoided, if possible). For example, suppose you want to implement the first condition, and also suppose your x and y are known; also, let us assume your table is called t
logicalIndecesFirstCondition = t{:,1} < x | t{:,2} >y
and then you could refer to the rows which verify this condition using logical indexing (please refer to logical indexing
E.g.:
t{logicalIndecesFirstCondition , 6} = t{logicalIndecesFirstCondition , 6} + 1.0;
This would add 1.0 to the 6th column, for the rows for which the logical condition is true
I am trying to realize my idea in matlab.
I consider two column A and B.
A=data(:,1)
B=data(:,5)
the data look like:
A B
1 1
2 1
3 1
... ...
100 20
... ...
150 30
151 1
... ...
The values in column A are timepoints.
I start with the first element in column A. It schould be A(1,1) and look on the first element in the column B B(1,1). If B(1,1)==1its true,if not its false. Then I increase consider the second raw of the column A and second raw of the column B and so on until the last raw of A and B.
How can I construck this loop??
You can just consider B likes the following:
result = (B == 1);
The result would be the same size of B such as you want. Nowm you can get the value of A on result likes the following:
valid_times = A(result);
I have a file which contain multiple rows of item codes as follows. There are 1 million rows similar to these
1. 123,134,256,345,789.....
2. 123,256,345,678,789......
.
.
I would like to find the count of all the pair of words/items per row in the file using q in kdb+. i.e. any two pair of words that occur in the same row can be considered a word pair.
e.g:
(123,134),(123,256),(134,256), (123,345) (123,789), (134,789) are some of the word pairs in row 1
(123,256),(123,345),(123,345),(678,789),(345,789) are some of the word pairs in row 2
word/item pair count
`123,134----1
123,256---2
345,789---2`
I am reading the file using read0 and have been able to convert each line into list using vs and using count each group to count the number of words, but now I want to find the count of all the word pairs per row in the file.
Thanks in advance for your help
I'm not 100% I understand your definition of a word-pair. Perhaps you could expand a little if my logic doesn't match what you were looking for.
In the example below, I've created a 5x5 matrice of symbols for testing - selected distinct pairs of values from each row, and then checked how many rows each of these appeared in, in total.
Please double check with your own results.
q)test:5 cut`$string 25?5
q)test
2 0 1 0 0
2 4 4 2 0
1 0 0 3 4
2 1 1 4 4
3 0 3 4 0
q)count each group raze {l[where(count'[l:distinct distinct each asc'[x cross x:distinct x]])>1]} each test
0 2| 2
1 2| 2
0 1| 2
2 4| 2
0 4| 3
1 3| 1
1 4| 2
0 3| 2
3 4| 2
To add some other cases to Matthew's answer above, if what you want is to break the list down into pairs in this way:
l:"a,b,c,d,e,f,g"
becomes
"a,b"
"b,c"
"c,d"
"d,e"
"e,f"
"f,g"
so only taking valid pairs, you could use something like this:
f:{count each group b flip 0 1+\:til 1+count[b:","vs x]-1}
q)f l
,"a" ,"b"| 1
,"b" ,"c"| 1
,"c" ,"d"| 1
,"d" ,"e"| 1
,"e" ,"f"| 1
,"f" ,"g"| 1
where we're splitting the input list on ".", then using indexing to get a list of each element and the element directly to its right, then grouping the resultant list of pairs to count the distinct pairs. If you want to split it so l becomes
"a,b"
"c,d"
"e,f"
then you could use this:
g:{count each group b flip 0 1+\:2*til count[b:","vs x]div 2}
q)g l
,"a" ,"b"| 1
,"c" ,"d"| 1
,"e" ,"f"| 1
Which uses a similar approach, starting with the even-positioned elements and getting those to their right, and repeating as above.
You can easily apply these to the rows read with read0:
r:read0`:file.txt
f each r
will output a dictionary of the counts of each pair for each row, and this can be summed to give the total count of each word pair with each method throughout the file.
Hope this helps - it's still not clear what you mean by pairs, so if neither my answer not Matthew's is of some use, you could edit in a more complete explanation of what you'd like and we can help with that.
If you want to consider all possible combinations of 2 pairs in each row then this may be of help. The following function can be used to give distinct combinations, where x is the size of the list and y is the length of the combination:
q)comb:{$[x=y;enlist til x;1=y;flip enlist til x;.z.s[x;y],.z.s[x;y-1],'x-:1]}
q)comb[3;2]
0 1
0 2
1 2
From here we can index into each list to get the pairs, then raze to give a single list of all pairs, group to get the indices where each pair occurs and then count the number of indices in each group:
q)a
123 134 256 345 789
123 256 345 678 789
q)count each group raze{x comb[count x;2]}'[a]
123 134| 1
123 256| 2
134 256| 1
...
345 789| 2
...
I currently have a 4x3500 cell array. First row is a single number, 2 row is a single string, 3rd and 4th rows are also single numbers.
Ex:
1 1 2 3 3 4 5 5 5 6
hi no ya he ........ % you get the idea
28 34 18 0 3 ......
55 2 4 42 24 .....
I would like to be able to select all columns that have a certain value in the first row. ie if I wanted '1' as the first row value, it would return
1 1
hi no
28 34
55 2
Then I would like to sort based on the 2nd row's string. ie if I wanted to have'hi', it would return:
1
hi
28
55
I have attempted to do:
variable = cellArray{:,find(cellArray{1,:} == 1)}
However I keep getting:
Error using find
Too many input arguments.
or
Error using ==
Too many input arguments.
Any help would be much appreciated! :)
{} indexing will return a comma separated list which will provide multiple outputs. When you pass this to find, it's the same as passing each element of your cell array as a separate input. This is what leads to the error about to many input arguments.
You will want to surround the comma-separated list with [] to create an array or numbers. Also, you don't need find because you can just use logical indexing to grab the columns you want. Additionally, you will want to index using () to grab the relevant rows, again to avoid the comma-separated list.
variable = cellArray(:, [cellArray{1,:}] == 1)
So, presume a matrix like so:
20 2
20 2
30 2
30 1
40 1
40 1
I want to count the number of times 1 occurs for each unique value of column 1. I could do this the long way by [sum(x(1:2,2)==1)] for each value, but I think this would be the perfect use for the UNIQUE function. How could I fix it so that I could get an output like this:
20 0
30 1
40 2
Sorry if the solution seems obvious, my grasp of loops is very poor.
Indeed unique is a good option:
u=unique(x(:,1))
res=arrayfun(#(y)length(x(x(:,1)==y & x(:,2)==1)),u)
Taking apart that last line:
arrayfun(fun,array) applies fun to each element in the array, and puts it in a new array, which it returns.
This function is the function #(y)length(x(x(:,1)==y & x(:,2)==1)) which finds the length of the portion of x where the condition x(:,1)==y & x(:,2)==1) holds (called logical indexing). So for each of the unique elements, it finds the row in X where the first is the unique element, and the second is one.
Try this (as specified in this answer):
>>> [c,~,d] = unique(a(a(:,2)==1))
c =
30
40
d =
1
3
>>> counts = accumarray(d(:),1,[],#sum)
counts =
1
2
>>> res = [c,counts]
Consider you have an array of various integers in 'array'
the tabulate function will sort the unique values and count the occurances.
table = tabulate(array)
look for your unique counts in col 2 of table.