I would like to split a string in a column to n rows in Talend.
For example :
column
2aabbccdd
The first number is the "n" which I use to define the row lenght, so the expected result should be :
row 1 = aa
row 2 = bb
row 3 = cc
row 4 = dd
The idea here is to iterate on the string and cut it every 2 characters.
Any idea please ?
I would use a tJavaFlex to split the string, with a trick to have n rows coming out of it.
tJavaFlex's main code:
int n = Integer.parseInt(row1.str.substring(0, 4)); //get n from the first 4 characters
String str2 = row1.str.substring(4); //get the string after n
int nbParts = (str2.length() + 1) / n;
System.out.println("number of parts = " + nbParts);
for (int i = 0; i < nbParts; i++)
{
String part = str2.substring(i * n);
if(part.length() > n)
{
part = part.substring(0, n);
}
row2.str = part;
And tJavaFlex's end code is just a closing brace:
}
The trick is to use a for loop in the main code, but only close it in the end code.
tFixedFlowInput contains just one column holding the input string.
So I have arr = randi([0,20],20,1). I want to show: If there are numbers less than 5, fprintf('Yes\n') only once. Im using a for loop (for i = 1 : length(arr)) and indexing it.
As your description, maybe you need if statement within for loop like below
for i = 1:length(arr)
if arr(i) < 5
fprintf('Yes\n');
break
end
end
If you want to print Yes once, you can try
if any(arr < 5)
fprintf('Yes\n')
endif
If you don't want to use break, the code below might be an option
for i = 1:min(find(arr <5))
if (arr(i) < 5)
fprintf('Yes\n');
end
end
You can use a break statement upon finding the first value under 5 and printing the Yes statement.
Using a break Statement:
arr = randi([0,20],20,1);
for i = 1: length(arr)
if arr(i) < 5
fprintf("Yes\n");
break;
end
end
Extension:
By Using any() Function:
Alternatively, if you'd like to concise it down without the need for a for-loop the any() function can be used to determine if any values within the array meet a condition in this case arr < 5.
arr = randi([0,20],20,1);
if(any(arr < 5))
fprintf("Yes\n");
end
By Using a While Loop:
Check = 0;
arr = randi([0,20],20,1);
i = 1;
while (Check == 0 && i < length(arr))
if arr(i) < 5
fprintf("Yes\n");
Check = 1;
end
i = i + 1;
end
Suppose:
1 A
2 B
3 C
I need to print the value corresponding to 1-> A. I have put each in an array:
d1[1,2,3] and s1[A,B,C]. Now I need to print the value in the form shown in the above:
d1[0] s1[0]
1 A
How can I do this using UnityScript? In the program, I did id printing in this format:
1 A
1 B
1 C
2 A
2 B
2 C
What you have probably done, and I'm guessing without seeing the code is something along that you have a for loop within a for loop which means you're processing the first item in the first array and then all items in the second:
for (var i = 0; i < d1.Length; i++) {
for(var j = 0; j < s1.length; j++ {
Debug.log(d1[i] + " " + s1[j])
}
}
Your options depend on the array sizes and how you want to handle them. For example if you know they're the same size consistently then
for(var i = 0; i < d1.Length; i++) {
Debug.log(d1[i] + " " + s1[i])
}
should work. I would wonder if what you're trying to achieve could be done with a different data structure such as a dictionary, etc.
I am trying to compute and plot the distribution of bigrams frequencies
First I did generate all possible bigrams which gives 1296 bigrams
then i extract the bigrams from a given file and save them in words1
my question is how to compute the frequency of these 1296 bigrams for the file a.txt?
if there are some bigrams did not appear at all in the file, then their frequencies should be zero
a.txt is any text file
clear
clc
%************create bigrams 1296 ***************************************
chars ='1234567890abcdefghijklmonpqrstuvwxyz';
chars1 ='1234567890abcdefghijklmonpqrstuvwxyz';
bigram='';
for i=1:36
for j=1:36
bigram = sprintf('%s%s%s',bigram,chars(i),chars1(j));
end
end
temp1 = regexp(bigram, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp1(1:end-1)', temp1(2:end)','un',0);
bigrams = temp2;
bigrams = unique(bigrams);
bigrams = rot90(bigrams);
bigram = char(bigrams(1:end));
all_bigrams_len = length(bigrams);
clear temp temp1 temp2 i j chars1 chars;
%****** 1. Cleaning Data ******************************
collection = fileread('e:\a.txt');
collection = regexprep(collection,'<.*?>','');
collection = lower(collection);
collection = regexprep(collection,'\W','');
collection = strtrim(regexprep(collection,'\s*',''));
%*******************************************************
temp = regexp(collection, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp(1:end-1)', temp(2:end)','un',0);
words1 = rot90(temp2);
%*******************************************************
words1_len = length(words1);
vocab1 = unique(words1);
vocab_len1 = length(vocab1);
[vocab1,void1,index1] = unique(words1);
frequencies1 = hist(index1,vocab_len1);
I. Character counting problem for a string
bsxfun based solution for counting characters -
counts = sum(bsxfun(#eq,[string1-0]',65:90))
Output -
counts =
2 0 0 0 0 2 0 1 0 0 ....
If you would like to get a tabulate output of counts against each letter -
out = [cellstr(['A':'Z']') num2cell(counts)']
Output -
out =
'A' [2]
'B' [0]
'C' [0]
'D' [0]
'E' [0]
'F' [2]
'G' [0]
'H' [1]
'I' [0]
....
Please note that this was a case-sensitive counting for upper-case letters.
For a lower-case letter counting, use this edit to this earlier code -
counts = sum(bsxfun(#eq,[string1-0]',97:122))
For a case insensitive counting, use this -
counts = sum(bsxfun(#eq,[upper(string1)-0]',65:90))
II. Bigram counting case
Let us suppose that you have all the possible bigrams saved in a 1D cell array bigrams1 and the incoming bigrams from the file are saved into another cell array words1. Let us also assume certain values in them for demonstration -
bigrams1 = {
'ar';
'de';
'c3';
'd1';
'ry';
't1';
'p1'}
words1 = {
'de';
'c3';
'd1';
'r9';
'yy';
'de';
'ry';
'de';
'dd';
'd1'}
Now, you can get the counts of the bigrams from words1 that are present in bigrams1 with this code -
[~,~,ind] = unique(vertcat(bigrams1,words1));
bigrams_lb = ind(1:numel(bigrams1)); %// label bigrams1
words1_lb = ind(numel(bigrams1)+1:end); %// label words1
counts = sum(bsxfun(#eq,bigrams_lb,words1_lb'),2)
out = [bigrams1 num2cell(counts)]
The output on code run is -
out =
'ar' [0]
'de' [3]
'c3' [1]
'd1' [2]
'ry' [1]
't1' [0]
'p1' [0]
The result shows that - First element ar from the list of all possible bigrams has no find in words1 ; second element de has three occurrences in words1 and so on.
Hey similar to Dennis solution you can just use histc()
string1 = 'ASHRAFF'
histc(string1,'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
this checks the number of entries in the bins defined by the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' which is hopefully the alphabet (just wrote it fast so no garantee). The result is:
Columns 1 through 21
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
Columns 22 through 26
0 0 0 0 0
Just a little modification of my solution:
string1 = 'ASHRAFF'
alphabet1='A':'Z'; %%// as stated by Oleg Komarov
data=histc(string1,alphabet1);
results=cell(2,26);
for k=1:26
results{1,k}= alphabet1(k);
results{2,k}= data(k);
end
If you look at results now you can easily check rather it works or not :D
This answer creates all bigrams, loads in the file does a little cleanup, ans then uses a combination of unique and histc to count the rows
Generate all Bigrams
note the order here is important as unique will sort the array so this way it is created presorted so the output matches expectation;
[y,x] = ndgrid(['0':'9','a':'z']);
allBigrams = [x(:),y(:)];
Read The File
this removes capitalisation and just pulls out any 0-9 or a-z character then creates a column vector of these
fileText = lower(fileread('d:\loremipsum.txt'));
cleanText = regexp(fileText,'([a-z0-9])','tokens');
cleanText = cell2mat(vertcat(cleanText{:}));
create bigrams from file by shifting by one and concatenating
fileBigrams = [cleanText(1:end-1),cleanText(2:end)];
Get Counts
the set of all bigrams is added to our set (so the values are created for all possible). Then a value ∈{1,2,...,1296} is assigned to each unique row using unique's 3rd output. Counts are then created with histc with the bins equal to the set of values from unique's output, 1 is subtracted from each bin to remove the complete set bigrams we added
[~,~,c] = unique([fileBigrams;allBigrams],'rows');
counts = histc(c,1:1296)-1;
Display
to view counts against text
[allBigrams, counts+'0']
or for something potentially more useful...
[sortedCounts,sortInd] = sort(counts,'descend');
[allBigrams(sortInd,:), sortedCounts+'0']
ans =
or9
at8
re8
in7
ol7
te7
do6 ...
Did not look into the entire code fragment, but from the example at the top of your question, I think you are looking to make a histogram:
string1 = 'ASHRAFF'
nr = histc(string1,'A':'Z')
Will give you:
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
(Got a working solution with hist, but as #The Minion shows histc is more easy to use here.)
Note that this solution only deals with upper case letters.
You may want to do something like so if you want to put lower case letters in their correct bin:
string1 = 'ASHRAFF'
nr = histc(upper(string1),'A':'Z')
Or if you want them to be shown separately:
string1 = 'ASHRaFf'
nr = histc(upper(string1),['a':'z' 'A':'Z'])
bi_freq1 = zeros(1,all_bigrams_len);
for k=1: vocab_len1
for i=1:all_bigrams_len
if char(vocab1(k)) == char(bigrams(i))
bi_freq1(i) = frequencies1(k);
end
end
end
I'm new to Matlab and now learning the basic grammar.
I've written the file GetBin.m:
function res = GetBin(num_bin, bin, val)
if val >= bin(num_bin - 1)
res = num_bin;
else
for i = (num_bin - 1) : 1
if val < bin(i)
res = i;
end
end
end
and I call it with:
num_bin = 5;
bin = [48.4,96.8,145.2,193.6]; % bin stands for the intermediate borders, so there are 5 bins
fea_val = GetBin(num_bin,bin,fea(1,1)) % fea is a pre-defined 280x4096 matrix
It returns error:
Error in GetBin (line 2)
if val >= bin(num_bin - 1)
Output argument "res" (and maybe others) not assigned during call to
"/Users/mac/Documents/MATLAB/GetBin.m>GetBin".
Could anybody tell me what's wrong here? Thanks.
You need to ensure that every possible path through your code assigns a value to res.
In your case, it looks like that's not the case, because you have a loop:
for i = (num_bins-1) : 1
...
end
That loop will never iterate (so it will never assign a value to res). You need to explicitly specify that it's a decrementing loop:
for i = (num_bins-1) : -1 : 1
...
end
For more info, see the documentation on the colon operator.
for i = (num_bin - 1) : -1 : 1