I understand the whole code and
I just want to know why there has to be a -1 at the end of the range function.
I've been checking it out with pythontutor but I can't make it out.
#Given 2 strings, a and b, return the number of the positions where they
#contain the same length 2 substring. So "xxcaazz" and "xxbaaz" yields 3,
#since the "xx", "aa", and "az" substrings appear in the same place in
#both strings.
def string_match(a, b):
shorter = min(len(a), len(b))
count = 0
for i in range(shorter -1): #<<<<<<<<< This is -1 I don't understand.
a_sub = a[i:i+2]
b_sub = b[i:i+2]
if a_sub == b_sub:
count = count + 1
return count
string_match('xxcaazz', 'xxbaaz')
string_match('abc', 'abc')
string_match('abc', 'axc')
I expect to understand why there has to be a -1 at the end of the range function. I will appreciate your help and explanation!
The value indices of the for loop are counted since 0 so the final value actually would be the (size -1)
Related
So, I have a formula ( =INDEX(Sheet1.A1:F15,RANDBETWEEN(1,15),RANDBETWEEN(1,6)) ) that returns a random number in the sheet. But, how to run the formula until the returned number is less than or equal to 25 ?
I thought of using for..next.. but couldn't get it how to run ...
Welcome!
As #thebusybee pointed out in his comment, a macro for this task is much easier than using built-in functions. As rightly pointed out #tohuwawohu, pre-filtering the values makes things a lot easier. The macro code could be, for example, like this
Option Explicit
Function getRandValue(aValues As Variant, nTypeCriteria As Integer, dCriteriaValue As Variant) As Variant
Rem Params: aValues - array of values,
Rem nTypeCriteria - -2 less then, -1 not more, 0 equal, 1 not less, 2 more than
Rem dCriteriaValue - value to compare
Dim aTemp As Variant
Dim i As Long, j As Long, k As Long
Dim bGoodValue As Boolean
k = UBound(aValues,1)*UBound(aValues,2)
ReDim aTemp(1 To k)
k = 0
For i = 1 To UBound(aValues,1)
For j = 1 To UBound(aValues,2)
bGoodValue = False
Select Case nTypeCriteria
Case -2
bGoodValue = (aValues(i,j) < dCriteriaValue)
Case -1
bGoodValue = (aValues(i,j) <= dCriteriaValue)
Case 0
bGoodValue = (aValues(i,j) = dCriteriaValue)
Case 1
bGoodValue = (aValues(i,j) >= dCriteriaValue)
Case 2
bGoodValue = (aValues(i,j) > dCriteriaValue)
End Select
If bGoodValue Then
k = k+1
aTemp(k) = aValues(i,j)
EndIf
Next j
Next i
If k<1 Then
getRandValue = "No matching values"
ElseIf k=1 Then
getRandValue = aTemp(k)
Else
getRandValue = aTemp(Rnd()*(k-1)+1)
EndIf
End Function
Just put a call to this function in a cell in the form
=GETRANDVALUE(A1:F15;-1;25)
There is an array with indices [[0, n_0], [1, n_1], ..., [n, n_n]]. For each n_i a function is called. It is necessary to reorder the result from the threads by first component after every thread has terminated. As far as I could find a way to do this, I organized that the index is hard-coded by asking if the index is e.g. 0 and then starting the code separately for the hard-coded index 0. So far this a possible way to do it (even though the code looks as if someone didn't understand what a loop is for).
rest = []
tpl.each do |idx, vn|
if idx == 0
pool.post do
res = funk(vn)
p ['idx 0: ', res]
rest += [[0, res]]
end#pool.post
elsif idx == 1
pool.post do
res = funk(vn)
p ['idx 1: ', res]
rest += [[1, res]]
end#pool.post
end;end
But now there is a strange behaviour:
Index 0 and 1 are calculated accurately, but when the result of 1 is added one line later, the result of the former function is added (again).
["idx 1: ", [4]]
["idx 0: ", [16900]]
rest: [[0, [16900]], [1, [16900], ...]
This is not always the case, so it depends on the order of the appearance of the results.
If e.g. the calculation of index 0 is finished after the calculation of index 1, then idx 1 is missing, or wrong. But other cases of confused results also appear: idx 0 before idx 1, but result of idx 0 is the result of idx 1.
?
It looks like if the threads are not really separated. Can that be enforced, or is there a smarter way of keeping indeces?
One option, I found out, is to synchronize the threads, but that would make the algorithm slower again, so a better solution is:
The results don't get mixed up, if the rest-tuple already has the structure to differentiate the results coming in:
rest = [[], []]
tpl.each do |idx, vn|
if idx == 0
pool.post do
res = funk(vn)
p ['idx 0: ', res]
rest[0] << [0, res]
end#pool.post
elsif idx == 1
pool.post do
res = funk(vn)
p ['idx 1: ', res]
rest[1] << [1, res]
end#pool.post
end;end
i got the error of this code which is:
path[index][4] += 1
IndexError: list index out of range
why this happened?how can i remove this error ?
Code:
def stress_centrality(g):
stress = defaultdict(int)
for a in nx.nodes_iter(g):
for b in nx.nodes_iter(g):
if a==b:
continue
pred = nx.predecessor(G,b) # for unweighted graphs
#pred, distance = nx.dijkstra_predecessor_and_distance(g,b) # for weighted graphs
if a not in pred:
return []
path = [[a,0]]
path_length = 1
index = 0
while index >= 0:
n,i = path[index]
if n == b:
for vertex in list(map(lambda x:x[0], path[:index+1]))[1:-1]:
stress[vertex] += 1
if len(pred[n]) >i:
index += 1
if index == path_length:
path.append([pred[n][i],0])
path_length += 1
else:
path[index] = [pred[n][i],0]
else:
index -= 1
if index >= 0:
path[index][4] += 1
return stress
Without the data it's hard to give you anything more than an indicative answer.
This line
path[index][4] += 1
assumes there are 5 elements in path[index] but there are fewer than that. It seems to me that your code only assigns or appends to path lists of length 2. As in
path = [[a,0]]
path.append([pred[n][i],0])
path[index] = [pred[n][i],0]
So it's hard to see how accessing the 5th element of one of those lists could ever be correct.
This is a complete guess, but I think you might have meant
path[index][1] += 4
Have an RDD with List of values, which are mix of positives and negatives.
Need to compute number of cycles from this data.
For example,
val range = List(sampleRange(2020,2030,2040,2050,-1000,-1010,-1020,Starting point,-1030,2040,-1020,2050,2040,2020,end point,-1060,-1030,-1010)
the interval between each value in above list is 1 second. ie., 2020 and 2030 are recorded in 1 second interval and so on.
how many times it turns from negative to positive and stays positive for >= 2 seconds.
If >= 2 seconds it is a cycle.
Number of cycles: Logic
Example 1: List(1,2,3,4,5,6,-15,-66)
No. of cycles is 1.
Reason: As we move from 1st element of list to 6th element, we had 5 intervals which means 5 seconds. So one cycle.
As we move to 6th element of list, it is a negative value. So we start counting from 6th element and move to 7th element. The negative values are only 2 and interval is only 1. So not counted as cycle.
Example 2:
List(11,22,33,-25,-36,-43,20,25,28)
No. of cycles is 3.
Reason: As we move from 1st element of list to 3rd element, we had 2 intervals which means 2 seconds. So one cycle As we move to 4th element of list, it is a negative value. So we start counting from 4th element and move to 5th, 6th element. we had 2 intervals which means 2 seconds. So one cycle As we move to 7th element of list, it is a positive value. So we start counting from 7th element and move to 8th, 9th element. we had 2 intervals which means 2 seconds. So one cycle.
range is a RDD in the use case. It looks like
scala> range
range: Seq[com.Range] = List(XtreamRange(858,890,899,920,StartEngage,-758,-790,-890,-720,920,940,950))
You can encode this "how many times it turns from negative to positive and stays positive for >= 2 seconds. If >= 2 seconds it is a cycle." pretty much directly into a pattern match with a guard. The expression if(h < 0 && ht > 0 && hht > 0) checks for a cycle and adds one to the result then continues with the rest of the list.
def countCycles(xs: List[Int]): Int = xs match {
case Nil => 0
case h::ht::hht::t if(h < 0 && ht > 0 && hht > 0) => 1 + countCycles(t)
case h::t => countCycles(t)
}
scala> countCycles(range)
res7: Int = 1
A one liner
range.sliding(3).count{case f::s::t::Nil => f < 0 && s > 0 && t > 0}
This generates all sub-sequences of length 3 and counts how many are -ve, +ve, +ve
Generalising cycle length
def countCycles(n:Int, xs:List[Int]) = xs.sliding(n+1)
.count(ys => ys.head < 0 && ys.tail.forall(_ > 0))
The below code would help you resolve you query.
object CycleCheck {
def main(args: Array[String]) {
var data3 = List(1, 4, 82, -2, -12, "startingpoint", -9, 32, 76,45, -98, 76, "Endpoint", -24)
var data2 = data3.map(x => getInteger(x)).filter(_ != "unknown").map(_.toString.toInt)
println(data2)
var nCycle = findNCycle(data2)
println(nCycle)
}
def getInteger(obj: Any) = obj match {
case n: Int => obj
case _ => "unknown"
}
def findNCycle(obj: List[Int]) : Int = {
var cycleCount =0
var sign = ""
var signCheck="+"
var size = obj.size - 1
var numberOfCycles=0
var i=0
for( x <- obj){
if (x < 0){
sign="-"
}
else if (x > 0){
sign="+"
}
if(signCheck.equals(sign))
cycleCount=cycleCount+1
if(!signCheck.equals(sign) && cycleCount>1){
cycleCount = 1
numberOfCycles=numberOfCycles+1
}
if(size==i && cycleCount>1)
numberOfCycles= numberOfCycles+1
if(cycleCount==1)
signCheck = sign;
i=i+1
}
return numberOfCycles
}
}
I am trying to compute and plot the distribution of bigrams frequencies
First I did generate all possible bigrams which gives 1296 bigrams
then i extract the bigrams from a given file and save them in words1
my question is how to compute the frequency of these 1296 bigrams for the file a.txt?
if there are some bigrams did not appear at all in the file, then their frequencies should be zero
a.txt is any text file
clear
clc
%************create bigrams 1296 ***************************************
chars ='1234567890abcdefghijklmonpqrstuvwxyz';
chars1 ='1234567890abcdefghijklmonpqrstuvwxyz';
bigram='';
for i=1:36
for j=1:36
bigram = sprintf('%s%s%s',bigram,chars(i),chars1(j));
end
end
temp1 = regexp(bigram, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp1(1:end-1)', temp1(2:end)','un',0);
bigrams = temp2;
bigrams = unique(bigrams);
bigrams = rot90(bigrams);
bigram = char(bigrams(1:end));
all_bigrams_len = length(bigrams);
clear temp temp1 temp2 i j chars1 chars;
%****** 1. Cleaning Data ******************************
collection = fileread('e:\a.txt');
collection = regexprep(collection,'<.*?>','');
collection = lower(collection);
collection = regexprep(collection,'\W','');
collection = strtrim(regexprep(collection,'\s*',''));
%*******************************************************
temp = regexp(collection, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp(1:end-1)', temp(2:end)','un',0);
words1 = rot90(temp2);
%*******************************************************
words1_len = length(words1);
vocab1 = unique(words1);
vocab_len1 = length(vocab1);
[vocab1,void1,index1] = unique(words1);
frequencies1 = hist(index1,vocab_len1);
I. Character counting problem for a string
bsxfun based solution for counting characters -
counts = sum(bsxfun(#eq,[string1-0]',65:90))
Output -
counts =
2 0 0 0 0 2 0 1 0 0 ....
If you would like to get a tabulate output of counts against each letter -
out = [cellstr(['A':'Z']') num2cell(counts)']
Output -
out =
'A' [2]
'B' [0]
'C' [0]
'D' [0]
'E' [0]
'F' [2]
'G' [0]
'H' [1]
'I' [0]
....
Please note that this was a case-sensitive counting for upper-case letters.
For a lower-case letter counting, use this edit to this earlier code -
counts = sum(bsxfun(#eq,[string1-0]',97:122))
For a case insensitive counting, use this -
counts = sum(bsxfun(#eq,[upper(string1)-0]',65:90))
II. Bigram counting case
Let us suppose that you have all the possible bigrams saved in a 1D cell array bigrams1 and the incoming bigrams from the file are saved into another cell array words1. Let us also assume certain values in them for demonstration -
bigrams1 = {
'ar';
'de';
'c3';
'd1';
'ry';
't1';
'p1'}
words1 = {
'de';
'c3';
'd1';
'r9';
'yy';
'de';
'ry';
'de';
'dd';
'd1'}
Now, you can get the counts of the bigrams from words1 that are present in bigrams1 with this code -
[~,~,ind] = unique(vertcat(bigrams1,words1));
bigrams_lb = ind(1:numel(bigrams1)); %// label bigrams1
words1_lb = ind(numel(bigrams1)+1:end); %// label words1
counts = sum(bsxfun(#eq,bigrams_lb,words1_lb'),2)
out = [bigrams1 num2cell(counts)]
The output on code run is -
out =
'ar' [0]
'de' [3]
'c3' [1]
'd1' [2]
'ry' [1]
't1' [0]
'p1' [0]
The result shows that - First element ar from the list of all possible bigrams has no find in words1 ; second element de has three occurrences in words1 and so on.
Hey similar to Dennis solution you can just use histc()
string1 = 'ASHRAFF'
histc(string1,'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
this checks the number of entries in the bins defined by the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' which is hopefully the alphabet (just wrote it fast so no garantee). The result is:
Columns 1 through 21
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
Columns 22 through 26
0 0 0 0 0
Just a little modification of my solution:
string1 = 'ASHRAFF'
alphabet1='A':'Z'; %%// as stated by Oleg Komarov
data=histc(string1,alphabet1);
results=cell(2,26);
for k=1:26
results{1,k}= alphabet1(k);
results{2,k}= data(k);
end
If you look at results now you can easily check rather it works or not :D
This answer creates all bigrams, loads in the file does a little cleanup, ans then uses a combination of unique and histc to count the rows
Generate all Bigrams
note the order here is important as unique will sort the array so this way it is created presorted so the output matches expectation;
[y,x] = ndgrid(['0':'9','a':'z']);
allBigrams = [x(:),y(:)];
Read The File
this removes capitalisation and just pulls out any 0-9 or a-z character then creates a column vector of these
fileText = lower(fileread('d:\loremipsum.txt'));
cleanText = regexp(fileText,'([a-z0-9])','tokens');
cleanText = cell2mat(vertcat(cleanText{:}));
create bigrams from file by shifting by one and concatenating
fileBigrams = [cleanText(1:end-1),cleanText(2:end)];
Get Counts
the set of all bigrams is added to our set (so the values are created for all possible). Then a value ∈{1,2,...,1296} is assigned to each unique row using unique's 3rd output. Counts are then created with histc with the bins equal to the set of values from unique's output, 1 is subtracted from each bin to remove the complete set bigrams we added
[~,~,c] = unique([fileBigrams;allBigrams],'rows');
counts = histc(c,1:1296)-1;
Display
to view counts against text
[allBigrams, counts+'0']
or for something potentially more useful...
[sortedCounts,sortInd] = sort(counts,'descend');
[allBigrams(sortInd,:), sortedCounts+'0']
ans =
or9
at8
re8
in7
ol7
te7
do6 ...
Did not look into the entire code fragment, but from the example at the top of your question, I think you are looking to make a histogram:
string1 = 'ASHRAFF'
nr = histc(string1,'A':'Z')
Will give you:
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
(Got a working solution with hist, but as #The Minion shows histc is more easy to use here.)
Note that this solution only deals with upper case letters.
You may want to do something like so if you want to put lower case letters in their correct bin:
string1 = 'ASHRAFF'
nr = histc(upper(string1),'A':'Z')
Or if you want them to be shown separately:
string1 = 'ASHRaFf'
nr = histc(upper(string1),['a':'z' 'A':'Z'])
bi_freq1 = zeros(1,all_bigrams_len);
for k=1: vocab_len1
for i=1:all_bigrams_len
if char(vocab1(k)) == char(bigrams(i))
bi_freq1(i) = frequencies1(k);
end
end
end