What wrong with this Scala loop in reading files - scala

I am using Scala to read data from 2 CSV files and for each line from the first file, I want to scan all line from the second CSV file to do some calculating.
This is my code
object CSVProcess extends App {
val dataMatlab = io.Source.fromFile("./data/data_matlab1.csv")
val matchDataMatlab = io.Source.fromFile("./data/match_data_matlab1.csv")
for ((line, count) <- dataMatlab.getLines.zipWithIndex) {
for ((line1, count1) <- matchDataMatlab.getLines.zipWithIndex) {
println(s"count count1 ${count} ${count1}")
}
}
dataMatlab.close
matchDataMatlab.close
However, the output does not like what I expect, the loop stops when the first line of the first CSV file scans all lines of the second one.
For example, in the CSV 1, There are 3 lines
1,1
2,2
3,3
In the CSV 2, It has3 lines
1,1,1
2,2,2
3,3,3
But the output is
count count1 0 0
count count1 0 1
count count1 0 2
The output should be
count count1 0 0
count count1 0 1
count count1 0 2
count count1 1 0
count count1 1 1
count count1 1 2
count count1 2 0
count count1 2 1
count count1 2 2
.
Could someone detect the problem of my code

The problem is io.Source.fromFiles("path").getLines gives you a iterator and Iterators are like socket buffers meaning that once you read a data out of it, there would be no data left.
official scala documentation explains as
An iterator is not a collection, but rather a way to access the elements of a collection one by one. The two basic operations on an iterator it are next and hasNext. A call to it.next() will return the next element of the iterator and advance the state of the iterator. Calling next again on the same iterator will then yield the element one beyond the one returned previously...
The solution would be to convert the iterators to any of the traversables. Here I have converted to List for persistance.
val dataMatlab = io.Source.fromFile("./data/data_matlab1.csv").getLines().toList
val matchDataMatlab = io.Source.fromFile("./data/match_data_matlab1.csv").getLines().toList
for ((line, count) <- dataMatlab.zipWithIndex) {
for ((line1, count1) <- matchDataMatlab.zipWithIndex) {
println(s"count count1 ${count} ${count1}")
}
}
now you should get the expected output
I hope the explanation is clear enough and helpful

Related

SML Uncaught exception Empty homework1

Question: Write a function number_before_reaching_sum that takes an int called sum, which you can assume
is positive, and an int list, which you can assume contains all positive numbers, and returns an int.
You should return an int n such that the first n elements of the list add to less than sum, but the first
n + 1 elements of the list add to sum or more. Assume the entire list sums to more than the passed in
value; it is okay for an exception to occur if this is not the case.
I am quit new on SML, and coudn't find out anything wrong with this simple exprssion. The error message Please help me to debug the code below
fun number_before_reaching_sum (sum:int, xl: int list) =
if hd xl = sum
then 0
else
(hd xl) + number_before_reaching_sum(sum, (tl xl))
Try a couple of steps of your solution on a short list:
number_before_reaching_sum (6, [2,3,4])
--> if 2 = 6
then 0
else 2 + number_before_reaching_sum(6, [3,4])
--> 2 + if 3 = 6
then 0
else 3 + number_before_reaching_sum(6, [4])
--> ...
and you see pretty clearly that this is wrong - the elements of the list should not be added up, and you can't keep looking for the same sum in every tail.
You should return an int n such that the first n elements of the list add to less than sum, but the first n + 1 elements of the list add to sum or more.
This means that the result is 0 if the head is greater than or equal to the sum,
if hd xl >= sum
then 0
Otherwise, the index is one more, not hd xl more, than the index in the tail.
Also the "tail sum" you're looking for isn't the original sum, but the sum without hd xl.
else 1 + number_before_reaching_sum(sum - hd xl, tl xl)

count number of elements with a specific value in a field of a structure in Matlab

I have a structure myS with several fields, including myField, which in turns includes several other fields such as BB. I need to count how many time *'R_value' appears in BB.
I have tried:
sum(myS.myField.BB = 'R_value')
and this:
count = 0;
for i = 1:numel(myS.myField)
number_of_element = numel(myS.myField(i).BB)=='R_value'
count = count+number_of_element;
end
but it doesn't work. Any suggestion?
If you are just checking if BB is that literal string, then your loop is just:
count = 0;
for i = 1:numel(myS.myField)
count = count+strcmp(myS.myField(i).BB,'R_value')
end
numel counts how many elements are. Zero is an element. so is False. Just sum the array.
count = 0;
for i = 1:numel(myS.myField)
number_of_element = sum(myS.myField(i).BB==R_value)
count = count+number_of_element;
end
Also note you had the parenthesis wrong, so you where counting how many BB where in total, then comparing that number to R_value. I am assuming R_value is a number.
e.g.:
myS.myField(1).BB=[1 2 3 4 1 1 1]
myS.myField(2).BB=[4 5 65 1]
R_value=1

Intersecting two columns with different lengths

I have a dataset1 containing 5000 user_ids from Twitter. I want to intersect the user_ids from this dataset with another dataset2 containing other user_ids from Twitter and at the same time create a new column in my dataset1, where each user_id in dataset1 either get the score '1' (if intersect) or '0' (if no intersect). I tried the following code below, but I just get an output in the new column 'intersect' with some (random) zeros and then a lot of NA's.
for(i in 1:ncol(data1)){
#intersect with other data
ids_intersect = intersect(data1$user_id, data2$user_id)
if(length(ids_intersect == 0)){
data1[i, "intersect"] <- 0 # no intersect
} else {
data1[i, "intersect"] <- 1 # intersect
}
}
I also tried another code, which I find more intuitive, but this one won't work since the two datasets have different rowlengths ("replacement has 3172 rows, data has 5181"). But in the same way as above the intention here would be that you get the score 1 'if intersect' or 0/NA 'if no intersect' in the new column 'intersect'. However i'm not sure how to implement it in the following code:
data$intersect <- intersect(data1$user_id, data2$user_id)
Any way of assigning either 1 or 0 to the user_ids in a new column depending on whether there is an intersect/match?
A comfortable option is using mutate() from the dplyr package together with the Base R %in% command as follows.
Data
data1 <- data.frame(user_id = c("Test1",
"Test2",
"Test4",
"Test5"))
data2 <- data.frame(user_id = c("Test1",
"Test3",
"Test4"))
Code
data1 %<>%
mutate(Existence = ifelse(user_id %in% data2$user_id,
1,
0))
Output
> data1
user_id Existence
1 Test1 1
2 Test2 0
3 Test4 1
4 Test5 0

Reference to non-existent field 'd'

My mat file contains 40,000 rows and two columns. I have to read it line by line
and then get values of last column in a single row.
Following is my code:
for v = 1:40000
firstRowB = data.d(v,:)
if(firstRowB(1,2)==1)
count1=count1+1;
end
if(firstRowB(1,2)==2)
count2=count2+1;
end
end
FirstRowB gets the row checks whether last column equals 1 or 2 and then increases the value of respective count by 1.
But I keep getting this error:
Reference to non-existent field 'd'.
You could use vectorization (it is always convenient especially in Matlab). Taking advantage of the fact that true is one and false is zero, if you just want to count you can do :
count1 = sum ( data.d(:, 2) == 1 ) ;
count2 = sum (data.d(:,2) == 2 ) ;
in fact in general you could define :
getNumberOfElementsInLastColEqualTo = #(numb) sum (data.d(:,end) == numb ) ;
counts =arrayfun( getNumberOfElementsInLastColEqualTo , [1 2 ] );
Hope this helps.

Using counter to count one index in a string for Python

Now, I imported Counter and I saw that it counts all the letters in the string but I would want it to only count one letter and ignore the others.. Is that possible?
run = 'Mississippi'
count = 0
for letter_s in run:
if letter_s == 's':
count = count + 1
print count
What you're doing should be totally possible. You can iterate over most things in python and the way you're doing it should work. I reformatted yours a bit and it works just fine.
run = 'Mississippi'
count = 0
for letter_s in run:
if letter_s == 's':
count += 1
print count
output is 4