Scala for loop yield - scala

I'm new to Scala so I'm trying to mess around with an example in Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition
// Returns a row as a sequence
def makeRowSeq(row: Int) =
for (col <- 1 to 10) yield {
val prod = (row * col).toString
val padding = " " * (4 - prod.length)
padding + prod
}
// Returns a row as a string
def makeRow(row: Int) = makeRowSeq(row).mkString
// Returns table as a string with one row per line
def multiTable() = {
val tableSeq = // a sequence of row strings
for (row <- 1 to 10)
yield makeRow(row)
tableSeq.mkString("\n")
}
When calling multiTable() the above code outputs:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
This makes sense but if I try to change the code in multiTable() to be something like:
def multiTable() = {
val tableSeq = // a sequence of row strings
for (row <- 1 to 10)
yield makeRow(row) {
2
}
tableSeq.mkString("\n")
}
The 2 is being returned and changing the output. I'm not sure where it's being used though to manipulate the output and can't seem to find a similar example searching around here or Google. Any input would be appreciated!

makeRow(row) {2}
and
makeRow(row)(2)
and
makeRow(row).apply(2)
are all equivalent.
makeRow(row) is of type List[String], each String representing one row. So effectively, you are picking character at index 2 from each row. That is why you are seeing 9 spaces and one 1 in your output.
def multiTable() = {
val tableSeq = // a sequence of row strings
for (row <- 1 to 10)
yield makeRow(row) {2}
tableSeq.mkString("\n")
}
is equivalent to applying a map on each row like
def multiTable() = {
val tableSeq = // a sequence of row strings
for (row <- 1 to 10)
yield makeRow(row)
tableSeq.map(_(2)).mkString("\n")
}

Related

Replace values of one pyspark dataframe with another

I have a pyspark dataframe df2 :-
ID
Total_Count
Final_A
Final_B
Final_C
Final_D
11
80
36
30
8
6
4
80
36
30
8
6
13
65
30
24
6
5
12
56
26
21
5
4
2
65
30
24
6
5
1
56
26
21
5
4
I have another dataframe df1 :-
ID
Total_Count
A
B
C
D
4
80
0
0
3
0
11
80
0
0
0
0
13
65
0
0
0
0
12
56
0
4
0
0
2
65
0
0
0
0
1
56
0
0
0
0
10
34
10
10
10
4
I want to replace values of df1 by df2 for respective ID(primary key).
Expected df1 :-
ID
Total_Count
A
B
C
D
11
80
36
30
8
6
4
80
36
30
8
6
13
65
30
24
6
5
12
56
26
21
5
4
2
65
30
24
6
5
1
56
26
21
5
4
10
34
10
10
10
4
df2=spark.read.option("header","True").option("inferSchema","True").csv("df1.csv")
df1=spark.read.option("header","True").option("inferSchema","True").csv("df2.csv")
df2 = df2.withColumnRenamed("ID",'df2_ID').withColumnRenamed("Total_Count",'df2_Total_Count')
final_df = df1.join(df2,(df1.ID == df2.df2_ID) & (df1.Total_Count == df2.df2_Total_Count),"left")
from pyspark.sql.functions import when
for i in ('A','B','C','D'):
final_df = final_df.withColumn(i, when(final_df[i] == 0, final_df["Final_{}".format(i)]).otherwise(final_df[i]))
cols = df2.columns
final_df = final_df.drop(*cols)
df = df1.join(df2.select('Final_A', 'Final_B', 'Final_C', 'Final_D'), 'ID'], 'left')
df =df.withColumn('A', coalesce(df['Final_A'],df['A'])).\
withColumn('B', coalesce(df['Final_B'],df['B'])).\
withColumn('C', coalesce(df['Final_C'],df['C'])).\
withColumn('D', coalesce(df['Final_D'],df['D']))
df1 = df.select('ID', 'Total_Count','A', 'B', 'C', 'D')
df1.show()

how I delete combination rows that have the same numbers from matrix and only keeping one of the combinations?

for a=1:50; %numbers 1 through 50
for b=1:50;
c=sqrt(a^2+b^2);
if c<=50&c(rem(c,1)==0);%if display only if c<=50 and c=c/1 has remainder of 0
pyth=[a,b,c];%pythagorean matrix
disp(pyth)
else c(rem(c,1)~=0);%if remainder doesn't equal to 0, omit output
end
end
end
answer=
3 4 5
4 3 5
5 12 13
6 8 10
7 24 25
8 6 10
8 15 17
9 12 15
9 40 41
10 24 26
12 5 13
12 9 15
12 16 20
12 35 37
14 48 50
15 8 17
15 20 25
15 36 39
16 12 20
16 30 34
18 24 30
20 15 25
20 21 29
21 20 29
21 28 35
24 7 25
24 10 26
24 18 30
24 32 40
27 36 45
28 21 35
30 16 34
30 40 50
32 24 40
35 12 37
36 15 39
36 27 45
40 9 41
40 30 50
48 14 50
This problem involves the Pythagorean theorem but we cannot use the built in function so I had to write one myself. The problem is for example columns 1 & 2 from the first two rows have the same numbers. How do I code it so it only deletes one of the rows if the columns 1 and 2 have the same number combination? I've tried unique function but it doesn't really delete the combinations. I have read about deleting duplicates from previous posts but those have confused me even more. Any help on how to go about this problem will help me immensely!
Thank you
welcome to StackOverflow.
The problem in your code seems to be, that pyth only contains 3 values, [a, b, c]. The unique() funcion used in the next line has no effect in that case, because only one row is contained in pyth. another issue is, that the values idx and out are calculated in each loop cycle. This should be placed after the loops. An example code could look like this:
pyth = zeros(0,3);
for a=1:50
for b=1:50
c = sqrt(a^2 + b^2);
if c<=50 && rem(c,1)==0
abc_sorted = sort([a,b,c]);
pyth = [pyth; abc_sorted];
end
end
end
% do final sorting outside of the loop
[~,idx] = unique(pyth, 'rows', 'stable');
out = pyth(idx,:);
disp(out)
a few other tips for writing MATLAB code:
You do not need to end for or if/else stements with a semicolon
else statements cover any other case not included before, so they do not need a condition.
Some performance reommendations:
Due to the symmetry of a and b (a^2 + b^2 = b^2 + a^2) the b loop could be constrained to for b=1:a, which would roughly save you half of the loop cycles.
if you use && for contencation of scalar values, the second part is not evaluated, if the first part already fails (source).
Regards,
Chris
You can also linearize your algorithm (but we're still using bruteforce):
[X,Y] = meshgrid(1:50,1:50); %generate all the combination
C = (X(:).^2+Y(:).^2).^0.5; %sums of two square for every combination
ind = find(rem(C,1)==0 & C<=50); %get the index
res = unique([sort([X(ind),Y(ind)],2),C(ind)],'rows'); %check for uniqueness
Now you could really optimized your algorithm using math, you should read this question. It will be useful if n>>50.

How to remove zero columns from array

I have an array which looks similar to:
0 2 3 4 0 0 7 8 0 10
0 32 44 47 0 0 37 54 0 36
I wish to remove all
0
0
from this to get:
2 3 4 7 8 10
32 44 47 37 54 36
I've tried x(x == 0) = []
but I get:
x =
2 32 3 44 4 47 7 37 8 54 10 36
How can I remove all zero columns?
Here is a possible solution:
x(:,all(x==0))=[]
You had the right approach with x(x == 0) = [];. By doing this, you would remove the right amount of elements that can still form a 2D matrix and this actually gives you a vector of values that are non-zero. All you have to do is reshape the matrix back to its original form with 2 rows:
x(x == 0) = [];
y = reshape(x, 2, [])
y =
2 3 4 7 8 10
32 44 47 37 54 36
Another way is with any:
y = x(:,any(x,1));
In this case, we look for any columns that are non-zero and use these locations to index into x and extract out those corresponding columns.
Result:
y =
2 3 4 7 8 10
32 44 47 37 54 36
Another way which is more for academic purposes is to use unique. Assuming that your matrix has all positive values:
[~,~,id] = unique(x.', 'rows');
y = x(:, id ~= 1)
y =
2 3 4 7 8 10
32 44 47 37 54 36
We transpose x so that each column becomes a row, and we look for all unique rows. The reason why the matrix needs to have all positive values is because the third output of unique assigns unique ID to each unique row in sorted order. Therefore, if we have all positive values, then a row of all zeroes would be assigned an ID of 1. Using this array, we search for IDs that were not assigned a value of 1, and use those to index into x to extract out the necessary columns.
You could also use sum.
Sum over the columns and any column with zeros only will be zeros after the summation as well.
sum(x,1)
ans =
0 34 47 51 0 0 44 62 0 46
x(:,sum(x,1)>0)
ans =
2 3 4 7 8 10
32 44 47 37 54 36
Also by reshaping nonzeros(x) as follows:
reshape(nonzeros(x), size(x,1), [])
ans =
2 3 4 7 8 10
32 44 47 37 54 36

Matlab: how I can transform this algorithm associated with matrices manipulation?

(For my problem, I use a matrix A 4x500000. And the values of A(4,k) varies between 1 and 200).
I give here an example for a case A 4x16 and A(4,k) varies between 1 and 10.
I want first to match a name to the value from 1 to 5 (=10/2):
1 = XXY;
2 = ABC;
3 = EFG;
4 = TXG;
5 = ZPF;
My goal is to find,for a vector X, a matrix M from the matrix A:
A = [20 52 70 20 52 20 52 20 20 10 52 20 11 1 52 20
32 24 91 44 60 32 24 32 32 12 11 32 2 5 24 32
40 37 24 30 11 40 37 40 40 5 10 40 40 3 37 40
2 4 1 3 4 5 2 1 3 3 8 6 7 9 6 10]
A(4,k) takes all values between 1 and 10. These values can be repeated and they all appear on the 4th line.
20
X= 32 =A(1:3,1)=A(1:3,6)=A(1:3,8)=A(1:3,9)=A(1:3,12)=A(1:3,16)
40
A(4,1) = 2;
A(4,6) = 5;
A(4,8) = 1;
A(4,9) = 3;
A(4,12) = 6;
A(4,16) = 10;
for A(4,k) corresponding to X, I associate 2 if A(4,k)<= 5, and 1 if A(4,k)> 5. For the rest of the value of A(4,k) which do not correspond to X, I associate 0:
[ 1 2 3 4 5 %% value of the fourth line of A between 1 and 5
2 2 2 0 2
ZX = 6 7 8 9 10 %% value of the fourth line of A between 6 and 10
1 0 0 0 1
2 2 2 0 2 ] %% = max(ZX(2,k),ZX(4,k))
the ultimate goal is to find the matrix M:
M = [ 1 2 3 4 5
XXY ABC EFG TXG ZPF
2 2 2 0 2 ] %% M(3,:)=ZX(5,:)
Code -
%// Assuming A, X and names to be given to the solution
A = [20 52 70 20 52 20 52 20 20 10 52 20 11 1 52 20
32 24 91 44 60 32 24 32 32 12 11 32 2 5 24 32
40 37 24 30 11 40 37 40 40 5 10 40 40 3 37 40
2 4 1 3 4 5 2 1 3 3 8 6 7 9 6 10];
X = [20 ; 32 ; 40];
names = {'XXY','ABC','EFG','TXG','ZPF'};
limit = 10; %// The maximum limit of A(4,:). Edit this to 200 for your actual case
%// Find matching 4th row elements
matches = A(4,ismember(A(1:3,:)',X','rows'));
%// Matches are compared against all possible numbers between 1 and limit
matches_pos = ismember(1:limit,matches);
%// Finally get the line 3 results of M
vals = max(2*matches_pos(1:limit/2),matches_pos( (limit/2)+1:end ));
Output -
vals =
2 2 2 0 2
For a better way to present the results, you can use a struct -
M_struct = cell2struct(num2cell(vals),names,2)
Output -
M_struct =
XXY: 2
ABC: 2
EFG: 2
TXG: 0
ZPF: 2
For writing the results to a text file -
output_file = 'results.txt'; %// Edit if needed to be saved to a different path
fid = fopen(output_file, 'w+');
for ii=1:numel(names)
fprintf(fid, '%d %s %d\n',ii, names{ii},vals(ii));
end
fclose(fid);
Text contents of the text file would be -
1 XXY 2
2 ABC 2
3 EFG 2
4 TXG 0
5 ZPF 2
A bsxfun() based approach.
Suppose your inputs are (where N can be set to 200):
A = [20 52 70 20 52 20 52 20 20 10 52 20 11 1 52 20
32 24 91 44 60 32 24 32 32 12 11 32 2 5 24 32
40 37 24 30 11 40 37 40 40 5 10 40 40 3 37 40
2 4 1 3 4 5 2 1 3 3 8 6 7 9 6 10]
X = [20; 32; 40]
N = 10;
% Match first 3 rows and return 4th
idxA = all(bsxfun(#eq, X, A(1:3,:)));
Amatch = A(4,idxA);
% Match [1:5; 5:10] to 4th row
idxZX = ismember([1:N/2; N/2+1:N], Amatch)
idxZX =
1 1 1 0 1
1 0 0 0 1
% Return M3
M3 = max(bsxfun(#times, idxZX, [2;1]))
M3 =
2 2 2 0 2

Matrix division & permutation to achieve Baker map

I'm trying to implement the Baker map.
Is there a function that would allow one to divide a 8 x 8 matrix by providing, for example, a sequence of divisors 2, 4, 2 and rearranging pixels in the order as shown in the matrices below?
X = reshape(1:64,8,8);
After applying divisors 2,4,2 to the matrix X one should get a matrix like A shown below.
A=[31 23 15 7 32 24 16 8;
63 55 47 39 64 56 48 40;
11 3 12 4 13 5 14 6;
27 19 28 20 29 21 30 22;
43 35 44 36 45 37 46 38;
59 51 60 52 61 53 62 54;
25 17 9 1 26 18 10 2;
57 49 41 33 58 50 42 34]
The link to the document which I am working on is:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.5132&rep=rep1&type=pdf
This is what I want to achieve:
Edit: a little more generic solution:
%function Z = bakermap(X,divisors)
function Z = bakermap()
X = reshape(1:64,8,8)'
divisors = [ 2 4 2 ];
[x,y] = size(X);
offsets = sum(divisors)-fliplr(cumsum(fliplr(divisors)));
if any(mod(y,divisors)) && ~(sum(divisors) == y)
disp('invalid divisor vector')
return
end
blocks = #(div) cell2mat( cellfun(#mtimes, repmat({ones(x/div,div)},div,1),...
num2cell(1:div)',...
'UniformOutput',false) );
%create index matrix
I = [];
for ii = 1:numel(divisors);
I = [I, blocks(divisors(ii))+offsets(ii)];
end
%create Baker map
Y = flipud(X);
Z = [];
for jj=1:I(end)
Z = [Z; Y(I==jj)'];
end
Z = flipud(Z);
end
returns:
index matrix:
I =
1 1 3 3 3 3 7 7
1 1 3 3 3 3 7 7
1 1 4 4 4 4 7 7
1 1 4 4 4 4 7 7
2 2 5 5 5 5 8 8
2 2 5 5 5 5 8 8
2 2 6 6 6 6 8 8
2 2 6 6 6 6 8 8
Baker map:
Z =
31 23 15 7 32 24 16 8
63 55 47 39 64 56 48 40
11 3 12 4 13 5 14 6
27 19 28 20 29 21 30 22
43 35 44 36 45 37 46 38
59 51 60 52 61 53 62 54
25 17 9 1 26 18 10 2
57 49 41 33 58 50 42 34
But have a look at the if-condition, it's just possible for these cases. I don't know if that's enough. I also tried something like divisors = [ 1 4 1 2 ] - and it worked. As long as the sum of all divisors is equal the row-length and the modulus as well, there shouldn't be problems.
Explanation:
% definition of anonymous function with input parameter: div: divisor vector
blocks = #(div) cell2mat( ... % converts final result into matrix
cellfun(#mtimes, ... % multiplies the next two inputs A,B
repmat(... % A...
{ones(x/div,div)},... % cell with a matrix of ones in size
of one subblock, e.g. [1,1,1,1;1,1,1,1]
div,1),... % which is replicated div-times according
to actual by cellfun processed divisor
num2cell(1:div)',... % creates a vector [1,2,3,4...] according
to the number of divisors, so so finally
every Block A gets an increasing factor
'UniformOutput',false...% necessary additional property of cellfun
));
Have also a look at this revision to have a simpler insight in what is happening. You requested a generic solution, thats the one above, the one linked was with more manual inputs.