Create features (long vector) with scala

Create features (long vector) with scala - scala

I have a Big CSV file (~2GB) that contains a parameter X that for each day has around 1000 record.
What I want to do is transform this column to a set features (vectors) of length 1000 (one for each day).
For example:
==> Day 1 Day P1
1 1
1 2
1 5
1 9
1 .
1 .
1 .
1 6
==> Day 2 1 4
2 1
2 2
2 5
2 7
2 .
2 .
2 .
2 8
Will be transformed to:
d1 1 2 5 9 . . . 6
d2 4 1 2 5 . . . 8
.
.
.
dn
How can I do that in Scala ?
I know that there will be issue with the memory, I'll try to store the result on multiple steps.
Here is what I've tried so far:
df_data.map(x => (x(1),x(3))).filter(x=> x._1== 1).zipWithIndex.map(x=> (x._1._1,(x._2,x._1._2))).groupByKey()
Now I get something like:
(1, (0,val1),(1,val2),(2,val3),...,(n,valn))

Related

How I can use printf or disp in MATLAB to print some special format of my data set?

I have a data set with 5 columns and 668 rows. I need to use these data in ampl and I need a special format of it as the following :
1 3 4 5 7
5 4 3 2 1
4 5 6 4 3
4 5 3 4 2
[*,*,1]: 1 2 3 4:=
4 3 2 1 5
4 5 6 7 4
3 4 5 6 7
3 4 2 3 1
[*,*,2]: 1 2 3 4:=
4 5 6 2
4 3 2 1
4 5 3 2
1 2 7 1
[*,*,3]: 1 2 3 4:=
.
.
.
In other words, I have to print 4 rows then [*,*, i]: 1 2 3 4:= again 4 rows and that statement and so on. It should be done by a simple for loop but I don't know how to do that since I don't work with MATLAB.

You can a string with disp combined with a for loop.
num2str is used to convert number to string.
For example with a matrix containing 100 lines.
D = rand(100,4);
for i = 1 : 4 : size( D,1 )
disp( D( i : i + 3,: ) )
disp(['[*,*,' num2str((i + 3)/4) ']: 1 2 3 4:='])
end

MATLAB - Frequency of an array element with a condition

I need some help please. I have an array, as shown below, 6 rows and 5 columns, none of the elements in any one row repeats. The elements are all single digit numbers.
I want to find out, per row, when a number, let's say 1 appears, I want to keep of how often the other numbers of the row appear. For example, 1 shows up 3 times in rows one, three and five. When 1 shows up, 2 shows up one time, 3 shows up two times, 4 shows up two times, 5 shows up one time, 6 shows up two times, 7 shows up one time, 8 shows up three times, and 9 shows up zero times. I want to keep a vector of this information that will look like, V = [3,1,2,2,1,2,1,3,0], by starting with a vector like N = [1,2,3,4,5,6,7,8,9]
ARRAY =
1 5 8 2 6
2 3 4 6 7
3 1 8 7 4
6 5 7 9 4
1 4 3 8 6
5 7 8 9 6
The code I have below does not give the feedback I am looking for, can someone help please? Thanks
for i=1:length(ARRAY)
for j=1:length(N)
ARRAY(i,:)==j
V(j) = sum(j)
end
end

Using indices that is in A creae a zero and one 6 * 9 matrix that [i,j] th element of it is 1 if i th row of A contains j.
Then multiply the zero and one matrix with its transpose to get desirable result:
A =[...
1 5 8 2 6
2 3 4 6 7
3 1 8 7 4
6 5 7 9 4
1 4 3 8 6
5 7 8 9 6]
% create a matrix with the size of A that each row contains the row number
rowidx = repmat((1 : size(A,1)).' , 1 , size(A , 2))
% z_o a zero and one 6 * 9 matrix that [i,j] th element of it is 1 if i th row of A contains j
z_o = full(sparse(rowidx , A, 1))
% matrix multiplication with its transpose to create desirable result. each column relates to number N
out = z_o.' * z_o
Result: each column relates to N
3 1 2 2 1 2 1 3 0
1 2 1 1 1 2 1 1 0
2 1 3 3 0 2 2 2 0
2 1 3 4 1 3 3 2 1
1 1 0 1 3 3 2 2 2
2 2 2 3 3 5 3 3 2
1 1 2 3 2 3 4 2 2
3 1 2 2 2 3 2 4 1
0 0 0 1 2 2 2 1 2

I don't understand how you are approaching the problem with your sample code but here is something that should work. This uses find, any and accumarray and in each iteration for the loop it will return a V corresponding to the ith element in N
for i=1:length(N)
rowIdx = find(any(A == N(i),2)); % Find all the rows contain N(j)
A_red = A(rowIdx,:); % Get only those rows
V = [accumarray(A_red(:),1)]'; % Count occurrences of the 9 numbers
V(end+1:9) = 0; % If some numbers don't exist place zeros on their counts
end

Matlab (textscan), read characters from specified column and row

I have a number of text files with data, and want to read a specific part of each file (time information), which is always located at the end of the first row of each file. Here's an example:
%termo2, 30-Jan-2016 12:27:20
I.e. I would like to get "12:27:20".
I've tried using textscan, which I have used before for similar problems. I figured there are 3 columns of this row, with single white space as delimiter.
I first tried to specify these as strings (%s):
fid = fopen(fname);
time = textscan(fid,'%s %s %s');
I also tried to specify the date and time using datetime format:
time = textscan(fid,'%s %{dd-MMM-yyyy}D %{HH:mm:ss}D')
Both of these just produce a blank cell. (I've also tried a number of variations, such as defining the delimiter as ' ', with the same result)
Thanks for any help!
Here's the entire file (not sure pasting here is the right way to do this - i'm new to both matlab and stackoverflow..):
%termo2, 30-Jan-2016 12:27:20
%
%102
%
%stimkod stimtyp
% 1 Next:Pain
% 2 Next:Brush
% vaskod text
% 1 Obeh -> Beh
% 2 Inte alls intensiv -> Mycket intensiv
% stimnr starttid stimkod vaskod VASstart VASmark VAS
1 78.470 2 1 96.470 100.708 6.912
1 78.470 2 2 96.470 104.739 2.763
2 138.822 1 2 156.821 162.619 7.615
2 138.822 1 1 156.821 166.659 2.496
3 199.117 2 2 217.116 222.978 2.897
3 199.117 2 1 217.116 224.795 5.773
4 258.612 2 1 276.612 280.419 5.395
4 258.612 2 2 276.612 284.145 4.622
5 320.068 1 1 338.068 340.689 4.396
5 320.068 1 2 338.068 346.090 2.722
6 377.348 1 2 395.347 398.809 6.336
6 377.348 1 1 395.347 404.465 3.391
7 443.707 2 1 461.707 464.840 6.604
7 443.707 2 2 461.707 473.703 3.652
8 503.122 1 2 521.122 526.009 4.285
8 503.122 1 1 521.122 529.808 3.646
9 568.546 2 2 586.546 586.546 5.000
9 568.546 2 1 586.546 595.496 6.412
10 629.953 2 1 647.953 650.304 7.034
10 629.953 2 2 647.953 655.600 6.615
11 694.305 1 1 712.305 714.416 4.669
11 694.305 1 2 712.305 721.079 2.478
12 751.537 2 2 769.537 773.511 7.307
12 751.537 2 1 769.537 777.423 8.225
13 813.944 1 2 831.944 834.958 7.731
13 813.944 1 1 831.944 839.255 1.363
14 872.448 2 1 890.448 893.829 6.813
14 872.448 2 2 890.448 899.439 2.600
15 939.880 1 2 957.880 963.811 4.332
15 939.880 1 1 957.880 966.603 2.786
16 998.328 2 1 1016.327 1020.707 5.837
16 998.328 2 2 1016.327 1025.275 2.664
17 1062.911 1 2 1080.910 1082.967 2.792
17 1062.911 1 1 1080.910 1088.674 4.094
18 1125.182 1 1 1143.182 1144.379 0.619
18 1125.182 1 2 1143.182 1151.786 8.992

If you're not reading in the entire file, you could just read the first line using fgetl, split on the strings (using regexp) and then grab the last element.
parts = regexp(fgetl(fid), '\s+', 'split');
last = parts{end};
That being said, there doesn't seem to be anything wrong with the way you're using textscan if your file is actually how you say. You could alternately do something like:
parts = textscan(fid, '%s', 3);
last = parts{end}
Update
Also, be sure to rewind the file pointer using frewind before trying to parse the file to ensure that it starts at the top of the file.
frewind(fid)

need some help in matlab displaying

this is my code where it has a matrix where it adds each row with its col where
as example
sum of row 1 =4
sum of col 1= 10
my number will be 14
so far my code works correct where number calculates all rows with col and gives this display
number =
14 18 22 26
in my loop os where i went wrong it split it with a counter to show like
number 1 has 14
number 2 has 18
number 3 has 22
number 4 has 26
it works for the first one and goes to an infinite loop and even not displaying the rest of the numbers can you tell me how to work with this and where did i go wrong thank you
first loop result
matrix =
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 has 14 number
12 has 22 number
code
matrix=[1 2 3 4;1 2 3 4;1 2 3 4;1 2 3 4]
number= sum(matrix)+sum(matrix');
number
len= length(number);
x=1;
y=1;
number(1,y) ; %
while x<=len
fprintf('%x has %d number \n',x,number)
x+1;
y+1;
number
end
Desired output:
matrix =
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
number =
14 18 22 26
1 has 14
2 has 18
3 has 22
4 has 26

There are multiple errors in your loop:
y is unused, you can delete it.
numbers prints the full array each iteration, not what you want.
x+1 calculates x+1 but does nothing with the result. Use x=x+1; instead.
When using fprint, you are using a the full array number instead of the right number number(x) as an input argument.
matrix=[1 2 3 4;1 2 3 4;1 2 3 4;1 2 3 4]
number= sum(matrix)+sum(matrix');
number
len= length(number);
x=1;
while x<=len
fprintf('%x has %d number \n',x,number(x))
x=x+1;
end

matrix=[1 2 3 4;1 2 3 4;1 2 3 4;1 2 3 4]
number= sum(matrix)+sum(matrix');
number
len= length(number);
x=1;
while x<=len
fprintf('%x has %d number \n',x,number(x))
x=x+1;
end

Octave generate combination subsets

Given a number N, I would like to create a matrix of x columns with every combination of a subset of N. For example, if N is 16 and x is 3 then I should get a matrix of 560 rows and each row will have 3 columns and contain a unique combination from the numbers 1 to 16.
Can I use a function zzz(N,x) ?
I will be generating a lot of them with different N and x values so a for loop will slow things down.

Just use the nchoosek function:
N = 16;
x = 3;
nchoosek(1:N, x)
returns 560 rows like this:
. . .
. . .
. . .
1 2 13
1 2 14
1 2 15
1 2 16
1 3 4
1 3 5
1 3 6
1 3 7
. . .
. . .
. . .

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Create features (long vector) with scala - scala

Related

How I can use printf or disp in MATLAB to print some special format of my data set?

MATLAB - Frequency of an array element with a condition

Matlab (textscan), read characters from specified column and row

need some help in matlab displaying

Octave generate combination subsets

Categories

Resources