Transpose in Scala - scala

I am new to Scala. I am currently working on a jupyter notebook Apache Toree - Scala.
I have the following:
dt_analysis
metric_name
mean_value
2021-08-13
FID
337.07522229312184
2021-08-13
CLS
0.4778479558664849
2021-08-13
LCP
27853.39253624655
2021-08-14
LCP
1503.4752384264077
2021-08-14
CLS
0.4933521946102963
2021-08-14
FID
119.07927683323628
2021-08-15
LCP
1654.1969758061045
which is a "'void' class".
I wish to tranpose it and have metric_name as columns.
Note: there is one value per metric per dt_analysis.

Related

SPSS Merging Data with duplicate Keys

I am currently attempting to join 2 datasets using SPSS syntax but am struggling as I have duplicate values on the keys. I would like for the joined data to be duplicated for each instance of the key on the source dataset (or other way round as it doesn't matter which is the source).
The datasets are like the following -
Data1 (3rd column placeholder)
batch
run
date
A
1
1
A
2
1
A
3
1
B
1
1
C
1
1
C
2
1
D
1
1
E
1
1
Data2
batch
Value1
Value2
A
1
21
A
2
22
A
3
23
A
4
24
B
5
25
B
6
26
B
7
27
B
8
28
C
9
29
C
10
30
C
11
31
C
12
32
D
13
33
D
14
34
D
15
35
D
16
36
E
17
37
E
18
38
E
19
39
E
20
40
Current attempt
What I have just now is a method where I CASETOVARS on Data1 before matching it onto Data2 and then VARSTOCASES to expand it out. This works perfectly with my test data but, unfortunately, it requires that I know exactly how many 'runs' there will be. That will not be known in production. It could be 1 or more.
Is there a method to join these datasets while expanding the joined data into the multliple cases in the source?
I am open to using macros but am not able to utilise Python solutions for this (which would probably be easier!).
edit - Unfortunately, extensions are also not possible for me to use.
CASESTOVARS
/ID = batch .
DATASET ACTIVATE data2 .
MATCH FILES
/FILE = *
/TABLE = data1
/BY batch .
EXECUTE .
VARSTOCASES
/MAKE run FROM BATCH_RUN_ID.1 TO BATCH_RUN_ID.3 .
EXECUTE .
If Python and dependent extention command are not availabe, here's an idea how to solve the dynamic list length for the varstocases phase.
What you'll do is basically to create a new dataset with the maximum number of runs possible, attach your read dataset to it, and then set the varstocases to go for that maximum number of runs (blank rows are dropped automatically):
dataset name orig.
data list free/throwthisrow (f1) BATCH_RUN_ID.1 to BATCH_RUN_ID.50 (50F8.2) .
begin data
1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
end data.
add files /file=* /file=orig .
EXECUTE.
select if missing(throwthisrow).
VARSTOCASES
/MAKE run FROM BATCH_RUN_ID.1 TO BATCH_RUN_ID.50 /drop throwthisrow.
EXECUTE .
To complete your present approach you can use spssinc select variables extention command (see examples of use here and here and here). You will use it to automatically create a list of the variables you want to name in your varstocases command, so that the syntax will automatically adapt itself to the number of runs in the data:
So after varstocases and match files:
spssinc select variables macroname="!from" /properties pattern = "BATCH_RUN_ID".
VARSTOCASES /MAKE run FROM !from .

What is neuroph GUI import file format?

Just starting with Neuroph NN GUI. Trying to create a dataset by importing a .csv file. What's the file format supposed to be?
I have 3 inputs and 1 output so I assumed the format of the import file would be ..
1,2,3,4
6,7,8,9
But I get error 9, or 4 or 10 depending on what combination I try of newlines, commas etc.
Any help out there ?
many thanks,
john.
That's because you aren't counting with the output column. The lastest columns are for the output.
So, for example, if you have 10 inputs and 1 output, your file will need to have 11 columns.
I came here, because the Neurophy can't import CSVs with title line. Example of a data file that works for me:
1.0,1.0,1.0
1.0,2.0,2.0
1.0,3.0,3.0
1.0,4.0,4.0
1.0,5.0,5.0
1.0,6.0,6.0
1.0,7.0,7.0
1.0,8.0,8.0
1.0,9.0,9.0
1.0,10.0,10.0
2.0,1.0,2.0
2.0,2.0,4.0
2.0,3.0,6.0
2.0,4.0,8.0
2.0,5.0,10.0
2.0,6.0,12.0
2.0,7.0,14.0
2.0,8.0,16.0
2.0,9.0,18.0
2.0,10.0,20.0

Warning: Could not start Excel server for import, 'basic' mode will be used. Refer to HELP XLSREAD for more information.

The data looks like this:
'name' 'date' value1 value2
'A' 'E' 350 25
'B' 'F' 204 22
'C' 'G' 1022 117
'D' 'H' 369 53
I want to store the last two columns into a matrix. I tried
column=xlsread('fileName.xlsx', 'D:D')
and
[num,txt,raw] = xlsread('fileName.xlsx')
and I got errors like the following:
Warning: Could not start Excel server for import, 'basic' mode will be used. Refer to HELP XLSREAD for more information.
I found the following related post.
How do you use xlsread with MATLAB and OS X?. But the matlab I am using is R2014b. It should be able to work, isnot it? Many thanks.

What am I doing wrong while importing the following data into sas

I am trying to import certain data into my SAS datset using this piece of code:
Data Names_And_More;
Infile 'C:\Users\Admin\Desktop\Torrent Downloads\SAS 9.1.3 Portable\Names_and_More.txt';
Input Name & $20.
Phone : $20.
Height & $10.
Mixed & $10.;
run;
The data in the file is as below:
Roger Cody (908)782-1234 5ft. 10in. 50 1/8
Thomas Jefferson (315)848-8484 6ft. 1in. 23 1/2
Marco Polo (800)123-4567 5Ft. 6in. 40
Brian Watson (518)355-1766 5ft. 10in 89 3/4
Michael DeMarco (445)232-2233 6ft. 76 1/3
I have been trying to learn SAS and while going through Ron Cody's book Learning SAS by example,I found to import the kind of data above, we can use 'the ampersand (&) informat modifier. The ampersand, like the colon,says to use the supplied informat, but the delimiter is now two or more blanks instead of just one.' (Ron's words, not mine). However, while importing this the result (dataset) is as follows:
Name Phone Height Mixed
Roger Cody (908)782- Thomas Jefferson Marco Polo
Also, for further details the SAS log is as follows:
419 Data Names_And_More;
420 Infile 'C:\Users\Admin\Desktop\Torrent Downloads\SAS 9.1.3 Portable\Names_and_More.txt';
421 Input Name & $20.
422 Phone : $20.
423 Height & $10.
424 Mixed & $10.
425 ;run;
NOTE:
The infile 'C:\Users\Admin\Desktop\Torrent Downloads\SAS 9.1.3 Portable\Names_and_More.txt' is:
File Name=C:\Users\Admin\Desktop\Torrent Downloads\SAS 9.1.3 Portable\Names_and_More.txt,
RECFM=V,LRECL=256
NOTE:
LOST CARD.
Name=Brian Watson (518)35 Phone=Michael Height=DeMarco (4 Mixed= ERROR=1 N=2
NOTE: 5 records were read from the infile 'C:\Users\Admin\Desktop\Torrent Downloads\SAS 9.1.3
Portable\Names_and_More.txt'.
The minimum record length was 37.
The maximum record length was 47.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.NAMES_AND_MORE has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.17 seconds
cpu time 0.14 seconds
I am looking for some help with this one. It'd be great if someone can explain what exactly is happening, what am I doing wrong and how to correct this error.
Thanks
The answer is in the explanation in Ron Cody's book. & means you need two spaces to separate varaibles; so you need a second space after the name (and other fields with &).
Wrong:
Roger Cody (908)782-1234 5ft. 10in. 50 1/8
Right:
Roger Cody (908)782-1234 5ft. 10in. 50 1/8

Can you capture the output of ipython's magic methods? (timeit)

I want to capture and plot the results from 5 or so timeit calls with logarithmically increasing sizes of N to show how methodX() scales with input.
So far I have tried:
output = %timeit -r 10 results = methodX(N)
It does not work...
Can't find info in the docs either. I feel like you should be able to at least intercept the string that is printed. After that I can parse it to extract my info.
Has anyone done this or tried?
PS: this is in an ipython notebook if that makes a diff.
This duplicate question Capture the result of an IPython magic function has an answer demonstrating that this has since been implemented.
Calling the %timeit magic with the -o option like:
%timeit -o <statement>
returns a TimeitResult object which is a simple object with all information about the %timeit run as attributes. For example:
In [1]: result = %timeit -o 1 + 2
Out[1]: 10000000 loops, best of 3: 23.2 ns per loop
In [2]: result.best
Out[2]: 2.3192405700683594e-08
PS: this is in an ipython notebook if that makes a diff.
No it does not.
On dev there is te %%capture cell magic.
The other way would be to modify the timeit magic to return value instead of printing, or use the timeit module itself. Patches welcomed.