Selecting variables in a dataset according to values in another dataset - macros

I want to create a subset for a dataset which has around 100 variables and I wish to KEEP only those variables that are present as values of another variable in another dataset. Can someone pleae help me with a SPSS Syntax.
This is what it should look like:
DATASET ACTIVATE basedataset.
SAVE OUTFILE ='Newdata.sav'
/KEEP Var1.
Var 1 is the variable in the other dataset which contains all the values based on which i want to perform the subsetting.I am not sure if vector should be involved or if there is an easier way to do this.

The following will create a macro containing the list of variables you require, to use in your analysis or in subsetting the data.
First I'll create some sample data to demonstrate on:
data list free /v1 to v10 (10f3).
begin data
1,2,3,2,4,7,77,777,66,55
end data.
dataset name basedataset.
data list free/var1 (a4).
begin data
"v3", "v5", "v6", "v9"
end data.
dataset name varnames.
Now to create the list:
dataset activate varnames.
write out="yourpath\var1 selection.sps"
/"VARIABLE ATTRIBUTE VARIABLES= ", var1, " ATTRIBUTE=selectVars('yes')." .
exe.
dataset activate basedataset.
VARIABLE ATTRIBUTE VARIABLES=all ATTRIBUTE=selectVars('no').
insert file="yourpath\var1 selection.sps".
SPSSINC SELECT VARIABLES MACRONAME="!varlist" /ATTRVALUES NAME=selectVars VALUE = yes .
The list is now ready and can be called using the macro name !varlist in any command, for example:
freq !varlist.
or
SAVE OUTFILE ='Newdata.sav' /KEEP !varlist.

Related

Modelica (Dymola) : get a particular value of a timetable?

I have a model using a timeTable which represents a variable evolution. I would like to initialize a subcomponent's parameter with the first value of the table (time = 0 second).
The table's values are read from a .txt file. The idea would be to have a command as follow :
parameter Real InitialValue = timeTable.y[2](for timeTable.y[1] = 0)
Is there a command to do so ?
In some cases another option is to initialize the parameter at the output value of that table-component when starting the simulation:
model Demo
Modelica.Blocks.Sources.TimeTable timeTable(table=[0,1; 2,3]);
parameter Real initialValue(fixed=false);
initial equation
initialValue = timeTable.y;
end Demo;
This works for all variants in the same way, but only for the initial value. It is triggered by having fixed=false for a parameter and then giving an initial equation for it.
The solution depends on the block you are using and how the data is defined. Note that there is no easy solution for .txt files, so I recommend using .mat files instead.
1. Data from model
If you don't read from a file it is quite easy.
The data is stored as matrix in the parameter table and we can use array indexing to access it:
model Demo
Modelica.Blocks.Sources.TimeTable timeTable(table=[0,1; 2,3]);
parameter Real initialValue = timeTable.table[1, 2];
end Demo;
This works for both, the Modelica.Blocks.Sources.TimeTable and the CombiTimeTable found in the same package.
2. Data from .mat file
The MSL provides functions to access .mat files. You have to get the table size before you can read the data.
See the code below how this can be done.
model Demo2
import Modelica.Utilities.Streams.{readMatrixSize, readRealMatrix};
parameter String fileName = "C:/tmp/table.mat";
parameter String tableName = "tab1";
parameter Real initialValue = (readRealMatrix(fileName=fileName, matrixName=tableName, nrow=matrixSize[1], ncol=matrixSize[2]))[1, 2];
Modelica.Blocks.Sources.CombiTimeTable combiTimeTable(
tableOnFile=true,
tableName=tableName,
fileName=fileName)
annotation (Placement(transformation(extent={{-10,-10},{10,10}})));
protected
final parameter Integer matrixSize[2] = readMatrixSize(fileName, tableName);
end Demo2;
Note that we don't store the whole table in a variable. Instead,
we read it and access the element of intereset with [1, 2]. This requires putting brackets around the function call.

Is there a difference in performance between set/save when saving columns to tables?

I have a small utility that checks for new columns for an intraday hdb and adds new columns.
At the moment I am using :
.[set;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]
where path is :
`:path_to_hdb/2022.03.31/table01/newDummyThree
and
?[data;();();cls] // just an exec statement
Would it make any difference to use save instead:
.[save;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]
Yes. If you are adding entire columns to a table then you might want to store it splayed, i.e. as a directory of column files rather than as a single table file. This means using set rather than save.
https://code.kx.com/q/kb/splayed-tables/
But test actual example updates.
As mentioned in the documentation for save:
Use set instead to save
a variable to a file of a different name
local data
So set has the advantage of not requiring a global and you can name the file a different name to the name of your in-memory global variable.
There is no difference in how they serialise/write the data. In fact, save uses set under the covers anyway:
q)save
k){$[1=#p:`\:*|`\:x:-1!x;set[x;. *p]; x 0:.h.tx[p 1]#.*p]}'
By the way - you can't use save in the way that you've suggested in your post. save takes a symbol as input and this symbol is the symbol name of your global variable containing the data you want to write.

Synchronize timetables stored in a structure

I am dynamically storing data from different data recorders in timetables, nested in a structure DATA, such as DATA.Motor (timetable with motor data), DATA.Actuators (timetable with actuators data) and so on.
My objective is to have a function that synchronizes and merges these timetables so I can work with one big timetable.
I am trying to use synchronize to merge and synchronize those timetables:
fields = fieldnames(DATA);
TT = synchronize(DATA.(fields{1:end}));
but get the following error:
Expected one output from a curly brace or dot indexing expression, but there were 3 results.
This confuses me because DATA.(fields{1}) return the timetable of the first field name of the DATA structure.
Any thought on how I can solve this is greatly appreciated.
The problem here is that fields{1:end} is returning a "comma-separated list", and you're not allowed to use one of those as a struct dot-index expression. I.e. it's as if you tried the following, which is not legal:
DATA.('Motor','Actuators')
One way to fix this is to pull out the values from DATA into a cell array, and then you can use {:} indexing to generate the comma-separated list as input to synchronize, like this:
DATA = struct('Motor', timetable(datetime, rand), ...
'Actuators', timetable(datetime, rand));
DATA_c = struct2cell(DATA);
TT = synchronize(DATA_c{:});

BIRT - using report variable to pass data from outer to inner nested data set

Hoping someone can tell me what is wrong with this BIRT report. I am trying to use a nested scripted data set, where the outer data set passes data to the inner data set through a report variable.
I find that the report isn't acting as I thought it would. It seems as though the report variable is outputting the last value it has for every row. For the below report I am seeing output such as:
key0
value[9][0]
value[9][1]
value[9][2]
value[9][3]
value[9][4]
key1
value[9][0]
value[9][1]
value[9][2]
value[9][3]
value[9][4]
....
key9
value[9][0]
value[9][1]
value[9][2]
value[9][3]
value[9][4]
Whereas I would expect to see this:
key0
value[0][0]
value[0][1]
value[0][2]
value[0][3]
value[0][4]
key1
value[1][0]
value[1][1]
value[1][2]
value[1][3]
value[1][4]
....
key9
value[9][0]
value[9][1]
value[9][2]
value[9][3]
value[9][4]
My (fully self contained) example report is here: click to see report xml in pastebin.
The key idea is that in the outer data set's fetch, I set the report variable:
vars["values"] = value;
And the inner data set's fetch will grab it:
values = vars["values"].iterator();
and the inner data set's fetch will take data from the report variable:
row["value"] = values.next();
You should be able to use dataSet parameters, to do this. In your example, you'd set up an output parameter in the outer data set's dataset editor. You'd set the value of this parameter to be your values you're passing to the other dataSet.
In the inner dataSet, you'd create an input parameter to take the values. In your layout, you'd need to refresh the bindings on the outer list, so that the output parameter is a binding. Then, you'd go to the binding tab, for the inner list, and choose to pass the outer list's output parameter binding to your input parameter.
Hope this helps.

Dynamically reference temp-table column values in Progress

I am using Progress 4GL
I have a spreadsheet of data containing several columns called data1....50.
I have created a temp table which holds all the values.
Now I would like to loop through the temp table columns and do various calculations
So I need something like this:
for each record loop thru cols_in_temp_table .
if col_value = "XYZ" then
do calcs and stuff
end.
So how do I reference the temp_table cols ?
Ok, didn't resolve the original query, but found a workaround. Split the data up and put into separate tables, long winded, but does the trick.
Depending on your version, this is one way to do it:
DEFINE VARIABLE h-cols AS HANDLE NO-UNDO.
h-cols = tt-cols:BUFFER-HANDLE.
FOR EACH tt-cols
NO-LOCK:
IF h-cols::col-name = "some value" THEN
RUN do-something.
END.
For versions that can't do the "::" operator, do this:
FOR EACH tt-cols
NO-LOCK:
IF h-cols::buffer-field("col-name"):buffer-value = "some value" THEN
RUN do-something.
END.