Aggregate Data/ Create a New Variable in SPSS - aggregate

I have looked at the aggregate data questions on this forum and elsewhere - but I don't quite see an answer that helps me but that may be non-understanding on my behalf.I apologize. I have a huge amount of raw data test findings and I want to aggregate certain scores into one mean score. I can do this by creating a new variable in compute > transform but I cannot do this for 330+ by adding it 1 + 2 + 3 all by hand. My question is: How can I aggregate hundreds of scores and calculate a new mean score as a new variable in a quick and intelligent fashion that so far eludes me? For example, I have 339 latency measures for 50 participants. I want to calculate ONE overall latency score as a new variable. Thanks! I am desperate for direction.

If your variables are contiguous in the file you can use the SUM command using TO to specify the variables in simple syntax.
COMPUTE SumX = SUM(X1 to X330).

Related

Modeler question: Is there a function in SPSS for multiple 'if' statements? Forecasting dates

I am trying to build a forecast for interest expense for floating debt in my company.
I have been given a set of ResetDates which help me match a given rate based on when the ResetDate is.
I have been successful in forecasting one period, but I need a much longer set of periods to satisfy my requirements.
I've tried derive nodes and nested if statements as well as filler nodes.
I am given this data to work with, I can only look at one ResetDate ahead.
Here you will find the data I used: Columns A/B/C/D is what i'm given, Column E (or 5th column from left to right) is what I want to derive as my output
I want to use 'InterestPayDate' and derive:
if it's more than 'NextReset' , the add 90 days to the 'NextReset' to create 'NextReset2'
That is as far as I can get.... where my problem lies is I want to look at NextReset2 and derive:
if 'InterestPayDate' is more than 'NextReset2', then add 90 days to 'NextReset2', if it's less than 'NextReset2', keep the current value for 'NextReset2'
Output should look like Column E here
Not sure if I need to dig deeper into the logical functions, in all honesty, I've just picked up SPSS and I am really trying to learn. Hopefully, you can point me in the right direction.
Thank you.
After computing the first NextReset2, you need to use a Filler node like the one below to change the value of the field.
You might need more than one identical nodes like this - one for each potential 90-day period that you are looking to extend the NextReset2 date. In your sample data, you will need at least two Filler nodes to get the correct value of NextReset2 for the last of the records.
There might be a more elegant way to do it, but this will work and it's easy enough to make copies of a node and string them together like this.
Please also see a sample IBM SPSS Modeler stream showing this approach here and using your sample data.

Tableau Time Series Prediction using Python Integration

I need help regarding the time series in Tableau. So far Here is what I can do.
Connect to TabPY
Call / Run scripts on TabPy
My current issue is that tableau doesn't seem to allow more output than input elements. Say I want to use the last 100 data points to predict the coming 10 points. Input of the data to python isn't a problem. The problem comes when I want to return a list with 110 elements. I've also tried returning the 10 elements and it complaints that it expects 100 elements list.
Thanks for reading
I've found a work around. You can see the post here for more information. Basically you shift the original values by the prediction amount and then have the prediction return the same amount as the shifted original

Generate subset of data with known mean

I have a dataset of n observations (nx1 vector) and would like to create a subset of this data, whose mean is known in advance, by selecting at random only n/3 observations (or within some constraint, ie where the mean of the data subset is within a range about the known mean).
Can someone please help me with the code do this in matlab?
Note, I don't want to use the rand function to create random data as I already have my data collected.
For example on a smaller scale: If I had the following dataset of 12 observations:
data = [8;7;4;6;9;6;4;7;3;2;1;1];
but then wanted to randomly select a subset of this data containing only 4 observations with a mean of 4 (or with a mean between 3.5-4.5 for example):
Then the answer might be datasubset=[7;3;2;4] but the answer could also be datasubset=[6;4;2;4] or datasubset=[6;4;3;4].
It doesn't matter if there are several possible solutions, I just need one of them, though I'd like to know the alternative solutions also.

Defining locality in pseudocode

I'm trying to figure out which locality (spatial/temporal) is used in the following pseudo code and how?
for i = 0, i < 10, i++
sum = sum + array[i]
I hope my question is clear and somebody could help me, thanks in advance!
Steven
Generally, given the code-snippet, one can't just determine easily about spatial locality unless the whole code is given.
Temporal Locality refers to the reuse of specific data, and/or resources, within a relatively small time duration..
Whereas, Spatial Locality refers to the use of data elements within relatively close storage locations.
Next, considering this snippet, as sum is to be called 10 times in 10 iterations of i, hence, repetitive reference to sum depicts temporal locality.

Find best available data in a given data set according to input data using WEKA?

I tried to use a clear title. What I try to achieve is that I have a list of data as below
ID - ID of people, not important in calculation, but need for output to determine the person
Education {1=Degree, 2=Master, 3=PhD}
CGPA - value from 2.00 until 4.00
Computer = {1=Yes, 0 = No} (Computer knowledge)
Oversea = {1 = Yes, 0 = No} (willing to travel oversea)
ID,Education,CGPA,Computer,Oversea
001,3,3.14,1,0
002,1,3.68,1,1
003,2,2.76,0,1
..........
.........
Say I have 1,000 rows with different values. My purpose is, I want to give similar 1 row of data and get the closest record out of 1,000 rows. I am using WEKA.
I am trying to do something like finding the best resume for a particular job.
I have checked and did many examples to understand better about WEKA, but I just cant get it done. I am new to WEKA. I tried classifiers and decision trees but couldnt. I am able to get the prediction out of given data, but I cannot filter data list according to given input.
Any help much appreciated. Any link that directs me to any article about this, or any idea or even any single sparkle will be useful.
Sounds like you want to use a nearest neighbour classifier (IBk in Weka). If you're using the Weka GUI, you can only get the class, so you'll have to implement some code to retrieve the actual nearest neighbour.
Have a look at this question for a way of doing this.