How to do recursive calculation in SPSS Modeler - spss-modeler

If I want to compute a value that relies on the previous one (Recursive functions) how can I do it in SPSS ? Example:
Q0 = 0
Qn = Q(n-1) + Constant

If by "... the previous one ..." you mean the value of the same field (or a different field) for the previous record, you can use the #OFFSET(FIELD, EXPR) function.
The function allows you to access values from records other than the current one based on a relative reference.

After many research I couldn't find any way to do recursive function with SPSS Modeler. The only work around is to use R Transform node within SPSS. HTH.

Depending on what you need to do, you can either chain many derive nodes or refer to the previous value in a column after sorting them.

I started with creating a domain context for the stream data flow (iterations) with a simple csv source file with records keeping one field N (range from 1 to 100), just to limit the example. Then I connected this data source with a derive node that defines the variable field Q:
if not(#NULL(#OFFSET(N,1))) then #OFFSET(Q,1) + 2 else 0 endif
Here I used the value 2 for the Constant in the example above. I see this being a recursive function and it relies on the OFFSET just as Kenneth suggested above.

Related

No range function with step in azure data factory

I have a Set Variable activity which uses the logic:
#range(int(pipeline().parameters.start),int(pipeline().parameters.end))
It is wierd that I cant find any logic in documents where I can mention a step so that I can generate few numbers as shown below
1,3,5,7,9,...
Is there work around to it, other than introducin a new parameter that is equal to step and generate next number using logic last = last+step.
It is possible to do this using the Filter activity and the range function. Use the range function to generate all numbers and then the Filter condition with mod to get odd numbers, ie
Property
Value
Items
#range(1,10)
Condition
#equals(mod(item(),2),1)
A screenprint of the results:
The other way to do it would be just use a Lookup activity and query a numbers table.
I agree with you that it's a shame range does not have a step argument, and that generally the ADF expression language isn't a bit more fully featured.

Error in makeClassifTask - columns to join must specify "on="

I am getting an error here for the makeClassifTask() from MLR package.
task = makeClassifTask(data = data[,2:20441], target='Disease')
Entering this I get this error.
Provided data is not a pure data.frame but from class data.table, hence it will be converted.
Error in [.data.table(data, target) :
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
If someone could help me out it'd be great.
Given that you did not provide the data I can only do some guessing and suggest to read the documentation at https://mlr3book.mlr-org.com/tasks.html.
It looks like you left out the first column in your dataset which might be your target. Hence makeClassifTask() cannot find your target column.
As #Shreyash Gputa pointed out correctly, changing the data.table object to a data.frame object solves the issue:
task = makeClassifTask(data = as.data.frame(data[,2:20441]), target='Disease')
Given of course that data[,2:20441] contains the target variable Disease...

Scala Nested Iteration within RDD

I have to iterate through all columns to find similarity of 1 column value. For example:
ID,FN,LN,Phone
-----------
1,James,Butt,872-232-1212
2,Josephine,Darakjy, 872-232-1213
3,Art,Venere,872-232-1214
4,Lenna,Paprocki,872-232-1215
5,Donette, Foller,872-232-1216
6,Jmes,Butt,666-232-1212
7,Donette, Foller,888-232-1216
8,Josphne,Darkjy, 555-232-1213
Inside the loop, I will take FN, which is 'James' and see if I have similar name in the complete data set using some kind string distances (e.g Levenshtein) and in this case I have match with ID#6: 'Jmes', I will create a bucket by adding a new GUID column this:
ID,FN,LN,Phone,GrupId
----------------------
1,James,Butt,872-232-1212,G1
2,Josephine,Darakjy, 872-232-1213,G2
3,Art,Venere,872-232-1214,G3
4,Lenna,Paprocki,872-232-1215,G4
5,Donette, Foller,872-232-1216,G5
6,Jmes,Butt,666-232-1212,G1
7,Donette, Foller,888-232-1216,G5
8,Josphne,Darkjy, 555-232-1213,G2
I have to do same operation on multiple columns, like LN, Phone as well. Imagine if I have 1 million records.
Any thoughts, suggestions or links are appreciated. Thank you!
I would definitely not try anything pairwise and would rather think towards coding a per-field Levenshtein-y index and accumulate results on the fly. I’d probably start from a suffix tree -ish one.
Will try to sketch a prototype as soon as I get to the laptop...
Update: after some reading I am leaning towards Affinity Clustering1 combined with pairwise (yes I know) Levenshtein cached on a Trie2. Code in progress...

neo4j Similarity cosine graphaware

How do i write a statement for similarity cosine using ga.nlp.ml.similarity.cosine for node News:
CREATE (n:News)
SET n.text = "Scores of people were already lying dead or injured inside a crowded Orlando nightclub,
and the police had spent hours trying to connect with the gunman and end the situation without further violence.
But when Omar Mateen threatened to set off explosives, the police decided to act, and pushed their way through a
wall to end the bloody standoff.";
What is the proper syntax?
This is the call structure:
CALL ga.nlp.ml.similarity.cosine([<nodes>],depth,Query,Relationship type)
//nodes->The list of annotated nodes for which it will compute the distances
//depth->Integer. if 0, it will not use Concept Net 5 imported data for the distance computing. If greater than 0 it will consider concepts during computation, the value will define how much in general it should go.
//Query->String. It is the query that will be used to compute the tags vector, some are already defined, so this cold be null
//Relationship Type->String. The name to assign to the Relationship created between AnnotatedText nodes.
This is an example:
MATCH (a:AnnotatedText)
with collect(a) as list
CALL ga.nlp.ml.similarity.cosine(list, 0, null, "SIMILARITY") YIELD result
return result
CALL ga.nlp.ml.similarity.cosine([<nodes>],depth,Query,Relationship type)
//nodes->Must be annotated nodes
//depth->integer data
//Query->String
//Relationship Type->String

Using variable names from a table in Matlab

I have written a small model in Matlab. This model analyses several supply nodes to meet the required amount of demand, in a demand node. Supply nodes are specified in a vector, in which for each timestep the available supply is given.
To meet the demand, supply nodes are analysed subsequently whether they can meet the demand, and the fluxes from the supply nodes to the demand node are updated accordingly. This analysis now uses a fixed order, which is defined by the script code. In pseudocode:
for timestep=1:end
if demand(timestep) > supply_1(timestep)
supply_1_demand(timestep) = supply_1(timestep)
else
supply_1_demand(timestep) = demand(timestep)
end
if remaining_demand(timestep) > supply_2(timestep)
supply_2_demand(timestep) = supply_2(timestep)
else
supply_2_demand(timestep) = demand(timestep)
end
# etcetera, etcetera
end
However, this order in which the supply nodes are analysed must be varied. I would like to read this order from a table, where the order of analysis is given by the order in which they are presented in the table. Thus, the table can look like this
1 supply_4
2 supply_1
3 supply_5
# etcetera
Is there a way of reading variable names from such a table? Preferably, this would be without using eval, as this is very slow (as I've heard), and the model will be extended to quite a lot of nodes and fluxes.
Maybe you can use structures:
varNames={'supp_1','supp_2','supp_3'};
supply.(varNames{1}) = 3; %%% set a variable by name
display(supply.(varNames{1})) %%% get value by name
ans =
3