List of cases with specific sub sequence in TraMineR - traminer

I'm trying to see the cases that have a specific sub sequence or sub string.
In the user's guide section 10.5.1-10.5.2 it specifies how to find cases with a specific sub sequence:
mysubseqstr <- character(2)
mysubseqstr[1] <- "(Parent)-(Left)-(Left+Marr)"
mysubseqstr[2] <- "(Parent)-(Left+Marr)"
mysubseq <- seqefsub(bf.seqestate, strsubseq = mysubseqstr)
print(mysubseq)
Subsequence Support Count
1 (Parent)-(Left+Marr) 0.4870 974
2 (Parent)-(Left)-(Left+Marr) 0.2275 455
Computed on 2000 event sequences
Constraint Value
countMethod One by sequence
msubcount <- seqeapplysub(mysubseq, method = "count")
msubcount[1:3, ]
Following another question answered here (Find specific patterns in sequences, I can list the sequences that contain the sub sequence:
rownames(msubcount)[msubcount[,1]==1]
but I can't figure out how to get a list of id's (defined with the id= option in the seqdef function) have this sub sequence.

The ids passed with the id= argument of seqdef are used as row names for the state sequence object. Assuming your state sequence object is seq and that you created the event sequence object bf.seqestate from it (which you do not show in your code!), you get the ids of the sequences containing your sub sequences with:
rownames(seq)[msubcount[,1] > 0]
Notice, that I use here > because with method = "count" in seqeapplysub you get the number of occurrences of the sub sequence in each sequence, which can be greater than 1.

Related

Print the hash elements by grouping their values in Raku

I keep record of how many times a letter occur in a word e.g. 'embeddedss'
my %x := {e => 3, m => 1, b => 1, d => 3, s => 2};
I'd like to print the elements by grouping their values like this:
# e and d 3 times
# m and b 1 times
# s 2 times
How to do it practically i.e. without constructing loops (if there is any)?
Optional Before printing the hash, I'd like to convert and assing it to a temporary data structure such as ( <3 e d>, <1 m b>, <2 s> ) and then print it. What could be the most practical data structure and way to print it?
Using .categorize as suggested in the comments, you can group them together based on the value.
%x.categorize(*.value)
This produces a Hash, with the keys being the value used for categorization, and the values being Pairs from your original Hash ($x). This can be looped over using for or .map. The letters you originally counted are the key of the Pairs, which can be neatly extracted using .map.
for %x.categorize(*.value) {
say "{$_.value.map(*.key).join(' and ')} {$_.key} times";
}
Optionally, you can also sort the List by occurrences using .sort. This will sort the lowest number first, but adding a simple .reverse will make the highest value come first.
for %x.categorize(*.value).sort.reverse {
say "{$_.value.map(*.key).join(' and ')} {$_.key} times";
}

Scala - Not enough arguments for method count

I am fairly new to Scala and Spark RDD programming. The dataset I am working with is a CSV file containing list of movies (one row per movie) and their associated user ratings (comma delimited list of ratings). Each column in the CSV represents a distinct user and what rating he/she gave the movie. Thus, user 1's ratings for each movie are represented in the 2nd column from the left:
Sample Input:
Spiderman,1,2,,3,3
Dr.Sleep, 4,4,,,1
I am getting the following error:
Task4.scala:18: error: not enough arguments for method count: (p: ((Int, Int)) => Boolean)Int.
Unspecified value parameter p.
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count()
when I execute the few lines below. For the program below, the second line of code splits all values delimited by "," and produces this:
( Spiderman, [[1,0],[2,1],[-1,2],[3,3],[3,4]] )
( Dr.Sleep, [[4,0],[4,1],[-1,2],[-1,3],[1,4]] )
On the third line, taking the count() throws an error. For each movie (row), I am trying to get the number of common elements. In the above example, [-1, 2] is clearly a common element shared by both Spiderman and Dr.Sleep.
val textFile = sc.textFile(args(0))
var movieRatings = textFile.map(line => line.split(","))
.map(movingRatingList => (movingRatingList(0), movingRatingList.drop(1)
.map(ranking => if (ranking.isEmpty) -1 else ranking.toInt).zipWithIndex));
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count() )).saveAsTextFile(args(1));
My target output of line 3 is as follows:
( Spiderman, Dr.Sleep, 1 ) --> Between these 2 movies, there is 1 common entry.
Can somebody please advise ?
To get the number of elements in a collection, use length or size. count() returns number of elements which satisfy some additional condition.
Or you could avoid building the complete intersection by using count to count the elements of the first collection which the second contains:
movieRating1._2.count(movieRating2._2.contains(_))
The error message seems pretty clear: count takes one argument, but in your call, you are passing an empty argument list, i.e. zero arguments. You need to pass one argument to count.

unique with "with" operator in systemverilog

I am a new SystemVerilog user and I have faced a strange (from my point of view) behavior of combination of unique method called for fixed array with with operator.
module test();
int arr[12] = '{1,2,1,2,3,4,5,8,9,10,10,8};
int q[$]
initial begin
q = arr.unique() with (item > 5 ? item : 0);
$display("the result is %p",q);
end
I've expected to get queue {8,9,10} but instead I have got {1,8,9,10}.
Why there is a one at the index 0 ?
You are trying to combine the operation of the find method with unique. Unfortunately, it does not work the way you expect. unique returns the element, not the expression in the with clause, which is 0 for elements 1,2,3,4 and 5. The simulator could have chosen any of those elements to represent the unique value for 0(and different simulators do pick different values)
You need to write them separately:
module test();
int arr[$] = '{1,2,1,2,3,4,5,8,9,10,10,8};
int q[$]
initial begin
arr = arr.find() with (item > 5);
q = arr.unique();
$display("the result is %p",q);
end
Update explaining the original results
The with clause generates a list of values to check for uniqueness
'{0,0,0,0,0,0,0,8,9,10,10,8};
^. ^ ^ ^
Assuming the simulator chooses the first occurrence of a replicated value to remain, then it returns {arr[0], arr[7], arr[8], arr[9]} from the original array, which is {1,8,9,10}

q - internal state in while loop

In q, I am trying to call a function f on an incrementing argument id while some condition is not met.
The function f creates a table of random length (between 1 and 5) with a column identifier which is dependent on the input id:
f:{[id] len:(1?1 2 3 4 5)0; ([] identifier:id+til len; c2:len?`a`b`c)}
Starting with id=0, f should be called while (count f[id])>1, i.e. so long until a table of length 1 is produced. The id should be incremented each step.
With the "repeat" adverb I can do the while condition and the starting value:
{(count x)>1} f/0
but how can I keep incrementing the id?
Not entirely sure if this will fix your issue but I was able to get it to work by incrementing id inside the function and returning it with each iteration:
q)g:{[id] len:(1?1 2 3 4 5)0; id[0]:([] identifier:id[1]+til len; c2:len?`a`b`c);#[id;1;1+]}
In this case id is a 2 element list where the first element is the table you are returning (initially ()) and the second item is the id. By amending the exit condition I was able to make it stop whenever the output table had a count of 1:
q)g/[{not 1=count x 0};(();0)]
+`identifier`c2!(,9;,`b)
10
If you just need the table you can run first on the output of the above expression:
q)first g/[{not 1=count x 0};(();0)]
identifier c2
-------------
3 a
The issue with the function f is that when using over and scan the output if each iteration becomes the input for the next. In your case your function is working on a numeric value put on the second pass it would get passed a table.

Map word ngrams to counts in scala

I'm trying to create a map which goes through all the ngrams in a document and counts how often they appear. Ngrams are sets of n consecutive words in a sentence (so in the last sentence, (Ngrams, are) is a 2-gram, (are, sets) is the next 2-gram, and so on). I already have code that creates a document from a file and parses it into sentences. I also have a function to count the ngrams in a sentence, ngramsInSentence, which returns Seq[Ngram].
I'm getting stuck syntactically on how to create my counts map. I am iterating through all the ngrams in the document in the for loop, but don't know how to map the ngrams to the count of how often they occur. I'm fairly new to Scala and the syntax is evading me, although I'm clear conceptually on what I need!
def getNGramCounts(document: Document, n: Int): Counts = {
for (sentence <- document.sentences; ngram <- nGramsInSentence(sentence,n))
//I need code here to map ngram -> count how many times ngram appears in document
}
The type Counts above, as well as Ngram, are defined as:
type Counts = Map[NGram, Double]
type NGram = Seq[String]
Does anyone know the syntax to map the ngrams from the for loop to a count of how often they occur? Please let me know if you'd like more details on the problem.
If I'm correctly interpreting your code, this is a fairly common task.
def getNGramCounts(document: Document, n: Int): Counts = {
val allNGrams: Seq[NGram] = for {
sentence <- document.sentences
ngram <- nGramsInSentence(sentence, n)
} yield ngram
allNgrams.groupBy(identity).mapValues(_.size.toDouble)
}
The allNGrams variable collects a list of all the NGrams appearing in the document.
You should eventually turn to Streams if the document is big and you can't hold the whole sequence in memory.
The following groupBycreates a Map[NGram, List[NGram]] which groups your values by its identity (the argument to the method defines the criteria for "aggregate identification") and groups the corresponding values in a list.
You then only need to map the values (the List[NGram]) to its size to get how many recurring values there were of each NGram.
I took for granted that:
NGram has the expected correct implementation of equals + hashcode
document.sentences returns a Seq[...]. If not you should expect allNGrams to be of the corresponding collection type.
UPDATED based on the comments
I wrongly assumed that the groupBy(_) would shortcut the input value. Use the identity function instead.
I converted the count to a Double
Appreciate the help - I have the correct code now using the suggestions above. The following returns the desired result:
def getNGramCounts(document: Document, n: Int): Counts = {
val allNGrams: Seq[NGram] = (for(sentence <- document.sentences;
ngram <- ngramsInSentence(sentence,n))
yield ngram)
allNGrams.groupBy(l => l).map(t => (t._1, t._2.length.toDouble))
}