I wonder what is the underliyng algorithm implemented in the function seqefsub. In the book chapter "Exploratory mining of life event histories", and I found this:
Efficient algorithms for extracting frequent subsequences have been pro-
posed in the literature among which the prominent ones are those of Bettini
et al. (1996), Srikant and Agrawal (1996), Mannila et al. (1997) and Zaki
(2001). The algorithm implemented in
TraMineR
is an adaptation of the
prefix-tree-based search described in Masseglia (2002).
However, the last reference is a PhD thesis in French. Is there any reference (in English) about this algorithm?
Thanks!
Victor
Related
The slides of my professor compare the "Neural Net Language Model" (Bengio et al., 2003) with Google's word2vec (Mikolov et al., 2013). It says that, differently from the Bengio's model, in word2vec "the projection layer is shared (not just the weight matrix)"
What does it mean? Shared across what?
The other differences are that there is no hidden layer in Mikolov's model, and that the context contains words from both the past and the future (while only words from the past are accounted for in Bengio's model).
I understood these latter differences, but I have difficulty in understading the "shared layer" concept.
I have crawled MTurk website. and I have 260 Hits as a dataset and from this dataset particular number of users has selected Hits and assigned ratings to each selected Hits. now I want to give recommendation to these users on basis of their selection. How it is possible ? Can anyone recommend me any recommendation algorithm ?
It sounds that You should go for the one of the Collaborative Filtering (CF) algorithm as users have explicit feedback in a form of ratings. First, I would suggest implementing a simple item/user-based k-Nearest Neighbours algorithm. If the results do not satisfy You and maybe Your data is very sparse - probably matrix factorization techniques should do the trick. A good recently survey which I read was [1] - it presents the different methods on different data settings.
If You fill fill comfortable with this and You realize that what You need is actually ranked list of Top-N predictions than ratings, I would suggest reading about e.g. Bayesian Personalized Ranking[2].
And the best part is - those algorithms are really well known and most of them are available for almost every programming language, e.g. python -> https://github.com/Mendeley/mrec/
[1] J. Lee, M. Sun, and G. Lebanon, “A Comparative Study of Collaborative Filtering Algorithms,” ArXiv, pp. 1–27, 2012.
[2] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-thieme, “BPR : Bayesian Personalized Ranking from Implicit Feedback,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 2009, vol. cs.LG, pp. 452–461.
I am trying to simplify a Boolean expression with exactly 39 inputs, and about 500 million - 800million terms (as in that many and/not/or statements).
A perfect simplification is not needed, but a good one would be nice.
I am aware of the K-maps , Quine–McCluskey, Espresso algorithms. However I am also aware that these mechanisms would take way too long to simplify a circuit of this size based on what I have read
I would need to simplify this expression as much as possible within a 24 hour period.
After google searching, I find it difficult to find any resources for attempting to simplify a machine of quite this magnitude! Any resources out there or a library out there that can attempt to at least simplify this to some extent within a 24 time period?
A greedy heuristic Simplify is described in the somewhat dated book
Robert K. Brayton , Gary D. Hachtel , C. McMullen , Alberto Sangiovanni-Vincentelli
Logic Minimization Algorithms for VLSI Synthesis
You can find the chapter online.
Simplify is based on the unate paradigm. In divide-and-conquer style, it recursively applies Shannon's expansion theorem to split the function into smaller sub-functions. The heuristic rule is to split by the most binate variable first, i.e. the variable which separates the largest number of terms.
A second approach could be to use graph partitioning tools like METIS to split the terms into independent (or at least loosely related) subsets. But I am not aware that this has been tried sucessfully for logic synthesis tasks. My favorite search engine is sceptical and does not return any hits.
A more recent algorithm based on Binary Decision Diagrams was published in
Olivier Coudert: Doing Two-Level Logic Minimization 100 Times Faster
The paper lists examples with very high number of terms similar to your task at hand.
A somewhat related simplification technique BDD Sweeping as described in A Study of Sweeping Algorithms in the Context of Model Checking.
This is a duplicate question. See https://stackoverflow.com/a/60535990/1531728 for resources about logic optimization, or simplication of boolean expressions.
I am reading about scala collection architecture. RNA term is mentioned here. Can somebody explain what it is?
From Wikipedia:
Ribonucleic acid (RNA) is a polymeric molecule implicated in various biological roles in coding, decoding, regulation, and expression of genes. RNA and DNA are nucleic acids, and, along with proteins and carbohydrates, constitute the three major macromolecules essential for all known forms of life. [...] Cellular organisms use messenger RNA (mRNA) to convey genetic information (using the letters G, U, A, and C to denote the nitrogenous bases guanine, uracil, adenine, and cytosine) that directs synthesis of specific proteins.
Of course this has nothing to do with Scala or programming as such - it is merely used as a real-life example in the tutorial you're reading.
It's an example problem domain used in that article. From there:
In the next few sections you’ll be walked through two examples that do this, namely sequences of RNA bases and prefix maps implemented with Patricia tries.
If you want to learn more about RNA, check out its Wikipedia page.
Discrete mathematics (also finite mathematics) deals with topics such as logic, set theory, information theory, partially ordered sets, proofs, relations, and a number of other topics.
For other branches of mathematics, there are tools that support programming. For statistics, there is R and S that have many useful statistics functions built in. For numerical analysis, Octave can be used as a language or integrated into C++.
I don't know of any languages or packages that deal specifically with discrete mathematics (although just about every language can be used to implement algorithms used in discrete mathematics, there should be libraries or environments out there designed specifically for these applications).
The current version of Mathematica is 7. License costs:
Home Edition: $295.
Standard: $2,495 Win/Mac/Linux PC ($3,120 for Solaris)
Government: $1,996 ($2,496 for Solaris)
Educational: $1,095 ($1,370 for Solaris)
Student: $139.95 (no Solaris)
Above, the Home Edition link says:
Mathematica Home Edition is a fully functional version of Mathematica Professional with the same features.
The current version of Maple is 12. License costs:
Student: $99
Commercial: $1,895
Academic: $995
Government: $1,795
And yes, check out Sage, mentioned above by Thomas Owens.
Mathematica
Mathematica has a Combinatorica package, which though quite venerable at this point, provides a good deal of support for combinatorics and graphs. Commands like this are available:
NecklacePolynomial[8, m, Cyclic];
GrayCodeSubsets[{1, 2, 3, 4}];
IntegerPartitions[6]
I'd say Mathematica is your best bet.. even if it does not come with some functionality out of the box, it has very well designed supplementary packages available for it on the net
check out http://www.wolfram.com/products/mathematica/analysis/
you might be interested in the links for Number Theory, Graph Visualizations
I also found Sage. It appears to be the closest thing to Mathematica that's open source, but I'm not sure how well it handles discrete mathematics.
Maple and Matlab would be a couple of Mathematical software packages that may cover part of what you want.
Stanford GraphBase, written primarily by Donald Knuth is a great package for combinatorial computing. I wouldn't call it an extensive code base, but it has great support for graphs and a great deal of discrete mathematics can be formulated in terms of graph theory. It's written in CWEB, which is (IMO) a more readable version of C.
EDIT: It's free.
I love Mathematica and used it to prototype ideas during my PhD in computational physics. However, Mathematica tries to be all things to all people and there are a few downsides:
Being a for-profit company, bug-fixes sometimes come in the next major release: you pay.
Being a proprietary product, sharing code with non-Mathematica people (the world) is problematic.
New features are often half-baked and break when you try to take it beyond the embedded example.
It's user base (tutorials, advice, external libraries) is less active than say python's,
Mulitpanel figures are difficult to generate; see SciDraw library.
That being said, Mathematica's core functionality is amazing for the following reasons:
Its default math functionality is quite robust allowing quick solutions.
It allows both functional and procedural programming.
One can quickly code & publish in a variety of formats: pdf, interactive website.
A new Discrete Book came out.
Bottom line
Apple users expecting ease of use, will like Mathematica for its Apple-like, get-up-and-go feel.
Linux users wanting extensibility, will find Mathematica frustrating for having its Apple-like, box-welded-shut design.