Can RNA assisted protein folding be performed computationally? - simulation

I am looking for tools/ softwares to perform protein folding in presence of its RNA partner. Part of my protein is disordered and I suspect that it will take up structure while interacting with its RNA partner. Till now I did not find any literature where computational tools like MD has been implemented to check for such folding.
If anyone has come across any literature or have performed such studies, can you kindly help me how to proceed with it?
Thanking You,
Dolly

Related

How to preprocess text for embedding?

In the traditional "one-hot" representation of words as vectors you have a vector of the same dimension as the cardinality of your vocabulary. To reduce dimensionality usually stopwords are removed, as well as applying stemming, lemmatizing, etc. to normalize the features you want to perform some NLP task on.
I'm having trouble understanding whether/how to preprocess text to be embedded (e.g. word2vec). My goal is to use these word embeddings as features for a NN to classify texts into topic A, not topic A, and then perform event extraction on them on documents of topic A (using a second NN).
My first instinct is to preprocess removing stopwords, lemmatizing stemming, etc. But as I learn about NN a bit more I realize that applied to natural language, the CBOW and skip-gram models would in fact require the whole set of words to be present --to be able to predict a word from context one would need to know the actual context, not a reduced form of the context after normalizing... right?). The actual sequence of POS tags seems to be key for a human-feeling prediction of words.
I've found some guidance online but I'm still curious to know what the community here thinks:
Are there any recent commonly accepted best practices regarding punctuation, stemming, lemmatizing, stopwords, numbers, lowercase etc?
If so, what are they? Is it better in general to process as little as possible, or more on the heavier side to normalize the text? Is there a trade-off?
My thoughts:
It is better to remove punctuation (but e.g. in Spanish don't remove the accents because the do convey contextual information), change written numbers to numeric, do not lowercase everything (useful for entity extraction), no stemming, no lemmatizing.
Does this sound right?
I've been working on this problem myself for some time. I totally agree with the other answers, that it really depends on your problem and you must match your input to the output that you expect.
I found that for certain tasks like sentiment analysis it's OK to remove lot's of nuances by preprocessing, but e.g. for text generation, it is quite essential to keep everything.
I'm currently working on generating Latin text and therefore I need to keep quite a lot of structure in the data.
I found a very interesting paper doing some analysis on that topic, but it covers only a small area. However, it might give you some more hints:
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
by Jose Camacho-Collados and Mohammad Taher Pilehvar
https://arxiv.org/pdf/1707.01780.pdf
Here is a quote from their conclusion:
"Our evaluation highlights the importance of being consistent in the preprocessing strategy employed across training and evaluation data. In general a simple tokenized corpus works equally or better than more complex preprocessing techniques such as lemmatization or multiword grouping, except for a dataset corresponding to a specialized domain, like health, in which sole tokenization performs poorly. Addi- tionally, word embeddings trained on multiword- grouped corpora perform surprisingly well when applied to simple tokenized datasets."
So many questions. The answer to all of them is probably "depends". It needs to be considered the classes you are trying to predict and the kind of documents you have. It's not the same to try to predict authorship (then you definitely need to keep all kinds of punctuation and case so stylometry will work) than sentiment analysis (where you can get rid of almost everything but have to pay special attention to things like negations).
I would say apply the same preprocessing to both ends. The surface forms are your link so you can't normalise in different ways. I do agree with the point Joseph Valls makes, but my impression is that most embeddings are trained in a generic rather than a specific manner. What I mean is that the Google News embeddings perform quite well on various different tasks and I don't think they had some fancy preprocessing. Getting enough data tends to be more important. All that being said -- it still depends :-)

What is RNA in scala?

I am reading about scala collection architecture. RNA term is mentioned here. Can somebody explain what it is?
From Wikipedia:
Ribonucleic acid (RNA) is a polymeric molecule implicated in various biological roles in coding, decoding, regulation, and expression of genes. RNA and DNA are nucleic acids, and, along with proteins and carbohydrates, constitute the three major macromolecules essential for all known forms of life. [...] Cellular organisms use messenger RNA (mRNA) to convey genetic information (using the letters G, U, A, and C to denote the nitrogenous bases guanine, uracil, adenine, and cytosine) that directs synthesis of specific proteins.
Of course this has nothing to do with Scala or programming as such - it is merely used as a real-life example in the tutorial you're reading.
It's an example problem domain used in that article. From there:
In the next few sections you’ll be walked through two examples that do this, namely sequences of RNA bases and prefix maps implemented with Patricia tries.
If you want to learn more about RNA, check out its Wikipedia page.

simple speech recognition methods

Yes, I'm aware that speech recognition is fairly complicated (as an understatement). What I'm looking for is a method for distinguishing between maybe 20-30 phrases. An ability to split words (discrete speech is fine) would be nice, but isn't required. The software will be user-dependent(i.e. for use by me). I'm not looking for existing software, but for a good way of going about doing this myself. I've looked into various existing methods and it seems like splitting the sound into phonemes, while common, is somewhat excessive for my needs.
For some context, I'm just looking for a way to control some aspects of my computer with a few simple voice commands. I'm aware that Windows already has speech recognition software, but I'd like to go about this one myself as a learning exercise. Commands would be simple like "Open Google", or "Mute". What I had in mind (not sure if this is a good idea) is that some commands would be compound. So "Mute" would just be "Mute". Whereas the "Open" command could be recognized individually, and then have its suffixes (Google, Photoshop, etc). recognized with another network/model/whatever. But I'm not sure if looking for prefixes/word breaks in this way would produce better results than having to deal with an increased number of individual commands.
I've been looking into perceptrons, hopfield networks (though they're somewhat obsolete from what I understand) and HMMs, and while I understand the ideas behind these (I've implemented the ANNs before) I don't really know which is best suited to this task. I'm assuming that linear vector quantization models would also be appropriate, but I can't really find much literature to this end. Any guidance/resources would be greatly appreciated.
There are some open source project in speech recognition:
HTK (Hidden Markov Models Toolkit)
Sphinx
Both have decoder, training, language model toolkits. Eveything to build a complete and robust speech recognizer.
Voxforge has acoustic and language models for both open source speech recognition toolkits.
Some time ago, I read a whitepaper about a limited vocabulary system, which used a simple recognition process. The system divided each utterance into a small number of bins (6 in time, and 4 in magnitude, if I remember correctly, for 24 total), and all it did was count the number of sample audio measurements in each bin. There was a fuzzy logic rule base which then interpreted each utterances 24 bin counts, and generated an interpretation.
I imagine that (for some applications) a simple matching process might work just as well, in which the 24 bin counts of the current utterance are simple matched against those of each of your stored prototypes, and the one with the least overall difference is the winner.

Is there a tool that supports discrete mathematics?

Discrete mathematics (also finite mathematics) deals with topics such as logic, set theory, information theory, partially ordered sets, proofs, relations, and a number of other topics.
For other branches of mathematics, there are tools that support programming. For statistics, there is R and S that have many useful statistics functions built in. For numerical analysis, Octave can be used as a language or integrated into C++.
I don't know of any languages or packages that deal specifically with discrete mathematics (although just about every language can be used to implement algorithms used in discrete mathematics, there should be libraries or environments out there designed specifically for these applications).
The current version of Mathematica is 7. License costs:
Home Edition: $295.
Standard: $2,495 Win/Mac/Linux PC ($3,120 for Solaris)
Government: $1,996 ($2,496 for Solaris)
Educational: $1,095 ($1,370 for Solaris)
Student: $139.95 (no Solaris)
Above, the Home Edition link says:
Mathematica Home Edition is a fully functional version of Mathematica Professional with the same features.
The current version of Maple is 12. License costs:
Student: $99
Commercial: $1,895
Academic: $995
Government: $1,795
And yes, check out Sage, mentioned above by Thomas Owens.
Mathematica
Mathematica has a Combinatorica package, which though quite venerable at this point, provides a good deal of support for combinatorics and graphs. Commands like this are available:
NecklacePolynomial[8, m, Cyclic];
GrayCodeSubsets[{1, 2, 3, 4}];
IntegerPartitions[6]
I'd say Mathematica is your best bet.. even if it does not come with some functionality out of the box, it has very well designed supplementary packages available for it on the net
check out http://www.wolfram.com/products/mathematica/analysis/
you might be interested in the links for Number Theory, Graph Visualizations
I also found Sage. It appears to be the closest thing to Mathematica that's open source, but I'm not sure how well it handles discrete mathematics.
Maple and Matlab would be a couple of Mathematical software packages that may cover part of what you want.
Stanford GraphBase, written primarily by Donald Knuth is a great package for combinatorial computing. I wouldn't call it an extensive code base, but it has great support for graphs and a great deal of discrete mathematics can be formulated in terms of graph theory. It's written in CWEB, which is (IMO) a more readable version of C.
EDIT: It's free.
I love Mathematica and used it to prototype ideas during my PhD in computational physics. However, Mathematica tries to be all things to all people and there are a few downsides:
Being a for-profit company, bug-fixes sometimes come in the next major release: you pay.
Being a proprietary product, sharing code with non-Mathematica people (the world) is problematic.
New features are often half-baked and break when you try to take it beyond the embedded example.
It's user base (tutorials, advice, external libraries) is less active than say python's,
Mulitpanel figures are difficult to generate; see SciDraw library.
That being said, Mathematica's core functionality is amazing for the following reasons:
Its default math functionality is quite robust allowing quick solutions.
It allows both functional and procedural programming.
One can quickly code & publish in a variety of formats: pdf, interactive website.
A new Discrete Book came out.
Bottom line
Apple users expecting ease of use, will like Mathematica for its Apple-like, get-up-and-go feel.
Linux users wanting extensibility, will find Mathematica frustrating for having its Apple-like, box-welded-shut design.

How do I estimate tasks using function points?

What are the steps to estimating using function points?
Is there a quick-reference guide of some sort out there?
I took a conference session on Function Point Analysis a few years back. There is a lot too it. You can check out the Free Function Point Training Manual online, the Fundamentals of Function Points, or I suspect you can get a book on it at a computer store.
You might also check out the International Function Point Users Group and see if they have some resources or a local meeting for you.
You really need to get some training on it. Check with IFPUG. You will unknowingly pick up some destructive bad habits if self-taught. It also helps to have an experienced FP analyst review some of your early attempts.
It's the kind of thing that appears overwhelmingly complex until you "get it" and then it's fairly quick to do. It improved my requirements analysis a lot too. I often spot contradictions and gaps when doing a count.
It isn't limited to BDUF Waterfall projects either. I spent three years using FP and Planning Poker as cross-checks on one another when contracting agile methods projects.
I was IFPUG-certified from 2002-2005 and am still using FP analysis. I've seen it misused a lot, and I think that's why it has such a bad reputation.
I recommend you take a look at COSMIC Function points. https://cosmic-sizing.org. COSMIC Function points are also an ISO standard for measuring software size. They are an evolved improvement over IFPUG.
You can quickly estimate size by counting the entries, exits, reads and writes.
Compared with the IFPUG manual, learning COSMIC is much easier, the free book below is all you need, and you can read it in a day.
Recommended reading: https://cosmic-sizing.org/publications/measurement-guide/