Biojava secondary structure prediction - prediction

Is there any method to use in biojava to predict secondary structure from a given sequence?
Or if not does any anyone how can i implement it? any source code? Any exe to recommend?

I think it is not available in BioJava but what you can do is to do a BLAST search by using BioJava libraries (http://biojava.org/wiki/BioJava:CookBook3:NCBIQBlastService) or RCSB web page and from your BLAST search you can find a protein with 3D structure. Afterwards you can use suggestDomains(Structure s) method of LocalProteinDomainParser class in org.biojava.bio.structure.domain package. Domains might give you some ideas about secondary structures.

I don't think there is secondary structure prediction from sequence in biojava. However there is secondary structure assignment given the structure, based on the DSSP algorithm, see https://github.com/biojava/biojava-tutorial/blob/master/structure/secstruc.md

Related

Embedding version control of the weaviate

In Weaviate, the vector engine, I wonder how this can handle version issue of embedding model.
For instance, considering the (trained) word2vec model, embedded vectors from different models must be seperated.
One option might think is that make distinct multiple classes representing model version.
Custom script may useful. If new model available, create new class and import accorded data. After that, change (GET) entrypoints (used for searching nearest vectors) to the new class.
Or maybe weaviate have other fancy way to handle this issue, but I couldn't find.
As at version 1.17.3, you have to manage this yourself because weaviate only supports one embedding per object.
There is a feature request to allow multiple embeddings per object here. But it sounds like your request is closer to this one. In any case, have a look at them and upvote the one that addresses your need so the engineering team can prioritize accordingly. Also, feel free to raise a new feature request if neither of these addresses your needs.

What is currently the best way to add a custom dictionary to a neural machine translator that uses the transformer architecture?

It's common to add a custom dictionary to a machine translator to ensure that terminology from a specific domain is correctly translated. For example, the term server should be translated differently when the document is about data centers, vs when the document is about restaurants.
With a transformer model, this is not very obvious to do, since words are not aligned 1:1. I've seen a couple of papers on this topic, but I'm not sure which would be the best one to use. What are the best practices for this problem?
I am afraid you cannot easily do that. You cannot easily add new words to the vocabulary because you don't know what embedding it would get during training. You can try to remove some words, or alternatively you can manually change the bias in the final softmax layer to prevent some words from appearing in the translation. Anything else would be pretty difficult to do.
What you want to do is called domain adaptation. To get an idea of how domain adaptation is usually done, you can have a look at a survey paper.
The most commonly used approaches are probably model finetuning or ensembling with a language model. If you want to have parallel data in your domain, you can try to fine-tune your model on that parallel data (with simple SGD, small learning rate).
If you only have monolingual data in the target language, you train a language model on that data. During the decoding, you can mix the probabilities from the domain-specific language and the translation model. Unfortunately, I don't know of any tool that could do this out of the box.

Deploy Knowledge Studio dictionary pre-annotator to Natural Language Understanding

I'm getting started with Knowledge Studio and Natural Language Understanding.
I'm able to deploy a machine-learning model toNatural Language Understanding and use the API to query it.
I would know if there's a way to deploy only the pre-annotator.
I read from Knowledge Studio's documentation that
You can deploy or export a machine-learning annotator. A dictionary pre-annotator can only be used to pre-annotate documents within Watson Knowledge Studio.
Does exist a workaround to create a model that simply does the job of the pre-annotator, i.e. use dictionaries to find entities instead of the machine-learning model?
Does exist a workaround to create a model that simply does the job of the pre-annotator, i.e. use dictionaries to find entities instead of the machine-learning model?
You may need to explain this better in what you need.
WKS allows you to pre-annotate documents with dictionaries you upload. Once you have created a ML model, you can alternatively use that to annotate your training documents, and then manually correct. As you continue the amount of manual work will reduce after each model iteration.
The assumption is that you are creating a model with a reasonable amount of examples. In your model results, you will want the mention/relations to be outside or close to outside the gray area of the report.
The other interpretation of your request I took was you want to create a dictionary based model only. This is possible using the "Rule-Based Model" functionality. You would have to create the parsing rules but you just map what you want to find to the dictionary/rule.
Using this in production though is still limited. You should get a warning when you deploy these kinds of models.
It's slightly better than just a keyword search as you can map items to parts of speech.
The last point. The purpose of WKS is to create a machine learning model which will do the work in discovering new terms you haven't seen before. With the rule based engine it can only find what you explicitly tell it to find.
If all you want is just dictionary entries, then you can create a very simple string comparison solution, but you lose the linguistic features.

Group multiple simulink Bus Objects into structures

Short version
I am considering to use BusObjects to implement hard interface control on a (large industrial) application using Simulink and I would like to store the BusObjects (hundrends of them) into a Matlab structure so that the entire application interface specification is well organized. However, it seems that BusObjects cant be contained into structures, nor they can reside on other workspaces other than Matlab Base. Any idea on how to handle this?
Long version
I would like the interfaces specification to be hierarchical and centralized in some way. I mean, I would like to specify the external interface of my application, then the internal interfaces, then the internal interfaces of the internal interfaces and so on. And I would like this information to be stored in one object that resembles the hierarchy. I was thinking in using an structure with BusObjects as elements.
Unfortunately, it seems that, for a bus object to work, it must be declared on the Matlab workspace as an independent variable of class BusObject. It cant be an element of an structure that is a BusObject, or an element of a cell whose elements are BusObjects or an element of a BusObject vector.
Any suggestion on how to handle this? take into account that if you have a model with dozens and dozens of blocks and more than 3 hierarchy levels, then you end up with hundreds of bus objects in the Matlab workspace without any particular structure... I think that is too messy to let it be...
Bus objects are always stored in the global workspace.
Send a request to Mathworks if you want to change this.

Implement "Did you mean?" with Core Data

I'm working on an iOS app. I have a Core Data database with a lot of company names.
When the user insert a company name that does not exist, I would like to show "similar" company names. For example, if the user entered "Aple", I would like to show "Did you mean Apple?".
I know that the technique of finding strings that match a pattern approximately (rather than exactly) is called approximate string matching or, colloquially, fuzzy string searching.
In theory, there are many algorithms, more or less valid: the Levenshtein distance computing algorithm and so on.
But in practice, is there someone who has already implemented something similar that can be used easily with core data?
I found a solution. Use this NSString's category available on GitHub: NSString-DamerauLevenshtein.
Try looking at Soundex, I believe that is part of the core featureset for SQLite, if that is your underlying data store.