What is currently the best way to add a custom dictionary to a neural machine translator that uses the transformer architecture?

What is currently the best way to add a custom dictionary to a neural machine translator that uses the transformer architecture? - neural-network

It's common to add a custom dictionary to a machine translator to ensure that terminology from a specific domain is correctly translated. For example, the term server should be translated differently when the document is about data centers, vs when the document is about restaurants.
With a transformer model, this is not very obvious to do, since words are not aligned 1:1. I've seen a couple of papers on this topic, but I'm not sure which would be the best one to use. What are the best practices for this problem?

I am afraid you cannot easily do that. You cannot easily add new words to the vocabulary because you don't know what embedding it would get during training. You can try to remove some words, or alternatively you can manually change the bias in the final softmax layer to prevent some words from appearing in the translation. Anything else would be pretty difficult to do.
What you want to do is called domain adaptation. To get an idea of how domain adaptation is usually done, you can have a look at a survey paper.
The most commonly used approaches are probably model finetuning or ensembling with a language model. If you want to have parallel data in your domain, you can try to fine-tune your model on that parallel data (with simple SGD, small learning rate).
If you only have monolingual data in the target language, you train a language model on that data. During the decoding, you can mix the probabilities from the domain-specific language and the translation model. Unfortunately, I don't know of any tool that could do this out of the box.

Related

Attention Text Generation in Character-by-Character fashion

I am searching the web for a couple of days for any text generation model that would use only attention mechanisms.
The Transformer architecture that made waves in the context of Seq-to-Seq models is actually based solely on Attention mechanisms but is mainly designed and used for translation or chat bot tasks so it doesn't fit to the purpose, but the principle does.
My question is:
Does anyone knows or heard of a text generation model based solely on Attention without any recurrence?
Thanks a lot!
P.S. I'm familiar with PyTorch.

Building a character-level self-attentive model is a challenging task. Character-level models are usually based on RNNs. Whereas in a word/subword model, it is clear from the beginning what are the units carrying meaning (and therefore the units the attention mechanism can attend to), a character-level model needs to learn word meaning in the following layers. This makes it quite difficult for the model to learn.
Text generation models are nothing more than conditional languages model. Google AI recently published a paper on Transformer character language model, but it is the only work I know of.
Anyway, you should consider either using subwords units (as BPE, SentencePiece) or if you really need to go for character level, use RNNs instead.

Translating a State Transition System to properties LTL formulae

In the context of bounded model checking, one describes the system as a State Transition System and the properties that need to be checked.
When one needs to provide multiple system descriptions and properties to the Model Checker Tool, it can become tedious to write the property by hand. In my case, I use some temporal logic.
How does one automate the process of translating/parsing the system description and deriving verifiable properties from it (ideally, a set of Initial states, Transitions, Set of States).
For example, consider the Microwave Example given here Given such a system description, how can I arrive at the specifications in an efficient manner?
There is no such open source tool that I know of, that can do this. Any approaches in terms of ideas, theories are welcome.

You can't automatically derive LTL formulae from automata as you suggest, because automata are more expressive than LTL formulae.
That leaves you with mainly two options: 1. find a verification tool which accepts specifications that are directly expressed as automata (I'm not sure which ones do, but I suspect it is worth checking SPIN and NuSMV for this feature.), or 2. use a meta-specification language that makes the writing of specifications easier; for example, https://www.isp.uni-luebeck.de/salt (doi: 10.1007/11901433_41) or IEE1850/PSL. While PSL is more a language definition for tool-implementors, SALT already offers a web front-end that translates your input straight into LTL.
(By the way, I find your approach methodologically challenging though: you're not supposed to derive formulae from your model, but from your initial system description as it is this very model which you're going to verify. But I am not a 100% sure, if I understood this point in your question correctly.)

I think properties of a system, e.g. Microwave system, come from technical and common sense expectation and requirements, not the model. E.g. microwave is supposed to cook the food. But it is not supposed to cook with door open. Nevertheless a repository of typical LTL pattern can be useful to define properties. It also lists properties along with more familiar regex and automata properties.
If you certain you still want to translate automata to LTL automatically check
https://mathoverflow.net/questions/96963/translate-a-buchi-automaton-to-ltl
Kansas Specification Property Repository
http://patterns.projects.cs.ksu.edu/documentation/patterns.shtml

Are Operational Transformation Frameworks only meant for text?

Looking at all the examples of Operational Transformation Frameworks out there, they all seem to resolve around the transformation of changes to plain text documents. How would an OT framework be used for more complex objects?
I'm wanting to dev a real-time sticky notes style app, where people can co-create sticky notes, change their positon and text value. Would I be right in assuming that the position values wouldn't be transformed? (I mean, how would they, you can't merge them right?). However, I would want to use an OT framework to resolve conflicts with the posit-its value, correct?

I do not see any problem to use Operational Transformation to work with Complex Objects, what you need is to define what operations your OT system support and how concurrency is solved for them
For instance, if you receive two Sticky notes "coordinates move operation" from two different users from same 'client state', you need to make both states to converge, probably cancelling out second operation.
This is exactly the same behaviour with text when two users generate two updates to delete a text range that overlaps completely, (or maybe partially), the second update processed must be transformed against the previous and the resultant operation will only effectively delete a portion of the original one, (or completely cancelled with a 'no-op')
You can take a look on this nice explanation about how Google Wave Operational Transformation works and guess from this point how it should work your own implementation

See the following paper for an approach to using OT with trees if you want to go down that route:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.74
However, in your particular case, I would use a separate plain text OT document for each stickynote and use an existing library, eg: etherPad, to do the heavy lifting. The positions of the notes could then be broadcast on a last-committer-wins basis.

Operation Transformation is a general technique, it works for any data type. The point is you need to define your transformation functions. Also, there are some atomic attributes that you cannot merge automatically like (position and background color) those will be mostly "last-update wins" or the user solves them manually when there is a conflict.
there are some nice libs and frameworks that provide OT for complex data already out there:
ShareJS : library for Node which provides all operations on JSON objects
DerbyJS: framework for NodeJS, it uses ShareJS for OT stuff.
Open Coweb framework : Dojo foundation project for cooperative web applications using OT

machine learning and code generator from strings

The problem: Given a set of hand categorized strings (or a set of ordered vectors of strings) generate a categorize function to categorize more input. In my case, that data (or most of it) is not natural language.
The question: are there any tools out there that will do that? I'm thinking of some kind of reasonably polished, download, install and go kind of things, as opposed to to some library or a brittle academic program.
(Please don't get stuck on details as the real details would restrict answers to less generally useful responses AND are under NDA.)
As an example of what I'm looking at; the input I'm wanting to filter is computer generated status strings pulled from logs. Error messages (as an example) being filtered based on who needs to be informed or what action needs to be taken.

Doing Things Manually
If the error messages are being generated automatically and the list of exceptions behind the messages is not terribly large, you might just want to have a table that directly maps each error message type to the people who need to be notified.
This should make it easy to keep track of exactly who/which-groups will be getting what types of messages and to update the routing of messages should you decide that some of the messages are being misdirected.
Typically, a small fraction of the types of errors make up a large fraction of error reports. For example, Microsoft noticed that 80% of crashes were caused by 20% of the bugs in their software. So, to get something useful, you wouldn't even need to start with a complete table covering every type of error message. Instead, you could start with just a list that maps the most common errors to the right person and routes everything else to a person for manual routing. Each time an error is routed manually, you could then add an entry to the routing table so that errors of that type are handled automatically in the future.
Document Classification
Unless the error messages are being editorialized by people who submit them and you want to use this information when routing them, I wouldn't recommend treating this as a document classification task. However, if this is what you want to do, here's a list of reasonably good packages for document document classification organized by programming language:
Python - To do this using the Python based Natural Language Toolkit (NLTK), see the Document Classification section in the freely available NLTK book.
Ruby - If Ruby is more of your thing, you can use the Classifier gem. Here's sample code that detects whether Family Guy quotes are funny or not-funny.
C# - C# programmers can use nBayes. The project's home page has sample code for a simple spam/not-spam classifier.
Java - Java folks have Classifier4J, Weka, Lucene Mahout, and as adi92 mentioned Mallet.
Learning Rules with Weka - If rules are what you want, Weka might be of particular interest, since it includes a rule set based learner. You'll find a tutorial on using Weka for text categorization here.

Mallet has a bunch of classifiers which you can train and deploy entirely from the commandline
Weka is nice too because it has a huge number of classifiers and preprocessors for you to play with

Have you tried spam or email filters? By using text files that have been marked with appropriate categories, you should be able to categorize further text input. That's what those programs do, anyway, but instead of labeling your outputs a 'spam' and 'not spam', you could do other categories.
You could also try something involving AdaBoost for a more hands-on approach to rolling your own. This library from Google looks promising, but probably doesn't meet your ready-to-deploy requirements.

Tools or programming libraries to visualize custom logic

I am looking for tools to aid in the visualization of custom business logic used to perform document generation. The logic is expressed as an object-oriented model consisting of a graph of decision points and rendering actions. The basic building blocks are relatively simple, but the overall decision tree is quite large and complex making it hard to visualize.
We are looking for suggestions on tools and/or graphing libraries that can be used to visually represent the decision tree and rendering actions. The choice of programming language is not critical (Delphi, C#, Java would be great) and we are able to easily extract the logic to XML or other data format as required. The preference is for something that will run under Windows and enable printing or PDF output of portions of the resulting diagram.
Requirements
Decision points can be simple yes/no or multiple outputs e.g. (yes, not, sometimes, always etc).
The decision points are linked to external business logic that exist elsewhere in the runtime environment. We need to label the graph node with the type of decision point (e.g. boolean) and string describing the business rule being used.
Rendering actions are linked to named content objects with optional merge variables and inline rendering logic. At a minimum we need to be able to label nodes with the name of the element and ideally also information about variables used to render the content.
We have considered building something around Visio or WinGraphViz, or perhaps using a third-party graphing/flowchart library. Any ideas or pointers would be greatly appreciated.

After some more digging I found WinGraphViz and DotXML to be the closest match to my requirements. I was previously unaware of the "record" element which allows me to render decisions in the logic flow in a clean and legible manner.

You can consider Morphir with the Elm frontend.
It is a solid tool for business logic modeling, and code generation.
Visualization is coming along as well.