Attention Text Generation in Character-by-Character fashion - neural-network

I am searching the web for a couple of days for any text generation model that would use only attention mechanisms.
The Transformer architecture that made waves in the context of Seq-to-Seq models is actually based solely on Attention mechanisms but is mainly designed and used for translation or chat bot tasks so it doesn't fit to the purpose, but the principle does.
My question is:
Does anyone knows or heard of a text generation model based solely on Attention without any recurrence?
Thanks a lot!
P.S. I'm familiar with PyTorch.

Building a character-level self-attentive model is a challenging task. Character-level models are usually based on RNNs. Whereas in a word/subword model, it is clear from the beginning what are the units carrying meaning (and therefore the units the attention mechanism can attend to), a character-level model needs to learn word meaning in the following layers. This makes it quite difficult for the model to learn.
Text generation models are nothing more than conditional languages model. Google AI recently published a paper on Transformer character language model, but it is the only work I know of.
Anyway, you should consider either using subwords units (as BPE, SentencePiece) or if you really need to go for character level, use RNNs instead.

Related

What is currently the best way to add a custom dictionary to a neural machine translator that uses the transformer architecture?

It's common to add a custom dictionary to a machine translator to ensure that terminology from a specific domain is correctly translated. For example, the term server should be translated differently when the document is about data centers, vs when the document is about restaurants.
With a transformer model, this is not very obvious to do, since words are not aligned 1:1. I've seen a couple of papers on this topic, but I'm not sure which would be the best one to use. What are the best practices for this problem?
I am afraid you cannot easily do that. You cannot easily add new words to the vocabulary because you don't know what embedding it would get during training. You can try to remove some words, or alternatively you can manually change the bias in the final softmax layer to prevent some words from appearing in the translation. Anything else would be pretty difficult to do.
What you want to do is called domain adaptation. To get an idea of how domain adaptation is usually done, you can have a look at a survey paper.
The most commonly used approaches are probably model finetuning or ensembling with a language model. If you want to have parallel data in your domain, you can try to fine-tune your model on that parallel data (with simple SGD, small learning rate).
If you only have monolingual data in the target language, you train a language model on that data. During the decoding, you can mix the probabilities from the domain-specific language and the translation model. Unfortunately, I don't know of any tool that could do this out of the box.

Translating a State Transition System to properties LTL formulae

In the context of bounded model checking, one describes the system as a State Transition System and the properties that need to be checked.
When one needs to provide multiple system descriptions and properties to the Model Checker Tool, it can become tedious to write the property by hand. In my case, I use some temporal logic.
How does one automate the process of translating/parsing the system description and deriving verifiable properties from it (ideally, a set of Initial states, Transitions, Set of States).
For example, consider the Microwave Example given here Given such a system description, how can I arrive at the specifications in an efficient manner?
There is no such open source tool that I know of, that can do this. Any approaches in terms of ideas, theories are welcome.
You can't automatically derive LTL formulae from automata as you suggest, because automata are more expressive than LTL formulae.
That leaves you with mainly two options: 1. find a verification tool which accepts specifications that are directly expressed as automata (I'm not sure which ones do, but I suspect it is worth checking SPIN and NuSMV for this feature.), or 2. use a meta-specification language that makes the writing of specifications easier; for example, https://www.isp.uni-luebeck.de/salt (doi: 10.1007/11901433_41) or IEE1850/PSL. While PSL is more a language definition for tool-implementors, SALT already offers a web front-end that translates your input straight into LTL.
(By the way, I find your approach methodologically challenging though: you're not supposed to derive formulae from your model, but from your initial system description as it is this very model which you're going to verify. But I am not a 100% sure, if I understood this point in your question correctly.)
I think properties of a system, e.g. Microwave system, come from technical and common sense expectation and requirements, not the model. E.g. microwave is supposed to cook the food. But it is not supposed to cook with door open. Nevertheless a repository of typical LTL pattern can be useful to define properties. It also lists properties along with more familiar regex and automata properties.
If you certain you still want to translate automata to LTL automatically check
https://mathoverflow.net/questions/96963/translate-a-buchi-automaton-to-ltl
Kansas Specification Property Repository
http://patterns.projects.cs.ksu.edu/documentation/patterns.shtml

Looking for examples where knowledge of discrete mathematics is helpful [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Inspired after watching Michael Feather's SCNA talk "Self-Education and the Craftsman", I am interested to hear about practical examples in software development where discrete mathematics have proved helpful.
Discrete math has touched every aspect of software development, as software development is based on computer science at its core.
http://en.wikipedia.org/wiki/Discrete_math
Read that link. You will see that there are numerous practical applications, although this wikipedia entry speaks mainly in theoretical terms.
Techniques I learned in my discrete math course from university helped me quite a bit with the Professor Layton games.
That counts as helpful... right?
There are a lot of real-life examples where map coloring algorithms are helpful, besides just for coloring maps. The question on my final exam had to do with traffic light programming on a six-way intersection.
As San Jacinto indicates, the fundamentals of programming are very much bound up in discrete mathematics. Moreover, 'discrete mathematics' is a very broad term. These things perhaps make it harder to pick out particular examples. I can come up with a handful, but there are many, many others.
Compiler implementation is a good source of examples: obviously there's automata / formal language theory in there; register allocation can be expressed in terms of graph colouring; the classic data flow analyses used in optimizing compilers can be expressed in terms of functions on lattice-like algebraic structures.
A simple example the use of directed graphs is in a build system that takes the dependencies involved in individual tasks by performing a topological sort. I suspect that if you tried to solve this problem without having the concept of a directed graph then you'd probably end up trying to track the dependencies all the way through the build with fiddly book-keeping code (and then finding that your handling of cyclic dependencies was less than elegant).
Clearly most programmers don't write their own optimizing compilers or build systems, so I'll pick an example from my own experience. There is a company that provides road data for satnav systems. They wanted automatic integrity checks on their data, one of which was that the network should all be connected up, i.e. it should be possible to get to anywhere from any starting point. Checking the data by trying to find routes between all pairs of positions would be impractical. However, it is possible to derive a directed graph from the road network data (in such a way as it encodes stuff like turning restrictions, etc) such that the problem is reduced to finding the strongly connected components of the graph - a standard graph-theoretic concept which is solved by an efficient algorithm.
I've been taking a course on software testing, and 3 of the lectures were dedicated to reviewing discrete mathematics, in relation to testing. Thinking about test plans in those terms seems to really help make testing more effective.
Understanding of set theory in particular is especially important for database development.
I'm sure there are numerous other applications, but those are two that come to mind here.
Just example of one of many many...
In build systems it's popular to use topological sorting of jobs to do.
By build system I mean any system where we have to manage jobs with dependency relation.
It can be compiling program, generating document, building building, organizing conference - so there is application in task management tools, collaboration tools etc.
I believe testing itself properly procedes from modus tollens, a concept of propositional logic (and hence discrete math), modus tollens being:
P=>Q. !Q, therefore !P.
If you plug in "If the feature is working properly, the test will pass" for P=>Q, and then take !Q as given ("the test did not pass"), then, if all these statements are factually correct, you have a valid, sound basis for returning the feature for a fix. By contrast, many, maybe most testers operate by the principle:
"If the program is working properly, the test will pass. The test passed, therefore the program is working properly."
This can be written as: P=>Q. Q, therefore P.
But this is the fallacy of "affirming the consequent" and does not show what the tester believes it shows. That is, they mistakenly believe that the feature has been "validated" and can be shipped. When Q is given, P may in fact either be true or it may be untrue for P=>Q, and this can be shown with a truth table.
Modus tollens is core to Karl Popper's notion of science as falsification, and testing should proceed in much the same way. We're attempting to falsify the claim that the feature always works under every explicit and implicit circumstance, rather than attempting to verify that it works in the narrow sense that it can work in some proscribed way.

Tools or programming libraries to visualize custom logic

I am looking for tools to aid in the visualization of custom business logic used to perform document generation. The logic is expressed as an object-oriented model consisting of a graph of decision points and rendering actions. The basic building blocks are relatively simple, but the overall decision tree is quite large and complex making it hard to visualize.
We are looking for suggestions on tools and/or graphing libraries that can be used to visually represent the decision tree and rendering actions. The choice of programming language is not critical (Delphi, C#, Java would be great) and we are able to easily extract the logic to XML or other data format as required. The preference is for something that will run under Windows and enable printing or PDF output of portions of the resulting diagram.
Requirements
Decision points can be simple yes/no or multiple outputs e.g. (yes, not, sometimes, always etc).
The decision points are linked to external business logic that exist elsewhere in the runtime environment. We need to label the graph node with the type of decision point (e.g. boolean) and string describing the business rule being used.
Rendering actions are linked to named content objects with optional merge variables and inline rendering logic. At a minimum we need to be able to label nodes with the name of the element and ideally also information about variables used to render the content.
We have considered building something around Visio or WinGraphViz, or perhaps using a third-party graphing/flowchart library. Any ideas or pointers would be greatly appreciated.
After some more digging I found WinGraphViz and DotXML to be the closest match to my requirements. I was previously unaware of the "record" element which allows me to render decisions in the logic flow in a clean and legible manner.
You can consider Morphir with the Elm frontend.
It is a solid tool for business logic modeling, and code generation.
Visualization is coming along as well.

Do you create your own code generators?

The Pragmatic Programmer advocates the use of code generators.
Do you create code generators on your projects? If yes, what do you use them for?
In "Pragmatic Programmer" Hunt and Thomas distinguish between Passive and Active code generators.
Passive generators are run-once, after which you edit the result.
Active generators are run as often as desired, and you should never edit the result because it will be replaced.
IMO, the latter are much more valuable because they approach the DRY (don't-repeat-yourself) principle.
If the input information to your program can be split into two parts, the part that changes seldom (A) (like metadata or a DSL), and the part that is different each time the program is run (B)(the live input), you can write a generator program that takes only A as input, and writes out an ad-hoc program that only takes B as input.
(Another name for this is partial evaluation.)
The generator program is simpler because it only has to wade through input A, not A and B. Also, it does not have to be fast because it is not run often, and it doesn't have to care about memory leaks.
The ad-hoc program is faster because it's not having to wade through input that is almost always the same (A). It is simpler because it only has to make decisions about input B, not A and B.
It's a good idea for the generated ad-hoc program to be quite readable, so you can more easily find any errors in it. Once you get the errors removed from the generator, they are gone forever.
In one project I worked on, a team designed a complex database application with a design spec two inches thick and a lengthy implementation schedule, fraught with concerns about performance. By writing a code generator, two people did the job in three months, and the source code listings (in C) were about a half-inch thick, and the generated code was so fast as to not be an issue. The ad-hoc program was regenerated weekly, at trivial cost.
So active code generation, when you can use it, is a win-win. And, I think it's no accident that this is exactly what compilers do.
Code generators if used widely without correct argumentation make code less understandable and decrease maintainability (the same with dynamic SQL by the way). Personally I'm using it with some of ORM tools, because their usage here mostly obvious and sometimes for things like searcher-parser algorithms and grammatic analyzers which are not designed to be maintained "by hands" lately. Cheers.
In hardware design, it's fairly common practice to do this at several levels of the 'stack'. For instance, I wrote a code generator to emit Verilog for various widths, topologies, and structures of DMA engines and crossbar switches, because the constructs needed to express this parameterization weren't yet mature in the synthesis and simulation tool flows.
It's also routine to emit logical models all the way down to layout data for very regular things that can be expressed and generated algorithmically, like SRAM, cache, and register file structures.
I also spent a fair bit of time writing, essentially, a code generator that would take an XML description of all the registers on a System-on-Chip, and emit HTML (yes, yes, I know about XSLT, I just found emitting it programatically to be more time-effective), Verilog, SystemVerilog, C, Assembly etc. "views" of that data for different teams (front-end and back-end ASIC design, firmware, documentation, etc.) to use (and keep them consistent by virtue of this single XML "codebase"). Does that count?
People also like to write code generators for e.g. taking terse descriptions of very common things, like finite state machines, and mechanically outputting more verbose imperative language code to implement them efficiently (e.g. transition tables and traversal code).
We use code generators for generating data entity classes, database objects (like triggers, stored procs), service proxies etc. Anywhere you see lot of repititive code following a pattern and lot of manual work involved, code generators can help. But, you should not use it too much to the extend that maintainability is a pain. Some issues also arise if you want to regenerate them.
Tools like Visual Studio, Codesmith have their own templates for most of the common tasks and make this process easier. But, it is easy to roll out on your own.
It is often useful to create a code generator that generates code from a specification - usually one that has regular tabular rules. It reduces the chance of introducing an error via a typo or omission.
Yes ,
I developed my own code generator for AAA protocol Diameter (RFC 3588).
It could generate structures and Api's for diameter messages reading from an XML file that described diameter application's grammar.
That greatly reduced the time to develop complete diameter interface (such as SH/CX/RO etc.).
in my opinion a good programming language would not need code generators because introspection and runtime code generation would be part of language e.g. in python metaclasses and new module etc.
code generators usually generate more unmanageable code in long term usage.
however, if it is absolutely imperative to use a code generator (eclipse VE for swing development is what I use at times) then make sure you know what code is being generated. Believe me, you wouldn't want code in your application that you are not familiar with.
Writing own generator for project is not efficient. Instead, use a generator such as T4, CodeSmith and Zontroy.
T4 is more complex and you need to know a .Net programming language. You have to write your template line by line and you have to complete data relational operations on your own. You can use it over Visual Studio.
CodeSmith is an functional tool and there are plenty of templates ready to use. It is based on T4 and writing your own temlate takes too much time as it is in T4. There is a trial and a commercial version.
Zontroy is a new tool with a user friendly user interface. It has its own template language and is easy to learn. There is an online template market and it is developing. Even you can deliver templates and sell them online over market.
It has a free and a commercial version. Even the free version is enough to complete a medium-scale project.
there might be a lot of code generators out there , however I always create my own to make the code more understandable and suit the frameworks and guidelines we are using
We use a generator for all new code to help ensure that coding standards are followed.
We recently replaced our in-house C++ generator with CodeSmith. We still have to create the templates for the tool, but it seems ideal to not have to maintain the tool ourselves.
My most recent need for a generator was a project that read data from hardware and ultimately posted it to a 'dashboard' UI. In-between were models, properties, presenters, events, interfaces, flags, etc. for several data points. I worked up the framework for a couple data points until I was satisfied that I could live with the design. Then, with the help of some carefully placed comments, I put the "generation" in a visual studio macro, tweaked and cleaned the macro, added the datapoints to a function in the macro to call the generation - and saved several tedious hours (days?) in the end.
Don't underestimate the power of macros :)
I am also now trying to get my head around CodeRush customization capabilities to help me with some more local generation requirements. There is powerful stuff in there if you need on-the-fly decision making when generating a code block.
I have my own code generator that I run against SQL tables. It generates the SQL procedures to access the data, the data access layer and the business logic. It has done wonders in standardising my code and naming conventions. Because it expects certain fields in the database tables (such as an id column and updated datetime column) it has also helped standardise my data design.
How many are you looking for? I've created two major ones and numerous minor ones. The first of the major ones allowed me to generate programs 1500 line programs (give or take) that had a strong family resemblance but were attuned to the different tables in a database - and to do that fast, and reliably.
The downside of a code generator is that if there's a bug in the code generated (because the template contains a bug), then there's a lot of fixing to do.
However, for languages or systems where there is a lot of near-repetitious coding to be done, a good (enough) code generator is a boon (and more of a boon than a 'doggle').
In embedded systems, sometimes you need a big block of binary data in the flash. For example, I have one that takes a text file containing bitmap font glyphs and turns it into a .cc/.h file pair declaring interesting constants (such as first character, last character, character width and height) and then the actual data as a large static const uint8_t[].
Trying to do such a thing in C++ itself, so the font data would auto-generate on compilation without a first pass, would be a pain and most likely illegible. Writing a .o file by hand is out of the question. So is breaking out graph paper, hand encoding to binary, and typing all that in.
IMHO, this kind of thing is what code generators are for. Never forget that the computer works for you, not the other way around.
BTW, if you use a generator, always always always include some lines such as this at both the start and end of each generated file:
// This code was automatically generated from Font_foo.txt. DO NOT EDIT THIS FILE.
// If there's a bug, fix the font text file or the generator program, not this file.
Yes I've had to maintain a few. CORBA or some other object communication style of interface is probably the general thing that I think of first. You have object definitions that are provided to you by the interface you are going to talk over but you still have to build those objects up in code. Building and running a code generator is a fairly routine way of doing that. This can become a fairly lengthy compile just to support some legacy communication channel, and since there is a large tendency to put wrappers around CORBA to make it simpler, well things just get worse.
In general if you have a large amount of structures, or just rapidly changing structures that you need to use, but you can't handle the performance hit of building objects through metadata, then your into writing a code generator.
I can't think of any projects where we needed to create our own code generators from scratch but there are several where we used preexisting generators. (I have used both Antlr and the Eclipse Modeling Framework for building parsers and models in java for enterprise software.) The beauty of using a code generator that someone else has written is that the authors tend to be experts in that area and have solved problems that I didn't even know existed yet. This saves me time and frustration.
So even though I might be able to write code that solves the problem at hand, I can generate the code a lot faster and there is a good chance that it will be less buggy than anything I write.
If you're not going to write the code, are you going to be comfortable with someone else's generated code?
Is it cheaper in both time and $$$ in the long run to write your own code or code generator?
I wrote a code generator that would build 100's of classes (java) that would output XML data from database in a DTD or schema compliant manner. The code generation was generally a one time thing and the code would then be smartened up with various business rules etc. The output was for a rather pedantic bank.
Code generators are work-around for programming language limitations. I personally prefer reflection instead of code generators but I agree that code generators are more flexible and resulting code obviously faster during runtime. I hope, future versions of C# will include some kind of DSL environment.
The only code generators that I use are webservice parsers. I personally stay away from code generators because of the maintenance problems for new employees or a separate team after hand off.
I write my own code generators, mainly in T-SQL, which are called during the build process.
Based on meta-model data, they generate triggers, logging, C# const declarations, INSERT/UPDATE statements, data model information to check whether the app is running on the expected database schema.
I still need to write a forms generator for increased productivity, more specs and less coding ;)
I've created a few code generators. I had a passive code generator for SQL Stored procedures which used templates. This generated generated 90% of our stored procedures.
Since we made the switch to Entity Framework I've created an active codegenerator using T4 (Text Template Transformation Toolkit) inside visual studio. I've used it to create basic repository partial classes for our entities. Works very nicely and saves a bunch of coding. I also use T4 for decorating the entity classes with certain Attributes.
I use code generation features provided by EMF - Eclipse Modeling Framework.
Code generators are really useful in many cases, especially when mapping from one format to another. I've done code generators for IDL to C++, database tables to OO types, and marshalling code just to name a few.
I think the point the authors are trying to make is that if you're a developer you should be able to make the computer work for you. Generating code is just one obvious task to automate.
I once worked with a guy who insisted that he would do our IDL to C++ mapping manually. In the beginning of the project he was able to keep up, because the rest of us were trying to figure out what to do, but eventually he became a bottleneck. I did a code generator in Perl and then we could pretty much do his "work" in a few minutes.
See our "universal" code generator based on program transformations.
I'm the architect and a key implementer.
It is worth noting that a significant fraction of this generator, is generated using this generator.
We uses Telosys code generator in our projects : http://www.telosys.org/
We have created it to reduce the development duration in recurrent tasks like CRUD screens, documentation, etc...
For us the most important thing is to be able to customize the generator's templates, in order to create new generation targets if necessary and to customize existing templates. That's why we have also created a template editor (for Velocity .vm files).
It works fine for Java/Spring/AngularJS code generator and can be adapt for other targets (PHP, C#, Python, etc )