Automatically generate intents and entities from a given sentence(s) in Rasa bot - chatbot

While training the bot with Rasa NLU, we using a training data file nlu.yml in which all intents and their examples are mentioned. Example: I'm making an FAQ bot on invention questions. Here's a part of the training data.
- intent: invention_disclosure_length
examples: |
- How much information is too much in an invention disclosure form?
- How long can an invention disclosure be?
- What's ideal length for an invention disclosure?
- Is overstating the idea in invention disclosure okay?
- How much information should I add to invention disclosure?
- intent: submit_invention_disclosure
examples: |
- How do I submit my invention disclosure once I have completed the form?
- Steps to submit invention disclosure.
- I have completed the form. How do I submit it?
- intent: sample_invention_disclosure
examples: |
- Are there sample invention disclosure forms to see as reference?
- Please share some sample invention disclosure forms.
- Help me with some samples of invention disclosures.
- intent: inventor
examples: |
- Who would be considered an inventor?
- Who is an inventor?
- What do you mean by an inventor?
- Please explain the term inventor.
Usually we need to create this manually, identifying and writing each intent on our own. Is there a way/tool in which I can just give the sentence/paragraph and it can automatically return me the possible intents and entities?

Related

Google Places API : Is the option to switch off location biasing been removed from auto complete API?

Previously there was an option to disable location biasing by passing location=0,0 & radius=20000000, now this option is no longer available on the API docs. There is also change in the results of autocomplete API with location=0,0 & radius=20000000.
I had success turning off biasing by using location=0,0 and radius=1.
location=0,0 and radius=20000000 seemed to have the effect of biasing everything. That configuration resulted in some odd results like searching for "belize" and being returned only cities and provinces in Belize and not the country of Belize.
I have also noticed this and it behaves strange. For eg: when you search with input text "New" and location=0,0&radius=20000000, you will get
"New No. 142, Velachery Road, Next To Raptakos Brett & Co, opp. Maruthi Driving School, Indira Gandhi Nagar, Velachery, Chennai, Tamil Nadu, India". This is not at all justifiable when there are many other popular cities like New York is present in the bottom of the autocomplete list. An answer on this is appreciated.
https://maps.googleapis.com/maps/api/place/autocomplete/json?input=New&key=&location=0,0&radius=20000000
Created a ticket with google support and waiting for their answer.

Statistics about "Microformat vs HTML+RDFa" adoption

Are there some recent and reliable statistics about "Web use" (webpages using one standard or another) of these standards?
Or an specific statistic about vCard (person and/or organization) scope of use?
Only statistics, this question is not about "what the best ideia?" or "how to use it?". Looking for statistics numbers to compare Microformats adoption with (any kind of) RDFa in HTML adoption.
We can considere, for "counting pages" statistics, that Microdata is a kind of RDFa-HTML.
NOTES
Explain context
The RDFa Lite is the only W3C recommendation, when we talk about "Microdata vs Microformat", and Microdata have a better map to RDFa Lite. HTML5 has become a W3C Recommendation in 2014-10-28, and neither one was blessed by W3C. I understand that schema.org is the best way to adopt (reuse community-schemas) RDFa.
By other hand Microformats is older, and the most simple; so, perhaps, the most used in the Web (!? is it?).
About "vCard data statistics"
If we need some scope for the statistics, let's use vCard as scope:
Microformat's hCard and h-Card are standards for display vCards on (any) HTML, and was used for people and organizations.
schema.org's Person and Organization encodes vCard information with (standard) RDFa Lite or Microdata.
Other notes
Wikipedia express an old (2012's) and not-confirmable assertion (no source!), "Microformats such as hCard, however, continue to be published more than schema and others on the web", and Webdatacommons is a mess, no statistical report.
(edit) now Wikipedia's citation error is fixed.
(edit after #sashoalm comment)
Note for those who disagree that this question is valid.
This question is a software problem, not a "request for off-site resource"...
PROBLEM: to decide what library, framework, data-model, etc. in a project, we need to use tools that are in use today and in the next few years... To make project decisions in a software development, we need statistics about user tendency, framework adoption, etc.
PS: here in Stackoverflow there are a lot of discussions about language statistics, that is the same "set of problems". Example: 1, 2, 3,4, 5, 6. See also the questions tagged with [usage-statistics].
Now I see, there are some statistics (!!), the link of Wikipedia was lost... I corrected. It isn't updated, is from "Winter 2013" (~1.5 or 2 years old collected data), but show reality and tendencies.
http://webdatacommons.org/structureddata/index.html#toc2
This is the chart at the report (with RDFa+HTML dominance!):
Interpreting:
the section 5, "Extraction Process", say that "on each page, we run our RDF extractor based on the Anything To Triples (Any23) library", so all (RDF and Microformat) resulted in "triples" (not only RDF).
The ideia for "per domain" statistics is that domains use uniform politics for all pages... But I think this uniformity is false, only few pages per domain adopt "semantic markup" ... It is not more unbiased than URLs, is only another picture. Anyway, the outcome was dead heat, ~57% vs 43%.
Only 21% of the "semantic markup URLs" of 2013 was Microformat, all other are RDFa-HTML (Microdata is also a kind of RDFa).
using the average of percentuals of Domains (Ds) and URLs (Us), (Ds+Us)/2, the outcome is ~60% for RDFs and ~40% for Microformats.
before 2013 there was a dominance of Microformats, so, is evident the big growing of "RDFa-HTML" since 2011... The tendency is clear.
If we adopt the arithmetic mean of "per domain" and "per URL" countings, we have Microformats and RDFa-HTML near each other, with but with little less Microformat (and the strong tendency to RDFa-HTML grow in 2014).
Here a table for #sashoalm discussion, showing the percentuals and totals
NOTE1: HTML5 was released only 2014-10-28, so only ~2015-10 we will can check the real (definitive) impact of the new standard on the Web. An important expected impact is that Microdata not was blessed by HTML5, so the only standard is HTML+RDFa (that recommends RDFa Lite)... In the future perhaps there will less Microdata and more schema.org.
NOTE2: methodological problem of counting web-pages, of boilerplate text with some huge-cloned "semantic markup": I think that the "next generation" of statiscs can use some "per domain analisys" to make URL substatistics (sampling) of diversity (of semantically marked pages). Ideal is to weigh (p. ex. count once the non-clones and use 1+SQRT(count) of clones) the boilerplate.
Conclusion
Today perhaps some people use Microformat, but there are more pages in the Web using RDFa-HTML (Microdata, RDFa, RDFa Lite, etc.), and the tendency is to grow.
If your project is for next years, the statistics say to use RDFa.
NOTE
Another insteresting counting for RDFa is not the use, but the reuse of vocabularies (!). See Linked Open Vocabularies (LOV)
The last statistics from the WebDataCommons as follows:
Source: http://webdatacommons.org/structureddata/2016-10/stats/stats.html
Number of domain parsed: 34 million pay-level-domains
Number of domains with RDFa, Microdata and Microformats: 5.63 million (16.5%)
Popularity of different formats:

Including reference links in markdown as bullet point list on GitHub

Currently I'm using this markdown text inside the README.md file of a project on GitHub:
See the docs of [testthat][3] on how to write unit tests.
Links
-----
- http://www.rstudio.com/products/rpackages/devtools/
- https://github.com/hadley/testthat
- http://r-pkgs.had.co.nz/tests.html
---
[1]: http://www.rstudio.com/products/rpackages/devtools/
[2]: https://github.com/hadley/testthat
[3]: http://r-pkgs.had.co.nz/tests.html
I don't like this duplication, but I don't see what choice I have. If I remove the explicit bullet point lists, then GitHub won't display the reference links. If I remove the reference links, then GitHub shows the bullet point list (of course), but the embedded links like "testthat" above don't work.
Is there a better way than duplicating? What am I missing?
Inspired by #mb21, I suppose this would be the right way to do it:
See the docs of [testthat][2] on how to write unit tests.
Links
-----
- [RStudio Devtools][1]
- [testthat][2]
- [More unit test examples][3]
[1]: https://stackoverflow.com/users/214446/mb21
[2]: https://github.com/hadley/testthat
[3]: http://r-pkgs.had.co.nz/tests.html
That is, it's not a good practice to include links verbatim and without a meaningful title. I should keep the link URLs only in the reference links section at the bottom, and in the bullet point list use meaningful titles.
When you view this on GitHub, the URLs shouldn't really matter, and if you really want to know you can move the mouse over. When you view this in plain text, now the links have meaningful titles, which is useful additional information about the URLs.
I'd write that as follows:
See the docs of [testthat][1] on how to write unit tests.
Links
-----
- [RStudio Devtools](http://www.rstudio.com/products/rpackages/devtools/)
- [Testthat](https://github.com/hadley/testthat)
- [Tests][1]
[1]: http://r-pkgs.had.co.nz/tests.html
Did that answer your question? If not, you'll have to clarify it.
An answer from 8 years in the future!
The answer to your question will depend on what your Markdown parser supports. Nowadays, most parsers support CommonMark (plus some flavouring). However, don't take this for granted and double check it. If CommonMark is not supported, try using the "vanilla" Markdown syntax as described below. Just be aware that the "vanilla" Markdown specification is flawed and may result in broken links (by design, almost).
Using CommonMark
If you can guarantee that your Markdown parser supports CommonMark, then you can do it in a simple way:
Writing unit tests is explained in the [Unit Testing] website
[Unit Testing]: https://unittesting.somedomain.com
In the Links section of the CommonMark specification (currently at version 0.30) you see that a "link" is composed of a link text, link destination and a title and each one has its own syntax and quirks. For example, if the link destination contains spaces, you need to wrap it in <angled brackets>, or if your link text is some kind of code, you're allowed to write
[`AwesomeClass`](<../docs/awesome class.md>]
Note:
In this section I am using the CommonMark syntax, so you can click the
"Edit" button to see the syntax that I used for a "real" example.
Using vanilla Markdown
The vanilla Markdown specification simply requires an extra set of angled brackets with nothing in between, as described in the links section.
Writing unit tests is explained in the [Unit Testing][] website
[Unit Testing]: https://unittesting.somedomain.com
Note
And in this section I've only used vanilla Markdown syntax. Stack Overflow's Markdown parser supports both CommonMark and vanilla Markdown. This is not by accident, since CommonMark intends to be compatible with the original specification (wherever possible!). Stack Overflow do state that they use the CommonMark spec in their Markdown help page.
Tl;dr
See the docs of [`testthat`] on how to write unit tests.
Links
-----
- [RT Studio dev tools]
- [`testthat`]
- [R Packages]
---
[RT Studio dev tools]: http://www.rstudio.com/products/rpackages/devtools/
[`testthat`]: https://github.com/hadley/testthat
[R Packages]: http://r-pkgs.had.co.nz/tests.html

LaTeX math in github wikis

Is it possible to include LaTeX-style math in any way with github repo wikis? Googling implies github no longer allows things like MathJax, but most references are years old. What (if any) alternatives are there to including LaTeX-formatted math in github wikis?
You can use chart.apis.google.com to render LaTeX formulas as PNG.
It work nicely with Githhub's markdown:
Example (Markdown):
The ratio of the momentum to the velocity is
the relativistic mass, m.
![f1]
And the relativistic mass and the relativistic
kinetic energy are related by the formula:
![f2]
Einstein wanted to omit the unnatural second term
on the right-hand side, whose only purpose is
to make the energy at rest zero, and to declare
that the particle has a total energy, which obeys:
![f3] which is a sum of the rest energy ![f4]
and the kinetic energy.
[f1]: http://chart.apis.google.com/chart?cht=tx&chl=m=\frac{m_0}{\sqrt{1-{\frac{v^2}{c^2}}}}
[f2]: http://chart.apis.google.com/chart?cht=tx&chl=E_k=mc^2-m_0c^2
[f3]: http://chart.apis.google.com/chart?cht=tx&chl=E=mc^2
[f4]: http://chart.apis.google.com/chart?cht=tx&chl=m_0c^2
https
Some installations of Github Enterprise reject http and work only if you use https
Rendered:
For simple formulas (such as exponents etc) you may one just to use the available render languages. For example, using Textile, you can do:
_E = mc ^2^_
Thiw will be rendered as:
_ is used for italic style and ^ for superscript.
You can do the same thing in Markdown adding some HTML:
*E = mc<sup>2</sup>*
You can see it in action in this very place:
E = mc2
If you're looking for support for complex math formulas, then you have no better option than using a third-party service generating images for you. mathUrl looks interesting.
As input we give it E = mc ^ 2 and it generates the following link:
http://mathurl.com/render.cgi?E%20%3D%20mc%20%5E%202%5Cnocache
There is good solution for your problem - use TeXify github plugin - more details about this plugin and explanation why this is good approach you can find in that answer.
GitLab Wiki and markdown supports formulas. I moved multiple repos for this reason.
Now GitHub officially supports showing Mathjax in the wiki!
In Markdown, just use $ as the deliminator of inline formulas, and $$ as the deliminator of display formulas.
To add math equations to a GitHub wiki, I used mathURL as suggested by Ionică. It will render your LaTeX equations. Append .png to the generated url and use that url as an image (either block or inline) in your markdown.

iCalendar RFC 5545 to string

Is there any opensource library for displaying human readable description for recurring events described by the above standard?
I want to obtain every 5 weeks on Thursday for rule RRULE:FREQ=WEEKLY;INTERVAL=5;BYDAY=TH
Language C# is preferred but not required at all
In JavaScript you might use rrule.js library, It's very useful for generating human readable texts from RRULE string.
You can try it out from this link.
BTW, as far as I know ical4j and google-rfc-2445 didn't have any functionality for generating human readable texts.
You could try :
DDay.iCal - .NET 2.0
ical4j - Java
google-rfc-2445 - Java
iCalcreator - PHP
php iCalendar - PHP
libical but it seems broken - C
pyICSParser - disclaimer: I'm the maintainer - python