There is a certain organization that periodically provides information in the form of a recorded message on a "hotline". Is there any open source solution (or set of components that could be "wired" together) that would allow me to present this information in text form on a web page?
Since it's the really easy part, I'm going to assume you can fetch the audio from the "hotline", i.e. you have direct access to the actual audio samples.
The hard part is transcribing the audio. You can start by having a look at Wikipedia and follow the links from there. One solution you could use would be CMU Sphinx. Google and other related search tools such as Google Scholar are likely to become your close friends :)
While there are a number of voice recognition engines available, their accuracy is far from perfect.
Related
I want to code an artificial intelligence. To teach her the language I can use Wikipedia offline but for teaching her communication I need other sources. Do you know big data sources which fit to this task and are free available? For example chat protocolls, mails, content of forums or something similar?
for teaching her communication I need other sources.
Some ideas:
Search video transcriptions on youtube (you may need to edit them for quality)
Search in your country political debates transcriptions (they maybe available for free on internet)
Search for theater plays dialogs in public domain
I would like to fetch some data from Google Scholar automatically via a matlab script. I am mostly interested in data like Google Scholar's Bibtex entries and the forward citation feature. However, it seems that there is no API for Google Scholar, is there a way to automatically fetch bibliographic data from Google Scholar using Matlab? Are there some tools or code already available for this?
A word of caution I found while working further on this project.
There is a reason why Google Scholar does not have an API. Using bots to collect from Google Scholar is against the EULA. The basic idea is that any program that tries to interface with Google Scholar cannot do so in a qualitatively different way than an end user. In other words, you can automatically fetch large amounts of data. Although the script in #JustinPeel's answer do not necessarily violate the terms, putting it in a massive loop, would.
Some specific points from this EULA:
You shall not, and shall not allow any third party to: ...
(i) directly or indirectly generate queries, or impressions of or clicks on Results, through any automated, deceptive, fraudulent or other invalid means (including, but not limited to, click spam, robots, macro programs, and Internet agents);
...
(l) "crawl", "spider", index or in any non-transitory manner store or cache information obtained from the Service (including, but not limited to, Results, or any part, copy or derivative thereof);
If you look at the Google Scholar robots.txt then you can also see that no bots of any kind are allowed.
I have heard from some colleagues that you will get in trouble if you try to circumvent this policy, which can result in your lab losing access to Google Scholar.
If you really want to use Matlab for this (which I don't really advise), then you can look at some various web scraping examples and there is this code that actually already gets some info from Google Scholar. Basically, just good 'matlab web scraping' and off you go.
I personally would recommend using Python for this because Python is better for general programming IMHO. For instance, this guy has already done a similar thing to what you want with Python. However, if you know Matlab and don't have any interest/time for Python then follow the links in the first paragraph.
Whats the best way to start to train an end user in a CMS like DOTNETNUKE?
The end user will want to add edit and delete there own content. They will need to install modules and understand how everything works?
Should i create a manual? is there a way to plan some training?
any ideas?
edit: the end users are VERY I.T illiterate, they struggled to even understand the rich text editor. I need to train them on how to use the form and list module and the HTML module for editting content. They want a document of some sort, this is really old school.
PD24, for what most customers do it usually only takes 5-10 minutes of training. I usually create a couple Jing Videos which is a free screen and audio recording tool. I go through and do voice over as I create a page, edit text, add photos, add modules and record it. Then I send them the links they can reference if they ever need a reminder.
Works great! (boooo to manuals, no one reads those and they take a lot of time to make!)
& DNNcreative is probably too detailed for your client, that's a good resource for DNN implementers.
We have a variety of videos in the video library on DotNetNuke.com you could point users to those for specific topics.
We (DotNetNuke Corp) also provide custom training solutions, we could develop a custom training program for your client that fits the scope of your project and delivery requirements. If you want more info feel free to email me at training#dnncorp.com.
Have a look into www.dnncreative.com, they have some awesome tutorials for developers and users.
I'm nearing the completion of migrating our existing website to a CMS and I've just finished creating all the various contact forms. The CMS I'm using has CAPTCHA built into it's form builder, which is great, but the only method available is the "decipher-the-noisy-image" method.
This approach works well, but it limits access for people who might have reading or sight disabilities. I've worked around this by having a "help" page which allows those with disabilities to contact us by telephone and I'm considering having a single-field form which says "Send us your email address and we'll contact you". Accessibility is of particular importance to me as a web developer, but from an organisational perspective; so is reducing the amount of form spam we receive.
So what I'd like to know is, has anyone in the community had any experience with other CAPTCHA methods and how have you managed to make them accessible to people with disabilities?
As a blind person I find that recaptcha is one of the better CAPTCHA services out there as far as an audio option. The issue with using sms as the only alternative is the fact that many visually impaired users don't have cell phones that allow them to read text messages.
A good captcha, like reCAPTCHA, usually includes an audio CAPTCHA. Also I have seen a site that will
send a SMS message and you enter the code in the sms (Google-gmail will do this).
I am very interested in this because I am implementing a CAPTCHA in jQuery right now.
Many sites, including this one I believe, have an option to play noisy audio with embedded spoken numbers, as an audio equivalent to the traditional CAPTCHA image.
I find the result pretty spooky, actually. Reminds me of numbers stations.
As Michael said, audio with each character of the CAPTCHA text spoken for better or worse is a common option provided. If your CMS is PHP-based or if PHP is available on the hosting infrastructure you are using anyway, here's an open source CAPTCHA application with an audio download option:
http://www.phpcaptcha.org/
I've implemented a production site with phpcaptcha, and it works as advertised.
I'm looking for a presentation, PDF, blog post, or whitepaper discussing the technical details of how to filter down and display massive amounts of information for individual users in an intelligent (possibly machine learning) kind of way. I've had coworkers hear presentations on the Facebook news feed but I can't find anything published anywhere that goes into the dirty details. Searches seem to just turn up the controversy of the system. Maybe I'm not searching for the right keywords...
#AlexCuse I'm trying to build something similar to Facebook's system. I have large amounts of data and I need to filter it down to something manageable to present to the user. I cannot use another website due to the scale of what I've got to work at. Also I just want a technical discussion of how to implement it, not examples of people who have an implementation.
Are you looking for something along the lines of distributed pub/sub with content based filtering? If so, you may want to look into Siena and some of the associated papers such as Design and Evaluation of a Wide-Area Event Notification Service