I am planning to build speech to text program for my community.
It would be a new language which doesn't exist in google yet.
Probably, it's like
this IBM's tool.
( For instance, if I speak my language, speech to text program will text my language words. )
I know Javascript and PHP.
And I am still learning python.
1 - Can I build it with just a web knowledge such as Javascript and PHP?
2 - If you think it would be very difficult for me to develop that kind of tool. Which service or program should I use? or buy it?
if you want to build a tool that looks like the IBM demo (https://speech-to-text-demo.ng.bluemix.net) you are free to fork it on github and customize to point you to your own speech recognition system. The IBM demo page uses IBM Watson in the backend to process the speech and produce the hypotheses. If you want to build a recognition system for a new language maybe that should be your first step. Once you have that up and running you can fork the IBM demo project and make it point to your recognition engine. It is definitely a substantial amount of work that will require expertise in both speech recognition and web development.
Can I ask you a question? What language are you trying to target?
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I want to create a third-party plug-in for Serato (a software for DJs).
I searched in their site and I saw that Serato supports VST (VST2) plug-ins. So my question now is what should I read in order to create a VST plug-in?
Thank you in advance.
A good starting point would be the wikipedia site for VSTs, just to get the basics if you are not familiar with this technology, first you need to know the creators of the VSTs: Steinberg.
VST SDK is a set of C++ classes based around an underlying C API. The
SDK can be downloaded from their website.
Therefore I would recommend starting with something simple. Let’s review a few options:
JUCE
This technology is trending for a few reasons, like their homepage says:
With support for PC, Mac and Linux, JUCE is the perfect tool for
building powerful and complex applications. JUCE also supports the
development of plug-ins: VST, AU and AAX. Run your desktop
applications on mobile! One-click deployment to Android and iOS
(requires Android Studio and XCode) Adjust the user interface of your
application with the Projucer live coding engine Use the best audio
performance available on iOS and Android.
So the pros of this technology are the big community, multi-platform and that is free, at least for non-commercial developments (then if you want to sell it you have to pay). The cons would be that you need to have a little more than the basics of C++ to get started, fortunately there are a lot of tutorials on their page, youtube and the internet, the community is growing so if you have issues you can always ask.
SynthEdit and FL SynthMaker
If you don’t want to get into the code that fast you can start practicing with these, as they don’t require programming expertise, or only a few basics.
SynthEdit is a framework and a visual circuit design that allows you
to create your own synths with only drag & drop without programming.
Therefore giving you the flexibility of using your DSP algorithms
inside the modules.
This is cool if you want to start going quickly, this currently has a cost you can check on their official website.
FL SynthMaker, aka Flowstone, comes free with FL studio. It has a straightforward drag-and-drop graphical interface and a wide range of components. You can use it to code modules and DSP in Ruby and comes with loads of examples to get started quickly and its capacity to assist you in creating a prototype within a short time is a plus.
FLowstone is a programming application that is used to create virtual
instruments effects and computer control of external hardware without
the need to write basic code. The instruments and effects you create
in SynthMaker can be used in FL Studio as 'native' plugins and shared
with other FLowstone users.
MAX MSP
Max, also known as Max/MSP/Jitter, is a visual programming language for music and multimedia developed and maintained by San Francisco-based software company Cycling '74. Over its more than thirty-year history, composers, performers, software designers, researchers, and artists have used it to create recordings, performances, and installations.
The Max program is modular, with most routines existing as shared
libraries. An application programming interface (API) allows
third-party development of new routines (named external objects).
Thus, Max has a large user base of programmers unaffiliated with
Cycling '74 who enhance the software with commercial and
non-commercial extensions to the program. Because of this extensible
design, which simultaneously represents both the program's structure
and its graphical user interface (GUI), Max has been described as the
lingua franca for developing interactive music performance software.
SOUL
The SOUL project is creating a new language and infrastructure for
writing and deploying audio code. It aims to unlock improvements in
latency, performance, portability and ease-of-development that aren't
possible with the current mainstream techniques that are being used.
SOUL unlocks native-level speed, even when hosted from slower, safer
languages. The SOUL language makes audio coding more accessible and
less error-prone, enhancing productivity for both beginners and expert
professionals.
Maximilian
Is a cross-platform and multi-target audio synthesis and signal processing library. It was written in C++ and provides bindings to Javascript. It's compatible with native implementations for MacOS, Windows, Linux and iOS systems, and client-side browser-based applications. The main features are:
sample playback, recording and looping
support for WAV and OGG files.
a selection of oscillators and filters enveloping
multichannel mixing for 1, 2, 4 and 8 channel setups controller
mapping functions
effects including delay, distortion, chorus, flanging granular
synthesis, including time and pitch stretching atom synthesis
real-time music information retrieval functions: spectrum analysis,
spectral features, octave analysis, Bark scale analysis, and MFCCs
example projects for Windows and MacOS, using command line and
OpenFrameworks environments
example projects for Firefox and Chromium-based browsers using the
Web Audio API ScriptProcessorNode (deprecated!)
example projects for Chromium-based browsers using the Web Audio API
AudioWorklet (e.g. Chrome, Brave, Edge, Opera, Vivaldi)
Extras
A few months ago I found this community that is focused on audio programming. They also have a Youtube channel with hundreds of tutorials and a discord server where you can ask questions, and even show your projects or even get a job. If you are interested. It’s called the “The audio Programmer”
Hope this helps you get started. I know there are a lot of option out there and this might confuse you at the beginning but I hope this little guide helps you choose a good starting point depending on your needs and goals since every technology offers different things.
I have developed a chatbot using this with docker. it's working fine, and now I want to implement it in Hindi. I found that we can do that with fasttext followed this blog but unable to achieve. so how I can implement fasttext in rasanlu with docker?
You can implement your bot in Hindi with Rasa even without using Fasttext. The supervised_embeddings pipeline works out of the box for any whitespace tokenized language (which as far as I know is the case for Hindi). Read more about language support here
Alternatively, if you still want to use fasttext, that can be done since spacy allows you to load custom language models. If the blogpost isn't working for you, you can take a look at this github issue.
There's also a bunch of Rasa community members that have built bots in hindi, you can take a look at this in the community forum and see if they have any additional tips.
I am new to automation for Windows apps, apart from using basic Sikuli.
I checked some options like PyAutoGUI, PyWinAuto. Which is the best tool for automation of a 3rd party Windows application out of Sikuli, PyAutoGUI, PyWinAuto or any other tool?
Preferable - a cross platform tool (Mac and Windows)
If you need text properties based automation, pywinauto should help you. But there is no popular text-based cross-platform tool in Open Source field.
On macOs pyatom/ATOMac is good enough if you prefer Python (it requires some compilation during setup, but works well).
This is the big list of open source tools I'm maintaining.
PyAutoGUI has image recognition capabilities (like Sikuli or Lackey) but it's not text based (even no Win32 API support).
PyAutoIt bindings and AutoIt itself doesn't support MS UI Automation technology (only Win32 API).
The Getting Started Guide for pywinauto explains some differences between these 2 technologies and how to switch between them in pywinauto.
Anyway this field is complicated and you may face many challenges. Feel free to ask more detailed questions because this question is more suitable for
Software Recommendations StackExchange site.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am looking for an API for ios (free ideally) that will allow to do some speech recognition. I have seen few posts for this: iPhone speech recognition API? and free speech recognition engines for iOS? and after a bit of prospect i have gathered the sdk that looks quite interesting:
http://dragonmobile.nuancemobiledeveloper.com/public/index.php?task=home
http://www.politepix.com/openears
http://www.creaceed.com/ceedvocalsdk/ (not free :-\ )
http://www.ispeech.org/
is there any of those that really stand out of the crowd and quite recent? how do they really differentiate from each other?
If you want to track just few keywords, you should not look for speech recognition API or service. This task is called Keyword Spotting and it uses different algorithms than speech recognition. Speech recognition tries to find all the words that has been said and because of that it consumes way more resources than keyword spotting. Keyword spotter only tries to find few selected keywords or keyphrases. It's way simple and way less resource consuming.
The only possible solution to archive this funcitonality is to use open source package like OpenEars powered by Pocketsphinx
http://www.politepix.com/openears
Openears has Rejecto plugin that implements something similar.
Pocketsphinx itself has recently implemented open source effective keyword spotting too, but it didn't get into Openers yet. It's only available through pocketsphinx API, you need to create kws search and set the target word to look for. I hope soon this functionality will reach OpenEars too.
Nuance gives developers free access (but not for high volume) - See http://www.masshightech.com/stories/2011/09/26/daily13-Nuance-tweaks-mobile-dev-program-with-free-access-to-Dragon.html or http://dragonmobile.nuancemobiledeveloper.com/public/index.php?task=home
Nuance services are typically offered commercially and require up front fees and transaction fees. The interesting news above is that they now make low volume use of their services available to developers for free. So, for development, testing, and demonstration you can probably use the free Nuance services. However, unlike the Google services that come free in Android, if your app has thousands of users you will likely have to pay for Nuance services.
We have been developing CeedVocal SDK since 2008, it's based on Julius & FLite open source projects.
Here's some context: we wanted to make our app (Vocalia) for speech recognition back in 2008 and basically picked Julius (hesitated with Pocket Sphinx, which appears to be good as well) and optimized its file format so that it would boot in 1-2 sec instead of 20sec on the original iPhone. Then we dutifully trained our own acoustic models in 6 languages. We designed the API, and eventually decided to offer it to other developers as an SDK.
CeedVocal basically supports 2 modes of operation:
matching of words (or small phrases)
keyword spotting
In the first mode of operation, it tries to align the input speech to a word (or phrase) in its list of acceptable input. This forces the input to a pre-known word, even if the speech is something else. Accuracy is good. In the second mode of operation, it will try to pick one of its keywords into the stream of speech. This is a difficult case, and it can be less accurate.
Is it possible to write a Google Wave plugin that turns it into an IDE for programming? With such an extension, Google Wave would be a replacement for Eclipse etc., and it would naturally be a code repository at the same time (replacing SVN, git, etc.).
Users (programmers) would be able to create code files directly in Wave and add collaborators to do pair programming etc. The whole codebase would live in a Wave folder, and an extension would do the building and compiling on the fly.
How would one go about writing such an extension?
Have you looked at the CodeRun IDE? Except for the collaboration aspect of google wave, or coding non-web apps, this might be ideal.
I expect coderun will become more collaborative as time goes on.
I like the idea and to be honest had thought of something similar myself. Stability and speed of Google Wave are valid points of course.
I see this being built over the Wave protocol than over Wave. I mean strip out some of the features of the Google Wave product but keep the underlying principles of Waves and collaboration.
i definitely would want google to get an IDE, they have mostly everything already:
- Appengine
- Appengine SDK
- GWT
- Google code
- Google apps (docs, talk, mail, sites)
They just need to put it all together in a single IDE and make it online based (webapp) with offline capabilities, maybe HTML5 or gears so you can use it no matter the browser or OS you use. Then this IDE connects to all the online tools mentioned before and you just have to write, junit test, run test, make some how-to's and wikis and commit. CodeRun is a good starter.
This is one of the first things which popped into my head when I recently watched the presentations from Google IO, and for Wave in particular. The demonstration of collaborating on the Sudoku app is a good example. The ability to replay conversations was particularly interesting and has obvious uses in developing as a team.