final version of my request:how to convert speech to text - objective-c++

I asked about converting speech to text in c++ but my question seemed to be unclear and also the answer I got. I have already heard that we should have "speech" in a wave file. so what to do after that? I said I want to do it on my own, I mean without using windows's facilities and API.c++ libraries are allowed if there are. I need to know how it works, is it impossible?? that it will take a million years. there is no tutorial thing in Google about it. at least tell me some steps to light my path.what are the titles of things I have to do or learn? there are so many thing that are using converting speech to text, like i can say john to my mobile phone and it calls john. does not it recognize john? it seems to be a little bit easy so it is possible to learn not 'a million years'. I need to learn how to convert just some short sentences to string. for now the qualities like pronunciation is not important . what are the bases and where I can learn about them? look , i need to say one sentence to computer and then computer has to choose one the answers that I have already recorded, so I need to convert to string to compare that string with the recorded string sentences. please some one tell me what to do or what to search or where to look or where to go or who to see?

Related

Flutter Speech Recognition : how to get a percentage of "similarity" between the right pronunciation of a word and the user's pronunciation

As stated in the question : I am developing an app for French kids to learn English vocabulary. I have added a SpeechToText functionality using the SpeechToText package, which really works well.
However, I am hitting a hard rock now... One of the activities proposed to the students is simply "Listen and repeat", so that they progressively improve their pronunciation.
I thought of using the SpeechToText package as well for this... and it would work if the students pronounced the words quite well.... One example : The sound "TH" for a French speaker is problematic and is very often pronounced as a "Z"... so the app never really recognize a word like "Father"... it keeps thinking the user says "Fazza".
Is there a way to compare the "good pronunciation" of a word to what the user says... get a percentage of "similitude". I know we can compare strings that way.
Would anyone know of a solution for this issue ? Any advice ?
You can use the speechace APIs to get the following features:
Pronunciation Assessment
Fluency Assessment
Spontaneous speech assessment

How do we share full NetLogo models here when working on a problem?

I'm pretty new on this list, and trying to answer a question posted here on StackOverflow and i am wondering if there is some standard way to post entire models here, not just the "CODE" tab portion. Or for that matter even an image of the view screen.
I don't see any place to put attachments on posts or answers. I suppose I could upload the model to the NetLogo-users Google users-group, which does have a place to upload files, and cross-reference it here. I suppose I could try to open an email channel to the user and send them email with an attachment. I suppose I could post it on GitHub.
But is there some way to attach a full .nlogo file right here that I'm missing? Some interfaces are really complicated and only looking at the CODE tab is not adequate.
And, yes since the .nlogo model is pure text I could paste the entire thing into a window here ( which would object to having code in a text window of course) but that's a lot of extra characters in the post.
Unfortunately there is not. This is actually more of a problem for the questioning than answering as it can be difficult to get the person asking the question to post the relevant bit of code, including the other bits of code that lead up to the problem and give key information like what the contents of a variable may be. NetLogo does not lend itself to MWE at all, and beginners simply don't have the experience to replace interface variables with global variables etc.
Uploading to Google users group and cross referencing is likely to get the question/answer closed as it's not complete. But StackOverflow has a different purpose than the users group - it is supposed to be focussed on specific questions with specific answers - such things as syntax problems and bugs, not design. The last thing we want is interfaces or full models. We tend to be more lenient than other areas of StackOverflow because we know NetLogo has a very high proportion of beginners without support and that MWE in NetLogo doesn't really make sense, but questions that require full models are definitely out of scope.

How to give special characters in Trivia Game, Multiple Choice?

I am using the Actions on Google Trivia Game template.
Special characters () are not displaying in the chat window.
In google sheets, I have given in the following format.
Question: How to Add an item to the end of the list
Answer1: append()
Answer2: extend()
In google assistant, it was displaying without parenthesis. How to give questions and answers with parenthesis and other special characters?
This is a good one - it looks like the processor that uses what you entered removes special characters. This does seem odd when you look at the question and the suggestion chips.
However... it makes sense if you think about how people are expected to answer the question. If you run it in "speaker" mode, it won't display the suggestion chips, but users will be expected to verbally give an answer. It is pretty difficult to give an answer with parentheses - so the system automatically removes those from what is expected.

Does anyone know where I can get the entire list of all Unicode code points?

I don't know why i might need it but I wanted to look up all the Unicode data points because I wanted to find all the cool things in there apart form Emoji. So does anyone know where I can get the table. The official one is only useful if you know the data point to find what it does, but I want it the other way round. I cant find anything more than the Wikipedia HTML version and one from UTF-1.

Porting Max/MSP .app to iOS

I've read a number of posts on Apple's forums, and a number of posts on the Cycling '74 forums (with my own questions scattered around both) and nobody seems to be able to help me.
I used Max/MSP to write a 'patch' that takes samples and generates music. I'm going to release it as an album similar to Brian Eno's Thursday Afternoon, but wanted to make it available to people so they can have the music last for more than the hour a CD can hold.
What I don't know how to do, and can't figure out is HOW. It looks just like a regular OS X app, and the only difference I see in the directory structure is that my Max/MSP made application has extra .framework folders as well as the objects I use (which I guess are similar to 'functions' in JScript). I've looked at the package contents of both OS X files and the unpacked .ipa files from the App Store. Being so similar I would imagine it'd be pretty easy.
Where do I start? Has anybody on this forum done this? Thanks for your time!
[edit] - I just wanted to let you know I've discovered RJDJ, an iOS app that allows users to create 'scenes' in Puredata (Pd) and load them on their RJDJ program. I'd rather not go this route.
[edit2] - ok. I agree that it's very different. Especially having 4 (i could cut it down to 3) additional frameworks that aren't part of the SDK. But Ive been thinking. I can add a JavaScript object inside of my program, or make a special new object (object in max is sort of like a class in JS, i think) using C. Is there anything in these languages that would be able to convert a simple 'touch' to a 'mouseclick' in my app?
My application is very very simple. Basically just samples, played at randomly generated time intervals with some a 'conductor' to bring in/out the groups the samples are drawn from (piano, fx, etc...). So the user just clicks the 'start' button and off it goes. So the .nib file I would need to create is very simple. In my head it seems like the .ipa package/ios .app both contain unix executables and so long as these are basically the same it should work, right?
Max6 has been released.
A new object/concept named gen~ is available.
As far as I discussed with C74 dev, I know gen~ WILL provide its source code output. This code produce by the gen~ object could be useable in any other framework. basically, it will be C++
So it would really open A LOT of possibilities ; Max becoming a real graphical framework producing output that can be used in programming world.
It would save time for some part of the code.
As far as I can see from poking around at the Cycling '74 site and forums, there's currently no Max engine available for iOS. libpd is probably your best bet, really. (I'd note that the Inception app uses this Pure Data engine with a custom interface and it works very well.)
Unfortunately OSX and iOS apps are completely different under the hood. Outwardly they look similar (eg. you've noted the .app extension) but the internals are completely different.