Watson keyword spotting unity - unity3d

I have downloaded the Watson unity SDK and set it up like show in the picture and it works.
My question is how do I add keyword spotting?
I have read this question For Watson's Speech-To-Text Unity SDK, how can you specify keywords?
But I cant for example locate the SendStart function.

The Speech to Text service does not find keywords. To find keywords you would need to take the final text output and send it to the Alchemy Language service. Natural Language Understanding service is still being abstracted into the Watson Unity SDK but will eventually replace Alchemy Language.
private AlchemyAPI m_AlchemyAPI = new AlchemyAPI();
private void FindKeywords(string speechToTextFinalResponse)
{
if (!m_AlchemyAPI.ExtractKeywords(OnExtractKeywords, speechToTextFinalResponse))
Log.Debug("ExampleAlchemyLanguage", "Failed to get keywords.");
}
void OnExtractKeywords(KeywordData keywordData, string data)
{
Log.Debug("ExampleAlchemyLanguage", "GetKeywordsResult: {0}", JsonUtility.ToJson(resp));
}
EDIT 1
Natural Language Understanding has been abstracted in tot he Watson Unity SDK.
NaturalLanguageUnderstanding m_NaturalLanguageUnderstanding = new NaturalLanguageUnderstanding();
private static fsSerializer sm_Serializer = new fsSerializer();
private void FindKeywords(string speechToTextFinalResponse)
{
Parameters parameters = new Parameters()
{
text = speechToTextFinalResponse,
return_analyzed_text = true,
language = "en",
features = new Features()
{
entities = new EntitiesOptions()
{
limit = 50,
sentiment = true,
emotion = true,
},
keywords = new KeywordsOptions()
{
limit = 50,
sentiment = true,
emotion = true
}
}
if (!m_NaturalLanguageUnderstanding.Analyze(OnAnalyze, parameters))
Log.Debug("ExampleNaturalLanguageUnderstanding", "Failed to analyze.");
}
private void OnAnalyze(AnalysisResults resp, string customData)
{
fsData data = null;
sm_Serializer.TrySerialize(resp, out data).AssertSuccess();
Log.Debug("ExampleNaturalLanguageUnderstanding", "AnalysisResults: {0}", data.ToString());
}
EDIT 2
Sorry, I didn't realize Speech To Text had the ability to do keyword spotting. Thanks to Nathan for pointing that out to me! I added this functionality into a future release of Speech to Text in the Unity SDK. It will look like this for the Watson Unity SDK 1.0.0:
void Start()
{
// Create credential and instantiate service
Credentials credentials = new Credentials(_username, _password, _url);
_speechToText = new SpeechToText(credentials);
// Add keywords
List<string> keywords = new List<string>();
keywords.Add("speech");
_speechToText.KeywordsThreshold = 0.5f;
_speechToText.Keywords = keywords.ToArray();
_speechToText.Recognize(_audioClip, HandleOnRecognize);
}
private void HandleOnRecognize(SpeechRecognitionEvent result)
{
if (result != null && result.results.Length > 0)
{
foreach (var res in result.results)
{
foreach (var alt in res.alternatives)
{
string text = alt.transcript;
Log.Debug("ExampleSpeechToText", string.Format("{0} ({1}, {2:0.00})\n", text, res.final ? "Final" : "Interim", alt.confidence));
if (res.final)
_recognizeTested = true;
}
if (res.keywords_result != null && res.keywords_result.keyword != null)
{
foreach (var keyword in res.keywords_result.keyword)
{
Log.Debug("ExampleSpeechToText", "keyword: {0}, confidence: {1}, start time: {2}, end time: {3}", keyword.normalized_text, keyword.confidence, keyword.start_time, keyword.end_time);
}
}
}
}
}
Currently you can find the refactor branch here. This release is a breaking change and has all of the higher level (widgets, config, etc) functionality removed.

Related

Azure notification hub Registration push variables

I am currently using Azure notification hub(FCM) to send one-one notification to user as well as notification to group of users by using tags(5000 - 10000 users at a time) .
Now while sending notification to group , I want some personalization like:
Hi ABC<$(firstname of user1)>, here is new AAAAA for you today.
Hi XYZ<$(firstname of user2)>, here is new AAAAA for you today.
.
.
Hi ZZZ<$(firstname of user5000)>, here is new AAAAA for you today.
I read that this is possible by using push variables with native registartion /installation sdk.
Ref:https://azure.microsoft.com/en-in/blog/updates-from-notification-hubs-independent-nuget-installation-model-pmt-and-more/
But I could not find any option in registration/installation Java SDK to set these values .
Registration registration = new FcmRegistration(id, token);
registration.getTags().add(tagname);
hub.createRegistration(registration);
Installation installation = new Installation(name);
installation.setPushChannel(token);
installation.setPlatform(NotificationPlatform.Gcm);
installation.addTag(tagname);
hub.createOrUpdateInstallation(installation);
Any help is really appreciated , otherwise for group notification I have to send notification for each user via iteration and that defeats benefit of using tags and getting the job done in just 1 hub API call.
You are correct - this is exactly what ANH templates are for. You can read this blog post about them for some background knowledge. Essentially, once you've created a template you can do a template send operation that provides just the values that need to be injected. i.e. Your Installation record will have set the appropriate body:
"Hi $(firstname), here is new $(subject) for you today."
and your send operation provides the values to inject:
{
"firstname": "ABC",
"subject": "AAAAA"
}
Also, make sure to specify the correct tags to scope the audience, in this case something like "ABC" to specify the user, and "new-daily" to specify which templates should be used.
Another trick, you can skip a layer of tag management and send fewer requests by embedding the $(firstname) in the template itself.
"Hi ABC, here is new $(subject) for you today."
Because templates are stored per device, each device can have a separate name embedded in it, reducing the number of tags you need to tinker with. This would make the body you send just:
{
"subject": "AAAAA"
}
and you only need to scope with the tag "new-daily".
Looks like you're on the right track with templating. When you embed an expression into surrounding text, you're effectively doing concatenation, which requires the expression to be surrounded in { }. See documentation about template expression language using Azure Notification Hubs where it states "when using concatenation, expressions must be wrapped in curly brackets."
In your case, I think you want something along the lines of:
...
{"title":"{'Seattle Kraken vs. ' + $(opponent) + ' starting soon'}",...}
...
Thanks a lot I got it working by extending the API classes on my own in following manner as per the blog suggested.
package com.springbootazure.controller;
import java.util.Map;
import org.apache.commons.collections.map.HashedMap;
import com.google.gson.GsonBuilder;
import com.windowsazure.messaging.FcmRegistration;
public class PushRegistration extends FcmRegistration {
private static final String FCM_NATIVE_REGISTRATION1 = "<?xml version=\"1.0\" encoding=\"utf-8\"?><entry xmlns=\"http://www.w3.org/2005/Atom\"><content type=\"application/xml\"><GcmRegistrationDescription xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://schemas.microsoft.com/netservices/2010/10/servicebus/connect\">";
private static final String FCM_NATIVE_REGISTRATION2 = "<GcmRegistrationId>";
private static final String FCM_NATIVE_REGISTRATION3 = "</GcmRegistrationId></GcmRegistrationDescription></content></entry>";
private Map<String, String> pushVariables = new HashedMap();
public PushRegistration() {
super();
}
public PushRegistration(String registrationId,
String fcmRegistrationId, Map<String, String> pushVariables) {
super(registrationId, fcmRegistrationId);
this.pushVariables = pushVariables;
}
public PushRegistration(String fcmRegistrationId, Map<String, String> pushVariables) {
super(fcmRegistrationId);
this.pushVariables = pushVariables;
}
#Override
public int hashCode() {
final int prime = 31;
int result = super.hashCode();
result = prime * result
+ ((pushVariables == null) ? 0 : pushVariables.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!super.equals(obj)) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
PushRegistration other = (PushRegistration) obj;
if (pushVariables == null) {
if (other.pushVariables != null) {
return false;
}
} else if (!pushVariables.equals(other.pushVariables)) {
return false;
}
return true;
}
protected String getPushVariablesXml() {
StringBuilder buf = new StringBuilder();
if (!tags.isEmpty()) {
buf.append("<PushVariables>");
buf.append(new GsonBuilder().disableHtmlEscaping().create().toJson(pushVariables));
buf.append("</PushVariables>");
}
return buf.toString();
}
#Override
public String getXml() {
return FCM_NATIVE_REGISTRATION1 +
getTagsXml() +
getPushVariablesXml() +
FCM_NATIVE_REGISTRATION2 +
fcmRegistrationId +
FCM_NATIVE_REGISTRATION3;
}
}
And afterwards , register a token using :
Map<String, String> pushVariables = new HashMap<>();
pushVariables.put("firstname", "Gaurav");
pushVariables.put("lastname", "Aggarwal");
Registration registration = new PushRegistration(name, token, pushVariables);
if (registration == null) {
registration = new FcmRegistration(name, token);
}
registration.getTags().add(tagname);
registration.getTags().add(category);
hub.createRegistration(registration);
And then send notification like:
Notification n = Notification.createFcmNotifiation("{\n" +
" \"notification\" : {\n" +
" \"body\" : \"{ $(firstname) + ' starting soon'}\",\n" +
" \"title\" : \"test title\"\n" +
" }\n" +
"}");
hub.sendNotification(n, tagname);

Unity - Google cloud speech-to-text voice recognition, Unity freezes after successful result

A friend of mine and I are working on a VR project in Unity at the moment and we are trying to implement voice recognition as a feature. We are using Unity version 2018.3.3f1. The idea is that a user can say a word and the voice recognition will see if they pronounced it correctly. We have chosen to use the Google cloud speech-to-text service for this as it supports the target language (Norwegian). In addition, the application is also multiplayer and so we are trying to use the streaming version of Google cloud speech. Here is a link to their documentation: https://cloud.google.com/speech-to-text/docs/streaming-recognize
What we have done is to have a plugin that essentially runs the speech recognition for us. It is a modification of the example code given in the link above:
public Task<bool> StartSpeechRecognition()
{
return StreamingMicRecognizeAsync(20, "fantastisk");
}
static async Task<bool> StreamingMicRecognizeAsync(int inputTime, string inputWord)
{
bool speechSuccess = false;
Stopwatch timer = new Stopwatch();
Task delay = Task.Delay(TimeSpan.FromSeconds(1));
if (NAudio.Wave.WaveIn.DeviceCount < 1)
{
//Console.WriteLine("No microphone!");
return false;
}
var speech = SpeechClient.Create();
var streamingCall = speech.StreamingRecognize();
// Write the initial request with the config.
await streamingCall.WriteAsync(
new StreamingRecognizeRequest()
{
StreamingConfig = new StreamingRecognitionConfig()
{
Config = new RecognitionConfig()
{
Encoding =
RecognitionConfig.Types.AudioEncoding.Linear16,
SampleRateHertz = 16000,
LanguageCode = "nb",
},
InterimResults = true,
}
});
// Compare speech with the input word, finish if they are the same and speechSuccess becomes true.
Task compareSpeech = Task.Run(async () =>
{
while (await streamingCall.ResponseStream.MoveNext(
default(CancellationToken)))
{
foreach (var result in streamingCall.ResponseStream
.Current.Results)
{
foreach (var alternative in result.Alternatives)
{
if (alternative.Transcript.Replace(" ", String.Empty).Equals(inputWord, StringComparison.InvariantCultureIgnoreCase))
{
speechSuccess = true;
return;
}
}
}
}
});
// Read from the microphone and stream to API.
object writeLock = new object();
bool writeMore = true;
var waveIn = new NAudio.Wave.WaveInEvent();
waveIn.DeviceNumber = 0;
waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
waveIn.DataAvailable +=
(object sender, NAudio.Wave.WaveInEventArgs args) =>
{
lock (writeLock)
{
if (!writeMore) return;
streamingCall.WriteAsync(
new StreamingRecognizeRequest()
{
AudioContent = Google.Protobuf.ByteString
.CopyFrom(args.Buffer, 0, args.BytesRecorded)
}).Wait();
}
};
waveIn.StartRecording();
timer.Start();
//Console.WriteLine("Speak now.");
//Delay continues as long as a match has not been found between speech and inputword or time that has passed since recording is lower than inputTime.
while (!speechSuccess && timer.Elapsed.TotalSeconds <= inputTime)
{
await delay;
}
// Stop recording and shut down.
waveIn.StopRecording();
timer.Stop();
lock (writeLock) writeMore = false;
await streamingCall.WriteCompleteAsync();
await compareSpeech;
//Console.WriteLine("Finished.");
return speechSuccess;
}
We made a small project in Unity to test if this was working with a cube GameObject that had this script:
private CancellationTokenSource tokenSource;
VR_VoiceRecognition.VoiceRecognition voice = new VR_VoiceRecognition.VoiceRecognition();
IDisposable speech;
// Use this for initialization
void Start() {
speech = Observable.FromCoroutine(WaitForSpeech).Subscribe();
}
// Update is called once per frame
void Update() {
}
IEnumerator WaitForSpeech()
{
tokenSource = new CancellationTokenSource();
CancellationToken token = tokenSource.Token;
Debug.Log("Starting up");
Task<bool> t = Task.Run(() => voice.StartSpeechRecognition());
while (!(t.IsCompleted || t.IsCanceled))
{
yield return null;
}
if (t.Status != TaskStatus.RanToCompletion)
{
yield break;
}
else
{
bool result = t.Result;
UnityEngine.Debug.Log(t.Result);
yield return result;
}
}
void OnApplicationQuit()
{
print("Closing application.");
speech.Dispose();
}
We are also using a plugin that was recommended to us by Unity support that they thought might have a workaround called UniRx (https://assetstore.unity.com/packages/tools/integration/unirx-reactive-extensions-for-unity-17276).
At the moment it is working fine when you play it in the editor for the first time. When the voice recognition returns false then everything is fine (two cases when this happens is if it cannot find a microphone or if the user does not say the specific word). However, if it is a success then it still returns true, but if you exit play mode in the editor and try to play again then Unity will freeze. Unity support suspects that it might have something to do with the Google .dll files or Google API. We are not quite sure what to do from here and we hope that someone could point us to the right direction.

how do I Integrate Watson text to speech with speech to text in unity

I'm building an AR CV app in unity using the watson SDK. I'm a complete noob but I've managed to follow the videos and create something kinda cool.
The idea is that it will give the candidate a more interesting way to describe themselves than a sheet of paper. my problem is that while I've managed to get speech to text streaming done I don't know what my next steps are. It's for a university project but my tutor doesn't know either. Also if TAJ reads this thank you so much for those youtube videos!
my question is how do I add text to speech and assistant?
The basic idea here is that you will use the Watson Unity SDK services to bring speech via the microphone and convert it to text. You shouldn't send this text back to text to speech since it's what you just input (unless that's what you wanted). This text can be used in many ways. One way would be to use the Watson Assistant service and create a kind of script that you can use in natural language. The output of the message method is text that you could feed into Watson Text to Speech resulting in an audio file that could be played back. Essentially from the StreamingExample
private void OnRecognize(SpeechRecognitionEvent result, Dictionary<string, object> customData)
{
if (result != null && result.results.Length > 0)
{
foreach (var res in result.results)
{
foreach (var alt in res.alternatives)
{
// Is final for the utternace?
if (res.final)
{
MessageRequest messageRequest = new MessageRequest()
{
Input = new MessageInput()
{
Text = alt.transcript
}
};
// Send the text to Assistant
assistant.Messsage(OnMessage, OnFail, assistantId, sessionId, messageRequest);
}
}
}
}
}
private void OnMessage(MessageResponse response, Dictionary<string, object> customData)
{
// Send Assistant output to TextToSpeech
textToSpeech.ToSpeech(OnSynthesize, OnFail, response.output.generic[0].text, true)
}
private void OnSynthesize(AudioClip clip, Dictionary<string, object> customData)
{
// Play the clip from TextToSpeech
PlayClip(clip);
}
private void PlayClip(AudioClip clip)
{
if (Application.isPlaying && clip != null)
{
GameObject audioObject = new GameObject("AudioObject");
AudioSource source = audioObject.AddComponent<AudioSource>();
source.spatialBlend = 0.0f;
source.loop = false;
source.clip = clip;
source.Play();
Destroy(audioObject, clip.length);
}
}
You will need to properly instantiate and authenticate the services.

On unity3d, my IBM chatbot can't recognize sequential

It is another unity watson sdk question.
I solved the first talk issue by just make fake object again..
Here is another thing.
In my chatbot I can see sequential text if there are same intents.
What line do I have to change or add to make it happen?
(another question: What line do I have to change or add to bring 'jump to' method in my unity?
using IBM.Watson.DeveloperCloud.Services.Conversation.v1;
using IBM.Watson.DeveloperCloud.Utilities;
using System;
using System.Collections.Generic;
using UnityEngine;
class Watson : MonoBehaviour{
static Credentials credentials;
static Conversation _conversation;
void Start()
{
credentials = new Credentials("xx-xx-xx-xx-xx", "xx", "https://gateway.watsonplatform.net/conversation/api");
// credentials.Url = "";
_conversation = new Conversation(credentials);
}
static Action<string, ManagerChat.Feel, bool> Act;
public static void GoMessage(string _str,Action<string, ManagerChat.Feel,bool> _act)
{
if (!_conversation.Message(OnMessage, "xx-xx-xx-xx-xx", _str))
Debug.Log("ExampleConversation Failed to message!");
Act = _act;
}
static bool GetIntent(Dictionary<string, object> respDict)
{
object intents;
respDict.TryGetValue("intents", out intents);
object intentString = new object();
object confidenceString = new object();
foreach (var intentObj in (intents as List<object>))
{
Dictionary<string, object> intentDict = intentObj as Dictionary<string, object>;
intentDict.TryGetValue("intent", out intentString);
intentDict.TryGetValue("confidence", out confidenceString);
}
string str = intentString as string;
if (str == "6사용자_마무리")
return true;
return false;
}
static string GetOutput(Dictionary<string, object> respDict)
{
object outputs;
respDict.TryGetValue("output", out outputs);
object output;
(outputs as Dictionary<string, object>).TryGetValue("text", out output);
string var = (output as List<object>)[0] as string;
return var;
}
static ManagerChat.Feel GetEntities(Dictionary<string, object> respDict)
{
object entities;
respDict.TryGetValue("entities", out entities);
List<object> entitieList = (entities as List<object>);
if(entitieList.Count == 0)
{
return ManagerChat.Feel.Normal;
}
else
{
object entitie;
(entitieList[0] as Dictionary<string, object>).TryGetValue("value", out entitie);
ManagerChat.Feel feel = ManagerChat.Feel.NONE;
string str = entitie as string;
switch (str)
{
case "Happy":
feel = ManagerChat.Feel.Happy;
break;
case "Expect":
feel = ManagerChat.Feel.Expect;
break;
case "Born":
feel = ManagerChat.Feel.Born;
break;
case "Sad":
feel = ManagerChat.Feel.Sad;
break;
case "Surprise":
feel = ManagerChat.Feel.Surprise;
break;
case "Normal":
feel = ManagerChat.Feel.Normal;
break;
default:
break;
}
return feel;
}
}
static void OnMessage(object resp, string data)
{
Dictionary<string, object> respDict = resp as Dictionary<string, object>;
bool flag = (GetIntent(respDict));
string output = (GetOutput(respDict));
ManagerChat.Feel feel = GetEntities(respDict);
// Debug.Log(resp);
// Debug.Log(data);
Act(output,feel, flag);
}
}
I don't quite follow your question about sequential texts, but I think you may be trying to do more in your application code than is needed.
I'll just give a high level pattern about how to use the sdk and see if that clears things up. Im no Unity dev, but the pattern is the same no matter the language.
You only need to give Watson the user's input text, and the existing context variables, most importantly system context, but any custom context you have created is valuable as well.
Then, Watson will return an output.text object, which you post to the user, and Watson also returns an updated system context.
Next, the user types something new, your app grabs that text, passes it to Watson along with the context object he returned last time, and the process repeats.
You should not need to do anything in your app code for jump tos, sequential text, etc., as thats all handled by Watson.
The only feature I can think of for the sequential piece is the response variations for a single dialog node. This feature means that if a user visits a specific dialog node multiple times, you will give different reponses, either in seuqential order, or random order, if there are multiple. It does require more than one input from the user, navigating to the same node multiple times. This is to give your bot some variation, most valuable for common inputs like 'hello', 'googbye', 'thanks', etc.

UWP trying to run background service throwing exception

I am trying to run background service in UWP application. I am first checking if application has background permission. If yes then I am registering the service for running.
This code was working fine until I updated Visual Studio along with Windows 10 SDK to Creators Update version. Now I can't figure out if this update changes things for registering background service.
using System;
using Windows.ApplicationModel.Background;
using BackgroundService;
using SampleApp.Config;
namespace SampleApp.Background
{
class BackgroundClass
{
LocalConfig LC = new LocalConfig();
public async void RequestBackgroundAccess()
{
var result = await BackgroundExecutionManager.RequestAccessAsync();
switch (result)
{
case BackgroundAccessStatus.AllowedMayUseActiveRealTimeConnectivity:
break;
case BackgroundAccessStatus.AllowedWithAlwaysOnRealTimeConnectivity:
break;
case BackgroundAccessStatus.Denied:
break;
case BackgroundAccessStatus.Unspecified:
break;
}
}
public async void RegisterBackgroundSync()
{
var trigger = new ApplicationTrigger();
var condition = new SystemCondition(SystemConditionType.InternetAvailable);
if (!LC.BackgroundSyncStatusGET())
{
var task = new BackgroundTaskBuilder
{
Name = nameof(BackgroundSync),
CancelOnConditionLoss = true,
TaskEntryPoint = typeof(BackgroundSync).ToString(),
};
task.SetTrigger(trigger);
task.AddCondition(condition);
task.Register();
LC.BackgroundSyncStatusSET(true);
}
await trigger.RequestAsync(); //EXCEPTION HAPPENS AT THIS LINE
}
public void RegisterBackgroundService(uint time)
{
var taskName = "BackgroundService";
foreach (var unregisterTask in BackgroundTaskRegistration.AllTasks)
{
if (unregisterTask.Value.Name == taskName)
{
unregisterTask.Value.Unregister(true);
}
}
if(time != 0)
{
var trigger = new TimeTrigger(time, false);
var condition = new SystemCondition(SystemConditionType.InternetAvailable);
var task = new BackgroundTaskBuilder
{
Name = nameof(BackgroundService),
CancelOnConditionLoss = true,
TaskEntryPoint = typeof(BackgroundService).ToString(),
};
task.SetTrigger(trigger);
task.AddCondition(condition);
task.Register();
}
}
}
}
Now while requesting I am checking if background service is registered keeping issues for re-registration. I am getting following exception
System.Runtime.InteropServices.COMException occurred
HResult=0x80004005
Message=Error HRESULT E_FAIL has been returned from a call to a COM component.
Source=Windows
 
StackTrace:
  
at Windows.ApplicationModel.Background.ApplicationTrigger.RequestAsync()
  
at SampleApp.Background.BackgroundClass.d__2.MoveNext()
Please Help
Had this same problem, was in my Windows 10 Privacy Settings.
System Settings => Privacy Settings
In the left-hand menu choose Background apps.
Check to make sure your app hasn't been blocked from running background tasks.