Is it not possible to use curl, to use Google Cloud Speech API, to recognize within 10 to 15 minute files? - rest

I'm using REST API with cURL because I need to do something quick and simple, and I'm on a box that I can't start dumping garbage on; i.e. some thick developer SDK.
I started out base64 encoding flac files and initiating speech.syncrecognize.
That eventually failed with:
{
"error": {
"code": 400,
"message": "Request payload size exceeds the limit: 10485760.",
"status": "INVALID_ARGUMENT"
}
}
So okay, you can't send 31,284,578 bytes in the request; have to use Cloud Storage. So, I upload the flac audio file and try again using the file now in Cloud Storage. That fails with:
{
"error": {
"code": 400,
"message": "For audio inputs longer than 1 min, use the 'AsyncRecognize' method.",
"status": "INVALID_ARGUMENT"
}
}
Great, speech.syncrecognize doesn't like the content size; try again with speech.asyncrecognize. That fails with:
{
"error": {
"code": 400,
"message": "For audio inputs longer than 1 min, please use LINEAR16 encoding.",
"status": "INVALID_ARGUMENT"
}
}
Okay, so speech.asyncrecognize can only do LPCM; upload the file in pcm_s16le format and try again. So finally, I get an operation handel:
{
"name": "9174269756763138681"
}
Keep checking it, and eventually it's complete:
{
"name": "9174269756763138681",
"done": true,
"response": {
"#type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
}
}
So wait, after all that, with the result now sitting on a queue, there is no REST method to request the result? Someone please tell me that I've missed the glaringly obvious staring me right in the face, and that Google didn't create completely pointless, incomplete, REST API.

So the answer to the question is, No, it is possible to use curl, to use Google Cloud Speech API, to recognize within 10 to 15 minute files... assuming you navigate and conform to a fairly tight set of constraints... at least in beta1.
What is not overtly obvious from the documentation is the result should be returned by the operations.get method... which would have been obvious had any of my attempts actually returned something other than empty results.
The source rate in my files is either 44,100 or 48,000 Hz, and I was setting sample_rate to the source native rate. However, contrary to the documentation which states:
Sample rate in Hertz of the audio data sent in all RecognitionAudio
messages. Valid values are: 8000-48000. 16000 is optimal. For best
results, set the sampling rate of the audio source to 16000 Hz. If
that's not possible, use the native sample rate of the audio source
(instead of re-sampling).
after re-sampling to 16,000 Hz I started to get results with operations.get.
I think it's worth noting that correlation does not imply causation. After re-sampling to 16,000 Hz the files becomes significantly smaller. Thus, I can't prove it's a sample rate issue, and not just the service choking on files over a certain size.
It's also worth noting the documentation refers to the Sample Rate inconsistently. It appears that gRPC API may be expecting sample_rate, and REST API may be expecting sampleRate, according to their respective detailed definitions, in which case the Quickstart may be giving an incorrect example for the REST API.

Related

Google Text-to-Speech API 400 error for specific paragraph

I have an integration set up with Google TTS generating audio for a daily Bloomberg email newsletter. It's been working reliably, but I recently received a 400 error for a specific group of sentences:
"message": "Request contains an invalid argument.",
"status": "INVALID_ARGUMENT
Sending POST to https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize with this payload:
{
"input": {
"text": "Still it hangs together in a rough schematic way, which means you can try it in thecryptomarket. Crypto, particularly decentralized finance, has some key advantages for this, including:\n\n * Weird and fragmented liquidity, so that you can trade with yourself on a futures exchange, and you can move the price of a token a lot on the spot market;\n * A love of mechanical rules and automated markets, so that if your X position spikes from $1 million to $100 million, some decentralized finance platform will say “yup, now it’s worth $100 million, so it’s good collateral for a $40 million loan”; and\n * A presumption of anonymity, so exchanges will let you trade with yourself, and won’t be able to come after you for your losses, since they just have some anonymous wallet addresses."
},
"voice": {
"name": "en-US-Neural2-A",
"languageCode": "en-US"
},
"audioConfig": {
"pitch": -4,
"speakingRate": 1.2,
"audioEncoding": "MP3"
}
}
I've confirmed this behavior is consistent in Google's API explorer, so I think I've isolated the text content itself as the source of the error. I eliminated unescaped characters as a cause by trying a no-symbol version of the text that still failed. API documentation hasn't yielded any clues on allowable characters or other requirements.
The weirdest thing is that the text above works if I delete either the first sentence or the last sentence. Is Google checking the content itself for prohibited phrases or something???
Any other ideas to resolve this error?
I have been able to reproduce this exact same error with multiple AU, UK and US english Neural2 voices using the text you provided. I've also observed this same exact behavior with other random segments of text of my own.
The workaround I've found (and which is working on the text you provided) is to simply use Wavenet voices instead.

Github REST API - how to retrieve specific lines of codes (code snippet)

I would like to retrieve specific lines of codes via the REST API.
After a user has authorized access by connecting to his github account (via Web Application flow), I'd like to be able to programtically retrieve with the REST API a block of lines from a repo's file.
On the github.com UI, it's quite easy to get only certain lines: you can select multiples lines and get a "permalink" such as if it's form line 3 to 7 for example:
https://github.com/{username}/{repo_name}/blob/{specific file ex: ce3f225c2025556705353f8369097e760d063c6bbce3}/{file_path_in_the_repo}#L3-L7
On the API however I don't manage to do it. I manage to get the code but only for the WHOLE file, not restricted to certain lines with:
https://api.github.com/repos/{username}/{repository_name}/contents/{file_path}
For example the following code works:
https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb
The result is
{
"name": "send_event_job.rb",
"path": "sentry-rails/app/jobs/sentry/send_event_job.rb",
"sha": "55314dd99703fc121516513a59e20377b2534f48",
"size": 980,
"url": "https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb?ref=master",
"html_url": "https://github.com/getsentry/sentry-ruby/blob/master/sentry-rails/app/jobs/sentry/send_event_job.rb",
"git_url": "https://api.github.com/repos/getsentry/sentry-ruby/git/blobs/55314dd99703fc121516513a59e20377b2534f48",
"download_url": "https://raw.githubusercontent.com/getsentry/sentry-ruby/master/sentry-rails/app/jobs/sentry/send_event_job.rb",
"type": "file",
"content": "aWYgZGVmaW5lZD8oQWN0aXZlSm9iKQogIG1vZHVsZSBTZW50cnkKICAgIHBh\ncmVudF9qb2IgPQogICAgICBpZiBkZWZpbmVkPyg6OkFwcGxpY2F0aW9uSm9i\nKSAmJiA6OkFwcGxpY2F0aW9uSm9iLmFuY2VzdG9ycy5pbmNsdWRlPyg6OkFj\ndGl2ZUpvYjo6QmFzZSkKICAgICAgICA6OkFwcGxpY2F0aW9uSm9iCiAgICAg\nIGVsc2UKICAgICAgICA6OkFjdGl2ZUpvYjo6QmFzZQogICAgICBlbmQKCiAg\nICBjbGFzcyBTZW5kRXZlbnRKb2IgPCBwYXJlbnRfam9iCiAgICAgICMgdGhl\nIGV2ZW50IGFyZ3VtZW50IGlzIHVzdWFsbHkgbGFyZ2UgYW5kIGNyZWF0ZXMg\nbm9pc2UKICAgICAgc2VsZi5sb2dfYXJndW1lbnRzID0gZmFsc2UgaWYgcmVz\ncG9uZF90bz8oOmxvZ19hcmd1bWVudHM9KQoKICAgICAgIyB0aGlzIHdpbGwg\ncHJldmVudCBpbmZpbml0ZSBsb29wIHdoZW4gdGhlcmUncyBhbiBpc3N1ZSBk\nZXNlcmlhbGl6aW5nIFNlbnRyeUpvYgogICAgICBpZiByZXNwb25kX3RvPyg6\nZGlzY2FyZF9vbikKICAgICAgICBkaXNjYXJkX29uIEFjdGl2ZUpvYjo6RGVz\nZXJpYWxpemF0aW9uRXJyb3IKICAgICAgZWxzZQogICAgICAgICMgbWltaWMg\nd2hhdCBkaXNjYXJkX29uIGRvZXMgZm9yIFJhaWxzIDUuMAogICAgICAgIHJl\nc2N1ZV9mcm9tIEFjdGl2ZUpvYjo6RGVzZXJpYWxpemF0aW9uRXJyb3IgZG8K\nICAgICAgICAgIGxvZ2dlci5lcnJvciAiRGlzY2FyZGVkICN7c2VsZi5jbGFz\nc30gZHVlIHRvIGEgI3tleGNlcHRpb259LiBUaGUgb3JpZ2luYWwgZXhjZXB0\naW9uIHdhcyAje2Vycm9yLmNhdXNlLmluc3BlY3R9LiIKICAgICAgICBlbmQK\nICAgICAgZW5kCgogICAgICBkZWYgcGVyZm9ybShldmVudCwgaGludCA9IHt9\nKQogICAgICAgIFNlbnRyeS5zZW5kX2V2ZW50KGV2ZW50LCBoaW50KQogICAg\nICBlbmQKICAgIGVuZAogIGVuZAplbHNlCiAgbW9kdWxlIFNlbnRyeQogICAg\nY2xhc3MgU2VuZEV2ZW50Sm9iOyBlbmQKICBlbmQKZW5kCgo=\n",
"encoding": "base64",
"_links": {
"self": "https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb?ref=master",
"git": "https://api.github.com/repos/getsentry/sentry-ruby/git/blobs/55314dd99703fc121516513a59e20377b2534f48",
"html": "https://github.com/getsentry/sentry-ruby/blob/master/sentry-rails/app/jobs/sentry/send_event_job.rb"
}
}
But if I add L3-L7, like below it does not change anything. I would have lked it to change for exmaple the download_url so that it only includes line 3 to 7:
https://api.github.com/repos/getsentry/sentry-ruby/contents/sentry-rails/app/jobs/sentry/send_event_job.rb#L3-L7
I don't find on the Github Docs which url to call to retrieve PROGRAMATICALLY with the REST API this type of multi-line code snippet?
Note: I know how to get the whole "download_url": https://raw.githubusercontent.com/getsentry/sentry-ruby/master/sentry-rails/app/jobs/sentry/send_event_job.rb file and then parse it to only keep line X to line Y but i would like to know if there's a direct API command to do what you can easily do with the UI.
Thanks
GitHub's REST API does not provide a way to extract just a few lines of a file. In the web interface, you get the entire rendered file with just a few lines highlighted, not just a snippet.
The reason this is the case is because extracting a limited number of lines from a file is actually much more work than extracting the entire file. All files are stored as Git blobs, and there isn't a way to extract only certain lines from a blob without reading the entire file up to that point, since blobs are stored compressed. Therefore, GitHub would actually expend much more effort to read the entire file into memory and then restrict it to just the lines you wanted, and as a result, such an API would be much more restricted and not be able to handle files that were nearly as large.
Also, in some cases, there is no sane answer to what constitutes a line. While Git normally wants files to be stored with LF endings, if a file has been checked in with CRLF endings, should those be handled? (If so, that's additional work to properly handle them.) If you have a binary file, like a JPEG, there are no lines. Similarly, while files in UTF-16 probably have lines, Git considers them binary files, so they probably wouldn't be able to be handled.
Note that the reason that your #L3-L7 doesn't work as part of the API, besides that the API doesn't support it, is that this is a fragment and is generally not sent to the server. It's supposed to identify a specific portion of a document, and that's typically done client-side, in the web browser. Since with your API request there is no client to do this, the server doesn't even see your request.

Rest API designing PUT vs PATCH

I am developing 2 REST APIs which edits and pause something at my backend.
For editing I was using:
PUT /video/1
What is the best way to develop a pause video service. Should I use PATCH or PUT for this? Input would be just the id. If I use PUT then how can differentiate between edit and pause? And if I have another API to be developed for eg: video restart how can I accommodate these verbs in REST API?
Distinguishing the state using the HTTP method only is a poor idea. What you can is to:
Introduce state, and then use PATCH to change the state:
PATCH /vidoes/1
{
"state": "PLAYING|PAUSED|STOPPED" // what you need here
}
Mind don't patch like an idiot, however it is common to patch like an idiot.
Introduce new endpoints that will reflect the operation invoked on the resource - this is not fully RESTful, however also common:
POST /vidoes/1/play/
POST /vidoes/1/stop/
POST /vidoes/1/pause/
PUT for editing is ok of course, however remember that PUT is idempotent and requires the resource to be sent.
I do not agree with #Opal's answer here hence I post this answer. I do feel you use the wrong tools (or terms) to achieve what yo want. REST is more then just a HTTP invocation via a cleanly designed URI. As proposed by #Opal in a comment on his answer, WebSockets might be what you are looking for, though REST may be able to server your needs as well (as plain HTTP would do either).
Pausing a video
It should not be the task of the HTTP server to stop the video but the client. Usually partial GET requests are sent to the server retrieving only a portion of the resource and adding them to a buffer which the client reads. In the back the client site will issue further partial requests to keep the buffer filled while the client is reading it. If the client wants to pause, it simply stops reading the buffer and optionally stop sending further partial GET requests to the server.
This allows to spread the actual video onto mutliple servers and let the client talk to any of these and still get the correct responses. If the server has to maintain the client state, you need to ensure that the state is also replicated to all the other serving nodes. Sure, this is possible but also combined with higher overhead!
Updating videos
As you obviously create a video-editing system you have two options here as also suggested by the PUT definiton:
Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in RFC5789).
Separate the resource into smaller resources
Use an other method like PATCH
As already pointed out by #Opal in his answer, in case when you use PATCH to partially update a resource you should not only provide the new content within the body but also instruct the server what is should do with it.
The separation into smaller resources however does feel more natural to me for a video-editing system though. A video can be seen as a sequence of scenes which consist of numerous pictures and maybe an attached soundfile.
A movie therefore could be represented like this in pseudo Json-HAL:
Movie : {
title: The Matrix,
release_year: 1999,
actors: [Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, Joe Pantoliano],
...
link: {
self: http://...,
...
},
embedded: {
Scenes : [
{
description: Trinity chased by police,
links: [
self: http://...,
video: http://.../scene01.vid
]
},
{
description: Thomas Anderson get notified to follow the white rabbit,
start_offset: 5091,
end_offset: 193920,
links: [
self: http://...,
video: http://.../scene02.vid
]
},
...
]
}
}
Instead of having all the bytes in one file you could maintain each scene separately. The movie representation combines the scenes to a full movie if played from scene 1 to scene n.
If now one scene is edited and the whole scene file should be replaced, using a simple PUT request is enough. If you want to trim the first or last few seconds off the video, you could introduce a start and stop offset for the respective scene and instead of reuploading the full scene again, you tell the client that it should start at the suggested offest or stop at the suggested position.
The client can use this parameters in the partial GET request to retrieve only the necessary bytes. This fields should then of course be modified via a PATCH command in order to prevent altering the video bytes or its URI. In order for a client to learn the total bytes of a video it can issue a HEAD request first to the URI and use the content length returned from the response
This, of course, screems for its own media-type, but this is what REST is actually all about. I don't know why so many misuse the REST-term for plain URI-design or think that a neat URI-API is more RESTful when REST doesn't care much about the URI layout actually.

Dont receive results other than those from first audio chunk

I want some level of real-time speech to text conversion. I am using the web-sockets interface with interim_results=true. However, I am receiving results for the first audio chunk only. The second,third... audio chunks that I am sending are not getting transcribed. I do know that my receiver is not blocked since I do receive the inactivity message.
json {"error": "Session timed out due to inactivity after 30 seconds."}
Please let me know if I am missing something if I need to provide more contextual information.
Just for reference this is my init json.
{
"action": "start",
"content-type":"audio/wav",
"interim_results": true,
"continuous": true,
"inactivity_timeout": 10
}
In the result that I get for the first audio chunk, the final json field is always received as false.
Also, I am using golang but that should not really matter.
EDIT:
Consider the following pseudo log
localhost-server receives first 4 seconds of binary data #lets say Binary 1
Binary 1 is sent to Watson
{interim_result_1 for first chunk}
{interim_result_2 for first chunk}
localhost-server receives last 4 seconds of binary data #lets say Binary 2
Binary 2 is sent to Watson
Send {"action": "stop"} to Watson
{interim_result_3 for first chunk}
final result for the first chunk
I am not receiving any transcription for the second chunk
Link to code
You are getting the time-out message because the service waits for you to either send more audio or send a message signalling the end of an audio submission. Are you sending that message? It's very easy:
By sending a JSON text message with the action key set to the value stop: {"action": "stop"}
By sending an empty binary message
https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/websockets.shtml
Please let me know if this does not resolve your problem
This is a bit late, but I've open-sourced a Go SDK for Watson services here:
https://github.com/liviosoares/go-watson-sdk
There is some documentation about speech-to-text binding here:
https://godoc.org/github.com/liviosoares/go-watson-sdk/watson/speech_to_text
There is also an example of streaming data to the API in the _test.go file:
https://github.com/liviosoares/go-watson-sdk/blob/master/watson/speech_to_text/speech_to_text_test.go
Perhaps this can help you.
The solution to this question was to set the size header of the wav file to 0.

How to design a REST API to allow returning files with metadata

Suppose I'm designing a REST API and I need the clients to be able to obtain files with metadata. What is a good way to design the resources / operations?
Some ideas come to mind:
A single resource (i.e. GET /files/{fileId}), which returns a multi-part response containing both the file and a JSON/XML structure with metadata. I have a feeling that this is not a very good approach. For example, you cannot use the Accept header for the clients to determine if they want a XML or a JSON metadata representation, since the response type would be multi-part in both cases.
Two resources (i.e. GET /files/{fileId} and GET /files/{fileId}/metadata), where the first one returns the file itself and the second one a JSON/XML structure with metadata. There can be a link from the metadata to the file. However, how do I send a link to the metadata along with the file?
I would suggest using the second idea you presented. This is the strategy used by most of the major web drives (Box, Dropbox, Google Drive, etc). They often have a significantly different URL because they store content and metadata in disparate locations.
You can add a Link header to the file response with a link to the metadata. Link headers are described in RFC 5988. The set of currently-registered link relations is here. Off the cuff, it seems that the describedBy relation is appropriate here.
I've had success with the following kind of API design. This differs slightly from what you suggested in that the main resource just contains links to its components.
POST /file
Request
<bytes of file>
Response
Location: /file/17
{
"id": 17
}
GET /file/17
{
"data": "/file/data/17",
"metadata": "/file/metadata/17"
}
GET /file/data/17
<bytes of file>
GET /file/metadata/17
{
"type": "image",
"format": "png"
}
DELETE /file/17
Your first Option is not at all a good choice because it violates following REST constraint.
Manipulation of resources through these representations under Uniform interface Principle.
When a client holds a representation of a resource, including any
metadata attached, it has enough information to modify or delete the resource.
If you brake it. Your URL will not be consider as RESTful.
Wiki about it.