How do you get a bytes object from Google Cloud Storage Bucket - google-cloud-storage

My question at Github
https://github.com/googleapis/python-speech/issues/52
has been active for 9 days and the only two people to have attempted an answer have both failed but now I think it might be possible for someone to answer it who understands how Google Cloud Buckets work even though they do not understand how Google's Speech Api works. In order to convert long audio files to text they first must be uploaded to the Cloud. I was using some syntax that now appears to be broken and the following syntax might work except that Google does not explain how to use this code in coordination with files uploaded to the Cloud. So in the code below published here:
https://cloud.google.com/speech-to-text/docs/async-recognize#speech_transcribe_async-python
The content object has to be located on the cloud and it needs to be a bytes object. Suppose the address of the object is: gs://audio_files/cool_audio
What syntax would I use such that the content object refers to a bytes object?
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=90)

My previous answer didn't really address your question. Let me try again:
Please try this:
audio = types.RecognitionAudio(content=bytes(content, 'utf-8'))

GCS stores objects as a sequence of bytes. If your object has a Content-Encoding header that can cause the content to be transformed while downloading (e.g., gzip content will be uncompressed if the client doesn't supply an Accept-Encoding: gzip header); and if it has a Content-Type header the client application or library may treat the information differently.

Related

Alamofire Chunked Upload - How API knows when its the last chunk?

I have a Golang web server that I have written to handle large file uploads 30GB or more. In a proof of concept using Dropzone.js I can upload files of any size with no issue as long as they are chunked.
The way DropzoneJS.js implemented this is that each chunk has items added the to the header like:
dzchunkindex: 435
dzchunksize: 10000
dztotalchunkcount: 3498274
So I receive a chunk, I create the file (if needed), write the data, and check to see if I'm on the last chunk. Then repeat as needed. Once I see I've written the last chunk I close the file.
It seems like Alamofire supports chunked uploads using its AF.Upload method.
However, how should my server know when the last chunk has been uploaded? I can certainly check this a different way. Just curious what that way should be? Ive combed over the Alamofire docs and can't find much.
I can chunk the file manually and upload it but id rather use Alamofire if possible.
Thanks,
Ed

Play! Framework 2.6! Gzip Filter if the response size is greater than 50bytes

I am currently working with Play! Framework 2.6. I am looking into gzipping my response if they are greater than 80bytes. However, With the Framework there is no way to perform this. Based on this Documentation I can make use of the ff code snippet
new GzipFilter(shouldGzip = (request, response) =>
response.body.contentType.exists(_.startsWith("text/html")))
However it did not specify on where would I create this. Any idea how I can specify if it should a gzip a certain response if its greater than 50bytes?
By default, the response bodies are streamed which means you do not know how big the size of the response body will be.
If you know the size of the response body already (e.g. you're serving a file from Amazon S3 already know the file size) You can set the Content-Length header and check it in GzipFilter.
You will also likely need to implement your own GzipFilter and adapt it so it checks the Content-Length.

Getting a strange error from Watson's Document Conversion service

I am trying to convert some documents into answer units with Watson's Document Conversion service, using the watson-developer-cloud Javascript library in Node.js. Certain ones (an example is at IBM internal link and is a .DOCX file) return this error:
Error: code:400 error: The supplied data appears to be in the Office
2007+ XML. You are calling the part of POI that deals with OLE2 Office
Documents. You need to call a different part of POI to process this
data (eg XSSF instead of HSSF)
If I try to convert it via the document conversion demo site, it seems to convert without error. My program downloads the file from the source, writes it to disk, and then uploads it to the Document Conversion service via the above mentioned library.
Is there any way around this error? Consider that this conversion is part of a massive automated conversion of thousands of documents, so manual handling for these outliers is out of the question.
The service attempts to autodetect the media type of the uploaded file using the first few bytes of the file, and the file name.
If the file name is unavailable (i.e., not passed in by your user), you could provide the media type of the file you are uploading in the file portion of the convert call:
file: {
value: fs.createReadStream('filename'),
options: {
contentType: 'application/vnd.openxmlformats officedocument.wordprocessingml.document'
}
}

Issue with decoding base64 encoded app engine data in swift

I am developing ios app which is getting data from Google endpoint ,the data is base 64 encoded on the server to a custom java object, which is then returned by the endpoint method.
On the iOS side I am able to receive the data and print the data using the generated client code.
I am facing a problem and I am unable to decode the data back in to the GTL**** endpoint auto generated class.
The decoded data shows up with some hex numbers:
My Code:
let respo2 = GTLDecodeBase64(responce) as? GTLEndpointStatusCollection
I also tried decoding using the swift classes:
let respo = NSData(base64EncodedString: responce, options: NSDataBase64DecodingOptions(rawValue: 0))
The input is base64 encoded : rO0ABXNyABNqYXZhLnV0aWwuQXJyYXlMaXN0eIHSHZnHYZ......
The desired output should have been readable data,
but instead im getting:
<aced0005 73720013 6a617661 2e757469 6c2e4172 7261794c.....
I even tried encoding, decoding the base64 decoded data with NSUTF8
but no use.
What am I doing wrong? Is it possible for data encoded on Server in Java (with custom Java objects) to be decoded back ? (I understand Google endpoint does the serialization/deserialization in between)
Thanks in advance.
You should use JSON for serialization rather than manually converting the object to a bytestring and base64 encoding it. If you are using the Endpoints libraries this is automatically done for you, simply by returning the object in your method. See the docs here for an example and the rest of the Endpoints docs for more details. To consume the API you can use the generated iOS libraries which also do this for you as per the examples here. You won't actually see any JSON unless you inspect the HTTP traffic or use the API Explorer.
It sounds like you might just be doing more work than is needed by pre-encoding the object, rather than just letting Endpoints do it for you. If you really need to manually serialize an object to some property you can use a library on the Endpoints side like Jackson to serialize the object to a string property and NSJSONSerialization on the client to convert it back to an object.

Can I fake uploaded image filesize?

I'm building a simple image file upload form. Programmatically, I'm using the Laravel 5 framework. Through the Input facade (through Illuminate), I can resolve the file object, which in itself is an UploadedFile (through Symfony).
The UploadedFile's API ref page (Symfony docs) says that
public integer | null getClientSize()
Returns the file size. It is extracted from the request from which the
file has been uploaded. It should not be considered as a safe
value. Return Value integer|null The file size
What will be these cases where the uploaded filesize is wrongly reported?
Are there known exploits using this?
How can the admin ensure this is detected (and hence logged as a trespass attempt)?
That method is using the "Content-Length" header, which can easily be forged. You'll want to use the easy construct $_FILES['myfile']['size']. As an answer to another question has already stated: Can $_FILES[...]['size'] be forged?
This value checks the actual size of the file, and is not modified by the provided headers.
If you'd like to check for people misbehaving, you can simply compare the content-length header to your $_FILES['myfile']['size'] value.