Getting a strange error from Watson's Document Conversion service - ibm-cloud

I am trying to convert some documents into answer units with Watson's Document Conversion service, using the watson-developer-cloud Javascript library in Node.js. Certain ones (an example is at IBM internal link and is a .DOCX file) return this error:
Error: code:400 error: The supplied data appears to be in the Office
2007+ XML. You are calling the part of POI that deals with OLE2 Office
Documents. You need to call a different part of POI to process this
data (eg XSSF instead of HSSF)
If I try to convert it via the document conversion demo site, it seems to convert without error. My program downloads the file from the source, writes it to disk, and then uploads it to the Document Conversion service via the above mentioned library.
Is there any way around this error? Consider that this conversion is part of a massive automated conversion of thousands of documents, so manual handling for these outliers is out of the question.

The service attempts to autodetect the media type of the uploaded file using the first few bytes of the file, and the file name.
If the file name is unavailable (i.e., not passed in by your user), you could provide the media type of the file you are uploading in the file portion of the convert call:
file: {
value: fs.createReadStream('filename'),
options: {
contentType: 'application/vnd.openxmlformats officedocument.wordprocessingml.document'
}
}

Related

Creating an attachment in SharePoint from Microsoft Forms Response - Get File Content using path not working

I am trying to add contents and an attachment from a Form to a SharePoint list. However, the Get file content using path action in my flow is failing. The error I'm receiving says "Unauthorized" and in the file content box, I receive the following message:
"status": 401,
"message": "A potentially dangerous Request.Path value was detected from the client (?).",
"source": "apidod.connectorp.svc.ms"
The file path is as follows (minus the front of the path):
sites/HSMWINGATLANTIC_Supply_Requests/Shared%20Documents/Forms/AllItems.aspx?newTargetListUrl=%2Fsites%2FHSMWINGATLANTIC%5FSupply%5FRequests%2FShared%20Documents&viewpath=%2Fsites%2FHSMWINGATLANTIC%5FSupply%5FRequests%2FShared%20Documents%2FForms%2FAllItems%2Easpx&id=%2Fsites%2FHSMWINGATLANTIC%5FSupply%5FRequests%2FShared%20Documents%2FApps%2FMicrosoft%20Forms%20Fairfax%2FVehicle%20Rental%20Request%2FSupporting%20Documents&viewid=55590b8b%2D4994%2D4e8b%2D804b%2D24f4774c21e920220815 - HSM-40 Truck Request for 15 AUG 20_Charles Power 1.pdf
c.d.power
For that Get File content using path you would need a relative path without the site url part. You can actually extract the correct path with an expression.
In the example below I retrieve the link property from the Attachment question answer value. I use a json function to turn it into an array, since Microsoft returns a string value for some reason ;)
After that I use nthindexof to determine at which forward slash (starting position of string) I need to slice with a slice function, in this case the 7th instance, which is index 6.
This should retrieve the part which we need for a get file content using path action. With a decodeUriComponent function I make sure the %20 is turned back into space characters.
Make sure you update the question id to your question id.
decodeUriComponent(slice(json(outputs('Get_response_details')?['body/re67e0cfcd95d488593347d93f2728204'])[0]['link'], nthindexof(json(outputs('Get_response_details')?['body/re67e0cfcd95d488593347d93f2728204'])[0]['link'], '/', 6)))
I found the solution to the issue. This wasn’t working because it is a group form and form responses are sent to the group’s SharePoint site; not the user’s OneDrive. Therefore, the Get file content action should be using the SharePoint connector instead of OneDrive.

How do you get a bytes object from Google Cloud Storage Bucket

My question at Github
https://github.com/googleapis/python-speech/issues/52
has been active for 9 days and the only two people to have attempted an answer have both failed but now I think it might be possible for someone to answer it who understands how Google Cloud Buckets work even though they do not understand how Google's Speech Api works. In order to convert long audio files to text they first must be uploaded to the Cloud. I was using some syntax that now appears to be broken and the following syntax might work except that Google does not explain how to use this code in coordination with files uploaded to the Cloud. So in the code below published here:
https://cloud.google.com/speech-to-text/docs/async-recognize#speech_transcribe_async-python
The content object has to be located on the cloud and it needs to be a bytes object. Suppose the address of the object is: gs://audio_files/cool_audio
What syntax would I use such that the content object refers to a bytes object?
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=90)
My previous answer didn't really address your question. Let me try again:
Please try this:
audio = types.RecognitionAudio(content=bytes(content, 'utf-8'))
GCS stores objects as a sequence of bytes. If your object has a Content-Encoding header that can cause the content to be transformed while downloading (e.g., gzip content will be uncompressed if the client doesn't supply an Accept-Encoding: gzip header); and if it has a Content-Type header the client application or library may treat the information differently.

NetSuite RESTlet output pdf

NetSuite Restlet PDF file encoding issue
The above thread seems to be giving a solution to outputing a pdf with a NetSuite RESTlet. As far as I know, you cannot output a pdf from a restlet, so I'm very confused. I am using a restlet to generate a report and the information ultimately needs to output to a pdf so I was trying to see if there was a work around. I tried the answer code from the above thread and I got the expected error:"error code: INVALID_RETURN_DATA_FORMAT error message:Invalid data format. You should return TEXT."
Am I missing something? Is there a way to export xml to a pdf with a NetSuite RESTlet?
The thread you reference discusses how to generate a PDF file in Netsuite. If you want to return a PDF from a RESTLet you will have to return it as a member of a JSON object. e.g.:
var pdfFile = genPDF(); // base this on the sample
return{
fileName: pdfFile.getName(),
fileContent: nlapiEncrypt(pdfFile.getValue(), 'base64')
};
And then your receiver will have to create the actual file.
Recall that RESTLets are for application-to-system communications. If you are trying to return a PDF to a browser you should probably be using a Suitelet.
If this is part of a larger app and you need the RESTLet then review this post: Save base64 string as PDF at client side with JavaScript for options to display the RESTLet response.
Reading through that answer, it appears you'll need to encode/convert the PDF to string format before returning, so you'll need to use base64 encoding.
The NS method nlapiEncrypt(content, 'base64') seems like it might be a good place to start.
Another avenue to investigate, which I haven't tried, is to first save the PDF in the file cabinet, then to return a public link to that file. You'll need to make sure the file has the correct permissions.

Can I fake uploaded image filesize?

I'm building a simple image file upload form. Programmatically, I'm using the Laravel 5 framework. Through the Input facade (through Illuminate), I can resolve the file object, which in itself is an UploadedFile (through Symfony).
The UploadedFile's API ref page (Symfony docs) says that
public integer | null getClientSize()
Returns the file size. It is extracted from the request from which the
file has been uploaded. It should not be considered as a safe
value. Return Value integer|null The file size
What will be these cases where the uploaded filesize is wrongly reported?
Are there known exploits using this?
How can the admin ensure this is detected (and hence logged as a trespass attempt)?
That method is using the "Content-Length" header, which can easily be forged. You'll want to use the easy construct $_FILES['myfile']['size']. As an answer to another question has already stated: Can $_FILES[...]['size'] be forged?
This value checks the actual size of the file, and is not modified by the provided headers.
If you'd like to check for people misbehaving, you can simply compare the content-length header to your $_FILES['myfile']['size'] value.

Talend ESB : How to validate XML request against a XSD file

I am using Talend Open Studio 5.4.
I have created a service, which on finish generates all schema required for the same service. I have assigned a new job for the service and trying validate input XML request against the request XSD file.
I followed this link, it worked fine, but when I tried to validate an input XML request, as tESBProviderRequst will receive, it did not work.
How to do it?
I'm not a Talend expert, but the way i do this is by transforming first my payload (which type is a "document") into a string, so that the tXSDValidator could work.
The second conversion is just here to allow me to build a custom response with my input fields.
But in any cases, i think you have to pass a document object to your tESBProviderResponse.
I hope it helps.