documents4j - docx to pdf generation problem - documents4j

I use a simple java code to generate doxc to PDF. The library is used to conversation is documents4j.
The same code is run in one machine, and not run in another. The other machine always get "com.documents4j.throwables.ConversionInputException: The input file seems to be corrupt" error.
VB script occurs the problem:
if i delete the "dummy-password-to-avoid-lock" string, it works fine.
Occurs error: Set wordDocument = wordApplication.Documents.Open(inputFile, False, True, False, "dummy-password-to-avoid-lock")
Generate PDF fine: Set wordDocument = wordApplication.Documents.Open(inputFile, False, True, False)
Is there any possibility to change VBSscript from java code?
Example the converter? IConverter converter = LocalConverter.builder().setVBScript().build(); ?

This seems to be a problem introduced in documents4j version 1.1.3. Try the previous version 1.1.2. I might need to undo this change since older Word versions do not seem to accept the password workaround.

Related

There is a way to use lsp4e for calling language server methods directly?

I'm new to the lsp4e & lsp technologies and as far as I have seen the framework provides almost everything for working with eclipse. However there is a way to use this features at will? i.e I would like to use the LS to get all the functions on a file, I think this will be done with textDocument/documentSymbol but how can I get this using the lsp4e framework?
NOTE:
I checked for SymbolKind and seems it was not the one I was looking for however that input helped me finding a sample of DocumentSymbol
DocumentSymbolParams params = new DocumentSymbolParams(
new TextDocumentIdentifier(documentUri.toString()));
CompletableFuture<List<Either<SymbolInformation, DocumentSymbol>>> symbols =
languageServer.getTextDocumentService().documentSymbol(params);
I checked for SymbolKind and seems it was not the one I was looking for. However that input helped me finding a sample of DocumentSymbol
DocumentSymbolParams params = new DocumentSymbolParams(
new TextDocumentIdentifier(documentUri.toString()));
CompletableFuture<List<Either<SymbolInformation, DocumentSymbol>>> symbols =
languageServer.getTextDocumentService().documentSymbol(params);

Different date value retrieved from google sheet in Apps Script V8 and Legacy

I have this very simple 4 lines of code to retrieve a duration value in google sheet from Google Apps Script. The sheet is blank and has this only value for this test.
function myFunction() {
var my_ss = SpreadsheetApp.getActiveSpreadsheet();
var my_sh = my_ss.getSheetByName("test");
var my_value = my_sh.getRange("A1").getValue();
console.log(my_value);
}
When I run this in the new runtime V8, I got the good duration of 5 minutes (with bad GMT but I don't use this). But with the current bug on V8 where we can't use the console to develop arrays and object, I downgraded the script to Apps script Legacy, and when I execute the exact same code (I mean I don't do anything but downgrading the environment with the Run menu), then I got the good GMT but the wrong duration! I got 55 minutes and 39 seconds instead of 5 minutes!
I checked the time zone in apps script V8, apps script legacy and google sheet, it's the same every where, "GMT+01:00 Paris".
Even stranger: this test is actualy performed from a copy of an original file created on august 2019. In the original file I got the opposite behavior. I got the good value with apps script legacy, but with runtime V8 I got a different value (I got 23:14:21 instead of 00:05:00).
Does anyone else encounter this behaviors? Do you know how to run apps script legacy on new files without these problems? Thanks.
As a workaround you can use getDisplayValue() instead :
var my_value = my_sh.getRange("A1").getDisplayValue();
and then, based on this value, you can construct a date object of your preference.
So, I noticed that we had this issue only in the case we upgrade to V8 on a file that has was created before V8 was published (and vice versa). So to avoid this, if I have an old file where I want to upgrade apps scripts to V8, I copy the file first, and in the new file (so a recent one when V8 already exists), then I don't encounter the problem...

Writing string to specific dir using chaquopy 4.0.0

I am trying a proof of concept here:
Using Chaquopy 4.0.0 (I use python 2.7.15), I am trying to write a string to file in a specific folder (getFilesDir()) using Python, then reading in via Android.
To check whether the file was written, I am checking for the file's length (see code below).
I am expecting to get any length latger than 0 (to verify that the file indeed has been written to the specific location), but I keep getting 0.
Any help would be greatly appreciated!!
main.py:
import os.path
save_path = "/data/user/0/$packageName/files/"
name_of_file = raw_input("test")
completeName = os.path.join(save_path, name_of_file+".txt")
file1 = open(completeName, "w")
toFile = raw_input("testAsWell")
file1.write(toFile)
file1.close()
OnCreate:
if (! Python.isStarted()) {
Python.start(new AndroidPlatform(this));
File file = new File(getFilesDir(), "test.txt");
Log.e("TEST", String.valueOf(file.length()));
}```
It's not clear whether you've based your app on the console example, so I'll give an answer for both cases.
If you have based your app on the console example, then the code in onCreate will run before the code in main.py, and the file won't exist the first time you start the activity. It should exist the second time: if it still doesn't, try using the Android Studio file explorer to see what's in the files directory.
If you haven't based your app on the console example, then you'll need to execute main.py manually, like this:
Python.getInstance().getModule("main");
Also, without the input UI which the console example provides, you won't be able to read anything from stdin. So you'll need to do one of the following:
Base your app on the console example; or
Replace the raw_input calls with a hard-coded file name and content; or
Create a normal Android UI with a text box or something, and get input from the user that way.

Tika detects Tesseract but doesn't perform any OCR

I just installed Tika from the Github's repository and tried to OCR a PDF which contains scanned document pages.
java -cp tika-app/target/tika-app-1.17-SNAPSHOT.jar org.apache.tika.cli.TikaCLI /tmp/testing/sample_scanned.pdf
However, only metadata gets extracted (although I got confirmation beforehand that Tesseract is installed and utilized:
WARNING: Tesseract OCR is installed and will be automatically applied to image files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
(Full output)
Note: Regular PDFs (containing) plain text gets extract successfully. The problem seems to be the OCR process itself.
This has been tested on Centos as well as Ubuntu - same issue.
Do I need to make changes to config files, specify more parsers? What could cause this?
Thank you.
Turns out PDF image extraction is disabled by default. From PDFParserConfig:
Beware: some PDF documents of modest size (~4MB) can contain thousands of embedded images totaling > 2.5 GB. Also, at least as of PDFBox 1.8.5, there can be surprisingly large memory consumption and/or out of memory errors. Set to true with caution. The default is false.
A simple example to enable it that worked for me:
Parser parser = new AutoDetectParser();
ContentHandler handler = new BodyContentHandler(Integer.MAX_VALUE);
ParseContext parseContext = new ParseContext();
PDFParserConfig pdfConfig = new PDFParserConfig();
pdfConfig.setExtractInlineImages(true);
parseContext.set(PDFParserConfig.class, pdfConfig);
try (InputStream stream = ClasspathUtil.readStreamFromClasspath("test.pdf")) {
parser.parse(stream, handler, new Metadata(), parseContext);
System.out.println(handler.toString());
}

How to decode a string in Visual Basic 6

I'm trying to read .DB (Paradox 5) file within my Visual Basic 6.
Everything's OK except encoding/charset. It shows as Iieiei 75a instead of cyrillic string.
This is my ODBC Connection string:
Driver={Microsoft Paradox Driver (*.db )};DriverID=538;Fil=Paradox 4.X;DataCodePage=ANSI;BDE=2;CollatingSequence=ASCII;AutoTranslate=No;DBQ=C:\Database;DefaultDir=C:\Database
Please note that software like Borland Database Desktop shows this strings without any problems. Also everything is fine in another PC.
I set following settings via regedit, by it doesn't help:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Xbase]
"DataCodePage"="ANSI"
"BDE"=dword:00000002
I also tried to use CharToOem/Oem2Char Win API functions, it doesn't help.
Any ideas?
Okay I have solved this by changing following registry values:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Xbase]
"DataCodePage"="ANSI"
"BDE"=dword:00000002
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"1252"="1251.nls"
Last setting has solved the issue.