Specify input file encoding in UIMA Ruta Workbench - uima

I am experimenting with the ExampleProject available from Apache UIMA Ruta and I would like to test the rules with some files of my own. Initially, I was getting an exception, which I believe was due to UIMA not being able to detect the encoding of the input files. After converting the input files to UTF-8, I no longer get the exception. However, I am not sure whether Ruta is correctly reading the files.
How can I specify the encoding of the input files when using the UIMA Ruta Workbench?

Normally, the encoding specified in the Eclipse project is utilized for reading the input files and the descriptors. However, there is a bug (version 2.5.0) that prevents this. Unfortunately, it is currently not possible to set the utilized encoding. The encoding will always be the default encoding: UTF-8
DISCLAIMER: I am a developer of UIMA Ruta

Related

Eclipse Xtext and an independent LSP server do not (appear to) work together - should they?

I have an independently written LSP-compliant language server for a custom language and an Xtext framework for that language, as Eclipse plugins. The two work fine independently; the LSP is connected in using LSP4E.
But when I try to connect the Language Server into a project in which Xtext is providing syntax coloring and some parsing checking, it appears that the language server is never started and certainly is not providing the error messages to the Eclipse UI that it does when used by itself. I'm not asking Xtext to create a language server itself.
The goal is to use (and not reimplement) the LS for parsing and type checking and language-aware code navigation, while using Xtext for syntax coloring.
Can anyone point me to a successful use of these two technologies together? or know that they cannot (yet?) be?
Edit: To the comment about checking whether the LS is working. As far as I can tell, the LS is not even started, though it started fine when used alone. Somehow putting Xtexxt into the mix has usurped the connection to the LS or changed it in a way that the launching and use no longer happens.
The LSP4J version installed is (only) Eclipse LSP4J org.eclipse.lsp4j 0.15.0.v20220805-0131 org.eclipse.lsp4j
Xtext's components vary, but basically Eclipse Xtext Xtext 2.28.0.v20220829-0438 org.eclipse.xtext

Eclipse Plugin Development: German Umlauts stored as UTF8 showing up wrong in SWT controls but correct in String constant

I am writing an Eclipse Plugin. All my source files are encoded as UTF8 (Alt+Enter shows "Text file encoding: UTF8").
When I run my plugin using "right click -> Run as eclipse application" everything works fine.
However, when I install my plugin using an update site in another (but identical, i.e., copied) Eclipse application, German Umlauts (Ä Ö Ü...) get messed up in all SWT-controls, but not in String constants.
Example:
public class MyWizard extends Wizard{
public NewEntityWizard() {
super();
setWindowTitle("This will NOT work: Ä");
}
public void foo() {
String contents = "This WILL work: Ä";
ByteArrayInputStream stream = new ByteArrayInputStream(contents.getBytes());
// write stream to file test.txt
}
The window title will show up as: "This will NOT work: ä"
When opening the file test.txt in Eclipse, with UTF8 encoding, it will contain the correct test: "This WILL work: Ä"
Both will work when run by using Run as, i.e., when not installing the plugin.
How do I resolve this?
I figure, the compiled plugin / bin files might (correctly) be encoded in UTF8, but read in a different encoding by the second Eclipse installation. If so: How do I tell the JVM / Eclipse to read the Plugin's bin files in UTF8?
Any help is appreciated.
Solved the problem myself. The issue was a Bug in Eclipse (I am running Oxygen, not sure if the problem exists in other versions).
The bug was, that the PDE Builder, which builds the product, does not respect the encoding settings configured in Eclipse. It uses the default encoding of the plattform, which is not UTF8. This seems to be a Windows only problem. The java builder does not suffer from this bug, that's why the problem did not occur during testing.
Bug Description:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=516656
Solution:
As workaround I added the following line to the eclipse.ini of the building Eclipse installation, build the product again and installed it again. The setting makes UTF-8 the default encoding for the whole VM.:
-Dfile.encoding=UTF-8
This solved the issue.

Building Delphi project from Visual Studio Code

I have already set up build and debug environment for Object Pascal inside Visual Studio Code via FPC and GDB, but I just made build process work for programs containing only 1 .pas file via
"command": "fpc",
"args": [ "-g", "-Px86_64", "helloWorld.pas" ],
Now, I need to build quite big Delphi project group (something like solution?) and it contains main project file .groupproj.
Is there a way to build the .groupproj via FPC somehow?
Or at least some workaround like conversion to .lpi and then build via FPC?
Or at least call Delphi compiler/builder from VS Code and build the whole project group via it? (but I don't like this option, because I prefer to not use Delphi)
To get some facts straight for other people that might stumble on this:
FPC supports Delphi source files (.lpr/.dpr, .pp/.pas and .inc). Not Delphi meta information (.dproj/.dof/.bpg/.cfg/.groupproj) which is Delphi version dependent anyway.
Lazarus conversion tool also converts .dfms. Basically it is a .dfm cleaner and Uses clause enhancer, just like some conversion tools between Delphi versions. It by default however also does substitutions that change Delphi code (that works in FPC's Delphi (-Sd) mode) into the objfpc dialect (-S2 mode) preferred by Lazarus . Always make a backup before trying, and check the configuration of the conversion tool thoroughly.
FPC and Delphi commandline parameters are different.
FPC does not support Lazarus metadata formats like .lpi. The Lazarus utility Lazbuild however does support building Lazarus projects from the commandline.
But luckily the basics are the same
a main program or library file files)
a set of unit (.pas files) and include directories (.inc files). FPC differentiates between the two, delphi doesn't.
autocreated forms must be added to the project.
any additional commandline switches like defines to set, range checking optimization options.
So in worst case, examine the Delphi projects (either in IDE or texteditor) for directories and switches and create either a manual buildscript or a lazarus (.lpi) project.
However it is vital to keep in mind that the default FPC mode is NOT Delphi mode, so always when executing FPC make sure you manually enable Delphi mode (-Sd)
Group project support within Lazarus is very new (as in months), and afaik not even in stable versions yet. Though if you create a bunch of .lpis, a batch file/shellscript with a sequence of lazbuild commands on .lpis might do it.
P.s. throw the VSCode under the bus and use Lazarus.

SDL WorldServer 10.2.1 - how to configure W3C ITS Compliant XML Studio File Type filter

We are using SDL WorldServer 10.2.1 and Tridion 2011. (Windows Server 2003/Tomcat 6.0/SQL Server 2008 R2) Recently the Tridion folks asked me to ensure that the appropriate filters (Management | Linguistic Tool Setup | Filter Configurations) were installed and configured to allow dynamic values to be used from Tridion.
So the appropriate filter (Text Studio File Type) is installed and configured for WS for what Tridion wants to do. The following inline tags are added:
{.+}
~.+?~
So for example, when a new project is created and uploaded file source is:
Hello~test~my~test~name~test~is~test~Robert
Hello^{thing.thing}World
The result in Browser Workbench is:
{1}{2}{3}{4}
{5}
While this is exactly what is desired, it is only working in WS. While Tridion is able to connect and create projects in WS, they are not able to get the same result using the same source file. It is reported that the filter "W3C ITS Compliant XML Studio File Type" should be configured. Would someone be kind enough to assist with the configuration?
Thanks
To accomplish the task of processing special fields and creating placeholders in WorldServer for a file sent from a Tridion server, the file will need to be processed by the WorldServer “W3C ITS Compliant XML Studio File Type” (W3C filter) filter. However the W3C filter expects to find its processing instructions in the Tridion file itself. So to process special fields and create placeholders, add the placeholder instructions within the Tridion file.

Buckminster headless build utf-8 encoding

We have an Jenkins CI, which creates our Eclipse RCP application. It was set up with this tutorial: http://www.ralfebert.de/blog/eclipsercp/rcp_builds/ . So far so good, we didn´t have any problems until we decided to use utf-8 encoding for our project instead of the default cp1252 encoding. So the problem is if we start the created application that the encoding is damaged. I tried everything especially with jvm -dfile.encoding="utf-8" arguments. I tried this in rcp.target, jenkins arguments and build.xml build properties.
Have someone any idea how i can fix my problem?
thanks for every response
We have the same issue as well, unfortunately unsolved so far. I think the charset must be set in time of compilation. So this may be a jvm argument instead of program argument.
Make sure you have saved the encoding information in the project. If there is no such information, the eclipse instance default encoding will be used, which in turns defaults to the platform's default encoding.
Buckminster build uses the eclipse build which honors the resource meta data.
An anti pattern is to change the default in Eclipse, and not save it in projects. The next user with a different default then risks creating a complete mess, especially if they also change the default and save.