CAS to XMI -Uima - uima

When I try to convert cas to xmi, I'm receiving UIMARuntimeException due to &#55349" (an invalid XML character). Thanks in advance.
Exception:
Caused by: org.xml.sax.SAXParseException; lineNumber: 190920; columnNumber: 36557; Character reference "&#55349" is an invalid XML character.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.uima.util.XmlCasDeserializer.deserializeR(XmlCasDeserializer.java:111)
at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:366)
Code:
private static void serialize(CAS cas, File file) throws SAXException, IOException {
Watch casToXmi = new Watch(Path.getFileName() + "Cas to Xmi Convertion - "+file.getName());
casToXmi.start();
OutputStream outputStream = null;
try {
outputStream = new BufferedOutputStream(new FileOutputStream(file));
XmiCasSerializer xmiSerializer = new XmiCasSerializer(cas.getTypeSystem());
XMLSerializer xmlSerializer = new XMLSerializer(outputStream, true);
xmiSerializer.serialize(cas,xmlSerializer.getContentHandler());
} catch (FileNotFoundException fnfe) {
throw new FileNotFoundException(fnfe.getMessage());
} catch (SAXException saxe) {
throw new SAXException(saxe.getMessage());
} finally {
try {
outputStream.close();
} catch (IOException ioe) {
throw new IOException(ioe.getMessage());
}
}
casToXmi.stop();
}

Per default, the XMI is serialized as XML 1.0. XML 1.0 has a restricted range of characters that it can represent.
But UIMA has the CasIOUtils which make it really easy to write our data out:
out = new FileOutputStream(this.outputFile);
CasIOUtils.save(cas, out, SerialFormat.XMI_1_1);
Alternatively, you can configure the serializer in your code to produce XML 1.1 instead which might resolve your issue:
XMLSerializer sax2xml = new XMLSerializer(docOS, prettyPrint);
sax2xml.setOutputProperty(OutputKeys.VERSION, "1.1");
These lines were taken from the XmiWriter of DKPro Core.
Note: I see your code includes a Watch. If speed is your concern, then there are other supported formats which save/load considerably faster than XMI, e.g. the binary format SerialFormat.COMPRESSED_FILTERED_TSI. Unlike XMI This format also supports any characters in the text.
Disclaimer: I am part of the Apache UIMA project and the maintainer of DKPro Core.

I used SerialFormat.BINARY which will give plain custom binary serialized CAS without type system, no filtering.
private static void serialize(CAS cas, File file) throws SAXException, IOException {
Watch casToXmi = new Watch(Path.getFileName() + "Cas to Xmi Convertion - "+file.getName());
casToXmi.start();
OutputStream outputStream = null;
try {
outputStream = new FileOutputStream(file);
CasIOUtils.save(cas, outputStream, SerialFormat.BINARY);
} catch (FileNotFoundException fnfe) {
throw new FileNotFoundException(fnfe.getMessage());
} finally {
try {
outputStream.close();
} catch (IOException ioe) {
throw new IOException(ioe.getMessage());
}
}
casToXmi.stop();
}

Related

How do I decode the base64compressed item in a GSA feed

I have the contents of what a feed is sending to the search appliance for indexing, but one XML node is base64compressed. Looking at the GSA docs' custom feed are to be constructed by compressing (zlib) and then encoding them. I tried to reverse the process by decoding and then using 7zip to open it but it did not work.
Rationale: I am looking at this is as GSA is EOL, we are moving to Solr but will continue to use some GSA Connectors for the time being (they are open source). I need to look at the text contents of what gets indexed to the search appliance so I can construct a proper Solr schema.
My experience with GSA is very minimal so I may be thinking about this all wrong, would appreciate any suggestions on how to tackle this.
Thanks!
This code will decode then uncompress the base64compressed item in a GSA feed.
private byte[] decodeUncompress(byte[] data) throws IOException {
// Decode
byte[] decodedBytes = Base64.getDecoder().decode(data);
// Uncompress
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Inflater decompresser = new Inflater(false);
InflaterOutputStream inflaterOutputStream = new InflaterOutputStream(stream, decompresser);
try {
inflaterOutputStream.write(decodedBytes);
} catch (IOException e) {
throw e;
} finally {
try {
inflaterOutputStream.close();
} catch (IOException e) {
}
}
return stream.toByteArray();
}

Apache FOP: upgrading from 1.1 to 2.1

I am following the migration guide, but I don't seem to get it right.
In FOP 1.1 I have this working code:
public class XsltFactory {
private static final String FO_CONFIG_FILE = "/path/to/fop-config.xml";
private static FopFactory fopFactory;
private static synchronized void initFopFactory(final ServletContext context) throws Exception {
Configuration cfg = new DefaultConfigurationBuilder().build(XsltFactory.class.getResourceAsStream(FO_CONFIG_FILE));
fopFactory = FopFactory.newInstance();
fopFactory.setURIResolver(new ServletContextURIResolver(context));
fopFactory.setUserConfig(cfg);
}
}
I adapted the above code to stick with FOP 2.1:
public class XsltFactory {
private static final String FO_CONFIG_FILE = "/path/to/fop-config.xml";
private static FopFactory fopFactory;
private static synchronized void initFopFactory(final ServletContext context) throws Exception {
Configuration cfg = new DefaultConfigurationBuilder().build(XsltFactory.class.getResourceAsStream(FO_CONFIG_FILE));
FopFactoryBuilder fopFactoryBuilder = new FopFactoryBuilder(
new URI(ServletContextURIResolver.SERVLET_CONTEXT_PROTOCOL),
new URIResolverAdapter(new ServletContextURIResolver(context))
);
fopFactoryBuilder.setConfiguration(cfg);
fopFactory = fopFactoryBuilder.build();
}
}
But I get the following error:
java.lang.Exception: Fail to create PDF
at ....web.controller.PrintPdfController.renderPdf(PrintPdfController.java:181)
[...]
at weblogic.work.ExecuteThread.run(ExecuteThread.java:263)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 16: servlet-context:
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.failExpecting(URI.java:2835)
at java.net.URI$Parser.parse(URI.java:3038)
at java.net.URI.<init>(URI.java:595)
[...]
... 42 common frames omitted
The PDF fails to load, since it failed at being created.
EDIT:
After adding + "///" after SERVLET_CONTEXT_PROTOCOL the context, I now get:
Caused by: java.net.MalformedURLException: unknown protocol: servlet-context
at java.net.URL.<init>(URL.java:592)
at java.net.URL.<init>(URL.java:482)
at java.net.URL.<init>(URL.java:431)
at java.net.URI.toURL(URI.java:1096)
at org.apache.fop.fonts.FontDetectorFactory$DefaultFontDetector.detect(FontDetectorFactory.java:94)
... 59 common frames omitted
After a few days of investigation, the migration has finally been done successfully. The problem was coming from the URI resolver, and fixing this problem created new problems, which I solved subsequently.
The guide at https://xmlgraphics.apache.org/fop/2.1/upgrading.html is of relatively limited help.
The core of the problem is the URI resolver. You now have to define a custom resolver, but NOT as in the example provided at:
https://xmlgraphics.apache.org/fop/2.0/servlets.html
ResourceResolver resolver = new ResourceResolver() {
public OutputStream getOutputStream(URI uri) throws IOException {
URL url = getServletContext().getResource(uri.toASCIIString());
return url.openConnection().getOutputStream();
}
public Resource getResource(URI uri) throws IOException {
return new Resource(getServletContext().getResourceAsStream(uri.toASCIIString()));
}
};
The right way of doing it is:
ResourceResolver resolver = new ResourceResolver() {
public OutputStream getOutputStream(URI uri) throws IOException {
URL url = context.getResource(uri.getPath());
return url.openConnection().getOutputStream();
}
public Resource getResource(URI uri) throws IOException {
return new Resource(context.getResourceAsStream(uri.getPath()));
}
};
Instead of uri.toASCIIString(), the correct syntax is uri.getPath().
In addition, we had to remove all "servlet-context:" markup in fonts URIs (in fop-config.xml) and images URIs (in any XSL transformation file or template).
Finally, I got an issue with hyphenation: FOP could not find .hyp files anymore, because for some reason, the baseUri was being used instead of the custom context resolver (I had to dig into FOP's source files to find out). So, I had to modify the getResource method of my custom resolver. I know this is a hack, but it works and it is sufficient for me as I already spent three days on this problem):
public OutputStream getOutputStream(URI uri) throws IOException {
URL url = context.getResource(uri.getPath());
return url.openConnection().getOutputStream();
}
public Resource getResource(URI uri) throws IOException {
InputStream stream = null;
/*
* For some reason, in FOP 2.x, the hyphenator does not use the
* classpath fop-hyph.jar.
*
* This causes trouble as FOP tries to find "none.hyp" in the
* war directory. Setting
* <hyphenation-base>/WEB-INF/hyph</hyphenation-base> in the
* fop-config.xml file does not solve the issue. The only
* solution I could find is to programmatically detect when a
* .hyp file is trying to be loaded. When this occurs, I modify
* the path so that the resolver gets the right resource.
*
* This is a hack, but after spending three days on it, I just
* went straight to the point and got a workaround.
*/
if (uri.getPath().endsWith('.hyp')) {
String relUri = uri.getPath().substring(uri.getPath().indexOf(baseUri.getPath()) + baseUri.getPath().length());
stream = context.getResourceAsStream(FopManager.HYPH_DIR + relUri);
} else {
stream = context.getResourceAsStream(uri.getPath());
}
Resource res = new Resource(stream);
return res;
}
};
Note that I also had to create the none.hyp file manually, since it does not exist in the .hyp files provided by OFFO. I just copied en.hyp and renamed it none.hyp. This solved my last problem.
I hope this saves someone a few days of work ;)

Getting error on deploying the process definition in activiti-rest using java code

Hi all i am trying to deploy process definiton in activiti-rest by using java rest.But getting error as 'Exception in thread "main" Bad Request (400)'.I have tried a lots in google but not found any solution for that.Please help me where is the actual fault in my code.Find below my java code and errors.
My Errors
Starting the internal HTTP client
Exception in thread "main" Bad Request (400) - The request could not be understood by the server due to malformed syntax
at org.restlet.resource.ClientResource.doError(ClientResource.java:590)
at org.restlet.resource.ClientResource.handleInbound(ClientResource.java:1153)
at org.restlet.resource.ClientResource.handle(ClientResource.java:1048)
at org.restlet.resource.ClientResource.handle(ClientResource.java:1023)
at org.restlet.resource.ClientResource.post(ClientResource.java:1485)
at org.restlet.resource.ClientResource.post(ClientResource.java:1424)
at com.bizruntime.activiti.rest.Activiti_Rest_BuyEconomyOrBusinsessClassTIcket.TicketClass.createdeployment(TicketClass.java:40)
at com.bizruntime.activiti.rest.Activiti_Rest_BuyEconomyOrBusinsessClassTIcket.Ticke_Test.main(Ticke_Test.java:13)
My Java Code
/**
*Client Resource
*/
private static ClientResource getClientResource(String uri){
ClientResource resource=new ClientResource("http://localhost:8431/activiti-rest/service");
resource.setChallengeResponse(ChallengeScheme.HTTP_BASIC,kermit,kermit);
return resource;
}
/**
* Creating Deployment
*/
public static JSONObject createdeployment(){
String uri=REST_URI+"/repository/deployments";
log.debug("uri(Create Deploymnet):: "+uri);
JSONObject my_data=new JSONObject();
try {
my_data.put("name","BuyTicket.bpmn20.xml");
Representation response=getClientResource(uri).post(my_data);
JSONObject object=new JSONObject(response.getText());
if(object!=null){
log.info("Deployed Successfully.....");
return object;
}
} catch (JSONException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
cfr. http://activiti.org/userguide/index.html#_create_a_new_deployment: the body should not be a json multipart/form-data file that is a bpmn20.xml file (or a .zip in case of multiple files)

Error in uploading a file using Jersey rest service

I am using jersey for building rest service which will upload a file. But I am facing problem in writing a file to required location. Java throws a system cannot find specified path error. Here is my Web service :
#POST
#Path("/fileupload")
#Consumes(MediaType.MULTIPART_FORM_DATA)
public Response uploadFile(#FormDataParam("file")InputStream fileUploadStream, #FormDataParam("file")FormDataContentDisposition fileDetails) throws IOException{
StringBuilder uploadFileLocation= new StringBuilder();
uploadFileLocation.append("c:/logparser/webfrontend/uploads");
uploadFileLocation.append("/"+dateFormat.format(Calendar.getInstance().getTime()));
uploadFileLocation.append("/"+fileDetails.getFileName());
writeToFile(fileUploadStream, uploadFileLocation.toString());
return Response.status(200).entity("File saved to " + uploadFileLocation).build();
}
private void writeToFile(InputStream uploadInputStream, String uploadFileLocation)
{
log.debug("UploadService , writeToFile method , start ()");
try{
int read = 0;
byte[] bytes = new byte[uploadInputStream.available()];
log.info("UploadService, writeToFile method , copying uploaded files.");
OutputStream out = new FileOutputStream(new File(uploadFileLocation));
while ((read = uploadInputStream.read(bytes)) != -1)
{
out.write(bytes, 0, read);
}
out.flush();
out.close();
}
catch(Exception e)
{
log.error("UploadService, writeToFile method, error in writing to file "+e.getMessage());
}
}
From looking at just the code (it's usually helpful to include the exception and stack trace), you're trying to write to a directory based on a timestamp which doesn't exist yet. Try adding a call to File.mkdir/mkdirs. See this question/answer: FileNotFoundException (The system cannot find the path specified)
Side note - Unless you have a reason not to, I'd consider using something like Apache commons-io(FileUtils.copyInputStreamToFile) to do the writing.

opennlp with netbeans is not giving output

How to use opennlp with netbeans. I made a small program as given in apache document but it is not working. I have set path to the opennlp bin as stated in the apache document but still i m not geting an output. it is not able to find .bin and hence SentenceModel model.
package sp;
public class Sp {
public static void main(String[] args) throws FileNotFoundException {
InputStream modelIn ;
modelIn = new FileInputStream("en-token.bin");
try {
SentenceModel model = new SentenceModel(modelIn);
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
}
}
The current working directory when you run in Netbeans is the base project directory (the one with build.xml in it). Place your .bin file there, and you should be able to open the file like that.