How do I decode the base64compressed item in a GSA feed - feed

I have the contents of what a feed is sending to the search appliance for indexing, but one XML node is base64compressed. Looking at the GSA docs' custom feed are to be constructed by compressing (zlib) and then encoding them. I tried to reverse the process by decoding and then using 7zip to open it but it did not work.
Rationale: I am looking at this is as GSA is EOL, we are moving to Solr but will continue to use some GSA Connectors for the time being (they are open source). I need to look at the text contents of what gets indexed to the search appliance so I can construct a proper Solr schema.
My experience with GSA is very minimal so I may be thinking about this all wrong, would appreciate any suggestions on how to tackle this.
Thanks!

This code will decode then uncompress the base64compressed item in a GSA feed.
private byte[] decodeUncompress(byte[] data) throws IOException {
// Decode
byte[] decodedBytes = Base64.getDecoder().decode(data);
// Uncompress
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Inflater decompresser = new Inflater(false);
InflaterOutputStream inflaterOutputStream = new InflaterOutputStream(stream, decompresser);
try {
inflaterOutputStream.write(decodedBytes);
} catch (IOException e) {
throw e;
} finally {
try {
inflaterOutputStream.close();
} catch (IOException e) {
}
}
return stream.toByteArray();
}

Related

CAS to XMI -Uima

When I try to convert cas to xmi, I'm receiving UIMARuntimeException due to &#55349" (an invalid XML character). Thanks in advance.
Exception:
Caused by: org.xml.sax.SAXParseException; lineNumber: 190920; columnNumber: 36557; Character reference "&#55349" is an invalid XML character.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.uima.util.XmlCasDeserializer.deserializeR(XmlCasDeserializer.java:111)
at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:366)
Code:
private static void serialize(CAS cas, File file) throws SAXException, IOException {
Watch casToXmi = new Watch(Path.getFileName() + "Cas to Xmi Convertion - "+file.getName());
casToXmi.start();
OutputStream outputStream = null;
try {
outputStream = new BufferedOutputStream(new FileOutputStream(file));
XmiCasSerializer xmiSerializer = new XmiCasSerializer(cas.getTypeSystem());
XMLSerializer xmlSerializer = new XMLSerializer(outputStream, true);
xmiSerializer.serialize(cas,xmlSerializer.getContentHandler());
} catch (FileNotFoundException fnfe) {
throw new FileNotFoundException(fnfe.getMessage());
} catch (SAXException saxe) {
throw new SAXException(saxe.getMessage());
} finally {
try {
outputStream.close();
} catch (IOException ioe) {
throw new IOException(ioe.getMessage());
}
}
casToXmi.stop();
}
Per default, the XMI is serialized as XML 1.0. XML 1.0 has a restricted range of characters that it can represent.
But UIMA has the CasIOUtils which make it really easy to write our data out:
out = new FileOutputStream(this.outputFile);
CasIOUtils.save(cas, out, SerialFormat.XMI_1_1);
Alternatively, you can configure the serializer in your code to produce XML 1.1 instead which might resolve your issue:
XMLSerializer sax2xml = new XMLSerializer(docOS, prettyPrint);
sax2xml.setOutputProperty(OutputKeys.VERSION, "1.1");
These lines were taken from the XmiWriter of DKPro Core.
Note: I see your code includes a Watch. If speed is your concern, then there are other supported formats which save/load considerably faster than XMI, e.g. the binary format SerialFormat.COMPRESSED_FILTERED_TSI. Unlike XMI This format also supports any characters in the text.
Disclaimer: I am part of the Apache UIMA project and the maintainer of DKPro Core.
I used SerialFormat.BINARY which will give plain custom binary serialized CAS without type system, no filtering.
private static void serialize(CAS cas, File file) throws SAXException, IOException {
Watch casToXmi = new Watch(Path.getFileName() + "Cas to Xmi Convertion - "+file.getName());
casToXmi.start();
OutputStream outputStream = null;
try {
outputStream = new FileOutputStream(file);
CasIOUtils.save(cas, outputStream, SerialFormat.BINARY);
} catch (FileNotFoundException fnfe) {
throw new FileNotFoundException(fnfe.getMessage());
} finally {
try {
outputStream.close();
} catch (IOException ioe) {
throw new IOException(ioe.getMessage());
}
}
casToXmi.stop();
}

MSF4J POST method receiving partial data

I'm new to MSF4J and I need to write a REST API that accepts a large XML data through POST. I am using
request.getMessegeBody()
method to get the data. I discovered that it's now deprecated but I couldn't find the newer version of it so I decided to use it anyway.
The problem is, when I send data to the microservice for the first time, it doesn't get the whole data. All the subsequent requests will get the full message body except the first.
When I try passing the request through ESB, ESB receives the whole body but when it reaches the endpoint it will be truncated.
I have also tried sending requests from different rest clients but for the first time it always gets the incomplete message body
#POST
#Consumes({ "application/xml", "application/json", "text/xml" })
#Path("test/")
public Response getReqNotification(#Context Request request) throws Exception {
Response.ResponseBuilder respBuilder =
Response.status(Response.Status.OK).entity(request);
ByteBuf b = request.getMessageBody();
byte[] bb = new byte[b.readableBytes()];
b.duplicate().readBytes(bb);
System.out.println(new String(bb));
return respBuilder.build();
}
I expect it to print the full message(which is about 2000 bytes long) every time when I send a request, but I'm only getting around 800 bytes when I first run the microservice.
I hope ill get assistance here. I have tried elsewhere but wso2 doesn't have much documentation (⌣_⌣”)
I still don't really understand what I was doing wrong but with the help of this link I have managed to come up with the following code and it works fine.
The major cha is that I now use request.getMessageContentStream() instead of the depricated request.getMessageBody()
#Consumes({ "application/xml", "application/json", "text/xml" })
#Path("test/")
public Response getReqNotification(#Context Request request) throws Exception {
Response.ResponseBuilder respBuilder =
Response.status(Response.Status.OK).entity(request);
String data = "";
BufferedInputStream bis = new BufferedInputStream(request.getMessageContentStream());
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
int d;
while ((d = bis.read()) != -1) {
bos.write(d);
}
data = bos.toString();
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
bos.close();
} catch (IOException e) {
}
}
System.out.println(data);
//////do stuff
return respBuilder.build();
}

Read large file using vertx

I am new to using vertx and I am using vertx filesystem api to read file of large size.
vertx.fileSystem().readFile("target/classes/readme.txt", result -> {
if (result.succeeded()) {
System.out.println(result.result());
} else {
System.err.println("Oh oh ..." + result.cause());
}
});
But the RAM is all consumed while reading and the resource is not even flushed after use. The vertx filesystem api also suggest
Do not use this method to read very large files or you risk running out of available RAM.
Is there any alternative to this?
To read large file you should open an AsyncFile:
OpenOptions options = new OpenOptions();
fileSystem.open("myfile.txt", options, res -> {
if (res.succeeded()) {
AsyncFile file = res.result();
} else {
// Something went wrong!
}
});
Then an AsyncFile is a ReadStream so you can use it together with a Pump to copy the bits to a WriteStream:
Pump.pump(file, output).start();
file.endHandler((r) -> {
System.out.println("Copy done");
});
There are different kind of WriteStream, like AsyncFile, net sockets, HTTP server responses, ...etc.
To read/process a large file in chunks you need to use the open() method which will return an AsyncFile on success. On this AsyncFile you setReadBufferSize() (or not, the default is 8192), and attach a handler() which will be passed a Buffer of at most the size of the read buffer you just set.
In the example below I have also attached an endHandler() to print a final newline to stay in line with the sample code you provided in the question:
vertx.fileSystem().open("target/classes/readme.txt", new OpenOptions().setWrite(false).setCreate(false), result -> {
if (result.succeeded()) {
result.result().setReadBufferSize(READ_BUFFER_SIZE).handler(data -> System.out.print(data.toString()))
.endHandler(v -> System.out.println());
} else {
System.err.println("Oh oh ..." + result.cause());
}
});
You need to define READ_BUFFER_SIZE somewhere of course.
The reason for that is that internally .readFile calls to Files.readAllBytes.
What you should do instead is create a stream out of your file, and pass it to Vertx handler:
try (InputStream steam = new FileInputStream("target/classes/readme.txt")) {
// Your handling here
}

Compact Framework - Upload file via REST

I am looking for the best way to transfer files from the compact framework to a server via REST. I have a web service I created using .net Web API. I've looked at several SO questions and other sites that dealt with sending files, but none of them seem to work the for what I need.
I am trying to send media files from WM 6 and 6.5 devices to my REST service. While most of the files are less than 300k, an odd few may be 2-10 or so megabytes. Does anyone have some snippets I could use to make this work?
Thanks!
I think this is the minimum for sending a file:
using (var fileStream = File.Open(#"\file.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.destination.com/path");
request.Method = "POST"; // or PUT, depending on what the server expects
request.ContentLength = fileStream.Length; // see the note below
using (var requestStream = request.GetRequestStream())
{
int bytes;
byte[] buffer = new byte[1024]; // any reasonable buffer size will do
while ((bytes = fileStream.Read(buffer, 0, buffer.Length)) > 0)
{
requestStream.Write(buffer, 0, bytes);
}
}
try
{
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
}
}
catch (WebException ex)
{
// failure
}
}
Note: HTTP needs a way to know when you're "done" sending data. There are three ways to achieve this:
Set request.ContentLength as used in the example, because we know the size of the file before sending anything
Set request.SendChunked, to send chunks of data including their individual size
You could also set request.AllowWriteStreamBuffering to write to an in-memory buffer, but I wouldn't recommend wasting that much memory on the compact framework.

Error in uploading a file using Jersey rest service

I am using jersey for building rest service which will upload a file. But I am facing problem in writing a file to required location. Java throws a system cannot find specified path error. Here is my Web service :
#POST
#Path("/fileupload")
#Consumes(MediaType.MULTIPART_FORM_DATA)
public Response uploadFile(#FormDataParam("file")InputStream fileUploadStream, #FormDataParam("file")FormDataContentDisposition fileDetails) throws IOException{
StringBuilder uploadFileLocation= new StringBuilder();
uploadFileLocation.append("c:/logparser/webfrontend/uploads");
uploadFileLocation.append("/"+dateFormat.format(Calendar.getInstance().getTime()));
uploadFileLocation.append("/"+fileDetails.getFileName());
writeToFile(fileUploadStream, uploadFileLocation.toString());
return Response.status(200).entity("File saved to " + uploadFileLocation).build();
}
private void writeToFile(InputStream uploadInputStream, String uploadFileLocation)
{
log.debug("UploadService , writeToFile method , start ()");
try{
int read = 0;
byte[] bytes = new byte[uploadInputStream.available()];
log.info("UploadService, writeToFile method , copying uploaded files.");
OutputStream out = new FileOutputStream(new File(uploadFileLocation));
while ((read = uploadInputStream.read(bytes)) != -1)
{
out.write(bytes, 0, read);
}
out.flush();
out.close();
}
catch(Exception e)
{
log.error("UploadService, writeToFile method, error in writing to file "+e.getMessage());
}
}
From looking at just the code (it's usually helpful to include the exception and stack trace), you're trying to write to a directory based on a timestamp which doesn't exist yet. Try adding a call to File.mkdir/mkdirs. See this question/answer: FileNotFoundException (The system cannot find the path specified)
Side note - Unless you have a reason not to, I'd consider using something like Apache commons-io(FileUtils.copyInputStreamToFile) to do the writing.