How do I convert text files to .arff format(weka) - cluster-analysis

Please advise me How do I convert text files to .arff format(weka)
because i wan to do data clustering for 1000 txt file.
regards

There are some converters implemented in WEKA, just find the right format or make little changes to your data (using awk, sed...).
Here is the API pages related to this topic: http://weka.sourceforge.net/doc.stable/weka/core/converters/package-summary.html
For exapmle here is how to convert from CSV to ARFF:
java weka.core.converters.CSVLoader filename.csv > filename.arff

Here is the code you can use
package text.Classification;
import java.io.*;
import weka.core.*;
public class TextDirectoryToArff {
public Instances createDataset(String directoryPath) throws Exception {
FastVector atts;
FastVector attVals;
atts = new FastVector();
atts.addElement(new Attribute("contents", (FastVector) null));
String[] s = { "class1", "class2", "class3" };
attVals = new FastVector();
for (String p : s)
attVals.addElement(p);
atts.addElement(new Attribute("class", attVals));
Instances data = new Instances("MyRelation", atts, 0);
System.out.println(data);
InputStreamReader is = null;
File dir = new File(directoryPath);
String[] files = dir.list();
for (int i = 0; i < files.length; i++) {
if (files[i].endsWith(".txt")) {
double[] newInst = new double[2];
File txt = new File(directoryPath + File.separator + files[i]);
is = new InputStreamReader(new FileInputStream(txt));
StringBuffer txtStr = new StringBuffer();
int c;
while ((c = is.read()) != -1) {
txtStr.append((char) c);
}
newInst[0] = data.attribute(0).addStringValue(txtStr.toString());
int j=i%(s.length-1);
newInst[1] = attVals.indexOf(s[j]);
data.add(new Instance(1.0, newInst));
}
}
return data;
}
public static void main(String[] args) {
TextDirectoryToArff tdta = new TextDirectoryToArff();
try {
Instances dataset = tdta.createDataset("/home/asadul/Desktop/Downloads/text_example/class5");
PrintWriter fileWriter = new PrintWriter("/home/asadul/Desktop/Downloads/text_example/abc.arff", "UTF-8");
fileWriter.println(dataset);
fileWriter.close();
} catch (Exception e) {
System.err.println(e.getMessage());
e.printStackTrace();
}
}
}

Related

Open OSM pbf results in Protobuf exception

Using OSMSharp I am having trouble to open a stream for a file (which I can provide on demand)
The error occurs in PBFReader (line 104)
using (var tmp = new LimitedStream(_stream, length))
{
header = _runtimeTypeModel.Deserialize(tmp, null, _blockHeaderType) as BlobHeader;
}
and states: "ProtoBuf.ProtoException: 'Invalid field in source data: 0'" which might mean different things as I have read in this SO question.
The file opens and is visualized with QGis so is not corrupt in my opinion.
Can it be that the contracts do not match? Is OsmSharp/core updated to the latest .proto files for OSM from here (although not sure if this is the real original source for the definition files).
And what might make more sense, can it be that the file I attached is generated for v2 of OSM PBF specification?
In the code at the line of the exception I see the following comment which makes me wonder:
// TODO: remove some of the v1 specific code.
// TODO: this means also to use the built-in capped streams.
// code borrowed from: http://stackoverflow.com/questions/4663298/protobuf-net-deserialize-open-street-maps
// I'm just being lazy and re-using something "close enough" here
// note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
// actually need the other way around (network byte order):
// length = IntLittleEndianToBigEndian((uint)length);
BlobHeader header;
// again, v2 has capped-streams built in, but I'm deliberately
// limiting myself to v1 features
So this makes me wonder if OSM Sharp is (still) up-to-date.
My sandbox code looks like this:
using OsmSharp.Streams;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using OsmSharp.Tags;
namespace OsmSharp
{
class Program
{
private const string Path = #"C:\Users\Bernoulli IT\Documents\Applications\Argaleo\Test\";
private const string FileNameAntarctica = "antarctica-latest.osm";
private const string FileNameOSPbf = "OSPbf";
private const Boolean useRegisterSource = false;
private static KeyValuePair<string, string> KeyValuePair = new KeyValuePair<string, string>("joep", "monita");
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
//string fileName = $#"{Path}\{FileNameAntarctica}.pbf";
string fileName = $#"{Path}\{FileNameOSPbf}.pbf";
string newFileName = $"{fileName.Replace(".pbf", string.Empty)}-{Guid.NewGuid().ToString().Substring(0, 4)}.pbf";
Console.WriteLine("*** Complete");
string fileNameOutput = CompleteFlow(fileName, newFileName);
Console.WriteLine("");
Console.WriteLine("*** Display");
DisplayFlow(fileNameOutput);
Console.ReadLine();
}
private static string CompleteFlow(string fileName, string newFileName)
{
// 1. Open file and convert to bytes
byte[] fileBytes = FileToBytes(fileName);
// 2. Bytes to OSM stream source (pbf)
PBFOsmStreamSource osmStreamSource;
osmStreamSource = BytesToOsmStreamSource(fileBytes);
osmStreamSource.MoveNext();
if (osmStreamSource.Current() == null)
{
osmStreamSource = FileToOsmStreamSource(fileName);
osmStreamSource.MoveNext();
if (osmStreamSource.Current() == null)
{
throw new Exception("No current in stream.");
}
}
// 3. Add custom tag
AddTag(osmStreamSource);
// 4. OSM stream source to bytes
//byte[] osmStreamSourceBytes = OsmStreamSourceToBytes(osmStreamSource);
// 5. Bytes to file
//string fileNameOutput = BytesToFile(osmStreamSourceBytes, newFileName);
OsmStreamSourceToFile(osmStreamSource, newFileName);
Console.WriteLine(newFileName);
return newFileName;
}
private static void DisplayFlow(string fileName)
{
// 1. Open file and convert to bytes
byte[] fileBytes = FileToBytes(fileName);
// 2. Bytes to OSM stream source (pbf)
BytesToOsmStreamSource(fileBytes);
}
private static byte[] FileToBytes(string fileName)
{
Console.WriteLine(fileName);
return File.ReadAllBytes(fileName);
}
private static PBFOsmStreamSource BytesToOsmStreamSource(byte[] bytes)
{
MemoryStream memoryStream = new MemoryStream(bytes);
memoryStream.Position = 0;
PBFOsmStreamSource osmStreamSource = new PBFOsmStreamSource(memoryStream);
foreach (OsmGeo element in osmStreamSource.Where(osmGeo => osmGeo.Tags.Any(tag => tag.Key.StartsWith(KeyValuePair.Key))))
{
foreach (Tag elementTag in element.Tags.Where(tag => tag.Key.StartsWith(KeyValuePair.Key)))
{
Console.WriteLine("!!!!!!!!!!!!!! Tag found while reading !!!!!!!!!!!!!!!!!!".ToUpper());
}
}
return osmStreamSource;
}
private static PBFOsmStreamSource FileToOsmStreamSource(string fileName)
{
using (FileStream fileStream = new FileInfo(fileName).OpenRead())
{
PBFOsmStreamSource osmStreamSource = new PBFOsmStreamSource(fileStream);
return osmStreamSource;
}
}
private static void AddTag(PBFOsmStreamSource osmStreamSource)
{
osmStreamSource.Reset();
OsmGeo osmGeo = null;
while (osmGeo == null)
{
osmStreamSource.MoveNext();
osmGeo = osmStreamSource.Current();
if(osmGeo?.Tags == null)
{
osmGeo = null;
}
}
osmGeo.Tags.Add("joep", "monita");
Console.WriteLine($"{osmGeo.Tags.FirstOrDefault(tag => tag.Key.StartsWith(KeyValuePair.Key)).Key} - {osmGeo.Tags.FirstOrDefault(tag => tag.Key.StartsWith(KeyValuePair.Key)).Value}");
}
private static byte[] OsmStreamSourceToBytes(PBFOsmStreamSource osmStreamSource)
{
MemoryStream memoryStream = new MemoryStream();
PBFOsmStreamTarget target = new PBFOsmStreamTarget(memoryStream, true);
osmStreamSource.Reset();
target.Initialize();
UpdateTarget(osmStreamSource, target);
target.Flush();
target.Close();
return memoryStream.ToArray();
}
private static string BytesToFile(byte[] bytes, string fileName)
{
using (FileStream fs = new FileStream(fileName, FileMode.Create, FileAccess.Write))
{
fs.Write(bytes, 0, bytes.Length);
}
return fileName;
}
private static void OsmStreamSourceToFile(PBFOsmStreamSource osmStreamSource, string fileName)
{
using (FileStream fileStream = new FileInfo(fileName).OpenWrite())
{
PBFOsmStreamTarget target = new PBFOsmStreamTarget(fileStream, true);
osmStreamSource.Reset();
target.Initialize();
UpdateTarget(osmStreamSource, target);
target.Flush();
target.Close();
}
}
private static void UpdateTarget(OsmStreamSource osmStreamSource, OsmStreamTarget osmStreamTarget)
{
if (useRegisterSource)
{
osmStreamTarget.RegisterSource(osmStreamSource, osmGeo => true);
osmStreamTarget.Pull();
}
else
{
bool isFirst = true;
foreach (OsmGeo osmGeo in osmStreamSource)
{
Tag? tag = osmGeo.Tags?.FirstOrDefault(t => t.Key == KeyValuePair.Key);
switch (osmGeo.Type)
{
case OsmGeoType.Node:
if (isFirst)
{
for (int indexer = 0; indexer < 1; indexer++)
{
(osmGeo as Node).Tags.Add(new Tag(KeyValuePair.Key + Guid.NewGuid(), KeyValuePair.Value));
}
isFirst = false;
}
osmStreamTarget.AddNode(osmGeo as Node);
break;
case OsmGeoType.Way:
osmStreamTarget.AddWay(osmGeo as Way);
break;
case OsmGeoType.Relation:
osmStreamTarget.AddRelation(osmGeo as Relation);
break;
default:
throw new ArgumentOutOfRangeException();
}
}
}
}
}
}
Already I posted this question on the GITHube page of OSMSharp as is linked here. Any help would be very appreciated.

How I can make dynamic header column and dynamic data to export to Excel/CSV/PDF using MongoDB, Spring Boot, and apache poi

I want to make export function using Spring Boot, I have data on MongoDB NoSQL, and then want to export my document on MongoDB Dynamically using Apache POI ( If any better dependency you can recommend to me).
I don't want to declare header column, entity model, etc., I want to export data dynamically as shown as in my Document database, any one can give me an example for it?
Please try these code may be help for you.
->add Gson Depandencies in porm.xml
Controller code
MasterController.java
#Autowired
MasterServiceImpl masterServiceImpl;
#GetMapping(value="/dynamicfile/{flag}/{fileType}/{fileName}")
public ResponseEntity<InputStreamResource> downloadsFiles(#PathVariable("flag") int flag,#PathVariable("fileType") String fileType,#PathVariable("fileName") String fileName) throws IOException{
List<?> objects=new ArrayList<>();
if(flag==1) {
objects=masterServiceImpl.getData();
}
ByteArrayInputStream in = masterServiceImpl.downloadsFiles(objects,fileType);
HttpHeaders headers = new HttpHeaders();
if(fileType.equals("Excel")) {
headers.add("Content-Disposition", "attachment; filename="+fileName+".xlsx");
}else if(fileType.equals("Pdf")){
headers.add("Content-Disposition", "attachment; filename="+fileName+".pdf");
}else if(fileType.equals("Csv")) {
headers.add("Content-Disposition", "attachment; filename="+fileName+".csv");
}
return ResponseEntity.ok().headers(headers).body(new InputStreamResource(in));
}
Service code :
MasterServiceImpl.java
public static List<HashMap<Object, Object>> getListOfObjectToListOfHashMap(List<?> objects) {
List<HashMap<Object,Object>> list=new ArrayList<>();
for(int i=0;i<objects.size();i++) {
HashMap<Object,Object> map=new HashMap<>();
String temp=new Gson().toJson(objects.get(i)).toString();
String temo1= temp.substring(1, temp.length()-1);
String[] temp2=temo1.split(",");
for(int j=-1;j<temp2.length;j++) {
if(j==-1) {
map.put("SrNo",i+1);
}else {
String tempKey=temp2[j].toString().split(":")[0].toString();
String tempValue=temp2[j].toString().split(":")[1].toString();
char ch=tempValue.charAt(0);
if(ch=='"') {
map.put(tempKey.substring(1, tempKey.length()-1), tempValue.substring(1, tempValue.length()-1));
}else {
map.put(tempKey.substring(1, tempKey.length()-1), tempValue);
}
}
}
list.add(map);
}
return list;
}
public static ByteArrayInputStream downloadsFiles(List<?> objects,String fileType) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
List<HashMap<Object, Object>> list = getListOfObjectToListOfHashMap(objects);
String[] COLUMNs = getColumnsNameFromListOfObject(objects);
try{
if(fileType.equals("Excel")) {
generateExcel(list, COLUMNs,out);
}else if(fileType.equals("Pdf")) {
generatePdf(list, COLUMNs,out);
}else if(fileType.equals("Csv")) {
generatePdf(list, COLUMNs,out);
}
}catch(Exception ex) {
System.out.println("Error occurred:"+ ex);
}
return new ByteArrayInputStream(out.toByteArray());
}
public static final String[] getColumnsNameFromListOfObject(List<?> objects) {
String strObjects=new Gson().toJson(objects.get(0)).toString();
String[] setHeader= strObjects.substring(1, strObjects.length()-1).split(",");
String header="SrNo";
for(int i=0;i<setHeader.length;i++) {
String str=setHeader[i].toString().split(":")[0].toString();
header=header+","+str.substring(1, str.length()-1);
}
return header.split(",");
}
public static final void generateExcel(List<HashMap<Object, Object>> list, String[] COLUMNs,ByteArrayOutputStream out) throws IOException {
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet("Excelshit");
Font headerFont = workbook.createFont();
headerFont.setBold(true);
headerFont.setColor(IndexedColors.BLUE.getIndex());
CellStyle headerCellStyle = workbook.createCellStyle();
headerCellStyle.setFont(headerFont);
Row headerRow = sheet.createRow(0);
for (int col = 0; col < COLUMNs.length; col++) {
Cell cell = headerRow.createCell(col);
cell.setCellValue(COLUMNs[col]);
cell.setCellStyle(headerCellStyle);
}
int rowIdx = 1;
for(int k = 0; k < list.size(); k++){
Row row =sheet.createRow(rowIdx++);
for (Map.Entry<Object, Object> entry : list.get(k).entrySet()){
Object key = entry.getKey();
Object value = entry.getValue();
for (int col = 0; col < COLUMNs.length; col++) {
if(key.toString().equals(COLUMNs[col].toString())) {
row.createCell(col).setCellValue(value.toString());
}
}
}
}
workbook.write(out);
System.out.println(workbook);
}
private static final void generatePdf(List<HashMap<Object, Object>> list, String[] COLUMNs,ByteArrayOutputStream out) throws DocumentException {
Document document = new Document();
com.itextpdf.text.Font headerFont =FontFactory.getFont(FontFactory.HELVETICA_BOLD,8);
com.itextpdf.text.Font dataFont =FontFactory.getFont(FontFactory.TIMES_ROMAN,8);
PdfPCell hcell=null;
PdfPTable table = new PdfPTable(COLUMNs.length);
for (int col = 0; col < COLUMNs.length; col++) {
hcell = new PdfPCell(new Phrase(COLUMNs[col], headerFont));
hcell.setHorizontalAlignment(Element.ALIGN_CENTER);
table.addCell(hcell);
}
for(int index = 0; index < list.size(); index++){
PdfPCell cell = null;
for (Map.Entry<Object, Object> entry : list.get(index).entrySet()){
Object key = entry.getKey();
Object value = entry.getValue();
for (int col = 0; col < COLUMNs.length; col++) {
if(key.toString().equals(COLUMNs[col].toString())) {
cell = new PdfPCell(new Phrase(value.toString(),dataFont));
cell.setVerticalAlignment(Element.ALIGN_MIDDLE);
cell.setHorizontalAlignment(Element.ALIGN_CENTER);
}
}
table.addCell(cell);
}
}
PdfWriter.getInstance(document, out);
document.open();
document.add(table);
document.close();
}
public List<TblDepartment> getData() {
List<TblDepartment> list = new ArrayList<>();
departmentRepository.findAll().forEach(list::add);
return list;
}

Create PDF with iText on iSeries leads to error "The document has no pages."

We use the nice library iText for one of my customer's project to generate a pdf from a string representing a html page. The iText version is 5.5.10.
The following piece of code works well on the development environments and servers running on Windows, but it is not working on the customer's server running on iSeries.
public class GeneratePDFCmdImpl extends ControllerCommandImpl implements
GeneratePDFCmd {
private String charsetStr = null;
private Charset charset = null;
private BaseFont bf = null;
private String destFile = null;
private String destFilename = null;
private String srcContent = null;
private String docName = null;
public void setDocName(String docname) {
this.docName = docname;
}
public void setSrcContent(String srcContent) {
this.srcContent = srcContent;
}
private void prepareDefaultsAndSettings() {
/* srcContent may be more complex html but even this simple one is not working */
srcContent = "<html><head></head><body>This is just a test</body></html>";
docName = "mypdf";
charsetStr = "UTF-8";
destFilename = docName+".pdf";
Date timestamp = new Date();
/* destFile = "/" is just for the sample. In my real project, the value is a folder where my app has full rights
*/
destFile = "/" + destFilename;
charset = Charset.forName(charsetStr);
FontFactory.register("/fonts/arial.ttf","Arial");
bf = FontFactory.getFont("Arial").getBaseFont();
}
#Override
public void performExecute() throws ECException {
super.performExecute();
Document document = null;
OutputStream os = null;
prepareDefaultsAndSettings();
try {
InputStream srcInputStream;
srcInputStream = new ByteArrayInputStream(srcContent.getBytes(charset));
document = new Document(PageSize.A4, 20, 20, 75, 80);
FileOutputStream destOutput = new FileOutputStream(destFile);
PdfWriter writer = PdfWriter.getInstance(document,destOutput);
writer.setPageEvent( new HeaderFooterPageEvent(bf));
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, srcInputStream, charset);
document.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (DocumentException e) {
e.printStackTrace();
} finally {
if(document != null) {
document.close();
}
document = null;
try {
if (os != null) {
os.close();
}
} catch(IOException e) {
e.printStackTrace();
}
os = null;
}
}
private class HeaderFooterPageEvent extends PdfPageEventHelper {
PdfContentByte cb;
PdfTemplate template;
BaseFont bf;
Font f;
float fs;
public HeaderFooterPageEvent(BaseFont _bf) {
super();
bf = _bf;
f = new Font(bf);
}
#Override
public void onOpenDocument(PdfWriter writer, Document document) {
cb = writer.getDirectContent();
template = cb.createTemplate(50, 50);
}
#Override
public void onEndPage(PdfWriter writer, Document document) {
Date dat = new Date();
ColumnText ct = new ColumnText(writer.getDirectContent());
SimpleDateFormat sdf = new SimpleDateFormat("dd-MM-yyyy HH:mm");
ct.showTextAligned(writer.getDirectContent(), Element.ALIGN_CENTER, new Phrase(sdf.format(dat) ), 100, 30, 0);
String text = "Page " +writer.getPageNumber() + " to ";
float len = bf.getWidthPoint(text, 12);
cb.beginText();
cb.setFontAndSize(bf, 12);
cb.setTextMatrix(450, 30);
cb.showText(text);
cb.endText();
cb.addTemplate(template, 450 + len, 30);
}
#Override
public void onCloseDocument(PdfWriter writer, Document document) {
template.beginText();
template.setFontAndSize(bf, 12);
template.showText(String.valueOf(writer.getPageNumber()));
template.endText();
}
}
}
When executed on the iSeries, we have the error message
com.ibm.commerce.command.ECCommandTarget executeCommand CMN0420E: The following command exception has occurred during processing: "ExceptionConverter: java.io.IOException: The document has no pages.". ExceptionConverter: java.io.IOException: The document has no pages.
at com.itextpdf.text.pdf.PdfPages.writePageTree(PdfPages.java:112)
at com.itextpdf.text.pdf.PdfWriter.close(PdfWriter.java:1256)
at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:900)
at com.itextpdf.text.Document.close(Document.java:415)
at be.ourcustomer.package.GeneratePDFCmdImpl.performExecute(GeneratePDFCmdImpl.java:107)
I don't have much idea about what we do wrong. Any help would be greatly appreciated

Creating custom plugin for chinese tokenization

I'm working towards properly integrating the stanford segmenter within SOLR for chinese tokenization.
This plugin involves loading other jar files and model files. I've got it working in a crude manner by hardcoding the complete path for the files.
I'm looking for methods to create the plugin where the paths need not be hardcoded and also to have the plugin in conformance with the SOLR plugin architecture. Please let me know if there are any recommended sites or tutorials for this.
I've added my code below :
public class ChineseTokenizerFactory extends TokenizerFactory {
/** Creates a new WhitespaceTokenizerFactory */
public ChineseTokenizerFactory(Map<String,String> args) {
super(args);
assureMatchVersion();
if (!args.isEmpty()) {
throw new IllegalArgumentException("Unknown parameters: " + args);
}
}
#Override
public ChineseTokenizer create(AttributeFactory factory, Reader input) {
Reader processedStringReader = new ProcessedStringReader(input);
return new ChineseTokenizer(luceneMatchVersion, factory, processedStringReader);
}
}
public class ProcessedStringReader extends java.io.Reader {
private static final int BUFFER_SIZE = 1024 * 8;
//private static TextProcess m_textProcess = null;
private static final String basedir = "/home/praveen/PDS_Meetup/solr-4.9.0/custom_plugins/";
static Properties props = null;
static CRFClassifier<CoreLabel> segmenter = null;
private char[] m_inputData = null;
private int m_offset = 0;
private int m_length = 0;
public ProcessedStringReader(Reader input){
char[] arr = new char[BUFFER_SIZE];
StringBuffer buf = new StringBuffer();
int numChars;
if(segmenter == null)
{
segmenter = new CRFClassifier<CoreLabel>(getProperties());
segmenter.loadClassifierNoExceptions(basedir + "ctb.gz", getProperties());
}
try {
while ((numChars = input.read(arr, 0, arr.length)) > 0) {
buf.append(arr, 0, numChars);
}
} catch (IOException e) {
e.printStackTrace();
}
m_inputData = processText(buf.toString()).toCharArray();
m_offset = 0;
m_length = m_inputData.length;
}
#Override
public int read(char[] cbuf, int off, int len) throws IOException {
int charNumber = 0;
for(int i = m_offset + off;i<m_length && charNumber< len; i++){
cbuf[charNumber] = m_inputData[i];
m_offset ++;
charNumber++;
}
if(charNumber == 0){
return -1;
}
return charNumber;
}
#Override
public void close() throws IOException {
m_inputData = null;
m_offset = 0;
m_length = 0;
}
public String processText(String inputText)
{
List<String> segmented = segmenter.segmentString(inputText);
String output = "";
if(segmented.size() > 0)
{
output = segmented.get(0);
for(int i=1;i<segmented.size();i++)
{
output = output + " " +segmented.get(i);
}
}
System.out.println(output);
return output;
}
static Properties getProperties()
{
if (props == null) {
props = new Properties();
props.setProperty("sighanCorporaDict", basedir);
// props.setProperty("NormalizationTable", "data/norm.simp.utf8");
// props.setProperty("normTableEncoding", "UTF-8");
// below is needed because CTBSegDocumentIteratorFactory accesses it
props.setProperty("serDictionary",basedir+"dict-chris6.ser.gz");
props.setProperty("inputEncoding", "UTF-8");
props.setProperty("sighanPostProcessing", "true");
}
return props;
}
}
public final class ChineseTokenizer extends CharTokenizer {
public ChineseTokenizer(Version matchVersion, Reader in) {
super(matchVersion, in);
}
public ChineseTokenizer(Version matchVersion, AttributeFactory factory, Reader in) {
super(matchVersion, factory, in);
}
/** Collects only characters which do not satisfy
* {#link Character#isWhitespace(int)}.*/
#Override
protected boolean isTokenChar(int c) {
return !Character.isWhitespace(c);
}
}
You can pass the argument through the Factory's args parameter.

Can I drag items from Outlook into my SWT application?

Background
Our Eclipse RCP 3.6-based application lets people drag files in for storage/processing. This works fine when the files are dragged from a filesystem, but not when people drag items (messages or attachments) directly from Outlook.
This appears to be because Outlook wants to feed our application the files via a FileGroupDescriptorW and FileContents, but SWT only includes a FileTransfer type. (In a FileTransfer, only the file paths are passed, with the assumption that the receiver can locate and read them. The FileGroupDescriptorW/FileContents approach can supply files directly application-to-application without writing temporary files out to disk.)
We have tried to produce a ByteArrayTransfer subclass that could accept FileGroupDescriptorW and FileContents. Based on some examples on the Web, we were able to receive and parse the FileGroupDescriptorW, which (as the name implies) describes the files available for transfer. (See code sketch below.) But we have been unable to accept the FileContents.
This seems to be because Outlook offers the FileContents data only as TYMED_ISTREAM or TYMED_ISTORAGE, but SWT only understands how to exchange data as TYMED_HGLOBAL. Of those, it appears that TYMED_ISTORAGE would be preferable, since it's not clear how TYMED_ISTREAM could provide access to multiple files' contents.
(We also have some concerns about SWT's desire to pick and convert only a single TransferData type, given that we need to process two, but we think we could probably hack around that in Java somehow: it seems that all the TransferDatas are available at other points of the process.)
Questions
Are we on the right track here? Has anyone managed to accept FileContents in SWT yet? Is there any chance that we could process the TYMED_ISTORAGE data without leaving Java (even if by creating a fragment-based patch to, or a derived version of, SWT), or would we have to build some new native support code too?
Relevant code snippets
Sketch code that extracts file names:
// THIS IS NOT PRODUCTION-QUALITY CODE - FOR ILLUSTRATION ONLY
final Transfer transfer = new ByteArrayTransfer() {
private final String[] typeNames = new String[] { "FileGroupDescriptorW", "FileContents" };
private final int[] typeIds = new int[] { registerType(typeNames[0]), registerType(typeNames[1]) };
#Override
protected String[] getTypeNames() {
return typeNames;
}
#Override
protected int[] getTypeIds() {
return typeIds;
}
#Override
protected Object nativeToJava(TransferData transferData) {
if (!isSupportedType(transferData))
return null;
final byte[] buffer = (byte[]) super.nativeToJava(transferData);
if (buffer == null)
return null;
try {
final DataInputStream in = new DataInputStream(new ByteArrayInputStream(buffer));
long count = 0;
for (int i = 0; i < 4; i++) {
count += in.readUnsignedByte() << i;
}
for (int i = 0; i < count; i++) {
final byte[] filenameBytes = new byte[260 * 2];
in.skipBytes(72); // probable architecture assumption(s) - may be wrong outside standard 32-bit Win XP
in.read(filenameBytes);
final String fileNameIncludingTrailingNulls = new String(filenameBytes, "UTF-16LE");
int stringLength = fileNameIncludingTrailingNulls.indexOf('\0');
if (stringLength == -1)
stringLength = 260;
final String fileName = fileNameIncludingTrailingNulls.substring(0, stringLength);
System.out.println("File " + i + ": " + fileName);
}
in.close();
return buffer;
}
catch (final Exception e) {
return null;
}
}
};
In the debugger, we see that ByteArrayTransfer's isSupportedType() ultimately returns false for the FileContents because the following test is not passed (since its tymed is TYMED_ISTREAM | TYMED_ISTORAGE):
if (format.cfFormat == types[i] &&
(format.dwAspect & COM.DVASPECT_CONTENT) == COM.DVASPECT_CONTENT &&
(format.tymed & COM.TYMED_HGLOBAL) == COM.TYMED_HGLOBAL )
return true;
This excerpt from org.eclipse.swt.internal.ole.win32.COM leaves us feeling less hope for an easy solution:
public static final int TYMED_HGLOBAL = 1;
//public static final int TYMED_ISTORAGE = 8;
//public static final int TYMED_ISTREAM = 4;
Thanks.
even if
//public static final int TYMED_ISTREAM = 4;
Try below code.. it should work
package com.nagarro.jsag.poc.swtdrag;
imports ...
public class MyTransfer extends ByteArrayTransfer {
private static int BYTES_COUNT = 592;
private static int SKIP_BYTES = 72;
private final String[] typeNames = new String[] { "FileGroupDescriptorW", "FileContents" };
private final int[] typeIds = new int[] { registerType(typeNames[0]), registerType(typeNames[1]) };
#Override
protected String[] getTypeNames() {
return typeNames;
}
#Override
protected int[] getTypeIds() {
return typeIds;
}
#Override
protected Object nativeToJava(TransferData transferData) {
String[] result = null;
if (!isSupportedType(transferData) || transferData.pIDataObject == 0)
return null;
IDataObject data = new IDataObject(transferData.pIDataObject);
data.AddRef();
// Check for descriptor format type
try {
FORMATETC formatetcFD = transferData.formatetc;
STGMEDIUM stgmediumFD = new STGMEDIUM();
stgmediumFD.tymed = COM.TYMED_HGLOBAL;
transferData.result = data.GetData(formatetcFD, stgmediumFD);
if (transferData.result == COM.S_OK) {
// Check for contents format type
long hMem = stgmediumFD.unionField;
long fileDiscriptorPtr = OS.GlobalLock(hMem);
int[] fileCount = new int[1];
try {
OS.MoveMemory(fileCount, fileDiscriptorPtr, 4);
fileDiscriptorPtr += 4;
result = new String[fileCount[0]];
for (int i = 0; i < fileCount[0]; i++) {
String fileName = handleFile(fileDiscriptorPtr, data);
System.out.println("FileName : = " + fileName);
result[i] = fileName;
fileDiscriptorPtr += BYTES_COUNT;
}
} catch (Exception e) {
e.printStackTrace();
} finally {
OS.GlobalFree(hMem);
}
}
} finally {
data.Release();
}
return result;
}
private String handleFile(long fileDiscriptorPtr, IDataObject data) throws Exception {
// GetFileName
char[] fileNameChars = new char[OS.MAX_PATH];
byte[] fileNameBytes = new byte[OS.MAX_PATH];
COM.MoveMemory(fileNameBytes, fileDiscriptorPtr, BYTES_COUNT);
// Skip some bytes.
fileNameBytes = Arrays.copyOfRange(fileNameBytes, SKIP_BYTES, fileNameBytes.length);
String fileNameIncludingTrailingNulls = new String(fileNameBytes, "UTF-16LE");
fileNameChars = fileNameIncludingTrailingNulls.toCharArray();
StringBuilder builder = new StringBuilder(OS.MAX_PATH);
for (int i = 0; fileNameChars[i] != 0 && i < fileNameChars.length; i++) {
builder.append(fileNameChars[i]);
}
String name = builder.toString();
try {
File file = saveFileContent(name, data);
if (file != null) {
System.out.println("File Saved # " + file.getAbsolutePath());
;
}
} catch (IOException e) {
System.out.println("Count not save file content");
;
}
return name;
}
private File saveFileContent(String fileName, IDataObject data) throws IOException {
File file = null;
FORMATETC formatetc = new FORMATETC();
formatetc.cfFormat = typeIds[1];
formatetc.dwAspect = COM.DVASPECT_CONTENT;
formatetc.lindex = 0;
formatetc.tymed = 4; // content.
STGMEDIUM stgmedium = new STGMEDIUM();
stgmedium.tymed = 4;
if (data.GetData(formatetc, stgmedium) == COM.S_OK) {
file = new File(fileName);
IStream iStream = new IStream(stgmedium.unionField);
iStream.AddRef();
try (FileOutputStream outputStream = new FileOutputStream(file)) {
int increment = 1024 * 4;
long pv = COM.CoTaskMemAlloc(increment);
int[] pcbWritten = new int[1];
while (iStream.Read(pv, increment, pcbWritten) == COM.S_OK && pcbWritten[0] > 0) {
byte[] buffer = new byte[pcbWritten[0]];
OS.MoveMemory(buffer, pv, pcbWritten[0]);
outputStream.write(buffer);
}
COM.CoTaskMemFree(pv);
} finally {
iStream.Release();
}
return file;
} else {
return null;
}
}
}
Have you looked at https://bugs.eclipse.org/bugs/show_bug.cgi?id=132514 ?
Attached to this bugzilla entry is an patch (against an rather old version of SWT) that might be of interest.
I had the same problem and created a small library providing a Drag'n Drop Transfer Class for JAVA SWT. It can be found here:
https://github.com/HendrikHoetker/OutlookItemTransfer
Currently it supports dropping Mail Items from Outlook to your Java SWT application and will provide a list of OutlookItems with the Filename and a byte array of the file contents.
All is pure Java and in-memory (no temp files).
Usage in your SWT java application:
if (OutlookItemTransfer.getInstance().isSupportedType(event.currentDataType)) {
Object o = OutlookItemTransfer.getInstance().nativeToJava(event.currentDataType);
if (o != null && o instanceof OutlookMessage[]) {
OutlookMessage[] outlookMessages = (OutlookMessage[])o;
for (OutlookMessage msg: outlookMessages) {
//...
}
}
}
The OutlookItem will then provide two elements: filename as String and file contents as array of byte.
From here on, one could write it to a file or further process the byte array.
To your question above:
- What you find in the file descriptor is the filename of the outlook item and a pointer to an IDataObject
- the IDataObject can be parsed and will provide an IStorage object
- The IStorageObject will be then a root container providing further sub-IStorageObjects or IStreams similar to a filesystem (directory = IStorage, file = IStream
You find those elements in the following lines of code:
Get File Contents, see OutlookItemTransfer.java, method nativeToJava:
FORMATETC format = new FORMATETC();
format.cfFormat = getTypeIds()[1];
format.dwAspect = COM.DVASPECT_CONTENT;
format.lindex = <fileIndex>;
format.ptd = 0;
format.tymed = TYMED_ISTORAGE | TYMED_ISTREAM | COM.TYMED_HGLOBAL;
STGMEDIUM medium = new STGMEDIUM();
if (data.GetData(format, medium) == COM.S_OK) {
// medium.tymed will now contain TYMED_ISTORAGE
// in medium.unionfield you will find the root IStorage
}
Read the root IStorage, see CompoundStorage, method readOutlookStorage:
// open IStorage object
IStorage storage = new IStorage(pIStorage);
storage.AddRef();
// walk through the content of the IStorage object
long[] pEnumStorage = new long[1];
if (storage.EnumElements(0, 0, 0, pEnumStorage) == COM.S_OK) {
// get storage iterator
IEnumSTATSTG enumStorage = new IEnumSTATSTG(pEnumStorage[0]);
enumStorage.AddRef();
enumStorage.Reset();
// prepare statstg structure which tells about the object found by the iterator
long pSTATSTG = OS.GlobalAlloc(OS.GMEM_FIXED | OS.GMEM_ZEROINIT, STATSTG.sizeof);
int[] fetched = new int[1];
while (enumStorage.Next(1, pSTATSTG, fetched) == COM.S_OK && fetched[0] == 1) {
// get the description of the the object found
STATSTG statstg = new STATSTG();
COM.MoveMemory(statstg, pSTATSTG, STATSTG.sizeof);
// get the name of the object found
String name = readPWCSName(statstg);
// depending on type of object
switch (statstg.type) {
case COM.STGTY_STREAM: { // load an IStream (=File)
long[] pIStream = new long[1];
// get the pointer to the IStream
if (storage.OpenStream(name, 0, COM.STGM_DIRECT | COM.STGM_READ | COM.STGM_SHARE_EXCLUSIVE, 0, pIStream) == COM.S_OK) {
// load the IStream
}
}
case COM.STGTY_STORAGE: { // load an IStorage (=SubDirectory) - requires recursion to traverse the sub dies
}
}
}
}
// close the iterator
enumStorage.Release();
}
// close the IStorage object
storage.Release();