ITextSharp / PDFBox text extract fails for certain pdfs - itext

The code below extracts the text from a PDF correctly via ITextSharp in many instances.
using (var pdfReader = new PdfReader(filename))
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
var currentText = PdfTextExtractor.GetTextFromPage(
pdfReader,
1,
strategy);
currentText =
Encoding.UTF8.GetString(Encoding.Convert(
Encoding.Default,
Encoding.UTF8,
Encoding.Default.GetBytes(currentText)));
Console.WriteLine(currentText);
}
However, in the case of this PDF I get the following instead of text: "\u0001\u0002\u0003\u0004\u0005\u0006\a\b\t\a\u0001\u0002\u0003\u0004\u0005\u0006\u0003"
I have tried different encodings and even PDFBox but still failed to decode the PDF correctly. Any ideas on how to solve the issue?

Extracting the text nonetheless
#Bruno's answer is the answer one should give here, the PDF clearly does not provide the information required to allow proper text extraction according to section 9.10 Extraction of Text Content of the PDF specification ISO 32000-1...
But there actually is a slightly evil way to extract the text from the PDF at hand nonetheless!
Wrapping one's text extraction strategy in an instance of the following class, the garbled text is replaced by the correct text:
public class RemappingExtractionFilter : ITextExtractionStrategy
{
ITextExtractionStrategy strategy;
System.Reflection.FieldInfo stringField;
public RemappingExtractionFilter(ITextExtractionStrategy strategy)
{
this.strategy = strategy;
this.stringField = typeof(TextRenderInfo).GetField("text", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
}
public void RenderText(TextRenderInfo renderInfo)
{
DocumentFont font =renderInfo.GetFont();
PdfDictionary dict = font.FontDictionary;
PdfDictionary encoding = dict.GetAsDict(PdfName.ENCODING);
PdfArray diffs = encoding.GetAsArray(PdfName.DIFFERENCES);
;
StringBuilder builder = new StringBuilder();
foreach (byte b in renderInfo.PdfString.GetBytes())
{
PdfName name = diffs.GetAsName((char)b);
String s = name.ToString().Substring(2);
int i = Convert.ToInt32(s, 16);
builder.Append((char)i);
}
stringField.SetValue(renderInfo, builder.ToString());
strategy.RenderText(renderInfo);
}
public void BeginTextBlock()
{
strategy.BeginTextBlock();
}
public void EndTextBlock()
{
strategy.EndTextBlock();
}
public void RenderImage(ImageRenderInfo renderInfo)
{
strategy.RenderImage(renderInfo);
}
public String GetResultantText()
{
return strategy.GetResultantText();
}
}
It can be used like this:
ITextExtractionStrategy strategy = new RemappingExtractionFilter(new LocationTextExtractionStrategy());
string text = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
Beware, I had to use System.Reflection to access private members. Some environments may forbid this.
The same in Java
I initially coded this in Java for iText because that's my primary development environment. Thus, here the initial Java version:
public class RemappingExtractionFilter implements TextExtractionStrategy
{
public RemappingExtractionFilter(TextExtractionStrategy strategy) throws NoSuchFieldException, SecurityException
{
this.strategy = strategy;
this.stringField = TextRenderInfo.class.getDeclaredField("text");
this.stringField.setAccessible(true);
}
#Override
public void renderText(TextRenderInfo renderInfo)
{
DocumentFont font =renderInfo.getFont();
PdfDictionary dict = font.getFontDictionary();
PdfDictionary encoding = dict.getAsDict(PdfName.ENCODING);
PdfArray diffs = encoding.getAsArray(PdfName.DIFFERENCES);
;
StringBuilder builder = new StringBuilder();
for (byte b : renderInfo.getPdfString().getBytes())
{
PdfName name = diffs.getAsName((char)b);
String s = name.toString().substring(2);
int i = Integer.parseUnsignedInt(s, 16);
builder.append((char)i);
}
try
{
stringField.set(renderInfo, builder.toString());
}
catch (IllegalArgumentException | IllegalAccessException e)
{
e.printStackTrace();
}
strategy.renderText(renderInfo);
}
#Override
public void beginTextBlock()
{
strategy.beginTextBlock();
}
#Override
public void endTextBlock()
{
strategy.endTextBlock();
}
#Override
public void renderImage(ImageRenderInfo renderInfo)
{
strategy.renderImage(renderInfo);
}
#Override
public String getResultantText()
{
return strategy.getResultantText();
}
final TextExtractionStrategy strategy;
final Field stringField;
}
(RemappingExtractionFilter.java)
It can be used like this:
String extractRemapped(PdfReader reader, int pageNo) throws IOException, NoSuchFieldException, SecurityException
{
TextExtractionStrategy strategy = new RemappingExtractionFilter(new LocationTextExtractionStrategy());
return PdfTextExtractor.getTextFromPage(reader, pageNo, strategy);
}
(from RemappedExtraction.java)
Why does this work?
First of all, this is not the solution to all extraction problems, merely for extracting text from PDFs like the OP has presented.
This method works because the names the PDF uses in its fonts' encoding differences arrays can be interpreted even though they are not standard. These names are built as /Gxx where xx is the hexadecimal representation of the ASCII code of the character this name represents.

A good test to find out whether or not a PDF allows text to be extracted correctly, is by opening it in Adobe Reader and to copy and paste the text.
For instance: I copied the word ABSTRACT and I pasted it in Notepad++:
Do you see the word ABSTRACT in Notepad++? No, you see %&SOH'"%GS. The A is represented as %, the B is represented as &, and so on.
This is a clear indication that the content of the PDF isn't accessible: there is no mapping between the encoding that was use (% = A, & = B,...) and the actual characters that humans can understand.
In short: the PDF doesn't allow you to extract text, not with iText, not with iTextSharp, not with PDFBox. You'll have to find an OCR tool instead and OCR the complete document.
For more info, you may want to watch the following videos:
https://www.youtube.com/watch?v=4ur9WRWVrbM (~5 minutes)
https://www.youtube.com/watch?v=wxGEEv7ibHE (~15 minutes)
https://www.youtube.com/watch?v=g-QcU9B4qMc (~45 minutes)

Related

In iText 7 java how do you update Link text after it's already been added to the document

I am using iText7 to build a table of contents for my document. I know all the section names before I start, but don't know what the page numbers will be. My current process is to create a table on the first page and create all the Link objects with generic text "GO!". Then as I add sections I add through the link objects and update the text with the page numbers that I figured out as I created the document.
However, at the end, what gets written out for the link is "GO!", not the updated page number values I set as I was creating the rest of the document.
I did set the immediateFlush flag to false when I created the Document.
public class UpdateLinkTest {
PdfDocument pdfDocument = null;
List<Link>links = null;
Color hyperlinkColor = new DeviceRgb(0, 102, 204);
public static void main(String[] args) throws Exception {
List<String[]>notes = new ArrayList<>();
notes.add(new String[] {"me", "title", "this is my text" });
notes.add(new String[] {"me2", "title2", "this is my text 2" });
new UpdateLinkTest().exportPdf(notes, new File("./test2.pdf"));
}
public void exportPdf(List<String[]> notes, File selectedFile) throws Exception {
PdfWriter pdfWriter = new PdfWriter(selectedFile);
pdfDocument = new PdfDocument(pdfWriter);
Document document = new Document(pdfDocument, PageSize.A4, false);
// add the table of contents table
addSummaryTable(notes, document);
// add a page break
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// add the body of the document
addNotesText(notes, document);
document.close();
}
private void addSummaryTable(List<String[]> notes, Document document) {
links = new ArrayList<>();
Table table = new Table(3);
float pageWidth = PageSize.A4.getWidth();
table.setWidth(pageWidth-document.getLeftMargin()*2);
// add header
addCell("Author", table, true);
addCell("Title", table, true);
addCell("Page", table, true);
int count = 0;
for (String[] note : notes) {
addCell(note[0], table, false);
addCell(note[1], table, false);
Link link = new Link("Go!", PdfAction.createGoTo(""+ (count+1)));
links.add(link);
addCell(link, hyperlinkColor, table, false);
count++;
}
document.add(table);
}
private void addNotesText(List<String[]> notes, Document document)
throws Exception {
int count = 0;
for (String[] note : notes) {
int numberOfPages = pdfDocument.getNumberOfPages();
Link link = links.get(count);
link.setText(""+(numberOfPages+1));
Paragraph noteText = new Paragraph(note[2]);
document.add(noteText);
noteText.setDestination(++count+"");
if (note != notes.get(notes.size()-1))
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
}
}
private static void addCell(String text, Table table, boolean b) {
Cell c1 = new Cell().add(new Paragraph(text));
table.addCell(c1);
}
private static void addCell(Link text, Color backgroundColor, Table table, boolean b) {
Cell c1 = new Cell().add(new Paragraph(text));
text.setUnderline();
text.setFontColor(backgroundColor);
table.addCell(c1);
}
}
Quite more work needs to be done compared to the code you have now because the changes to the elements don't take any effect once you've added them to the document. Immediate flush set to false allows you to relayout the elements, but that does not happen automatically. The way you calculate the current page the paragraph will be placed on (int numberOfPages = pdfDocument.getNumberOfPages();) is not bulletproof because in some cases pages might be added in advance, even if the content is not going to be placed on them immediately.
There is a very low level way to achieve your goal but with the recent version of iText (7.1.15) there is a simpler way as well, which still requires some work though. Basically your use case is very similar to target-counter concept in CSS, with page counter being the target one in your case. To support target counters in pdfHTML add-on we added new capabilities to our layout module which are possible to use directly as well.
To start off, we are going to tie our Link elements to the corresponding Paragraph elements that they will point to. We are going to do it with ID property in layout:
link.setProperty(Property.ID, String.valueOf(count));
noteText.setProperty(Property.ID, String.valueOf(count));
Next up, we are going to create custom renderers for our Link elements and Paragraph elements. Those custom renderers will interact with TargetCounterHandler which is the new capability in layout module I mentioned in the introduction. The idea is that during layout operation the paragraph will remember the page on which it was placed and then the corresponding link element (remember, link elements are connected to paragraph elements) will ask TargetCounterHandler during layout process of that link element which page the corresponding paragraph was planed on. So in a way, TargetCounterHandler is a connector.
Code for custom renderers:
private static class CustomParagraphRenderer extends ParagraphRenderer {
public CustomParagraphRenderer(Paragraph modelElement) {
super(modelElement);
}
#Override
public IRenderer getNextRenderer() {
return new CustomParagraphRenderer((Paragraph) modelElement);
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
LayoutResult result = super.layout(layoutContext);
TargetCounterHandler.addPageByID(this);
return result;
}
}
private static class CustomLinkRenderer extends LinkRenderer {
public CustomLinkRenderer(Link link) {
super(link);
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
Integer targetPageNumber = TargetCounterHandler.getPageByID(this, getProperty(Property.ID));
if (targetPageNumber != null) {
setText(String.valueOf(targetPageNumber));
}
return super.layout(layoutContext);
}
#Override
public IRenderer getNextRenderer() {
return new CustomLinkRenderer((Link) getModelElement());
}
}
Don't forget to assign the custom renderers to their elements:
link.setNextRenderer(new CustomLinkRenderer(link));
noteText.setNextRenderer(new CustomParagraphRenderer(noteText));
Now, the other thing we need to do it relayout. You already set immediateFlush to false and this is needed for relayout to work. Relayout is needed because on the first layout loop we will not know all the positions of the paragraphs, but we will already have placed the links on the pages by the time we know those positions. So we need the second pass to use the information about page numbers the paragraphs will reside on and set that information to the links.
Relayout is pretty straightforward - once you've put all the content you just need to call a single dedicated method:
// For now we have to prepare the handler for relayout manually, this is going to be improved
// in future iText versions
((DocumentRenderer)document.getRenderer()).getTargetCounterHandler().prepareHandlerToRelayout();
document.relayout();
One caveat is that for now you also need to subclass the DocumentRenderer since there is an additional operation that needs to be done that is not performed under the hood - propagation of the target counter handler to the root renderer we will be using for the second layout operation:
// For now we have to create a custom renderer for the root document to propagate the
// target counter handler to the renderer that will be used on the second layout process
// This is going to be improved in future iText versions
private static class CustomDocumentRenderer extends DocumentRenderer {
public CustomDocumentRenderer(Document document, boolean immediateFlush) {
super(document, immediateFlush);
}
#Override
public IRenderer getNextRenderer() {
CustomDocumentRenderer renderer = new CustomDocumentRenderer(document, immediateFlush);
renderer.targetCounterHandler = new TargetCounterHandler(targetCounterHandler);
return renderer;
}
}
document.setRenderer(new CustomDocumentRenderer(document, false));
And now we are done. Here is our visual result:
Complete code looks as follows:
public class UpdateLinkTest {
PdfDocument pdfDocument = null;
Color hyperlinkColor = new DeviceRgb(0, 102, 204);
public static void main(String[] args) throws Exception {
List<String[]> notes = new ArrayList<>();
notes.add(new String[] {"me", "title", "this is my text" });
notes.add(new String[] {"me2", "title2", "this is my text 2" });
new UpdateLinkTest().exportPdf(notes, new File("./test2.pdf"));
}
public void exportPdf(List<String[]> notes, File selectedFile) throws Exception {
PdfWriter pdfWriter = new PdfWriter(selectedFile);
pdfDocument = new PdfDocument(pdfWriter);
Document document = new Document(pdfDocument, PageSize.A4, false);
document.setRenderer(new CustomDocumentRenderer(document, false));
// add the table of contents table
addSummaryTable(notes, document);
// add a page break
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
// add the body of the document
addNotesText(notes, document);
// For now we have to prepare the handler for relayout manually, this is going to be improved
// in future iText versions
((DocumentRenderer)document.getRenderer()).getTargetCounterHandler().prepareHandlerToRelayout();
document.relayout();
document.close();
}
private void addSummaryTable(List<String[]> notes, Document document) {
Table table = new Table(3);
float pageWidth = PageSize.A4.getWidth();
table.setWidth(pageWidth-document.getLeftMargin()*2);
// add header
addCell("Author", table, true);
addCell("Title", table, true);
addCell("Page", table, true);
int count = 0;
for (String[] note : notes) {
addCell(note[0], table, false);
addCell(note[1], table, false);
Link link = new Link("Go!", PdfAction.createGoTo(""+ (count+1)));
link.setProperty(Property.ID, String.valueOf(count));
link.setNextRenderer(new CustomLinkRenderer(link));
addCell(link, hyperlinkColor, table, false);
count++;
}
document.add(table);
}
private void addNotesText(List<String[]> notes, Document document) {
int count = 0;
for (String[] note : notes) {
Paragraph noteText = new Paragraph(note[2]);
noteText.setProperty(Property.ID, String.valueOf(count));
noteText.setNextRenderer(new CustomParagraphRenderer(noteText));
document.add(noteText);
noteText.setDestination(++count+"");
if (note != notes.get(notes.size()-1))
document.add(new AreaBreak(AreaBreakType.NEXT_PAGE));
}
}
private static void addCell(String text, Table table, boolean b) {
Cell c1 = new Cell().add(new Paragraph(text));
table.addCell(c1);
}
private static void addCell(Link text, Color backgroundColor, Table table, boolean b) {
Cell c1 = new Cell().add(new Paragraph(text));
text.setUnderline();
text.setFontColor(backgroundColor);
table.addCell(c1);
}
private static class CustomLinkRenderer extends LinkRenderer {
public CustomLinkRenderer(Link link) {
super(link);
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
Integer targetPageNumber = TargetCounterHandler.getPageByID(this, getProperty(Property.ID));
if (targetPageNumber != null) {
setText(String.valueOf(targetPageNumber));
}
return super.layout(layoutContext);
}
#Override
public IRenderer getNextRenderer() {
return new CustomLinkRenderer((Link) getModelElement());
}
}
private static class CustomParagraphRenderer extends ParagraphRenderer {
public CustomParagraphRenderer(Paragraph modelElement) {
super(modelElement);
}
#Override
public IRenderer getNextRenderer() {
return new CustomParagraphRenderer((Paragraph) modelElement);
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
LayoutResult result = super.layout(layoutContext);
TargetCounterHandler.addPageByID(this);
return result;
}
}
// For now we have to create a custom renderer for the root document to propagate the
// target counter handler to the renderer that will be used on the second layout process
// This is going to be improved in future iText versions
private static class CustomDocumentRenderer extends DocumentRenderer {
public CustomDocumentRenderer(Document document, boolean immediateFlush) {
super(document, immediateFlush);
}
#Override
public IRenderer getNextRenderer() {
CustomDocumentRenderer renderer = new CustomDocumentRenderer(document, immediateFlush);
renderer.targetCounterHandler = new TargetCounterHandler(targetCounterHandler);
return renderer;
}
}
}

how to search any keyword from string using jfce AutoCompleteField

I have swt text where in I have written like "new AutoCompleteField (textSearch,new TextContentProvider(), searchList); it works but it finds the strings start with expression. I want to create my own proposal provider where i can write something if my string contains any keyword, i should get autoComplete popup.
You can't use the existing AutoCompleteField for this since you need to change the content proposal provider.
A suitable IContentProposalProvider would be something like:
public class AnyPositionContentProposalProvider implements IContentProposalProvider
{
private final String [] proposals;
public AnyPositionContentProposalProvider(String [] theProposals)
{
proposals = theProposals;
}
#Override
public IContentProposal [] getProposals(String contents, int position)
{
List<IContentProposal> result = new ArrayList<>();
for (String proposal : proposals) {
if (proposal.contains(contents)) {
result.add(new ContentProposal(proposal));
}
}
return result.toArray(new IContentProposal [result.size()]);
}
}
The following methods set this up to work like AutoCompleteField:
// Installs on a Text control
public static void installAnyPositionMatch(Text control, String [] proposals)
{
installAnyPositionMatch(control, new TextContentAdapter(), proposals);
}
// Install on any control with a content adapter
public static void installAnyPositionMatch(Control control, IControlContentAdapter controlContentAdapter, String [] proposals)
{
IContentProposalProvider proposalProvider = new AnyPositionContentProposalProvider(proposals);
ContentProposalAdapter adapter = new ContentProposalAdapter(control, controlContentAdapter, proposalProvider, null, null);
adapter.setPropagateKeys(true);
adapter.setProposalAcceptanceStyle(ContentProposalAdapter.PROPOSAL_REPLACE);
}

Rewrite Adobe CQ Image src attribute

In AEM, content such as pages and images contains the '/content/' prefix in them. We are able to rewrite these url via Link Checker Transformer configuration and resourceResolver.map() method. URLs are being rewritten for HTML elements <a> and <form>.
But I want it to work for <img> elements as well.
I tried including the <img> elements to the Link Checker Transformer configuration by adding it to the 'Rewrite Elements' list as img:src:
I also checked the answers from What am I missing for this CQ5/AEM URL rewriting scenario? but both attempts didn't work for this issue.
Is there any way to do this?
Even if the rewriter and Link Checker Transformer didn't work. I used a custom LinkRewriter by using the Transformer and TransformerFactory interfaces. I based on the sample from Adobe for my code. I worked out something like this:
#Component(
metatype = true,
label = "Image Link Rewriter",
description = "Maps the <img> elements src attributes"
)
#Service(value = TransformerFactory.class)
#Property(value = "global", propertyPrivate = true)
public class ImageLinkRewriter implements Transformer, TransformerFactory {
// some variables
public CustomLinkTransformer() { }
#Override
public void init(ProcessingContext context,
ProcessingComponentConfiguration config) throws IOException {
// initializations here
}
#Override
public final Transformer createTransformer() {
return new CustomLinkTransformer();
}
#Override
public void startElement(String uri, String localName,
String qName, Attributes atts) throws SAXException {
if ("img".equalsIgnoreCase(localName)) {
contentHandler.startElement(uri, localName, qName, rewriteImageLink(atts));
}
}
private Attributes rewriteImageLink(Attributes attrs) {
String attrName = "src";
AttributesImpl result = new AttributesImpl(attrs);
String link = attrs.getValue(attrName);
String mappedLink = resource.getResourceResolver().map(request, link);
result.setValue(result.getIndex(attrName), mappedLink);
return result;
}
}
Hope this helps others. Here are a few references:
TransformerFactory interface
Transformer interface
Adobe

GWT FileUpload - Servlet options and handling response

I am new to GWT and am trying to implement a file upload functionality.
Found some implementation help over the internet and used that as reference.
But have some questions related to that:
The actual upload or writing the contents of file on server(or disk) will be done by a servlet.
Is it necessary that this servlet (say MyFileUploadServlet) extends HttpServlet? OR
I can use RemoteServiceServlet or implement any other interface? If yes, which method do I need to implement/override?
In my servlet, after everything is done, I need to return back the response back to the client.
I think form.addSubmitCompleteHandler() can be used to achieve that. From servlet, I could return text/html (or String type object) and then use SubmitCompleteEvent.getResults() to get the result.
Question is that can I use my custom object instead of String (lets say MyFileUploadResult), populate the results in it and then pass it back to client?
or can I get back JSON object?
Currently, after getting back the response and using SubmitCompleteEvent.getResults(), I am getting some HTML tags added to the actual response such as :
pre> Image upload successfully /pre> .
Is there a way to get rid of that?
Thanks a lot in advance!
Regards,
Ashish
To upload files, I have extended HttpServlet in the past. I used it together with Commons-FileUpload.
I made a general widget for form-based uploads. That was to accommodate uploads for different file types (plain text and Base64). If you just need to upload plain text files, you could combine the following two classes into one.
public class UploadFile extends Composite {
#UiField FormPanel uploadForm;
#UiField FileUpload fileUpload;
#UiField Button uploadButton;
interface Binder extends UiBinder<Widget, UploadFile> {}
public UploadFile() {
initWidget(GWT.<Binder> create(Binder.class).createAndBindUi(this));
fileUpload.setName("fileUpload");
uploadForm.setEncoding(FormPanel.ENCODING_MULTIPART);
uploadForm.setMethod(FormPanel.METHOD_POST);
uploadForm.addSubmitHandler(new SubmitHandler() {
#Override
public void onSubmit(SubmitEvent event) {
if ("".equals(fileUpload.getFilename())) {
Window.alert("No file selected");
event.cancel();
}
}
});
uploadButton.addClickHandler(new ClickHandler() {
#Override
public void onClick(ClickEvent event) {
uploadForm.submit();
}
});
}
public HandlerRegistration addCompletedCallback(
final AsyncCallback<String> callback) {
return uploadForm.addSubmitCompleteHandler(new SubmitCompleteHandler() {
#Override
public void onSubmitComplete(SubmitCompleteEvent event) {
callback.onSuccess(event.getResults());
}
});
}
}
The UiBinder part is pretty straighforward.
<g:HTMLPanel>
<g:HorizontalPanel>
<g:FormPanel ui:field="uploadForm">
<g:FileUpload ui:field="fileUpload"></g:FileUpload>
</g:FormPanel>
<g:Button ui:field="uploadButton">Upload File</g:Button>
</g:HorizontalPanel>
</g:HTMLPanel>
Now you can extend this class for plain text files. Just make sure your web.xml serves the HttpServlet at /textupload.
public class UploadFileAsText extends UploadFile {
public UploadFileAsText() {
uploadForm.setAction(GWT.getModuleBaseURL() + "textupload");
}
}
The servlet for plain text files goes on the server side. It returns the contents of the uploaded file to the client. Make sure to install the jar for FileUpload from Apache Commons somewhere on your classpath.
public class TextFileUploadServiceImpl extends HttpServlet {
private static final long serialVersionUID = 1L;
#Override
protected void doPost(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
if (! ServletFileUpload.isMultipartContent(request)) {
response.sendError(HttpServletResponse.SC_BAD_REQUEST,
"Not a multipart request");
return;
}
ServletFileUpload upload = new ServletFileUpload(); // from Commons
try {
FileItemIterator iter = upload.getItemIterator(request);
if (iter.hasNext()) {
FileItemStream fileItem = iter.next();
// String name = fileItem.getFieldName(); // file name, if you need it
ServletOutputStream out = response.getOutputStream();
response.setBufferSize(32768);
int bufSize = response.getBufferSize();
byte[] buffer = new byte[bufSize];
InputStream in = fileItem.openStream();
BufferedInputStream bis = new BufferedInputStream(in, bufSize);
long length = 0;
int bytes;
while ((bytes = bis.read(buffer, 0, bufSize)) >= 0) {
out.write(buffer, 0, bytes);
length += bytes;
}
response.setContentType("text/html");
response.setContentLength(
(length > 0 && length <= Integer.MAX_VALUE) ? (int) length : 0);
bis.close();
in.close();
out.flush();
out.close();
}
} catch(Exception caught) {
throw new RuntimeException(caught);
}
}
}
I cannot recall how I got around the <pre></pre> tag problem. You may have to filter the tags on the client. The topic is also addressed here.

Need help loading XML data into XNA 4.0 project

I'd like to do this the right way if possible. I have XML data as follows:
<?xml version="1.0" encoding="utf-8"?>
<XnaContent>
<Asset Type="PG2.Dictionary">
<Letters TotalInstances="460100">
<Letter Count="34481">a</Letter>
...
<Letter Count="1361">z</Letter>
</Letters>
<Words Count="60516">
<Word>aardvark</Word>
...
<Word>zebra</Word>
</Words>
</Asset>
</XnaContent>
and I'd like to load this in (using Content.Load< Dictionary >) into one of these
namespace PG2
{
public class Dictionary
{
public class Letters
{
public int totalInstances;
public List<Character> characters;
public class Character
{
public int count;
public char character;
}
}
public class Words
{
public int count;
public HashSet<string> words;
}
Letters letters;
Words words;
}
}
Can anyone help with either instructions or pointers to tutorials? I've found a few which come close but things seem to have changed slightly between 3.1 and 4.0 in ways which I don't understand and a lot of the documentation assumes knowledge I don't have. My understanding so far is that I need to make the Dictionary class Serializable but I can't seem to make that happen. I've added the XML file to the content project but how do I get it to create the correct XNB file?
Thanks!
Charlie.
This may help Link. I found it useful to work the other way round to check that my xml data was correctly defined. Instantate your dictionary class set all the fields then serialize it to xml using a XmlSerializer to check the output.
You need to implement a ContentTypeSerializer for your Dictionary class. Put this in a content extension library and add a reference to the content extension library to your content project. Put your Dictionary class into a game library that is reference by both your game and the content extension project.
See:
http://blogs.msdn.com/b/shawnhar/archive/2008/08/26/customizing-intermediateserializer-part-2.aspx
Here is a quick ContentTypeSerializer I wrote that will deserialize your Dictionary class. It could use better error handling.
using System;
using System.Collections.Generic;
using System.Xml;
using Microsoft.Xna.Framework.Content.Pipeline.Serialization.Intermediate;
namespace PG2
{
[ContentTypeSerializer]
class DictionaryXmlSerializer : ContentTypeSerializer<Dictionary>
{
private void ReadToNextElement(XmlReader reader)
{
reader.Read();
while (reader.NodeType != System.Xml.XmlNodeType.Element)
{
if (!reader.Read())
{
return;
}
}
}
private void ReadToEndElement(XmlReader reader)
{
reader.Read();
while (reader.NodeType != System.Xml.XmlNodeType.EndElement)
{
reader.Read();
}
}
private int ReadAttributeInt(XmlReader reader, string attributeName)
{
reader.MoveToAttribute(attributeName);
return int.Parse(reader.Value);
}
protected override Dictionary Deserialize(IntermediateReader input, Microsoft.Xna.Framework.Content.ContentSerializerAttribute format, Dictionary existingInstance)
{
Dictionary dictionary = new Dictionary();
dictionary.letters = new Dictionary.Letters();
dictionary.letters.characters = new List<Dictionary.Letters.Character>();
dictionary.words = new Dictionary.Words();
dictionary.words.words = new HashSet<string>();
ReadToNextElement(input.Xml);
dictionary.letters.totalInstances = ReadAttributeInt(input.Xml, "TotalInstances");
ReadToNextElement(input.Xml);
while (input.Xml.Name == "Letter")
{
Dictionary.Letters.Character character = new Dictionary.Letters.Character();
character.count = ReadAttributeInt(input.Xml, "Count");
input.Xml.Read();
character.character = input.Xml.Value[0];
dictionary.letters.characters.Add(character);
ReadToNextElement(input.Xml);
}
dictionary.words.count = ReadAttributeInt(input.Xml, "Count");
for (int i = 0; i < dictionary.words.count; i++)
{
ReadToNextElement(input.Xml);
input.Xml.Read();
dictionary.words.words.Add(input.Xml.Value);
ReadToEndElement(input.Xml);
}
ReadToEndElement(input.Xml); // read to the end of words
ReadToEndElement(input.Xml); // read to the end of asset
return dictionary;
}
protected override void Serialize(IntermediateWriter output, Dictionary value, Microsoft.Xna.Framework.Content.ContentSerializerAttribute format)
{
throw new NotImplementedException();
}
}
}