Itext 7 SplitByOutlines - final document stays open and can't be closed - itext

I've written a custom splitter to split my PDF by the outlines/bookmarks. It works but the problem is the last document remains open and is corrupted. The document shows there are 71 outlines but the splitter returns back only 70 even though it created 71 documents.
Here is the custom splitter:
class CustomSplitter : PdfSplitter
{
private int _order;
private readonly string _destinationFolder;
private readonly string _podName;
private readonly IList<string> _splitFileNames;
public CustomSplitter(PdfDocument pdfDocument, string destinationFolder, string podName, IList<string> splitFileNames) : base(pdfDocument)
{
_destinationFolder = destinationFolder;
_order = 0;
_podName = podName;
_splitFileNames = splitFileNames;
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
string splitFileName = _destinationFolder + "\\" + _podName + _order++ + ".pdf";
_splitFileNames.Add(splitFileName);
return new PdfWriter(splitFileName);
}
}
I need to keep track of the names of the files so I can rename them. I call my custom splitter with this code:
IList<string> splitFileNames = new List<string>();
PdfSplitter pdfSplitter = new CustomSplitter(pdfDoc, yearDir, fileName, splitFileNames);
IList<PdfDocument> splitDocs = pdfSplitter.SplitByOutlines(outlineNames);
This is the first time I am posting here. I did search and found nothing that used SplitByOutlines.
Thank you.

Related

Property of the class is not displayed in XML string after serialization

I'm running an app calling a web reference and assigning different values to its properties.
As I'm debugging the app, I see the object has all the properties needed but when I run the following logic:
private string ClassToXML(Object classObject)
{
var myString = new System.IO.StringWriter();
var serializer = new XmlSerializer(classObject.GetType());
serializer.Serialize(myString, classObject);
return myString.ToString();
}
MyString returns an XML string containing all the properties that are not System.Nullable.
I run a console test app where I created the same Web references used in the original code.
This is a snippet:
class Program
{
static void Main(string[] args)
{
MerchantWSBO merchantWSBO = new MerchantWSBO();
MerchantWSVO merchantWSVO = new MerchantWSVO();
merchantWSVO.discoverRetained = true;
merchantWSBO.overview = merchantWSVO;
string xmlToSend = ClassToXML(merchantWSBO);
}
private static string ClassToXML(Object classObject)
{
var myString = new System.IO.StringWriter();
var serializer = new XmlSerializer(classObject.GetType());
serializer.Serialize(myString, classObject);
return myString.ToString();
}
}
And the above logic builds an XML String and that string has all the properties I need.
I don't see anything wrong, but the same properties are not being displayed in the XML String in my original app.
I'm not sure what's wrong with that.

VSCode: Create a document in memory with URI for automated testing?

Background
I created an extension that interacts with documents. In order to test the extension I need to create documents, that the extension can work with. The extension has to access the document via uri.
Currently I'm using vscode.workspace.openTextDocument({content: _content, language: _language}); for document creation. The problem is, it does not have a valid URI.
Question
How can I create a virtual document in memory, that has a valid URI?
As there was no native solution to this, I created my and I'd like to share it here:
A TextDocumentContentProvider for files in memory. Example usage shown below
memoryfile.ts
import * as vscode from 'vscode';
const _SCHEME = "inmemoryfile";
/**
* Registration function for In-Memory files.
* You need to call this once, if you want to make use of
* `MemoryFile`s.
**/
export function register_memoryFileProvider ({ subscriptions }: vscode.ExtensionContext)
{
const myProvider = new (class implements vscode.TextDocumentContentProvider
{
provideTextDocumentContent(uri: vscode.Uri): string
{
let memDoc = MemoryFile.getDocument (uri);
if (memDoc == null)
return "";
return memDoc.read ();
}
})();
subscriptions.push(vscode.workspace.registerTextDocumentContentProvider(
_SCHEME, myProvider));
}
/**
* Management class for in-memory files.
**/
class MemoryFileManagement
{
private static _documents: {[key: string]: MemoryFile} = {};
private static _lastDocId: number = 0;
public static getDocument(uri: vscode.Uri) : MemoryFile | null
{
return MemoryFileManagement._documents[uri.path];
}
private static _getNextDocId(): string{
MemoryFileManagement._lastDocId++;
return "_" + MemoryFileManagement._lastDocId + "_";
}
public static createDocument(extension = "")
{
let path = MemoryFileManagement._getNextDocId ();
if (extension != "")
path += "." + extension;
let self = new MemoryFile(path);
MemoryFileManagement._documents[path] = self;
return self;
}
}
/**
* A file in memory
**/
export class MemoryFile
{
/******************
** Static Area **
******************/
public static getDocument(uri: vscode.Uri) : MemoryFile | null {
return MemoryFileManagement.getDocument (uri);
}
public static createDocument(extension = "") {
return MemoryFileManagement.createDocument (extension);
}
/******************
** Object Area **
******************/
public content: string = "";
public uri: vscode.Uri;
constructor (path: string)
{
this.uri = vscode.Uri.from ({scheme: _SCHEME, path: path})
}
public write(strContent: string){
this.content += strContent;
}
public read(): string {
return this.content;
}
public getUri(): vscode.Uri {
return this.uri;
}
}
Example usage
Register the provider
You need to register the provider somewhere in the beginning of your test code (I do it in index.ts before Mocha is instantiated):
register_memoryFileProvider (extensionContext);
(How do I get the extension context?)
Create a document
Creating and using a file works as follows:
// create the in-memory document
let memfile = MemoryFile.createDocument ("ts");
memfile.write ("my content");
// create a vscode.TextDocument from the in-memory document.
let doc = await vscode.workspace.openTextDocument (memfile.getUri ());
Notes
Be aware, that LSP commands might not work with with approach, because they might be registered to a certain specific schema.
As rioV8 said, you can also use an existing document and change its content. Here the code:
export class TmpFile
{
private static _lastDocId: number = 0;
private static _getNextDocId(): string{
this._lastDocId++;
return "tmpfile_" + this._lastDocId;
}
public static async createDocument(strContent: string, extension:string = "")
: Promise<vscode.TextDocument | null>
{
let folder = "/tmp"
let filename = this._getNextDocId ();
let ext = (extension != "" ? "." + extension : "");
const newFile = vscode.Uri.parse('untitled:' + path.join(folder, filename + ext));
{
const edit = new vscode.WorkspaceEdit();
edit.insert(newFile, new vscode.Position(0, 0), strContent);
let success = await vscode.workspace.applyEdit(edit);
if (!success)
return null;
}
let document = await vscode.workspace.openTextDocument(newFile);
return document;
}
}
Pro's
It's a file (schema), so all LSP commands will work
The path (used above) does not even need to exist.
Con's
The file is really opened in the editor. You need to close it later
The file is a changed file in the editor, so it will ask you to save the changes upon closing.
Files cannot be closed in vscode. You can only run:
vscode.window.showTextDocument(doc.uri, {preview: true, preserveFocus: false})
.then(() => {
return vscode.commands.executeCommand('workbench.action.closeActiveEditor');
});
```<br>
which is a rather nasty workaround.

ITextSharp / PDFBox text extract fails for certain pdfs

The code below extracts the text from a PDF correctly via ITextSharp in many instances.
using (var pdfReader = new PdfReader(filename))
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
var currentText = PdfTextExtractor.GetTextFromPage(
pdfReader,
1,
strategy);
currentText =
Encoding.UTF8.GetString(Encoding.Convert(
Encoding.Default,
Encoding.UTF8,
Encoding.Default.GetBytes(currentText)));
Console.WriteLine(currentText);
}
However, in the case of this PDF I get the following instead of text: "\u0001\u0002\u0003\u0004\u0005\u0006\a\b\t\a\u0001\u0002\u0003\u0004\u0005\u0006\u0003"
I have tried different encodings and even PDFBox but still failed to decode the PDF correctly. Any ideas on how to solve the issue?
Extracting the text nonetheless
#Bruno's answer is the answer one should give here, the PDF clearly does not provide the information required to allow proper text extraction according to section 9.10 Extraction of Text Content of the PDF specification ISO 32000-1...
But there actually is a slightly evil way to extract the text from the PDF at hand nonetheless!
Wrapping one's text extraction strategy in an instance of the following class, the garbled text is replaced by the correct text:
public class RemappingExtractionFilter : ITextExtractionStrategy
{
ITextExtractionStrategy strategy;
System.Reflection.FieldInfo stringField;
public RemappingExtractionFilter(ITextExtractionStrategy strategy)
{
this.strategy = strategy;
this.stringField = typeof(TextRenderInfo).GetField("text", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
}
public void RenderText(TextRenderInfo renderInfo)
{
DocumentFont font =renderInfo.GetFont();
PdfDictionary dict = font.FontDictionary;
PdfDictionary encoding = dict.GetAsDict(PdfName.ENCODING);
PdfArray diffs = encoding.GetAsArray(PdfName.DIFFERENCES);
;
StringBuilder builder = new StringBuilder();
foreach (byte b in renderInfo.PdfString.GetBytes())
{
PdfName name = diffs.GetAsName((char)b);
String s = name.ToString().Substring(2);
int i = Convert.ToInt32(s, 16);
builder.Append((char)i);
}
stringField.SetValue(renderInfo, builder.ToString());
strategy.RenderText(renderInfo);
}
public void BeginTextBlock()
{
strategy.BeginTextBlock();
}
public void EndTextBlock()
{
strategy.EndTextBlock();
}
public void RenderImage(ImageRenderInfo renderInfo)
{
strategy.RenderImage(renderInfo);
}
public String GetResultantText()
{
return strategy.GetResultantText();
}
}
It can be used like this:
ITextExtractionStrategy strategy = new RemappingExtractionFilter(new LocationTextExtractionStrategy());
string text = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
Beware, I had to use System.Reflection to access private members. Some environments may forbid this.
The same in Java
I initially coded this in Java for iText because that's my primary development environment. Thus, here the initial Java version:
public class RemappingExtractionFilter implements TextExtractionStrategy
{
public RemappingExtractionFilter(TextExtractionStrategy strategy) throws NoSuchFieldException, SecurityException
{
this.strategy = strategy;
this.stringField = TextRenderInfo.class.getDeclaredField("text");
this.stringField.setAccessible(true);
}
#Override
public void renderText(TextRenderInfo renderInfo)
{
DocumentFont font =renderInfo.getFont();
PdfDictionary dict = font.getFontDictionary();
PdfDictionary encoding = dict.getAsDict(PdfName.ENCODING);
PdfArray diffs = encoding.getAsArray(PdfName.DIFFERENCES);
;
StringBuilder builder = new StringBuilder();
for (byte b : renderInfo.getPdfString().getBytes())
{
PdfName name = diffs.getAsName((char)b);
String s = name.toString().substring(2);
int i = Integer.parseUnsignedInt(s, 16);
builder.append((char)i);
}
try
{
stringField.set(renderInfo, builder.toString());
}
catch (IllegalArgumentException | IllegalAccessException e)
{
e.printStackTrace();
}
strategy.renderText(renderInfo);
}
#Override
public void beginTextBlock()
{
strategy.beginTextBlock();
}
#Override
public void endTextBlock()
{
strategy.endTextBlock();
}
#Override
public void renderImage(ImageRenderInfo renderInfo)
{
strategy.renderImage(renderInfo);
}
#Override
public String getResultantText()
{
return strategy.getResultantText();
}
final TextExtractionStrategy strategy;
final Field stringField;
}
(RemappingExtractionFilter.java)
It can be used like this:
String extractRemapped(PdfReader reader, int pageNo) throws IOException, NoSuchFieldException, SecurityException
{
TextExtractionStrategy strategy = new RemappingExtractionFilter(new LocationTextExtractionStrategy());
return PdfTextExtractor.getTextFromPage(reader, pageNo, strategy);
}
(from RemappedExtraction.java)
Why does this work?
First of all, this is not the solution to all extraction problems, merely for extracting text from PDFs like the OP has presented.
This method works because the names the PDF uses in its fonts' encoding differences arrays can be interpreted even though they are not standard. These names are built as /Gxx where xx is the hexadecimal representation of the ASCII code of the character this name represents.
A good test to find out whether or not a PDF allows text to be extracted correctly, is by opening it in Adobe Reader and to copy and paste the text.
For instance: I copied the word ABSTRACT and I pasted it in Notepad++:
Do you see the word ABSTRACT in Notepad++? No, you see %&SOH'"%GS. The A is represented as %, the B is represented as &, and so on.
This is a clear indication that the content of the PDF isn't accessible: there is no mapping between the encoding that was use (% = A, & = B,...) and the actual characters that humans can understand.
In short: the PDF doesn't allow you to extract text, not with iText, not with iTextSharp, not with PDFBox. You'll have to find an OCR tool instead and OCR the complete document.
For more info, you may want to watch the following videos:
https://www.youtube.com/watch?v=4ur9WRWVrbM (~5 minutes)
https://www.youtube.com/watch?v=wxGEEv7ibHE (~15 minutes)
https://www.youtube.com/watch?v=g-QcU9B4qMc (~45 minutes)

Google Web Toolkit - How to prevent duplication of the records?

I am new in GWT. I am trying to use the cell table to do this. Here is my questions:
Name Gender
Ali M
Abu M
Siti F
page 1
Name Gender
Siti F
Noor F
Ahmad F
page 2
I use simple pager to do the paging function. Everything is ok except next page.
When i click next page, siti record appear 2 times.
How to prevent the name Siti not appear in page 2? Below are my code:
private static class Contact{
private final String name;
private final String gender;
public Contact(String name, String gender){
this.name = name;
this.gender = gender;
}
}
private static final List<Contact> CONTACTS = Arrays.asList(
new Contact("Ali","M"),
new Contact("Abu","M"),
new Contact("Siti","F"),
new Contact("Noor","F"),
new Contact("Ahmad","M")
);
public void onModuleLoad(){
final CellTable<Contact> table = new CellTable<Contact>();
table.setPageSize(3);
TextColumn<Contact> nameColumn = new TextColumn<Contact>(){
#Override
public String getValue(Contact object) {
return object.name;
}
};
TextColumn<Contact> genderColumn = new TextColumn<Contact>(){
#Override
public String getValue(Contact object) {
return object.gender;
}
};
table.addColumn(nameColumn, "Name");
table.addColumn(genderColumn, "Gender");
AsyncDataProvider<Contact> provider = new AsyncDataProvider<Contact>(){
#Override
protected void onRangeChanged(HasData<Contact> display) {
int start = display.getVisibleRange().getStart();
int end = start + display.getVisibleRange().getLength();
end = end >= CONTACTS.size() ? CONTACTS.size() : end;
List<Contact> sub = CONTACTS.subList(start,end);
updateRowData(start,sub);
}
};
provider.addDataDisplay(table);
provider.updateRowCount(CONTACTS.size(), true);
SimplePager.Resources pagerResources = GWT.create(SimplePager.Resources.class);
SimplePager pager = new SimplePager(TextLocation.CENTER, pagerResources, false, 0, true);
pager.setDisplay(table);
Please help me to solve this problem. Thanks.
You have faced most probably the gwt last page problem, described in the linked questions:
GWT - celltable with simple pager issue
SimplePager row count is working incorrectly
The solution here is to set:
setRangeLimited(false)
and the last page is paged correctly, ie. it contains only Noor and Ahmad.
So in conclusion: actually no duplication is present here, but a bug on pagination in case of the last page. You will observe the same behavior with also other amounts of data, but on my view point it would be always a last page issue only.

GWT-Editors and sub-editors

I'm trying to run an example of Editors with sub-editors.
When flushing the parent the value of child editor is null.
The classes are Person and Address.
The main editor is:
// editor fields
public TextField firstname;
public TextField lastname;
public NumberField<Integer> id;
public AddressEditor address = new AddressEditor();
public PersonEditor(Person p){
asWidget();
}
#Override
public Widget asWidget() {
setWidth(400);
setBodyStyle("padding: 5px;");
setHeaderVisible(false);
VerticalLayoutContainer c = new VerticalLayoutContainer();
id = new NumberField<Integer>(new IntegerPropertyEditor());
// id.setName("id");
id.setFormat(NumberFormat.getFormat("0.00"));
id.setAllowNegative(false);
c.add(new FieldLabel(id, "id"), new VerticalLayoutData(1, -1));
firstname = new TextField();
// firstname.setName("firstname");
c.add(new FieldLabel(firstname, "firstname"), new VerticalLayoutData(1, -1));
lastname = new TextField();
lastname.setName("lastname");
c.add(new FieldLabel(lastname, "lastname"), new VerticalLayoutData(1, -1));
c.add(address);
add(c);
return this;
The sub-editor:
public class AddressEditor extends Composite implements Editor<Address> {
private AddressProperties props = GWT.create(AddressProperties.class);
private ListStore<Address> store = new ListStore<Address>(props.key());
ComboBox<Address> address;
public AddressEditor() {
for(int i = 0; i < 5; i ++)
store.add(new Address("city" + i));
address = new ComboBox<Address>(store, props.nameLabel());
address.setAllowBlank(false);
address.setForceSelection(true);
address.setTriggerAction(TriggerAction.ALL);
initWidget(address);
}
And this is where the Driver is created:
private HorizontalPanel hp;
private Person googleContact;
PersonDriver driver = GWT.create(PersonDriver.class);
public void onModuleLoad() {
hp = new HorizontalPanel();
hp.setSpacing(10);
googleContact = new Person();
PersonEditor pe = new PersonEditor(googleContact);
driver.initialize(pe);
driver.edit(googleContact);
TextButton save = new TextButton("Save");
save.addSelectHandler(new SelectHandler() {
#Override
public void onSelect(SelectEvent event) {
googleContact = driver.flush();
System.out.println(googleContact.getFirstname() + ", " + googleContact.getAddress().getCity());
if (driver.hasErrors()) {
new MessageBox("Please correct the errors before saving.").show();
return;
}
}
});
The value of googleContact.getFirstname() is filled but googleContact.getAddress() is always null.
What I'm missing?
The AddressEditor needs to map to the Address model - presently, it doesn't seem to, unless Address only has one getter/setter, called getAddress() and setAddress(Address), which wouldn't really make a lot of sense.
If you want just a ComboBox<Address> (which implements Editor<Address> already), consider putting that combo in the PersonEditor directly. Otherwise, you'll need to add #Path("") to the AddressEditor.address field, to indicate that it should be directly editing the value itself, and not a sub property (i.e. person.getAddress().getAddress()).
Another way to build an address editor would be to list each of the properties of the Address type in the AddressEditor. This is what the driver is expecting by default, so it is confused when it sees a field called 'address'.
Two quick thoughts on the code itself: there is no need to pass a person into the PersonEditor - thats the job of the driver itself. Second, your editor fields do not need to be public, they just can't be private.