I am trying to parse the tag structure of a tagged PDF using iTextSharp library.
When a tag contains more than one children tags we can access those children using the following code:
PdfDictionary docElement = kids.GetAsDict(0);
PdfArray kids_doc = docElement.GetAsArray(PdfName.K);
But when a tag contains only one child I am unable to use above code as it returns null instead of an PDFArray object.
So I tried to typecast PDFObject to PDFArray with following code:
var docElement = kids.GetAsDict(0);
PdfObject pdfObj = docElement.Get(PdfName.K);
PdfArray arr = (PdfArray)pdfObj;
but it throws exception as unable to convert PDFIndirectReference to PDFArray.
Can anyone help me to get the output as : kids of a Tag if it is >=1 in PDFArray object
I resolved the issue I was facing with a different approach : instead of converting PDFObject/PDFDictionary to PDFArray, I created new PDFArray object and then removed the existing key for PDFName.K and added the previously created array as PDFName.K. I am giving the solution as below:
PdfArray arr = new PdfArray();
PdfObject pdfObj = docElement.Get(PdfName.K);
arr.Add(pdfObj);
for (int i = 1; i < kidsCount; i++)
{
arr.Add(kids[i]);
}
docElement.Remove(PdfName.K);
docElement.Put(PdfName.K,arr);
Related
I have written a piece of code to create a word document by mail merge using Syncfusion (Assembly Syncfusion.DocIO.Portable, Version=17.1200.0.50,), Angular 7+ and .NET Core. Please see the code below.
private MemoryStream MergePaymentPlanInstalmentsScheduleToPdf(List<PaymentPlanInstalmentReportModel>
PaymentPlanDetails, byte[] templateFileBytes)
{
if (templateFileBytes == null || templateFileBytes.Length == 0)
{
return null;
}
var templateStream = new MemoryStream(templateFileBytes);
var pdfStream = new MemoryStream();
WordDocument mergeDocument = null;
using (mergeDocument = new WordDocument(templateStream, FormatType.Docx))
{
if (mergeDocument != null)
{
var mergeList = new List<PaymentPlanInstalmentScheduleMailMergeModel>();
var obj = new PaymentPlanInstalmentScheduleMailMergeModel();
obj.Applicants = 0;
if (PaymentPlanDetails != null && PaymentPlanDetails.Any()) {
var applicantCount = PaymentPlanDetails.GroupBy(a => a.StudentID)
.Select(s => new
{
StudentID = s.Key,
Count = s.Select(a => a.StudentID).Distinct().Count()
});
obj.Applicants = applicantCount?.Count() > 0 ? applicantCount.Count() : 0;
}
mergeList.Add(obj);
var reportDataSource = new MailMergeDataTable("Report", mergeList);
var tableDataSource = new MailMergeDataTable("PaymentPlanDetails", PaymentPlanDetails);
List<DictionaryEntry> commands = new List<DictionaryEntry>();
commands.Add(new DictionaryEntry("Report", ""));
commands.Add(new DictionaryEntry("PaymentPlanDetails", ""));
MailMergeDataSet ds = new MailMergeDataSet();
ds.Add(reportDataSource);
ds.Add(tableDataSource);
mergeDocument.MailMerge.ExecuteNestedGroup(ds, commands);
mergeDocument.UpdateDocumentFields();
using (var converter = new DocIORenderer())
{
using (var pdfDocument = converter.ConvertToPDF(mergeDocument))
{
pdfDocument.Save(pdfStream);
pdfDocument.Close();
}
}
mergeDocument.Close();
}
}
return pdfStream;
}
Once the document is generated, I notice there is a blank page (with the footer) at the end. I searched for a solution on the internet over and over again, but I was not able to find a solution. According to experts, I have done the initial checks such as making sure that the initial word template file has no page breaks, etc.
I am wondering if there is something that I can do from my code to remove any extra page breaks or anything like that, which can cause this.
Any other suggested solution for this, even including MS Word document modifications also appreciated.
Please refer the below documentation link to remove empty page at the end of Word document using Syncfusion Word library (Essential DocIO).
https://www.syncfusion.com/kb/10724/how-to-remove-empty-page-at-end-of-word-document
Please reuse the code snippet before converting Word to PDF in your sample application.
Note: I work for Syncfusion.
I have multiple copies of a .pdf document that are commented by different users. I would like to merge all these comments into a new pdf "merged".
I wrote this sub inside a class called document with properties "path" and "directory".
Public Sub MergeComments(ByVal pdfDocuments As String())
Dim oSavePath As String = Directory & "\" & FileName & "_Merged.pdf"
Dim oPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(Path),
New PdfWriter(New IO.FileStream(oSavePath, IO.FileMode.Create)))
For Each oFile As String In pdfDocuments
Dim oSecundairyPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(oFile))
Dim oAnnotations As New PDFannotations
For i As Integer = 1 To oSecundairyPDFdocument.GetNumberOfPages
Dim pdfPage As PdfPage = oSecundairyPDFdocument.GetPage(i)
For Each oAnnotation As Annot.PdfAnnotation In pdfPage.GetAnnotations()
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Next
Next
Next
oPDFdocument.Close()
End Sub
This code results in an exception that I am failing to solve.
iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.'
What do I need to change in order to perform this task? Or am I completely off with my code block?
You need to explicitly copy the underlying PDF object to the destination document. After that you will be easily able to add that object to the list of page annotations.
Instead of adding the annotation directly:
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Copy the object to the destination document first, wrap it into PdfAnnotation class with makeAnnotation method and then add it as usual. Code is in Java but you will easily be able to convert it into VB:
PdfObject annotObject = oAnnotation.getPdfObject().copyTo(pdfDocument);
pdfDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
Here is a working Java code, with annotations copied from one document to other using the copyTo method.
PdfReader reader = new PdfReader(new
RandomAccessSourceFactory().createBestSource(sourceFileName), null);
PdfDocument document = new PdfDocument(reader);
PdfReader toMergeReader = new PdfReader(new RandomAccessSourceFactory().createBestSource(targetFileName), null);
PdfDocument toMergeDocument = new PdfDocument(toMergeReader);
PdfWriter writer = new PdfWriter(targetFileName + "_MergedVersion.pdf");
PdfDocument writeDocument = new PdfDocument(writer);
int pageCount = toMergeDocument.getNumberOfPages();
for (int i = 1; i <= pageCount; i++) {
PdfPage page = document.getPage(i);
writeDocument.addPage(page.copyTo(writeDocument));
PdfPage pdfPage = toMergeDocument.getPage(i);
List<PdfAnnotation> pageAnnots = pdfPage.getAnnotations();
if (pageAnnots != null) {
for (PdfAnnotation pdfAnnotation : pageAnnots) {
PdfObject annotObject = pdfAnnotation.getPdfObject().copyTo(writeDocument);
writeDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
}
}
}
reader.close();
toMergeReader.close();
toMergeDocument.close();
document.close();
writeDocument.close();
writer.close();
I have retrieved data from database and i am able to show it in Table of the Table Viewer but my task is on click of a row the data should be appear in another form from where i can edit it and on click of update the view should be updated with the new values.I am using viewer.addSelectionChangedListener to retrieve the selected row but i am not getting how to update the table .Please suggest me few ideas
I have written the below code in the constructor of my class so whenever object is created Table is generated and l1 is list of data which i am passing to another UI
input[i] = new MyModel(persons[i].getDateOfRegistration(), persons[i].getFirstName(),
persons[i].getMiddleName(), persons[i].getLastName(), persons[i].getGender(),
persons[i].getDob(), persons[i].getContactNumber(), persons[i].getMaritalStatus(),
persons[i].getAddress(), persons[i].getCountry(), persons[i].getBloodGroup(),
persons[i].getInsuranceDetails().getPolicyHolderName(),
persons[i].getInsuranceDetails().getPolicyNumber(),
persons[i].getInsuranceDetails().getSubscriberName(),
persons[i].getInsuranceDetails().getRelationshipToPatient());
viewer.setInput(input);
table.setHeaderVisible(true);
table.setLinesVisible(true);
GridData gridData = new GridData();
gridData.verticalAlignment = GridData.FILL;
gridData.horizontalSpan = 2;
gridData.grabExcessHorizontalSpace = true;
gridData.grabExcessVerticalSpace = true;
gridData.horizontalAlignment = GridData.FILL;
viewer.getControl().setLayoutData(gridData);
viewer.addSelectionChangedListener(new ISelectionChangedListener() {
public void selectionChanged(SelectionChangedEvent event) {
IStructuredSelection selection = (IStructuredSelection) event.getSelection();
StringBuffer sb = new StringBuffer("Selection - ");
int j = 0;
String[] s;
for (Iterator iterator = selection.iterator(); iterator.hasNext();) {
sb.append(iterator.next() + ", ");
System.out.println("Testing::" + sb);
}
System.out.println(sb);
String result[] = new String[18];
List l1 = new ArrayList(100);
String[] parts = sb.toString().split("=");
for (int i = 0; i < parts.length; i++) {
System.out.println("s" + parts[i]);
String[] s1 = parts[i].split(",");
l1.add(s1[0]);
}
SWTPatientRegistrationUpdatePageEventView swtPatientRegistrationUpdatePageEventView = new SWTPatientRegistrationUpdatePageEventView();
swtPatientRegistrationUpdatePageEventView.openParentUpdateShell(l1);
// viewer.refresh();
//flag=false;
//refreshingData(list);
}
});
Make sure we will follow table viewer MVC architecture all the time. Steps we can do to resolve your issue isas follow:
Create proper datastructure(Model Class) to hold the data. and use the same in array which you set in viewer.setInput()
When you click/double click on table row, fetch the data and save it in datastructure created(with new object).
Pass that data structure to your form dialog.
when you are done with update in form. Fill the updated data in new Model object again.
And after form close, pass that model object back to parent.
updated the corresponding array element with object received from Form Dialog and just refresh the tableviewer.
Please let me know if you need some code module. Thanks
I currently use iTextSharp to read in some PDF files and parse them by using the string I receive. I have encountered a strange behavior with some PDF files. When getting the string back of a for example 4 page PDF, the string is filled with the pages in the following order:
1 2 1 3 1 4
My code for reading the files is as follows:
using (PdfReader reader = new PdfReader(fileStream))
{
StringBuilder sb = new StringBuilder();
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
for (int page = 0; page < reader.NumberOfPages; page++)
{
string text = PdfTextExtractor.GetTextFromPage(reader, page + 1, strategy);
if (!string.IsNullOrWhiteSpace(text))
sb.Append(Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text))));
}
Debug.WriteLine(sb.ToString());
}
Here is a link to a file with which this behaviour occurs:
https://onedrive.live.com/redir?resid=D9FEFF3BF45E05FD!1536&authkey=!AFLRlskAvlg89yY&ithint=file%2cpdf
Hope you guys can help me out!
Thanks to Chris Haas I found out was going wrong. The samples found online on how to use iTextSharp.Pdf are incorrect or incorrect for my implementation.
The SimpleTextExtractionStrategy needs to be instantiated for every page you try to read. Not doing this will multiply each previous page in the resulting string.
Also the line where the StringBuilder is being appended can be changed from:
sb.Append(Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text))));
to
sb.Append(text);
Thus the following code gives the correct result:
using (PdfReader reader = new PdfReader(fileStream))
{
StringBuilder sb = new StringBuilder();
for (int page = 0; page < reader.NumberOfPages; page++)
{
string text = PdfTextExtractor.GetTextFromPage(reader, page + 1, new SimpleTextExtractionStrategy());
if (!string.IsNullOrWhiteSpace(text))
sb.Append(text);
}
Debug.WriteLine(sb.ToString());
}
I am trying to use the iText library with c# to capture the text portion of pdf files.
I created a pdf from excel 2013 (exported) and then copied the sample from the web of how to use itext (added the lib ref to the project).
It reads perfectly the first page but it gets garbled info after that. It is keeping part of the first page and merging the info with the next page. The commented lines is when I was trying to solve the problem, the string "thePage" is recreated inside the for loop.
Here is the code. I can email the pdf to whoever can help with this issue.
Thanks in advance
public static string ExtractTextFromPdf(string path)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
//string[] theLines;
//theLines = new string[COLUMNS];
//string thePage;
for (int i = 1; i <= reader.NumberOfPages; i++)
{
string thePage = "";
thePage = PdfTextExtractor.GetTextFromPage(reader, i, its);
string [] theLines = thePage.Split('\n');
foreach (var theLine in theLines)
{
text.AppendLine(theLine);
}
// text.AppendLine(" ");
// Array.Clear(theLines, 0, theLines.Length);
// thePage = "";
}
return text.ToString();
}
}
A strategy object collects text data and does not know if a new page has started or not.
Thus, use a new strategy object for each page.