Concatenate multiple PDF/A with different conformance levels - itext

Is it possible to concatenate a number of pdf/a (with possibly different conformance levels: some pdf/a-1b, some pdf/a-3b ecc) into a single pdfa ?
I was thinking that using the latest level (3-a or 3b) would be ok but I get errors when validating with VeraPDF:
Here is my code (where :
public static byte[] CreateConformantCopy(List<byte[]> sourcePdfs)
{
var version = PdfVersion.PDF_1_7;
var type = PdfAType.PDF_A_3B;
WriterProperties wp = new WriterProperties();
wp.UseSmartMode();
wp.SetPdfVersion(version.ToPdfVersion());
PdfOutputIntent oi = new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", Assembly.GetExecutingAssembly().GetManifestResourceStream("xxx.Resources.sRGB_CS_profile.icm"));
using (var mergedPdf = new MemoryStream())
{
var writer = new PdfWriter(mergedPdf, wp);
using (PdfADocument newDoc = new PdfADocument(writer, type.ToPdfAConformanceLevel(), oi, new DocumentProperties() { }))
{
Document document = new Document(newDoc, PageSize.A4.Rotate());
newDoc.SetTagged();
newDoc.GetCatalog().SetLang(new PdfString(Thread.CurrentThread.CurrentUICulture.Name));
newDoc.GetCatalog().SetViewerPreferences(
new PdfViewerPreferences()
.SetDisplayDocTitle(true)
.SetCenterWindow(true)
);
PdfMerger merger = new PdfMerger(newDoc);
for (int k = 0; k < sourcePdfs.Count; k++)
{
using (var inDoc = PdfHelper.GetDocument(sourcePdfs[k]))
{
var numberOfPages = inDoc.GetNumberOfPages();
merger.Merge(inDoc, 1, numberOfPages);
}
}
newDoc.Close();
}
return mergedPdf.ToArray();
}
}

PDF/A-1 and PDF/A-2 have several differences in the requirements. So, merging them together might not be possible. Looking on your validation errors, I think this is exactly the case. For example, the very first one is about XMP metadata. The PDF/A-2 is more strict here, and you get this error because your first file (which is probably a valid PDF/A-1) does not actually satisfy the PDF/A-2 rules.
What is possible however is to attach a PDF/A-1 document to PDF/A-2 one. This does not even require the use of PDF/A-3, which allows arbitrary attachments. The PDF/A-2 standard does allow attaching valid PDF/A-1 (as well as PDF/A-2 documents).

Related

Use mongodb BsonSerializer to serialize and deserialize data

I have complex classes like this:
abstract class Animal { ... }
class Dog: Animal{ ... }
class Cat: Animal{ ... }
class Farm{
public List<Animal> Animals {get;set;}
...
}
My goal is to send objects from computer A to computer B
I was able to achieve my goal by using BinaryFormatter serialization. It enabled me to serialize complex classes like Animal in order to transfer objects from computer A to computer B. Serialization was very fast and I only had to worry about placing a serializable attribute on top of my classes. But now BinaryFormatter is obsolete and if you read on the internet future versions of dotnet may remove that.
As a result I have these options:
Use System.Text.Json
This approach does not work well with polymorphism. In other words I cannot deserialize an array of cats and dogs. So I will try to avoid it.
Use protobuf
I do not want to create protobuf map files for every class. I have over 40 classes this is a lot of work. Or maybe there is a converter that I am not aware of? But still how will the converter be smart enough to know that my array of animals can have cats and dogs?
Use Newtonsoft (json.net)
I could use this solution and build something like this: https://stackoverflow.com/a/19308474/637142 . Or even better serialize the objects with a type like this: https://stackoverflow.com/a/71398251/637142. So this will probably be my to go option.
Use MongoDB.Bson.Serialization.BsonSerializer Because I am dealing with a lot of complex objects we are using MongoDB. MongoDB is able to store a Farm object easily. My goal is to retrieve objects from the database in binary format and send that binary data to another computer and use BsonSerializer to deserialize them back to objects.
Have computer B connect to the database remotely. I cannot use this option because one of our requirements is to do everything through an API. For security reasons we are not allowed to connect remotely to the database.
I am hopping I can use step 4. It will be the most efficient because we are already using MongoDB. If we use step 3 which will work we are doing extra steps. We do not need the data in json format. Why not just sent it in binary and deserialize it once it is received by computer B? MongoDB.Driver is already doing this. I wish I knew how it does it.
This is what I have worked so far:
MongoClient m = new MongoClient("mongodb://localhost:27017");
var db = m.GetDatabase("TestDatabase");
var collection = db.GetCollection<BsonDocument>("Farms");
// I have 1s and 0s in here.
var binaryData = collection.Find("{}").ToBson();
// this is not readable
var t = System.Text.Encoding.UTF8.GetString(binaryData);
Console.WriteLine(t);
// how can I convert those 0s and 1s to a Farm object?
var collection = db.GetCollection<RawBsonDocument>(nameof(this.Calls));
var sw = new Stopwatch();
var sb = new StringBuilder();
sw.Start();
// get items
IEnumerable<RawBsonDocument>? objects = collection.Find("{}").ToList();
sb.Append("TimeToObtainFromDb: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var ms = new MemoryStream();
var largestSixe = 0;
// write data to memory stream for demo purposes. on real example I will write this to a tcpSocket
foreach (var item in objects)
{
var bsonType = item.BsonType;
// write object
var bytes = item.ToBson();
ushort sizeOfBytes = (ushort)bytes.Length;
if (bytes.Length > largestSixe)
largestSixe = bytes.Length;
var size = BitConverter.GetBytes(sizeOfBytes);
ms.Write(size);
ms.Write(bytes);
}
sb.Append("time to serialze into bson to memory: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
// now on the client side on computer B lets pretend we are deserializing the stream
ms.Position = 0;
var clones = new List<Call>();
byte[] sizeOfArray = new byte[2];
byte[] buffer = new byte[102400]; // make this large because if an document is larger than 102400 bytes it will fail!
while (true)
{
var i = ms.Read(sizeOfArray, 0, 2);
if (i < 1)
break;
var sizeOfBuffer = BitConverter.ToUInt16(sizeOfArray);
int position = 0;
while (position < sizeOfBuffer)
position = ms.Read(buffer, position, sizeOfBuffer - position);
//using var test = new RawBsonDocument(buffer);
using var test = new RawBsonDocumentWrapper(buffer , sizeOfBuffer);
var identityBson = test.ToBsonDocument();
var cc = BsonSerializer.Deserialize<Call>(identityBson);
clones.Add(cc);
}
sb.Append("time to deserialize from memory into clones: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var serializedjs = new List<string>();
foreach(var item in clones)
{
var foo = item.SerializeToJsStandards();
if (foo.Contains("jaja"))
throw new Exception();
serializedjs.Add(foo);
}
sb.Append("time to serialze into js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
foreach(var item in serializedjs)
{
try
{
var obj = item.DeserializeUsingJsStandards<Call>();
if (obj is null)
throw new Exception();
if (obj.IdAccount.Contains("jsfjklsdfl"))
throw new Exception();
}
catch(Exception ex)
{
Console.WriteLine(ex);
throw;
}
}
sb.Append("time to deserialize js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();

How to remove the extra page at the end of a word document which created during mail merge

I have written a piece of code to create a word document by mail merge using Syncfusion (Assembly Syncfusion.DocIO.Portable, Version=17.1200.0.50,), Angular 7+ and .NET Core. Please see the code below.
private MemoryStream MergePaymentPlanInstalmentsScheduleToPdf(List<PaymentPlanInstalmentReportModel>
PaymentPlanDetails, byte[] templateFileBytes)
{
if (templateFileBytes == null || templateFileBytes.Length == 0)
{
return null;
}
var templateStream = new MemoryStream(templateFileBytes);
var pdfStream = new MemoryStream();
WordDocument mergeDocument = null;
using (mergeDocument = new WordDocument(templateStream, FormatType.Docx))
{
if (mergeDocument != null)
{
var mergeList = new List<PaymentPlanInstalmentScheduleMailMergeModel>();
var obj = new PaymentPlanInstalmentScheduleMailMergeModel();
obj.Applicants = 0;
if (PaymentPlanDetails != null && PaymentPlanDetails.Any()) {
var applicantCount = PaymentPlanDetails.GroupBy(a => a.StudentID)
.Select(s => new
{
StudentID = s.Key,
Count = s.Select(a => a.StudentID).Distinct().Count()
});
obj.Applicants = applicantCount?.Count() > 0 ? applicantCount.Count() : 0;
}
mergeList.Add(obj);
var reportDataSource = new MailMergeDataTable("Report", mergeList);
var tableDataSource = new MailMergeDataTable("PaymentPlanDetails", PaymentPlanDetails);
List<DictionaryEntry> commands = new List<DictionaryEntry>();
commands.Add(new DictionaryEntry("Report", ""));
commands.Add(new DictionaryEntry("PaymentPlanDetails", ""));
MailMergeDataSet ds = new MailMergeDataSet();
ds.Add(reportDataSource);
ds.Add(tableDataSource);
mergeDocument.MailMerge.ExecuteNestedGroup(ds, commands);
mergeDocument.UpdateDocumentFields();
using (var converter = new DocIORenderer())
{
using (var pdfDocument = converter.ConvertToPDF(mergeDocument))
{
pdfDocument.Save(pdfStream);
pdfDocument.Close();
}
}
mergeDocument.Close();
}
}
return pdfStream;
}
Once the document is generated, I notice there is a blank page (with the footer) at the end. I searched for a solution on the internet over and over again, but I was not able to find a solution. According to experts, I have done the initial checks such as making sure that the initial word template file has no page breaks, etc.
I am wondering if there is something that I can do from my code to remove any extra page breaks or anything like that, which can cause this.
Any other suggested solution for this, even including MS Word document modifications also appreciated.
Please refer the below documentation link to remove empty page at the end of Word document using Syncfusion Word library (Essential DocIO).
https://www.syncfusion.com/kb/10724/how-to-remove-empty-page-at-end-of-word-document
Please reuse the code snippet before converting Word to PDF in your sample application.
Note: I work for Syncfusion.

Support for basic datatypes in H5Attributes?

I am trying out the beta hdf5 toolkit of ilnumerics.
Currently I see H5Attributes support only ilnumerics arrays. Is there any plan to extend it for basic datatypes (such as string) as part of the final release?
Does ilnumerics H5 wrappers provide provision for extending any functionality to a particular
datatype?
ILNumerics internally uses the official HDF5 libraries from the HDF Group, of course. H5Attributes in HDF5 correspond to datasets with the limitation of being not capable of partial I/O. Besides that, H5Attributes are plain arrays! Support for basic (scalar) element types is given by assuming the array stored to be scalar.
Strings are a complete different story: strings in general are variable length datatypes. In terms of HDF5 strings are arrays of element type Char. The number of characters in the string determines the length of the array. In order to store a string into a dataset or attribute, you will have to store its individual characters as elements of the array. In ILNumerics, you can convert your string into ILArrray or ILArray (for ASCII data) and store that into the dataset/ attribute.
Please consult the following test case which stores a string as value into an attribute and reads the content back into a string.
Disclaimer: This is part of our internal test suite. You will not be able to compile the example directly, since it depends on the existence of several functions which may are not available. However, you will be able to understand how to store strings into datasets and attributes:
public void StringASCIAttribute() {
string file = "deleteA0001.h5";
string val = "This is a long string to be stored into an attribute.\r\n";
// transfer string into ILArray<Char>
ILArray<Char> A = ILMath.array<Char>(' ', 1, val.Length);
for (int i = 0; i < val.Length; i++) {
A.SetValue(val[i], 0, i);
}
// store the string as attribute of a group
using (var f = new H5File(file)) {
f.Add(new H5Group("grp1") {
Attributes = {
{ "title", A }
}
});
}
// check by reading back
// read back
using (var f = new H5File(file)) {
// must exist in the file
Assert.IsTrue(f.Get<H5Group>("grp1").Attributes.ContainsKey("title"));
// check size
var attr = f.Get<H5Group>("grp1").Attributes["title"];
Assert.IsTrue(attr.Size == ILMath.size(1, val.Length));
// read back
ILArray<Char> titleChar = attr.Get<Char>();
ILArray<byte> titleByte = attr.Get<byte>();
// compare byte values (sum)
int origsum = 0;
foreach (var c in val) origsum += (Byte)c;
Assert.IsTrue(ILMath.sumall(ILMath.toint32(titleByte)) == origsum);
StringBuilder title = new StringBuilder(attr.Size[1]);
for (int i = 0; i < titleChar.Length; i++) {
title.Append(titleChar.GetValue(i));
}
Assert.IsTrue(title.ToString() == val);
}
}
This stores arbitrary strings as 'Char-array' into HDF5 attributes and would work just the same for H5Dataset.
As an alternative solution you may use HDF5DotNet (http://hdf5.net/default.aspx) wrapper to write attributes as strings:
H5.open()
Uri destination = new Uri(#"C:\yourFileLocation\FileName.h5");
//Create an HDF5 file
H5FileId fileId = H5F.create(destination.LocalPath, H5F.CreateMode.ACC_TRUNC);
//Add a group to the file
H5GroupId groupId = H5G.create(fileId, "groupName");
string myString = "String attribute";
byte[] attrData = Encoding.ASCII.GetBytes(myString);
//Create an attribute of type STRING attached to the group
H5AttributeId attrId = H5A.create(groupId, "attributeName", H5T.create(H5T.CreateClass.STRING, attrData.Length),
H5S.create(H5S.H5SClass.SCALAR));
//Write the string into the attribute
H5A.write(attributeId, H5T.create(H5T.CreateClass.STRING, attrData.Length), new H5Array<byte>(attrData));
H5A.close(attributeId);
H5G.close(groupId);
H5F.close(fileId);
H5.close();

Referencing other documents' content

We need to create a matrix from 2 other documents' contents. For example:
doc has fields like:
4.2 Requirements A
Blah
doc has fields like:
2.1 Analysis A
Blah Blah
and we want to create another document (called Traceability Matrix) which is like:
Col1 Col2 Col3
4.2 2.1 Blah Blah Blah
4.2 and 2.1 should be dynamically updated in doc3.
We checked using hyperlink, cross referencing but nothing seems to be useful for combining different documents. Is there anyway to do this?
EDIT:
Here is an example:
Technical Specification Num Requirement Num Requirement
4.2 2.1 A sentence that explains the relationship btw 2 cols: Technical Specification and Requirement Num
I have now created a working example of how this can be implemented using MS Word Interop and C#.
The code contains comments that should explain the most interesting parts.
The sample is implemented as a C# console application using:
.NET 4.5
Microsoft Office Object Library version 15.0, and
Microsoft Word Object Library version 15.0
... that is, the MS Word Interop API that ships with MS Office 2013 Preview.
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Office.Interop.Word;
using Application = Microsoft.Office.Interop.Word.Application;
namespace WordDocStats
{
internal class Program
{
private static void Main()
{
// Open word
var wordApplication = new Application() { Visible = true };
// Open document A, get its headings, and close it again
var documentA = wordApplication.Documents.Open(#"C:\Users\MyUserName\Documents\documentA.docx", Visible: true);
var headingsA = GetHeadingsInDocument(documentA);
documentA.Close();
// Same procedure for document B
var documentB = wordApplication.Documents.Open(#"C:\Users\MyUserName\Documents\documentB.docx", Visible: true);
var headingsB = GetHeadingsInDocument(documentB);
documentB.Close();
// Open the target document (document C)
var documentC = wordApplication.Documents.Open(#"C:\Users\MyUserName\Documents\documentC.docx", Visible: true);
// Add a table to it (the traceability matrix)
// The number of rows is the number of headings + one row reserved for a table header
documentC.Tables.Add(documentC.Range(0, 0), headingsA.Count+1, 3);
// Get the traceability matrix
var traceabilityMatrix = documentC.Tables[1];
// Add a table header and border
AddTableHeaderAndBorder(traceabilityMatrix, "Headings from document A", "Headings from document B", "My Description");
// Insert headings from doc A and doc B into doc C's traceability matrix
for (var i = 0; i < headingsA.Count; i++)
{
// Insert headings from doc A
var insertRangeColOne = traceabilityMatrix.Cell(i + 2, 1).Range;
insertRangeColOne.Text = headingsA[i].Trim();
// Insert headings from doc B
var insertRangeColTwo = traceabilityMatrix.Cell(i + 2, 2).Range;
insertRangeColTwo.Text = headingsB[i].Trim();
}
documentC.Save();
documentC.Close();
wordApplication.Quit();
}
// Based on:
// -> http://csharpfeeds.com/post/5048/Csharp_and_Word_Interop_Part_4_-_Tables.aspx
// -> http://stackoverflow.com/a/1817041/700926
private static void AddTableHeaderAndBorder(Table table, params string[] columnTitles)
{
const int headerRowIndex = 1;
for (var i = 0; i < columnTitles.Length; i++)
{
var tableHeaderRange = table.Cell(headerRowIndex, i+1).Range;
tableHeaderRange.Text = columnTitles[i];
tableHeaderRange.Font.Bold = 1;
tableHeaderRange.Font.Italic = 1;
}
// Repeat header on each page
table.Rows[headerRowIndex].HeadingFormat = -1;
// Enable borders
table.Borders.Enable = 1;
}
// Based on:
// -> http://stackoverflow.com/q/7084270/700926
// -> http://stackoverflow.com/a/7084442/700926
private static List<string> GetHeadingsInDocument(Document document)
{
object headingsAtmp = document.GetCrossReferenceItems(WdReferenceType.wdRefTypeHeading);
return ((Array)(headingsAtmp)).Cast<string>().ToList();
}
}
}
Basically, the code first loads all headings from the two given documents and stores them in memory. Then it opens the target document, creates and styles the traceability matrix, and finally, it inserts the headings into the matrix.
The code is based on the assumptions that:
A target document (documentC.docx) exists.
The number of headings in the two input documents (documentA.docx, and documentB.docx) contains the same amount of headings - this assumption is made based on your comment about not wanting a Cartesian product.
I hope this meets your requirements :)

How to best detect encoding in XML file?

To load XML files with arbitrary encoding I have the following code:
Encoding encoding;
using (var reader = new XmlTextReader(filepath))
{
reader.MoveToContent();
encoding = reader.Encoding;
}
var settings = new XmlReaderSettings { NameTable = new NameTable() };
var xmlns = new XmlNamespaceManager(settings.NameTable);
var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default,
encoding);
using (var reader = XmlReader.Create(filepath, settings, context))
{
return XElement.Load(reader);
}
This works, but it seems a bit inefficient to open the file twice. Is there a better way to detect the encoding such that I can do:
Open file
Detect encoding
Read XML into an XElement
Close file
Ok, I should have thought of this earlier. Both XmlTextReader (which gives us the Encoding) and XmlReader.Create (which allows us to specify encoding) accepts a Stream. So how about first opening a FileStream and then use this with both XmlTextReader and XmlReader, like this:
using (var txtreader = new FileStream(filepath, FileMode.Open))
{
using (var xmlreader = new XmlTextReader(txtreader))
{
// Read in the encoding info
xmlreader.MoveToContent();
var encoding = xmlreader.Encoding;
// Rewind to the beginning
txtreader.Seek(0, SeekOrigin.Begin);
var settings = new XmlReaderSettings { NameTable = new NameTable() };
var xmlns = new XmlNamespaceManager(settings.NameTable);
var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default,
encoding);
using (var reader = XmlReader.Create(txtreader, settings, context))
{
return XElement.Load(reader);
}
}
}
This works like a charm. Reading XML files in an encoding independent way should have been more elegant but at least I'm getting away with only one file open.
Another option, quite simple, is to use Linq to XML. The Load method automatically reads the encoding from the xml file. You can then get the encoder value by using the XDeclaration.Encoding property.
An example from MSDN:
// Create the document
XDocument encodedDoc16 = new XDocument(
new XDeclaration("1.0", "utf-16", "yes"),
new XElement("Root", "Content")
);
encodedDoc16.Save("EncodedUtf16.xml");
Console.WriteLine("Encoding is:{0}", encodedDoc16.Declaration.Encoding);
Console.WriteLine();
// Read the document
XDocument newDoc16 = XDocument.Load("EncodedUtf16.xml");
Console.WriteLine("Encoded document:");
Console.WriteLine(File.ReadAllText("EncodedUtf16.xml"));
Console.WriteLine();
Console.WriteLine("Encoding of loaded document is:{0}", newDoc16.Declaration.Encoding);
While this may not server the original poster, as he would have to refactor a lot of code, it is useful for someone who has to write new code for their project, or if they think that refactoring is worth it.