Migrating from itext2 to itext7 - itext

Years ago, I wrote a small app in itext2 to gather reports on a weekly basis and concatenate them into one PDF. The app used com.lowagie.text.pdf.PdfCopy to copy and merge the PDFs. And it worked fine. Performed exactly as expected.
A few weeks ago I looked into migrating the application to itex7. To that end, I used the copyPagesTo method of com.itextpdf.kernel.pdf.PdfDocument. When run on the same file set, this produces warnings like:
WARN PdfNameTree - Name "section.1" already exists in the name tree; old value will be replaced by the new one.
When I click on the link to "section.1" in the first document of the merged PDF, I am taken to "section.1" of the last document. Not what I expected and not what happens when using the itext2 app. In the PDF's produced by itext2, if I click on the link to "section.1" of the first document in the combined PDF, I am taken to section 1 of the first document.
There is a hint in Javadocs for copyPagesTo saying
If outlines destination names are the same in different documents, all
such outlines will lead to a single location in the resultant
document. In this case iText will log a warning. This can be avoided
by renaming destinations names in the source document.
There is however, no explanation of how this should be done. I find it odd that this should be necessary in itext7, although it wasn't in itext2.
Is there a simple way to get around his problem?
I've also tried the Sejda desktop app and it produces correct results, but I would prefer to automate the process through a batch script.

My guess is iText 2 didn't even know it might be a problem.
If iText can't deduplicate destination names, the procedure is roughly:
Follow /Catalog -> /Names -> /Dests in each document to find the destination name tree.
Deduplicate the names, by adding suffixes. Remember that a name with a suffix added might be equal to an existing name in the same or another document. Be careful!
Now you can rewrite the destination name trees. Since you have only used suffixes, you can do this in place - the lexicographic ordering of the names is unaltered so the search tree structure is not broken.
Now, rewrite destination links in each PDF for the new names. For example any dictionary entry with key /Dest, or any /D in a /GoTo action.
Now, after all this preprocessing, the files will merge without name clashes.
(I know all this because I've just implemented it for my own PDF software. It's slightly hairy stuff, but not intractable.)
If you like, I can provide a devel version of cpdf with this functionality, if you would like to test it.

Related

VS Code Regex search to remove references based on containing text in string

I am attempting to remove all references of a managed package that is going to be uninstalled that spans throughout code base in VS Code
I have using a query to find the field permissions but am wondering if there is a way to search for the reference outside of specifying the exact field name compared to the field containing only "agf" since they are all using it.
Below is the search query:
<fieldPermissions>
<editable>false</editable>
<field>User.agf_Certified_Product_Owner__c</field>
<readable>false</readable>
</fieldPermissions>
In the field, I want to be able to find and delete the 5 associated lines from multiple files if they match "agf" in any combination. Something like the below:
<fieldPermissions>
<editable>false</editable>
<field>agf</field>
<readable>false</readable>
</fieldPermissions>
With any combination of agf in the field, delete all from any file it appears in.
Not an answer but too long for a comment
You don't have to? Profiles/perm sets don't block package's delete. Probably neither do reports.
You'd use your time better by searching for all instances of agf__ (that's with double underscore), should find fields, objects... used in classes, flows, page layouts etc. And search for agf. (with dot) should find all instances where your Apex code calls their classes marked as global.
Alternatively Apex / VF pages with dependencies on package will have it listed in their "meta.xml", for example
<?xml version="1.0" encoding="UTF-8"?>
<ApexClass xmlns="http://soap.sforce.com/2006/04/metadata">
<apiVersion>54.0</apiVersion>
<packageVersions>
<majorNumber>236</majorNumber>
<minorNumber>1</minorNumber>
<namespace>SBQQ</namespace>
</packageVersions>
<status>Active</status>
</ApexClass>
Last but not least - why not just spawn a dev sandbox and attempt the delete there? If it succeeds - great. If not - it'll list the dependencies that blocked the delete. It'll be "the real thing", it'll smite you even if your VSCode project doesn't contain all flows, layouts and thus could lull you into false sense of security. I'd seriously do it in sandbox and then run all tests for good measure, just in case there are some dynamic soql queries that don't count as hard, delete-blocking references.
After delete's done - fetch Profiles / Permsets from this org and the field references will be gone from the xml.

"Two output file names resolved to the same output path" error when nesting more than one .resx file within form in .NET application

I have a Windows Forms .NET application in Visual Studio. Making a form "Localizable" adds a Form1.resx file nested below the form. I also want to add a separate .resx file for each form (Form1Resources.resx). This is used for custom form-specific resources, e.g. messages generated using the code behind.
This is set up as follows:
It would be tidier to nest the custom .resx file beneath the form (see this question for details about nest how to do this), as follows:
However, this results in the following error when I build the application:
Two output file names resolved to the same output path:
"obj\Debug\WindowsFormsApp1.Form1.resources" WindowsFormsApp1
I'm guessing that MSBuild uses some logic to find nested .resx files and generate .resources file based on its parent. Is there any way that this can be resolved?
Note that it is not possible to add custom messages to the Form1.resx file - this is for design-specific resources only and any resources that you add get overwritten when you save changes in design mode.
The error comes from the GenerateResource task because the 2 resx files (EmbeddedResource items in msbuild) passed both have the same ManifestResourceName metadata value. That values gets created by the CreateManifestResourceNames task and assumingly when it sees an EmbeddedResource which has the DependentUpon metadata set (to Form1.cs in your case) it always generates something of the form '$(RootNamespace).%(DependentUpon)': both your resx files end up with WindowsFormsApp1.Form1 as ManifestResourceName. Which could arguably be treated as the reason why having all resx files under Form1 is not tidier: it's not meant for it, requires extra fiddling, moreover it could be confusing for others since they'd typcially expect to contain the resx fils placed beneath a form to contain what it always does.
Anyway: there's at least 2 ways to work around this:
there's a Target called CreateCustomManifestResourceNames which is meant to be used for custom ManifestResourceName creation. A bit too much work for your case probably, just mentioning it for completeness
manually declare a ManifestResourceName yourself which doesn't clash with the other(s); if the metadata is already present it won't get overwritten by
Generic code sample:
<EmbeddedResource Include="Form1Resources.resx">
<DependentUpon>Form1.cs</DependentUpon>
<ManifestResourceName>$(RootNamespace).%(FileName)</ManifestResourceName>
...
</EmbeddedResource>

LibXML: Comment-out a block of Elements

IS there a way to add/initate a comment ( e.g. $dom->createComment ... ) such that it comments out an entire block of xml tags. Basically I want to turn-off the content between the comment.
For example, it would look like this:
<TT>
<AA>keep</AA>
<!-- comment to blocking
<BB>hideme1</BB>
<CC>hideme2</CC>
-->
<DD>d's content is good</DD>
</TT>
Actually this question is a pre-cursor to my attempt to figure-out a method to be able to markup/label/identify the changes to an xml files in support of new client software functionality, but be able to have the ability to remove / back-out these xml changes in the rare event the client needs to fall back to the previous software version (and no I can't just simply point back to the original xml file because the client is allowed to make minor modifications to existing node text values). This is all going to be controlled via a perl script and LibXML's core modules (I can't use modules the client doesn't have).
So basically I've identified three possible types of xml changes as a result of new client sw functionality:
1.) ADD new element node(s) (typically to support new sw functionality)
2.) DELETE element node(s), or blocks of (would be rare, but never-the-less a possibility)
3.) CHANGE node text values (rare, but the new sw may require a new value)
For all three types, the client needs the ability to back out the changes. One thing I was thinking to use is ATTRIBUTES since the existing xml files don't use them. For example, for an ADD change type, I could include an atribute like 'ADD="sw version 4.1"' . This way if it needs to be removed, I could just simply have the perl script find those attribute strings and delete them (using LibXML methods). Same thing with CHANGE change type - I could use an attribute like CHG="newvalue_oldvalue", then again use straight perl (or LibXML) to switch back the value based on the contents of the attribute. The DELETE change type is giving me a problem though (as welll as the others lol!). I want to be able to "keep" the deleted lines in the xml file soley for the purposes if the sw falls back a version (at some late point the perl script could eventually cleanup/delete them).
I know this is a lot, I'm new to LibXML (but not to perl). I was just wonder if any of you have any thoughts as to how to go about it or seen anything resembling this kind of request ... I'd be grateful for any kind of advice! Thank you...

Generate a torrent/magnet link from a single file in a torrent collection

I was wondering if it is possible, having a torrent collection (IE a torrent containing multiple files) to extract a single one, generating an almost new torrent/magnet link to download only that single file but using the same source (announce, etc), instead of dowloading the whole torrent and then select what to download or not.
Thanks for any hint about.
2019 Update: Yes, you now can! In 2017 a draft BEP was released that covers the question's behaviour for magnet URIs! This is great, as it creates a standard that keeps a consistent info_hash between a magnet URI pointing to the multi-file torrent, and a magnet URI pointing to a single file within that multi-file torrent. They will share a swarm, which means you can, as the question asks "[generate] an almost new torrent/magnet link to download only that single file but using the same source".
The draft BEP:
http://www.bittorrent.org/beps/bep_0053.html BEP 53: "Magnet URI extension - Select specific file indices for download"
Example URI to request files 0, 2, 4 and the inclusive range 6 through to 8:
magnet:?xt=urn:btih:HASH&dn=NAME&tr=TRACKER&so=0,2,4,6-8
And the draft BEP is making it's way into bittorrent libraries:
https://gitlab.com/proninyaroslav/libretorrent/tags/1.9 LibreTorrent 1.9 2018-NOV-26
https://github.com/webtorrent/webtorrent/issues/1395 Webtorrent 0.100.0 2018-MAY-23
2013-MAY-03 Original Answer:
Sometimes yes, but not often, and the resulting swarm has no peers.
Firstly, you need the original .torrent file, so if you only have a magnet URI you need to resolve that to a .torrent using DHT. Any bittorrent library that supports magnet URIs has the code for that task.
Once you have the .torrent, you then need to get the hashes relating to the file you're interested in. The .torrent file contains a very long string, each 20 bytes representing the hash of each piece in the torrent. Piece length is fixed for a torrent, typically between 256KB and 1MB. If the file starts at exactly a piece offset, and is sized equal to a multiple of the piece size or is the last file in the torrent then you can reuse these hashes. You can then create a new .torrent file with that information, and generate a new magnet URI from the torrent file, re-using the announce or using a new one.
Torrent info structure: https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure
Being lucky enough to get that offset is unlikely, with a piece length generally varying between 256KB & 1MB, you have a 1/262144 to 1/268435456 chance of getting that offset (given that a file could start anywhere in a piece), so the circumstance is unlikely. If you can't re-use the hashes, you need to generate new hashes which means you can't re-use the .torrent and would need to download the files to generate the new piece hashes.
The killer is that in the end, the torrent created has a different info_hash. The info_hash is the hash of the info describing the torrent, which was a description of many files and now in your new hash is the description of a single file, thus is a new torrent so there's no-one available to leech from. Peers collect into swarms based on the info_hash, and if you create a new torrent based on one file from a multifile torrent, the peers from the multifile torrent don't know about it and won't be available to leech from.
Even if you're lucky enough to get the right piece offsets, you create a torrent that doesn't have anyone sharing the file.
So, could you instead re-use the magnet URI and just specify a file name within the torrent? No, the BEP that describes how Bittorrent uses magnet URIs doesn't cover this behaviour. http://www.bittorrent.org/beps/bep_0009.html

Merging doxygen modules

I have a large amount of code that I'm running doxygen against. To improve performance I'm trying to break it into modules and merge the result into one set of docs. I thought tag files would do the trick, but either I have it configured wrong or I'm misunderstanding how it works.
The directories are laid out:
root +
|-src+
| |-a
|
|-doc+
|-a.dox
|-main.dox
|-main.md
|-output+
|-a+
| |-html
|-main+
|-html
In addition to 'a' there are other peer directories but am starting with one.
a.dox generates output and a tag file into root/doc/output
OUTPUT_DIRECTORY=output/a
GENERATE_TAGFILE = output/a/a.tag
INPUT=../src/a
main.dox just inputs the markdown file that has a mainpage tag and refers to the other projects tag file.
OUTPUT_DIRECTORY=output/main
INPUT = main.md
TAGFILES=output/a/a.tag=output/a/html
Should this merge or link all the docs under main where I can browse 'a' globals, modules, pages, etc? Or does this only generate links to 'a' if I explicitly cross-reference a documented entity in 'a' from inside of 'main'?
If this should work, any thoughts on where my syntax is incorrect? I've tried various ways to define TAGFILES, is the output directory relative to the main.dox file? To the a.tag file? Or to the a/html directory?
If I'm off base an TAGFILES don't work this way, is there another way to merge sets of doxygen directories into one?
Thanks.
I suggest you read this topic on how I recommend to use tag files and the conditions that should apply: https://stackoverflow.com/a/8247993/784672
To answer your first question: doxygen will in general not merge the various index files together (then no performance would be gained). Although for a part you can still get external members in the index by setting ALLEXTERNALS to YES.
Doxygen will (auto)link symbols from other sources imported via a tag file. So in general you should divide your code into more or less self-contained modules/components/libraries, and if one such module depends on another, then import its tag file so that doxygen can link to the other documentation set. If you run doxygen twice (once for the tag file and once for the documentation) you can also resolve cyclic dependencies if you have them.
In my case I made a custom index page with links to all modules, and made a custom entry in the menu of each generated page that linked back to this index (see http://www.doxygen.nl/manual/customize.html#layout) how to add a user defined entry to the navigation menu/tree.