Set filename to ditamap title in DITA-OT Command Line PDF transformation - dita

I have a script that builds a system on a regular schedule, and as part of that system, I need to convert several documents from dita to PDF.
I can run the following shaped command line from my script fine:
dita --input=<file location> --output=<output location> --format=pdf
But due to naming conventions and other restrictions, the name of the ditamap files are not always well-formed or human-readable (and I am not able to change the name of the files). I'm aware of the outputBase.file parameter that I can pass in on the command line, but I would like dita to be able to scan/read the file and substitute the document title as the filename, something along the lines of:
dita --input=<file> --output=<output> --format=pdf --outputBase.file=$title
Is this even possible?

You don't have to change dita command-line formats. Instead, you can change output PDF file name to the document title according to following steps:
In the top of the your PDF plug-in processing, read the main map's title (bookmap or map) using XSLT task and output XML file that contains title.
Set the title to some property you prefer (such as document.title). To set property, it is useful to use <xmlproperty> task in ant script.
After generating PDF file, change the PDF file name in <output location> to ${document.title}.pdf in the last phase of build process.
In my experience, one of the user want to output PDF that is authored in bookmap. In this case, above technique works fine for this user.
Hope this helps your development.

Related

How to make your own templates for codelabs in claat

I need to embed codelabs into an existing web site.
So, I need to change the actual HTML output (I need to get rid of etc.)
In claat's own help I see:
Note that the built-in templates of the formats are not guaranteed to be stable.
They can be found in https://github.com/googlecodelabs/tools/tree/master/claat/render.
Please avoid using default templates in production. Use your own copies.
To use a custom format, specify a local file path to a Go template file.
More info on Go templates: https://golang.org/pkg/text/template/.
Except that:
The link provided is very API-ish
The only command line option that mentiones templates is this:
-extra string
Additional arguments to pass to format templates. JSON object of string,string key values.
What do I actually need to do to pass claat a different template?
As the help message implies, use the format option. html or md are merely shortcuts to use the built-in templates.
claat export -f [template filepath] [source filepath]
Also was trying to solve this a year after it was first asked. It doesn't say explicitly, but digging into the source we can find where the format option is parsed.
https://github.com/googlecodelabs/tools/blob/main/claat/render/template.go#L166
Following that, we can specify our own html or text template file (relative or absolute path) with the -f option, instead of the default html or md format option (loading built-in templates).
These templates are parsed according to:
https://pkg.go.dev/html/template
https://pkg.go.dev/text/template
as the help message indicates.

Is it possible to extract metadata such as Content Created date from files - I can't get this with PowerShell

I need to extract the "Content Created" date out of thousands of files, but haven't been able to find a way to do this using PowerShell / other Command Line utility.
Does someone out there know a way to obtain this metadata? If so, please can you advise me. Thanks.
I've looked at various resources online, including this site, but haven't been successful thus far.
Here's a screenshot explaining what I'm trying to do.
I've been unable to find a native powershell cmdlet which does what you want. However, I found this article: Use PowerShell to Find Metadata from Photograph Files and the script it used: get file meta data function.
The article talks about image files, but the function is not specific for image files.
I tested it out on a folder containing a Word and an Excel file and the returned Metadata from the Word file contains the Content Created date. The Excel file does not contain/return that value. This is not unexpected as the Details tab of properties for the Excel file does not contain a Content Created value so it seems to be specific for Word files, and maybe some other file or document types.
Update:
You write that you need to extract this info from thousands of files, but if those files are anything but Word-files you probably won't be able to do that.
As far as I can tell this should work with the file types exposing the type of metadata you want. However, it seems that the ContentCreated property is unique to Word. I tried adding a text file (.txt), Acrobat PDF (.pdf), MS Access (.mdb), Excel (.xlxs) and a Word doc (.docx) file to my test folder and the only one that has/returns that metadata property is the Word file.
You should also be aware that the script seems to return metadata localized, so for me to programatically get the info i wanted I had to pipe the output of the script to Select-Object -Property Name,'Innehåll skapat' (which is the Swedish name for Content created). So if you're running on a non-english system you may need to check what the output looks like before creating your Select-Object statement.
PowerQuery in Excel 2013 or later (data tab). Connect to data> Folder.

merging PDFs with Ghostscipt ignoring outline and using pdfmark instead

I am using a Batch script to merge different PDFs in one complete file.
%gsc% -dBATCH -sDEVICE=pdfwrite -sPAPERSIZE=letter -dEPSFitPage -o %dsk%%zus%%ext% %mfd% %pth%tmp\pdfmarks
%dsk%%zus%%ext%: Path and name of final (complete) document
%mfd%: Path and name of docs to be merged (c:\test\1.pdf c:\test\2.pdf ...)
%pth%tmp = path to the pdfmarks file
Additionally, I am creating a pdfmark document inside the script which gs uses to create the bookmarks. But unfortunately, some of the docs I am merging, have already their own bookmarks and I did not yet find a solution how to ignore those. GS should only use the bookmarks inside the pdfmarks file.
How can this be done?
Firstly; you are not 'merging' PDF files when you use Ghotscript's pdfwrite device. The process is described in detail here
The important point is that the way the input file(s) are constructed has no bearing on the way the output file is constructed. If any other software you use relies on the file being constructed in a particular fashion it may not work on the output PDF file.
The -dEPSFitPage switch only has any effect when the input is an EPS file. If you want to 'fit' PostScript or PDF files then you need to use -dPDFFitPage, -dPSFitPage or just -dFitPage. However, all of these rely on you first selecting a media size, and then preventing it being altered by setting -dFIXEDMEDIA. For EPS files you would more normally use -dEPSCrop which sets the media size to the EPS declared BoundingBox.
You can prevent the PDF interpreter reading the Outlines tree (which you are calling Bookmarks) and then creating a pdfmark from it to pass to the pdfwrite device by using the -dNO_PDFMARK_OUTLINES switch which oddly isn't documented, presumably an oversight.

Doxygen-produced PDF - change url color?

I’m using Doxygen 1.8.10 (on Windows) to generate LaTeX files, and MiKTex 2.9 to generate a PDF. The PDF is functional, but not very pretty. I’ve figured out how to customize the title page (I added graphics and non-default text) and how to get the images into the PDF.
But... how do I change the styling for things such as the color of URLs (which are just text in the Doxygen comments, and then Doxygen turns them into \href items)?
**** I believe I need to change something in the hyperref package’s config or what Doxygen writes to the .tex files, but I’m not sure which approach is right, nor how to do either one...
I’ve created a custom_doxygen.sty file, and assigned it to the LATEX_EXTRA_STYLESHEET. I assume that it’s being picked up by Doxygen because Doxygen is successfully picking up my custom LATEX_HEADER file, which is in the same directory as the custom_doxygen.sty file. But what I don’t know is what to put into the custom_doxygen.sty file?
If I run everything as default (that is, no LATEX_EXTRA_STYLESHEET), the following code gets written to the refman.tex file:
% Hyperlinks (required, but should be loaded last)
\usepackage{ifpdf}
\ifpdf
\usepackage[pdftex,pagebackref=true]{hyperref}
\else
\usepackage[ps2pdf,pagebackref=true]{hyperref}
\fi
\hypersetup{%
colorlinks=true,%
linkcolor=blue,%
citecolor=blue,%
unicode%
}
And what I need is for the “urlcolor” to also be blue (its default in the hyperref package is magenta—an odd choice for sure).
I tried just basically copying what was in the refman.tex file to the custom_doxygen.sty file (and making sure that the custom_doxygen.sty file is assigned to the LATEX_EXTRA_STYLESHEET setting in my Doxyfile) and adding a “urlcolor=blue,%” to the setup section, but there’s no change in the output.
If I manually edit the refman.tex file (that is, I add "citecolor=blue,%" to the \hypersetup) after it's output from Doxygen, and then use the edited file as input to MiKTeX, I get the desired output.
So a workaround could be to just script the desired change and run the script every time. But it would be certainly be better to get Doxygen to write the necessary configuration. Plus, there are other things I want to customize (such as the font of explicit html hrefs), so I'd like to learn how to do things properly.

Splitting Phing build file

I've got a huge phing build file here. Is there a way to put things like filesets into an external file used by the build.xml? Just need some organisation here.
You can try using the import task, which lets you split a build file into multiple files.
You can also look into property files
FileLists also support a listfile property which is a text file with one file per line.
FileSets support the includesfile and excludesfile property which is a text file with a list of patterns.