no "tags" / "structure tags" in libHaru PDF? - tags

I am using libHaru (by including the source in my C++ code) to generate PDF files. I am hoping to make these PDF files accessible by adding "tags" (aka "structure tags"). From what I can see in the documentation and source code, libHaru does not support this. Can someone confirm that libHaru indeed does not support tags? And if it's not supported directly, I wonder if there is a way to add tags by modifying the libHaru code? Has anybody done this?

I looked through the 22 page manual for libHaru, and there was no mention of tags, so I think it's safe to assume that it doesn't support tagging.
Attempting to make any library tag PDFs (and do it well) would be a non-trivial task. You'd essentially be re-inventing the wheel. Consider the fact that Adobe Acrobat Pro is only just mediocre at tagging PDFs and requires a ton of human intervention to get it right.
There is a product called CommonLook Dynamic that is made for creating accessible PDFs from live data on a webserver, but I can't vouch for it myself. I have used other products from this company, and they've been very good, but they're not at all cheap.
Generally speaking, PDF tagging is often a very complex thing. To make it work with an automated algorithm, the source code formatting has to be perfectly formed and dead simple. If your source material is at all complex or malformed, it won't come out right.
As an example, it's not possible for PDF generation software to do a good job at things like crafting good alt text for images, creating useful PDF metadata, or tagging complex tables. These are things that require human intervention.

Related

Documentation and version control

Given a project I'm about to start there will be documentation produced.
What is the best practice for this?
Should the documents live with the code and assets or should there be a separate documentation store?
Edit
I'd like a wiki but I will need to print the documents etc... It's a university project.
It really depends on your team. Where I work, we keep documentation in a wiki which is linked in with our team website. For the purposes of shipping documentation, the wiki can be exported and we run it through a parser that "fancifies" the look and feel of the documentation for customer purposes.
Storing the documentation with the code (typically in your source repository) is not a bad idea. Just make sure to keep them separated. For example, keep a docs folder which is on the same level with your src folder in your repository. This way, you can quickly ship the current documentation, you can easily track revisions, and anybody new to the project can immediately jump in without having to go to multiple locations for information.
Storing it in source control is fine.
This is an interesting question -- basically, what others are saying is right about generated documentation, source files and templates/etc. should be stored in source control and generated during your build process.
As far as requirements/specs/etc. documentation, I have worked both ways, and I very much prefer using SharePoint or a Wiki/document portal that is designed for document sharing/versioning. The reason is, most non-developer folks aren't comfortable working with source control systems, and you don't gain any of the advantages of intelligent merging if you are using a binary format like Word. Plus it's nice to have internet-based access so you can reference and work on the docs in a distributed team without people having to install extra software.
Here's a 2017 summary of the options and my experience:
(extreme 1) Completely external (e.g. a wiki, Google Docs, LaTeX, MS Word, MS Onedrive)
People aren't bothered about keeping it up to date (half of them don't even know where to find the page that needs updating since it's so out of the trenches).
wiki platforms are “captive user interfaces” - your data gets stored in their proprietary schemas and is not easy to examine with a simple text editor (Confluence is even worse in that you have no access to the plaintext content at all anymore)
(extreme 2) Completely internal (e.g. javadoc)
pollutes the source code, and is usually too low level to be of any use. Well-written source code is still the best form of low level documentation.
However, I feel package-info.java files are underutilized.
(balance) Colocated documentation (e.g. README.md)
A good half way solution, with the benefits of version control. If a single README.md file is not enough, consider a doc/ folder. The only drawback of this I've seen is whether to source control helpful graphics (e.g. png files) and risk bloating the repo.
One interesting way to avoid this problem is to use plaintext diagram tools (I find Grapheasy and Text Diagram to be a breath of fresh air).
plaintext can be easily read even if your rendering engine changes as the years go by.
Github's success is in no small part thanks to its README.md located in the root of the project.
One tiny disadvantage of this approach though is that your continuous integration system will trigger a new build each time you make edits to the README.md file.
If you are writing versioned user documentation associated with each release of the product, then it makes sense to put the documentation in source control along with its associated product release.
If you are writing internal developer documentation, use automated internal source code documentation (javadoc, doxygen, .net annotations, etc) for source level documentation and a project wiki for design level documentation.
I think most of us in the industry are not really following best-practices and it of course also depends a lot on your situation.
In an agile environment where you would have a very iterative process of release, you will want to "travel light". In this particular case, Jason's suggestion of a separate Wiki really works great.
In a water-fall/big bang model, you will have a better opportunity to have a decent documentation update with each new release. Also you will need to clearly document what version of the requirements was agreed on and have loads of documentation for every tiny change you do to requirements (due to the effects it has on subsequent stages). Often if the documentation can live together with the version controlled source code it is the best.
Are you using any sort of auto-documentation or is it completely manual? Assuming that you are using an auto-documentation system, the documentation is more or less generated on the fly, and would be part of the code itself.
To me, (assuming that it's possible with whatever code you are using), this would be the preferred method of handling it, as you wouldn't need to maintain the documentation source at all.

How can I generate comparable spec documents?

Currently in my company we're using Excel as a tool to write specs.
On the kind of projects we're working on, spec changes are a matter of course, and constitute a lot of the work, beyond the implementation.
I'm looking for a tool that will allow me to write a spec, including screenshots and arrows, but with comparable content. i.e., something that will allow me to compare between versions of the spec.
Free tools are better, but I guess anything with a license of less than 100USD will be OK too.
MS Word handles text and pictures, and has a document diff'ing tool built in. Open Office (being a MS Word Clone) may have something similar, and is free.
Maybe XMLmind Editor will interest you.
There's an excellent free Sequence Diagram editor: QUICK SEQUENCE DIAGRAM EDITOR. It's artifact is an xml document with CDATA tag which encloses simple DSL used for SD declaration.

Is Dreamweaver worth getting if I probably won't use its WYSIWYG editor?

In the past I've done web application development using Visual Studio. Initially I'd use the design view, editing the page visually. But over time I learned more and more (X)HTML, CSS, and Javascript. I became familiar with the tags for ASP.NET server controls and their common attributes.
I got to the point where I'd do all the markup by hand (still in Visual Studio though) and then test the site in an actual browser. Of course I'd also still use Visual Studio for programming server side functionality in C#, but never the WYSIWYG page editor. I was able to get work done faster too, getting the site to look just the way I wanted, and the same across different browsers.
Now I'm going to be taking charge of a public facing website (entirely static content - no ASP.NET, PHP, or anything). The website was created and maintained using Dreamweaver, which I don't have and never used before.
I'll be working from home, so the organization is looking into getting me a copy of Dreamweaver. Even though it's not money out of my own pocket ...
Is it worth using Dreamweaver if I probably won't touch the visual editor?
Or should I tell them to save their money and I'll just use Notepad++.
Or am I crazy and should relearn to use a WYSIWYG editor?
I do 95% of my web dev stuff using Dreamweaver's code editor. But, for the other 5%, the WYSIWYG stuff really comes in handy.
Plus, it's not your money anyway. I'd say get it and if the WYSIWYG stuff is too much for you just keep it in source code mode and use it as an editor.
You may not know until you see the code. If they were using things like Dreamweaver templates, unless you are going to extricate them, you may end up needing Dreamweaver for sanity sake.
Dreamweaver is really useful if you maintain a site with templates. If the site is in PHP or ASP, then all you need to do is put the common parts (header, footer etc.) in a separate file and include them in the different pages. If the pages are static then the common parts can't be included. Which means that if you want to change the menu, you have to change it in all pages. With dreamweaver, you can save a page as a template and when you create a new page from a template, dreamweaver stores it in the comments. Next time you update the template, all the pages that use the template are updated. I found this to be the best use of dreamweaver.
I haven't used a WYSIWYG HTML editor in years, all the HTML I produce these days is hand-coded, and it's something I would recommend to anyone. WYSIWYG Editors simply make it far too easy to throw in tons of unnecessary markup, and then you end up with unwieldy pages that are tricky to work with and hard to fix browser compatibility problems in.
However. If you're taking over a large existing codebase that has been produced this way, I'd say you probably want to make sure you at least have access to Dreamweaver or a similar editor (if they were produced in Dreamweaver, that's probably the best choice). Simply because many pages designed in this way are rather verbose, and can be a nightmare to deal with in a text editor.
This depends - you mean old school Dreamweaver or CS4 Dreamweaver?
With all the new additions (code hinting with some of the newer javascript frameworks, a "preview" that is integrated with webkit so you can see your page in action, being able to test AJAX calls and do a "code freeze") I'm tempted to walk away from jedit and try it out.
I believe that DreamWeaver gives you intellisense in the code editor for HTML, so I would use it for that, if you're not paying for it. I wouldn't pay for that myself though :)
If the Visual Studio editor works fine for you, there is no point in switching.
And if you don't like WYSIWIG editing, then there's no point in learning it. I stopped using WYSIWIG years ago, and like you, I've found it to be much more flexible and reliable to edit HTML/CSS by hand.
If you like DreamWeaver more and the organisation is willing to pay for it, then go for it!
FWIW, I do a lot of HTML and javascript coding in dreamweaver's code view- the JSF extensions are nice as well. I got it as part of the CS3 bundle, since I needed to get my hands on photoshop and illustrator as well to carve up graphics. If possible, try to get your company to get the whole bundle, since graphics manipulation is always important when you're maintaining a site- and most designers will be giving you photoshop source files. I never ever go in wysiwyg mode, and it's still useful.
I use dreamweaver, but not for the same reasons as everyone here seem to. I like the syntax highlighting, and I absolutely LOVE the way Dreamweaver handles FTP in the window on the right. If I could find another editor that would offer these two things, I would, but none seem to be that great.
I code my pages by hand usually (I do a LOT of PHP, which dreamweaver 8 obviously can't preview) so I do a lot of things like (1) edit page (2) upload changes (3) preview live on testing server. However, I still use the WYSIWIG editor occasionally, especially if I need to throw something together using tables or form elements. I just find it to be a bit quicker that way than doing things by hand.
That said, I never use Dreamweaver (8, mind) for CSS, as the implementation is buggy at best. I much prefer to do CSS and more complex HTML by hand. I also do not use the standard method of templating, as I prefer to have one "index.php" that calls in the appropriate template and stuffs data into it that it generated before.
All that said though, Dreamweaver offers a nice enough set of tools that I don't really want to leave it, and it certainly won't hurt to learn it, especially if its free. I'd say at least try it out and see if its going to work before making a final decision. It comes down to what you personally prefer to use.
I hand-code but there are times Dreamweaver is incredibly useful:
Making visual-tweaks to someone else's complex HTML. It's much quicker to use the WYSIWYG if you're short on time and the code is a mess.
Dreamweaver has got an incredibly good search and replace. The tag-based searching is the best I've seen anywhere for you whilst the regex seach/replace allows back-references, named groups in the replace field etc.
The code Dreamweaver produces isn't too horrific and it's fairly good at not breaking your own nice code if you ever dip into the visual editor.
I use dreamweaver CS5 for code only on a daily basis, and it's a great tool. It is very effective, and its a great tool even for people who already know how to write code. Some of its' features that make it one of the best editors, in my opinion, are:
Code coloring
Customizable color-schemes
Error highlighting
in-app validation
Autocomplete & Codehinting (works great!)
in-app FTP
New document type dialog (great for quick start)
Search & replace
Code Snippets
There are many more features, like setting up a local server and binding it to a database so you can write queries more easily and use dreamweaver's "help" with server-side code, but I haven't really got into it.
Bottom line:
If you are considering getting Dreamweaver mostly for code editing, then I'd say it's definitely a great deal - even if you aren't going to use some of its' other features.
Dreamweaver's a tad bloated for something which you really can just do in Notepad (++ or otherwise). No WYSIWYG will give you code to the same quality as hand-crafted code. Especially since it's vanilla HTML, just use an everyday programmer's text editor. Having intellisense isn't that important: I mean, there's only about 10 tags you need to know.

Industry experience with WYSIWYG editors

Just wanted to get an idea for ways (web) developers get round the short fall of (most) WYSIWYG editors, whereby the users that are editing the text aren't always HTML literate enough to produce good/great results.
In the past we have resigned ourselves to either locking down the editor or simply not supplying one.
What are other peoples experiences?
If I understand your question correctly, my advice would be to allow basic text formatting in the editor (bold, italicize, underline, paragraph breaks, etc). Anything beyond that should be handled either by custom fields in your CMS system that talk to the corresponding template, or directly by your designers / front-end people. There really should be no designing going on in your text editor.
Also, using a templating language like Markdown might help editors feel more comfortable formatting their pages.
If you have the resources (as the question implies you do) you get the users to supply copy and designs in what they do know (Powerpoint, Word, Fireworks, etc) and get the people who can do a correct implementation (but who might not be able to write decent prose, etc) to put it into the HTML/CMS/magicthing.
Sometimes it is possible to use something like WYMeditor - it isn't that simple but produces clean semantic code. The other way is using some wiki-like code - Markdown for example. And you can ease editor's life by using some helpers like MarkItUp editor (it also supports original Wiki and Textile).

MS Word is evil! Is there a good alternative? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
As a developer I really don't like writing documentation but when I have to I'd like to make the process as painless as possible.
The problem with Word is that it constantly gets in my way. I worry more about the layout than about the actual content ... that's why I'd like to get rid of Word.
Ideally I'd like to write my content and then 'compile' it into a document.
I've heard of LaTeX but I don't have any experience with it whatsoever. Would this be the right technology for the job? What editor (Windows) should I use? Is it a good idea to start with LyX?
EDIT: I'm not asking about documenting code (I use Sandcastle for that).
Update 2014:
We have now switched to GFM (GitHub Flavored Markdown).
It's really easy to work with.
Write code & documentation in the same IDE!
Everything can be versioned!
Get great output either as raw txt, html or pdf!
My solution to this was to invest some time in creating a decent Word Template for myself.
The important thing to do is make sure you have a Style defined for everything you can put in the document.
Once you have all the Styles defined and all of the document content tagged with the correct Style instead of formatted in an ad hoc fashion, you'll be surprised how easy it is to produce good looking Word documents quickly every time.
The wider problem here is that everyone spends hours in Word and yet it is very rare for companies to invest in Word training. At some point you have to bite the bullet and take the time to teach yourself how to use it properly, just like you would with any other tool.
Anything you can do with LyX you can do with LaTeX. LaTeX is suitable for all sorts of things; it has been used for everything from manuals to lecture slides to novels.
I think LaTeX is probably worth looking into as an option; if you've ever wanted to "code" for your word processor, LaTeX is for you. At the simplest level you can define new commands to do things for you, but there's a lot of power there. And the output looks really neat.
In my opinion, LyX is fantastic in certain circumstances, handy in others, and occasionally just gets in your way. I think it should be seen as a productivity booster for LaTeX. In other words, learn to use LaTeX before trying LyX. Both are of course free and available for Windows, though the learning curve is quite steep compared with MS Word. For long documents, or plenty of similar documents, LaTeX/LyX is probably a worthwhile investment.
I've found that wikis can be good for this. Find a wiki you like that lets you do a bit of formatting, but nothing really heavy. Ideally it should let you format code easily too - to be honest, the markdown available on SO is probably a good start.
That way:
You have change tracking built-in (assuming a decent wiki)
You can edit from anywhere
Everyone always sees the same documentation (instant distribution)
You can concentrate on content instead of formatting
You could write your documentation using your own XML format and then transform it into any format with XSL (e.g. PDF via FOP+XSL-FO ).
See also the DocBook XML format.
LaTeX is an extremely powerful tool and might well be overkill here as it is designed for scientific/mathematical literature. It has a (relatively) steep learning curve and can be tricky to coax to do exactly as you want if you're new to it. I LOVE LaTeX, but it is not really a general purpose word processor.
Have you considered OpenOffice instead?
LaTeX is really a very powerful language if you need to write documents.
Perhaps you can try texmaker, a cross-platform LaTeX editor:
Texmaker is a clean, highly
configurable LaTeX editor with good
hot key support and extensive Latex
documentation. Texmaker integrates
many tools needed to develop
documents with LaTeX, in just one
application. It has some nice
features such as syntax highlighting,
insertion of 370 mathematical symbols
with only one click, and "structure
view" of the document for easier
navigation.
What about using HTML? This way you could then publish the documentation if there will be need for many people to access it from many places.
Despite all efforts and reasonable expectation I don't think Word Processing has been "solved" yet.
My response to what I also personally find a deeply frustrating experience with MS Word is to avoid it altogether and use an auto-documenting tool like GhostDoc to generate XML from what I've already written in the code (DRY!) and deal with the XML from an XSLT based intranet site or similar later.
Are you talking about documenting your actual code? If so, I recommend Doxygen for unmanaged code and Sandcastle for managed code. Both will compile your help or build it as a website for you.
Both applications will read special tags above functions / classes / variables and compile that into the help.
Well I've never found anything wrong with MS-Word in the first place. (i.e if you take the time to know how to use it effectively). OpenOffice indeed is an amazing & credible free alternative - but then if you hate MS Word for layout related problems, the same problem is gonna occur with OpenOffice too.
Never tried the Latex system myself, but have heard its good for scientific work. I think using some HTML WYSIWYG editor would be best for you, if you want to just focus on the content.
I considered a wiki, but I decided to go with a modified Markdown notation, for the simple reason, that a wiki's content isn't easily exported and distributed outside of the wiki itself, while the Markdown can be rendered into HTML.
Answer to chris' question about my workflow: I write the documentation with a Notepad-like application (TextWrangler, only because of its word-wrapping feature) in its raw Markdown format. Then I have a small localhost documentation website with my modified Markdown parser (extended for a few features and a bit more HTML-oriented functionality) that checks for the timestamps for the documentation files - if a file has been updated, it parses that file into HTML, and stores the file in a cache.
This way I'm able to edit the source documentation on my desktop, and just press F5 in my browser to see the results immediately.
I haven't got around to trying it yet, but I've always thought AsciiDoc would be good for this kind of thing.
If you want something simpler than LaTeX, you can have a look at ReStructured Text
Read this book: http://en.wikipedia.org/wiki/The_Pragmatic_Programmer . There is some idee fixe inside, so that documentation should be built automatically. Think about using your IDE for this, or look for some additional tools. Most modern languages support generating documentation as you write the code. This can simply maintain your doc in touch with latest changes in the code.
I prefer to use a RTF editor which is a lot less clunkier than words. This way the formatting and all the headers/footers nonsense will not take up half your time. Wordpad has worked for me on several occasions. I'm stuck with Word for now though :(
there are a lot of possible ways:
embedded documentation, e.g. javadoc: good for describing APIs, not so good for the "big picture"
plain html: can be checked in under version control, a definite plus
a wiki, e.g. confluence -- great for collaboration, but has version control different from your source
LaTeX or somesuch: better suited for books or papers than typical documentation; support for graphics is cumbersome
an Office clone, e.g. OpenOffice: mostly the same as Word+Visio, but open source, with a nicer document format
I usually document the software structure (the "metaphors" of a project, component interrelations, external systems) up front, using Visio, in "freeform" UML. These are then embedded in confluence, which can be converted to PDF if someone wants a printout.
LyX
LyX is a WYSIWYM front end to LaTeX: You get the convenience of a document processor (somewhat similar to Word) with the consistency and power of LaTeX: It doesn't get in your way and can do a lot of things that professional writers need.
Note: The correct answer for you really depends on your way of thinking --- we can't decide this for you. This answer simply shows an excellent choice if you think of documentations as documents and want something similar to Word (where Word is good) that doesn't suck as Word (where Word is bad for programmers).
But many programmers think of documentation differently and hence prefer different metaphors. I myself had the same problem years ago, worked with LaTeX (as I am a mathematician), found LyX and finally settled on a Wiki/Source system that I wrote myself.
Vim is the solution for anything that means writing plain text in the most efficient possible way. If you need formatting, then use XML, Latex or something similar (in Vim).
Vim changed my life!
Simple answer: LaTeX sounds like just what you are looking for.
I use it for writing documentation myself. I will never go back to Word if I have the option.
At phc, we started with latex, then moved to docbook, and have settled (permanently I hope) on Restructured Text/Sphinx.
Latex was chosen because we are academics, and latex is the tool of choice. I believe it didn't generate good enough HTML.
Docbook was chosen for power, but it was very unwieldy. It put us off writing any documentation: code had to be manually formatted, we kept forgetting the syntax, and it was difficult to read. The learning curve was also steep.
Finally, we moved to reST, using sphinx, and that was a great decision. Documentation is now very easy to write, and both PDF and HTML versions look beautiful (though the PDF could do with some customization). Its very easy to customize too.
The best bit about reST though, is that its human readable in source form. That is a wonderful advantage. I've switched to using reST for all my stuff now, especially anything over the web (except of course academic papers, where one would be foolish to use anything but latex).
You may want to look into doxygen at http://www.doxygen.nl/, see their nice examples. In this case, the documentation is presented by tags in comments in the source.
Another option would be to use an online system like trac from http://trac.edgewall.org/ which is a wiki/doc/issuetracking system that lives on top of subversion.