Store and index YAML with PostgreSQL, with Javascript lib or reusable functions? - postgresql

PostgreSQL 9.2 has native JSON support. I'd like to store human readable config files, however, in YAML. And I think I'd like to index a few (but not all) of the config file values. Therefore I'm wondering:
Is it's somehow possible to include [a third party Javascript library that parses Yaml] in Postgres, for example js-yaml. Then I could have my own YAML Javascript helper, in the same way as there's the built-in JSON helper in PostgreSQL 9.2.
Alternatively:
is it possible to declare individual reusable Javascript functions? If so, then I could add my own YAML parsing functions (based on simple regexps), that are able to parse a subset of YAML, for example the top level key-value pairs here:
# some "top level key-value paris":
the_key: 'the value'
another_key: 'another value'
# But this however:
would_be_too_complicated_to_parse_manually_with_regexps: |
block string
with newlines
Worst case scenario would be that I'd need to duplicate YAML parsing code in each PostgreSQL stored procedure (if I cannot add 3rd party libraries or declare reusable functions).
(Performance wouldn't be terribly important in my case.)
(I've googled a while for "postgresql plv8 reusable function" and "postgresql plv8 library" but found nothing of relevance)

The pl/v8 procedural language is probably the way to go. It's a 'trusted' language, which means (among other things) it does not provide any way to do the 'load an external module from this file' thing. But it does have a 'find_function()' method to let you define your own javascript function and call it from another function (js or not). See description of it in this blog post:
http://umitanuki.hatenablog.com/entry/2012/04/10/171935

Related

Is there a way to perform a procedure in netlogo knowing only it's name?

I am developing a netlogo extension and I want to add a command where the user will tell me a list of procedures names that have no parameters.
Later I will perform this procedures but the only thing I will know is the name of the procedure that was passed to me before.
The command that the user will use to inform the name of the procedures is the following:
qlearningextension:actions ["procedure1" "procedure2" "procedure3"]
Later the extension will perform this procedures. I want to know if there is a way to get a procedure with only having it's name.
My recommendation would be to change the syntax of your primitive from taking in a list of strings to taking in a repeatable number of anonymous commands. You can do this by setting the syntax to CommandType | RepeatableType. A good reference should be the ControlFlow extension (cf) which uses a similar technique to accept at least 1, but possibly many, boolean/command combinations for its variadic cf:iflese primitive.
The anonymous commands provided will be checked for correctness at compile time, meaning you won't have to rely on the extension user properly typing the name of the procedures or forgetting if they change a name. The commands will also be easily executable by your extension prim at runtime, you won't have to "search" for the right procedure to execute (again, see the cf example).
Users of your extension will need to wrap your prim in parens when using the "more than 1" repeatable syntax: (qlearningextension:actions [procedure1] [procedure2] [procedure3])

Create an immutable clone of concat_ws

This blog post shows an example of how to create a immutable_concat function in Pg:
CREATE OR REPLACE FUNCTION immutable_concat(VARIADIC "any")
RETURNS text AS 'text_concat'
LANGUAGE internal IMMUTABLE
I'd like to do the same with concat_ws and the corresponding text_concat_ws does exist, however, the following just crashes the process:
CREATE OR REPLACE FUNCTION immutable_concat_ws(VARIADIC "any")
RETURNS text AS 'text_concat_ws'
LANGUAGE internal IMMUTABLE
Update: The siguature of immutable_concat_ws should be (glue, *parts), one glue (text or varchar) and one or more parts (text, varchar or null).
What am I missing here?
Firstly, the function requires two parameters in the definition, like Richard already suggested, and you updated your question accordingly.
Secondly, you can create that function with "any" input using LANGUAGE internal. Does not mean that you should, though.
concat_ws() is only STABLE for a reason. Among others, the text representation of date or timestamp depends on locale / datestyle settings, so the result is not immutable. Indexes building on this could silently break. Restricted to text input, it's safe to declare it IMMUTABLE.
Since you only need text input (or varchar, which has an implicit cast to text), limit it to your use case and be safe:
CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
RETURNS text
LANGUAGE internal IMMUTABLE PARALLEL SAFE AS
'text_concat_ws';
Crating a LANGUAGE internal function requires superuser privileges. If that's not an option, the next best thing would be an SQL function like:
PostgreSQL full text search on many columns
Mark it as PARALLEL SAFE in Postgres 9.6 or later (it qualifies!) to enable parallelism when involving this function. The manual:
all user-defined functions are assumed to be parallel unsafe unless otherwise marked.
Resist the temptation to do things like this immutable_concat_ws('|', now()::text, 'foo'). This would reintroduce said dependencies in the call.
Related:
Combine two columns and add into one new column
OK, so you're mapping to internal "C" functions, which I must admit I've never done myself.
However, text_concat_ws is "with separator" so it doesn't just take a variadic list of text arguments - it takes a separator THEN the variadic list of text arguments. Adjust your function definition accordingly.
If you're going to be doing this, you probably want to hook a debugger up to the backend or run it single process if that's practical.
Also - I just found the doxygen interface to the PostgreSQL source-code replying to your question. Thanks :-)

How to embed a function supplied by a extension into a .ods file

I am using a function for sorting arrays inside my calc document, which is supplied via an extension. However, transferring the file to a different system, where the extension isn't installed, breaks the function. Since the file is designed to be shared with many other users, it is impractical to instruct each of them to install the necessary extension individually.
Is there a way to embed/link the function supplied by the extensions to the .ods-file itself in such a way that the function wont break on file transfer?
When I tried to embed an add-in into a document by modifying manifest.xml, the add-in was ignored. Also, I ran into other limitations of embedding, such as not allowing importing from a pythonpath folder.
The documentation is rather difficult, but these two links perhaps support my conclusion:
https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Spreadsheet_Add-Ins
https://wiki.openoffice.org/wiki/Documentation/DevGuide/WritingUNO/Deployment_Options_for_Components
So it looks like the possibilities are to either require people to install the extension, or use a user-defined function instead of an add-in. It should be possible to embed a UDF in a document.

postgresql jsonb processing with c api

Postgres extension development
I am working with C API for postgres-9.4 installed from ubuntu trusty main repo. This might be a silly question, but please bear with me.
I would like to use a function that converts a cstring to Jsonb* structure defined in
http://doxygen.postgresql.org/jsonb_8h.html
There are functions doing exactly this already defined in
http://doxygen.postgresql.org/jsonb_8c.html
Namely, the function
Datum jsonb_in ( PG_FUNCTION_ARGS ), however I am not sure if I can call
this function from C API in a portable and safe manner. As it seems it is intended for being called by postgres from first glance.
I could also use the function jsonb_from_cstring
http://doxygen.postgresql.org/jsonb_8c.html#ab23eca28d5880f86a0943d71c90d6654
but it is declared and defined in jsonb.c and not declared in json.h, and hence linking with this function is not a very clean solution. I tried finding the symbols for jsonb_from_cstring in libpq.so, however there are none. I am guessing I need a non-standard build of postgres?
So the question is, what is the best way to convert a cstring to a Jsonb* structure from within C API?
Edit:
The extension gets json data as a string from external source and is supposed to be able to store this string in a Jsonb type
This was answered in postgres mailing list
http://www.postgresql.org/message-id/CAFj8pRCeGL7q_EGTz2=FyQZ2Qrtn1x_76mz3fuR=b7beEug7Wg#mail.gmail.com
Quote:
you can call "input function" - jsonb_in
Jsonb *targetjsonbvar = DatumGetJsonb(DirectFunctionCall1(jsonb_in,
CStringGetDatum(cstrvalue)));

How to internationalize java source code?

EDIT: I completely re-wrote the question since it seems like I was not clear enough in my first two versions. Thanks for the suggestions so far.
I would like to internationalize the source code for a tutorial project (please notice, not the runtime application). Here is an example (in Java):
/** A comment */
public String doSomething() {
System.out.println("Something was done successfully");
}
in English , and then have the French version be something like:
/** Un commentaire */
public String faitQuelqueChose() {
System.out.println("Quelque chose a été fait avec succès.");
}
and so on. And then have something like a properties file somewhere to edit these translations with usual tools, such as:
com.foo.class.comment1=A comment
com.foo.class.method1=doSomething
com.foo.class.string1=Something was done successfully
and for other languages:
com.foo.class.comment1=Un commentaire
com.foo.class.method1=faitQuelqueChose
com.foo.class.string1=Quelque chose a été fait avec succès.
I am trying to find the easiest, most efficient and unobtrusive way to do this with the least amount of manual grunt work (other than obviously translating the actual text). Preferably working under Eclipse. For example, the original code would be written in English, then externalized (to properties, preferably leaving the original source untouched), translated (humanly) and then re-generated (as a separate source file / project).
Some trails I have found (other than what AlexS suggested):
AntLR, a language parser / generator. There seems to be a supporting Eclipse plugin
Using Eclipse's AST (Abstract Syntax Tree) and I guess building some kind of plugin.
I am just surprised there isn't a tool out there that does this already.
I'd use unique strings as methodnames (or anything you want to be replaced by localized versions.
public String m37hod_1() {
System.out.println(m355a6e_1);
}
then I'd define a propertyfile for each language like this:
m37hod_1=doSomething
m355a6e_1="Something was done successfully"
And then I'd write a small program parsing the sourcefiles and replacing the strings. So everything just outside eclipse.
Or I'd use the ant task Replace and propertyfiles as well, instead of a standalone translation program.
Something like that:
<replace
file="${src}/*.*"
value="defaultvalue"
propertyFile="${language}.properties">
<replacefilter
token="m37hod_1"
property="m37hod_1"/>
<replacefilter
token="m355a6e_1"
property="m355a6e_1"/>
</replace>
Using one of these methods you won't have to explain anything about localization in your tutorials (except you want to), but can concentrate on your real topic.
What you want is a massive code change engine.
ANTLR won't do the trick; ASTs are necessary but not sufficient. See my essay on Life After Parsing. Eclipse's "AST" may be better, if the Eclipse package provides some support for name and type resolution; otherwise you'll never be able to figure out how to replace each "doSomething" (might be overloaded or local), unless you are willing to replace them all identically (and you likely can't do that, because some symbols refer to Java library elements).
Our DMS Software Reengineering Toolkit could be used to accomplish your task. DMS can parse Java to ASTs (including comment capture), traverse the ASTs in arbitrary ways, analyze/change ASTs, and the export modified ASTs as valid source code (including the comments).
Basically you want to enumerate all comments, strings, and declarations of identifiers, export them to an external "database" to be mapped (manually? by Google Translate?) to an equivalent. In each case you want to note not only the item of interest, but its precise location (source file, line, even column) because items that are spelled identically in the original text may need different spellings in the modified text.
Enumeration of strings is pretty easy if you have the AST; simply crawl the tree and look for tree nodes containing string literals. (ANTLR and Eclipse can surely do this, too).
Enumeration of comments is also straightforward if the parser you have captures comments. DMS does. I'm not quite sure if ANTLR's Java grammar does, or the Eclipse AST engine; I suspect they are both capable.
Enumeration of declarations (classes, methods, fields, locals) is relatively straightforward; there's rather more cases to worry about (e.g., anonymous classes containing extensions to base classes). You can code a procedure to walk the AST and match the tree structures, but here's the place that DMS starts to make a difference: you can write surface-syntax patterns that look like the source code you want to match. For instance:
pattern local_for_loop_index(i: IDENTIFIER, t: type, e: expression, e2: expression, e3:expression): for_loop_header
= "for (\t \i = \e,\e2,\e3)"
will match declarations of local for loop variables, and return subtrees for the IDENTIFIER, the type, and the various expressions; you'd want to capture just the identifier (and its location, easily done by taking if from the source position information that DMS stamps on every tree node). You'd probably need 10-20 such patterns to cover the cases of all the different kinds of identifiers.
Capture step completed, something needs to translate all the captured entities to your target language. I'll leave that to you; what's left is to put the translated entities back.
The key to this is the precise source location. A line number isn't good enough in practice; you may have several translated entities in the same line, in the worst case, some with different scopes (imagine nested for loops for example). The replacement process for comments, strings and the declarations are straightforward; rescan the tree for nodes that match any of the identified locations, and replace the entity found there with its translation. (You can do this with DMS and ANTLR. I think Eclipse ADT requires you generate a "patch" but I guess that would work.).
The fun part comes in replacing the identifier uses. For this, you need to know two things:
for any use of an identifier, what is the declaration is uses; if you know this, you can replace it with the new name for the declaration; DMS provides full name and type resolution as well as a usage list, making this pretty easy, and
Do renamed identifiers shadow one another in scopes differently than the originals? This is harder to do in general. However, for the Java language, we have a "shadowing" check, so you can at least decide after renaming that you have an issues. (There's even a renaming procedure that can be used to resolve such shadowing conflicts
After patching the trees, you simply rewrite the patched tree back out as a source file using DMS's built-in prettyprinter. I think Eclipse AST can write out its tree plus patches. I'm not sure ANTLR provides any facilities for regenerating source code from ASTs, although somebody may have coded one for the Java grammar. This is harder to do than it sounds, because of all the picky detail. YMMV.
Given your goal, I'm a little surprised that you don't want a sourcefile "foo.java" containing "class foo { ... }" to get renamed to .java. This would require not only writing the transformed tree to the translated file name (pretty easy) but perhaps even reconstructing the directory tree (DMS provides facilities for doing directory construction and file copies, too).
If you want to do this for many languages, you'd need to run the process once per language. If you wanted to do this just for strings (the classic internationalization case), you'd replace each string (that needs changing, not all of them do) by a call on a resource access with a unique resource id; a runtime table would hold the various strings.
One approach would be to finish the code in one language, then translate to others.
You could use Eclipse to help you.
Copy the finished code to language-specific projects.
Then:
Identifiers: In the Outline view (Window>Show View>Outline), select each item and Refactor>Rename (Alt+Shift+R). This takes care of renaming the identifier wherever it's used.
Comments: Use Search>File to find all instances of "/*" or "//". Click on each and modify.
Strings:
Use Source>Externalize strings to find all of the literal strings.
Search>File for "Messages.getString()".
Click on each result and modify.
On each file, ''Edit>Find/Replace'', replacing "//\$NON-NLS-.*\$" with empty string.
for the printed/logged string, java possess some internatization functionnalities, aka ResourceBundle. There is a tutorial about this on oracle site
Eclipse also possess a funtionnality for this ("Externalize String", as i recall).
for the function name, i don't think there anything out, since this will require you to maintain the code source on many version...
regards
Use .properties file, like:
Locale locale = new Locale(language, country);
ResourceBundle captions= ResourceBundle.getBundle("Messages",locale);
This way, Java picks the Messages.properties file according to the current local (which is acquired from the operating system or Java locale settings)
The file should be on the classpath, called Messages.properties (the default one), or Messages_de.properties for German, etc.
See this for a complete tutorial:
http://docs.oracle.com/javase/tutorial/i18n/intro/steps.html
As far as the source code goes, I'd strongly recommend staying with English. Method names like getUnternehmen() are worse to the average developer then plain English ones.
If you need to familiarize foreign developers to your code, write a proper developer documentation in their language.
If you'd like to have Javadoc in both English and other languages, see this SO thread.
You could write your code using freemarker templates (or another templating language such as velocity).
doSomething.tml
/** ${lang['doSomething.comment']} */
public String ${lang['doSomething.methodName']}() {
System.out.println("${lang['doSomething.message']}");
}
lang_en.prop
doSomething.comment=A comment
doSomething.methodName=doSomething
doSomething.message=Something was done successfully
And then merge the template with each language prop file during your build (using Ant / Gradle / Maven etc.)