Customize DITA: allow p in dt - dita

I am using the DITA standard.
As a matter of fact, I have to allow p tag in dt.
I want to ban CDTA in the element dt and allow the element p for it.
I changed in file commonElements.mod the following code
<!ENTITY % term.cnt
"#PCDATA |
%basic.ph; |
%data.elements.incl; |
%foreign.unknown.incl; |
%image;
"
>
To
<!ENTITY % term.cnt
"p |
%basic.ph; |
%data.elements.incl; |
%foreign.unknown.incl; |
%image;
"
>
The result is still CDTA in dt allowed and no p tag
On witch dita file I have to make my changes?
Thanks

Do not change any official DITA files!
You should create a specialization.
You must not allow the <p> element in <dt>. This violates the specification. Maybe you'd like create a new element that is based on an element that is allowed in <dt>. Then you can change the appearance of your new element to look like a <p> element.
Maybe you should explain what you're trying to achieve.

Related

What is the proper format for uploading a multi-label multi-class classification datasets with text and label in Doccano?

I have a question that I'd like to upload datasets to my doccano annotation project in which the labels have been already set beforehand in 8 classes with tags.
I'd like to know what is the correct uploading format of CSV or JSON for multi-label classification datasets with text and label column.
For example, I have 8 classes (a, b, c ,... ,h)
When I upload the file in this kind of format:
| text | label |
| ------ | --------- |
| text_1 | [a, b] |
| text_2 | [a, b ,c] |
| text_3 | [a, c] |
It is expected for text_1, it will only shows a and b, yet it turn out to be like [a, b]
Another example with screenshot.
0-7 are my project defined classes, in this cases it is expected only showing the correct marks in the labels with tags number 5 and 6. However it return a lot of mixing label list.
How do I modify my uploading dataset format to do it?
I found a solution,
there are a lot of mistaken labels in this project since at the beginning I upload the label column in the wrong format "[a, b]" (while it requires array) and it is stored inside the project. This kind of wrong label may mess up the following upload
my debugging step:
delete all labels in label management
re-create the label with tags
re-upload the file with JSON format and it works
Now the annotation is fine like:

How to remove multiple characters between 2 special characters in a column in SSIS expression

I want to remove the multiple characters starting from '#' till the ';' in derived column expression in SSIS.
For example,
my input column values are,
and want the output as,
Note: Length after '#' is not fixed.
Already tried in SQL but want to do it via SSIS derived column expression.
First of all: Please do not post pictures. We prefer copy-and-pastable sample data. And please try to provide a minimal, complete and reproducible example, best served as DDL, INSERT and code as I do it here for you.
And just to mention this: If you control the input, you should not mix information within one string... If this is needed, try to use a "real" text container like XML or JSON.
SQL-Server is not meant for string manipulation. There is no RegEx or repeated/nested pattern matching. So we would have to use a recursive / procedural / looping approach. But - if performance is not so important - you might use a XML hack.
--DDL and INSERT
DECLARE #tbl TABLE(ID INT IDENTITY,YourString VARCHAR(1000));
INSERT INTO #tbl VALUES('Here is one without')
,('One#some comment;in here')
,('Two comments#some comment;in here#here is the second;and some more text')
--The query
SELECT t.ID
,t.YourString
,CAST(REPLACE(REPLACE((SELECT t.YourString AS [*] FOR XML PATH('')),'#','<!--'),';','--> ') AS XML) SeeTheIntermediateXML
,CAST(REPLACE(REPLACE((SELECT t.YourString AS [*] FOR XML PATH('')),'#','<!--'),';','--> ') AS XML).value('.','nvarchar(max)') CleanedValue
FROM #tbl t
The result
+----+-------------------------------------------------------------------------+-----------------------------------------+
| ID | YourString | CleanedValue |
+----+-------------------------------------------------------------------------+-----------------------------------------+
| 1 | Here is one without | Here is one without |
+----+-------------------------------------------------------------------------+-----------------------------------------+
| 2 | One#some comment;in here | One in here |
+----+-------------------------------------------------------------------------+-----------------------------------------+
| 3 | Two comments#some comment;in here#here is the second;and some more text | Two comments in here and some more text |
+----+-------------------------------------------------------------------------+-----------------------------------------+
The idea in short:
Using some string methods we can wrap your unwanted text in XML comments.
Look at this
Two comments<!--some comment--> in here<!--here is the second--> and some more text
Reading this XML with .value() the content will be returned without the comments.
Hint 1: Use '-->;' in your replacement to keep the semi-colon as delimiter.
Hint 2: If there might be a semi-colon ; somewhere else in your string, you would see the --> in the result. In this case you'd need a third REPLACE() against the resulting string.

Need to explain the kdb/q script to save partitioned table

I'm trying to understand this snippet code from:
https://code.kx.com/q/kb/loading-from-large-files/
to customize it by myself (e.x partition by hours, minutes, number of ticks,...):
$ cat fs.q
\d .Q
/ extension of .Q.dpft to separate table name & data
/ and allow append or overwrite
/ pass table data in t, table name in n, : or , in g
k)dpfgnt:{[d;p;f;g;n;t]if[~&/qm'r:+en[d]t;'`unmappable];
{[d;g;t;i;x]#[d;x;g;t[x]i]}[d:par[d;p;n];g;r;<r f]'!r;
#[;f;`p#]#[d;`.d;:;f,r#&~f=r:!r];n}
/ generalization of .Q.dpfnt to auto-partition and save a multi-partition table
/ pass table data in t, table name in n, name of column to partition on in c
k)dcfgnt:{[d;c;f;g;n;t]*p dpfgnt[d;;f;g;n]'?[t;;0b;()]',:'(=;c;)'p:?[;();();c]?[t;();1b;(,c)!,c]}
\d .
r:flip`date`open`high`low`close`volume`sym!("DFFFFIS";",")0:
w:.Q.dcfgnt[`:db;`date;`sym;,;`stats]
.Q.fs[w r#]`:file.csv
But I couldn't find any resources to give me detail explain. For example:
if[~&/qm'r:+en[d]t;'`unmappable];
what does it do with the parameter d?
(Promoting this to an answer as I believe it helps answer the question).
Following on from the comment chain: in order to translate the k code into q code (or simply to understand the k code) you have a few options, none of which are particularly well documented as it defeats the purpose of the q language - to be the wrapper which obscures the k language.
Option 1 is to inspect the built-in functions in the .q namespace
q).q
| ::
neg | -:
not | ~:
null | ^:
string | $:
reciprocal| %:
floor | _:
...
Option 2 is to inspect the q.k script which creates the above namespace (be careful not to edit/change this):
vi $QHOME/q.k
Option 3 is to lookup some of the nuggets of documentation on the code.kx website, for example https://code.kx.com/q/wp/parse-trees/#k4-q-and-qk and https://code.kx.com/q/basics/exposed-infrastructure/#unary-forms
Options 4 is to google search for reference material for other/similar versions of k, for example k2/k3. They tend to be similar-ish.
Final point to note is that in most of these example you'll see a colon (:) after the primitives....this colon is required in q/kdb to use the monadic form of the primitive (most are heavily overloaded) while in k it is not required to explicitly force the monadic form. This is why where will show as &: in the q reference but will usually just be & in actual k code

Search text between symbol

I have this text (taken from concatenated field row)
Astronomic Event 2013/1434H - Aceh ....
How do We search it by 2013 or 1434h keywords?
I have tried below code but it resulting no row.
to_tsvector result:
'2013/1434h':8,12 'aceh':1 'bin.....
Sample Case:
WITH sample_table as
(SELECT to_tsvector('Astronomic Event 2013/1434H - Aceh') sample_content)
SELECT *
FROM sample_table, to_tsquery('2013') q
WHERE sample_content ## q
How do We search it by 2013 or 1434h keywords?
It seems like you want to replace:
to_tsquery('1434h') q
with:
to_tsquery('1434h | 2013') q
http://www.postgresql.org/docs/current/static/functions-textsearch.html
Side note: the to_tsquery() syntax is extremely capricious. It doesn't allow for much if any fantasy, and many of the assumptions in Postgres are everything but end-user friendly.
More often than not, you'll be better off using plainto_tsquery(), which allows any amount of garbage to be thrown at it. Thus, consider pre-processing the string before issuing the query. For instance, you could split the string, and OR the original parts together:
where sc.text_index ## (plainto_tsquery('1434h') || plainto_tsquery('2013'))
Doing so will make your code a bit more complex, but it won't rely on your users needing to understand that (contrary to what they're accustomed to in Google) they should enter 'quick & brown & fox & jumps & lazy & dog' instead of plain 'The quick brown fox jumps over the lazy dog'.
Edit: I ended up actually trying your sample query, and it seems you're actually running into a parser issue:
# SELECT alias, description, token FROM ts_debug('Astronomic Event 2013/1434H - Aceh');
alias | description | token
-----------+-------------------+------------
asciiword | Word, all ASCII | Astronomic
blank | Space symbols |
asciiword | Word, all ASCII | Event
blank | Space symbols |
file | File or path name | 2013/1434H
blank | Space symbols |
blank | Space symbols | -
asciiword | Word, all ASCII | Aceh
(8 rows)
http://www.postgresql.org/docs/current/static/textsearch-parsers.html
It looks like you might need to write (or find) and configure an app-specific parser. Having never done this personally, the best I can do is to highlight that Postgres allows this and includes a sample:
http://www.postgresql.org/docs/current/static/test-parser.html
Alternatively, change your tsvector-related trigger so that it matches e.g. \d{4}/\d+[a-zA-Z] or whatever seems most appropriate, and adds spaces accordingly, before converting it to a tsvector. Something as simple as the following might do the trick if you never need to store file names:
SELECT alias, description, token
FROM ts_debug(replace('Astronomic Event 2013/1434H - Aceh', '/', ' / '));

Easy to remember fingerprints for data?

I need to create fingerprints for RSA keys that users can memorize or at least easily recognize. The following ideas have come to mind:
Break the SHA1 hash into portions of, say 4 bits and use them as coordinates for Bezier splines. Draw the splines and use that picture as a fingerprint.
Use the SHA1 hash as input for some fractal algorithm. The result would need to be unique for a given input, i.e. the output can't be a solid square half the time.
Map the SHA1 hash to entries in a word list (as used in spell checkers or password lists). This would create a passphrase consisting of real words.
Instead of a word list, use some other large data set like Google maps (map the SHA1 hash to map coordinates and use the map region(s) as a fingerprint)
Any other ideas? I'm sure this has been implemented in one form or another.
OpenSSH contains something like that, under the name "visual host key". Try this:
ssh -o VisualHostKey=yes somesshhost
where somesshhost is some machine with a SSH server running. It will print out a "fingerprint" of the server key, both in hexadecimal, and as an ASCII-art image which may look like this:
+--[ RSA 2048]----+
| .+ |
| + o |
| o o + |
| + o + |
| . o E S |
| + * . |
| X o . |
| . * o |
| .o . |
+-----------------+
Or like this:
+--[ RSA 1024]----+
| .*BB+ |
| . .++o |
| = oo. |
| . =o+.. |
| So+.. |
| ..E. |
| |
| |
| |
+-----------------+
Apparently, this is inspired from techniques described in this article. OpenSSH is opensource, with a BSD-like license, so chances are that you could simply reuse their code (it seems to be in the key.c file, function key_fingerprint_randomart()).
For item 3 (entries in a word list), see RFC-1751 - A Convention for Human-Readable 128-bit Keys, which notes that
The authors of S/Key devised a system to make the 64-bit one-time
password easy for people to enter.
Their idea was to transform the password into a string of small
English words. English words are significantly easier for people to
both remember and type. The authors of S/Key started with a
dictionary of 2048 English words, ranging in length from one to four
characters. The space covered by a 64-bit key (2^64) could be covered
by six words from this dictionary (2^66) with room remaining for
parity. For example, an S/Key one-time password of hex value:
EB33 F77E E73D 4053
would become the following six English words:
TIDE ITCH SLOW REIN RULE MOT
You could also use a compound fingerprint to improve memorability, like english words followed (or preceeded) by one or more key-dependent images.
For generating the image, you could use things like Identicon, Wavatar, MonsterID, or RoboHash.
Example:
TIDE ITCH SLOW
REIN RULE MOT
I found something called random art which generates an image from a hash. There is a Python implementation available for download: http://www.random-art.org/about/
There is also a paper about using random art for authentication: http://sparrow.ece.cmu.edu/~adrian/projects/validation/validation.pdf
It's from 1999; I don't know if further research has been done on this.
Your first suggestion (draw the path of splines for every four bytes, then fill using the nonzero fill rule) is exactly what I use for visualization in hashblot.