Programmatically merge cells in Openoffice - perl

I want to programm a script, which should generate a OpenOffice-Calc table.
I have downloades the package "libooolib-perl" for Debian, and it works good, but I have a problem:
I can't concentrate Cells. I want the headline look like that:
This is the Head-Line of the Document |
This is subheadline 1 | This is subheadline 2 | This is subheadline 3 |
This is content 1 | This is content 2 | This is content 3 |
This is content 4 | This is content 5 | This is content 6 |
As you see the first Line contains 3 cells. As far as I know, I am not able to archive this by using csv or another non-binary format, so I need a proper Library, which can concentrate cells.

cellSpan does the job!
use OpenOffice::OODoc;
my $document = odfDocument(file=>'filename.odt',create=> 'text');
my $table=$document->appendTable("Table", 4, 3);
$document->cellSpan($table,"A1",3);
$document->cellValue($table, "A1", "This is the Head-Line of the Document");
#(...)
$document->save;

It would appear that the linked perl module does not support merging cells.
Perhaps the documentation of OpenOffice document format helps:
http://books.evc-cit.info/oobook/book_onepart.html#merged-spreadsheet-cells-section
It contains code samples, albeit in python, perhaps you can use the knowledge to implement the missing function in libooolib-perl

Related

What is the proper format for uploading a multi-label multi-class classification datasets with text and label in Doccano?

I have a question that I'd like to upload datasets to my doccano annotation project in which the labels have been already set beforehand in 8 classes with tags.
I'd like to know what is the correct uploading format of CSV or JSON for multi-label classification datasets with text and label column.
For example, I have 8 classes (a, b, c ,... ,h)
When I upload the file in this kind of format:
| text | label |
| ------ | --------- |
| text_1 | [a, b] |
| text_2 | [a, b ,c] |
| text_3 | [a, c] |
It is expected for text_1, it will only shows a and b, yet it turn out to be like [a, b]
Another example with screenshot.
0-7 are my project defined classes, in this cases it is expected only showing the correct marks in the labels with tags number 5 and 6. However it return a lot of mixing label list.
How do I modify my uploading dataset format to do it?
I found a solution,
there are a lot of mistaken labels in this project since at the beginning I upload the label column in the wrong format "[a, b]" (while it requires array) and it is stored inside the project. This kind of wrong label may mess up the following upload
my debugging step:
delete all labels in label management
re-create the label with tags
re-upload the file with JSON format and it works
Now the annotation is fine like:

How to write in two columns like a table in Linux man pages?

I'm creating a custom man page for my C library, and I'd like to do a thing like this
LIST OF FUNCTIONS |<--- terminal window side
|
Function Description |
function1 function1's description |
function2 function2's description |
which is longer than the |<--- here if the text
first one | overlaps out of the window,
function3 function3's description | it auto-aligns to Description
... ... |
How could I do that?
I think that it's a combination of https://tldp.org/HOWTO/Man-Page/q3.html and then use GROFF - https://www.linuxjournal.com/article/1158
.SH DESCRIPTION
.B foo
frobnicates the bar library by tweaking internal
symbol tables. By default it parses all baz segments
and rearranges them in reverse order by time for the
.BR xyzzy (1)
linker to find them. The symdef entry is then compressed
using the WBG (Whiz-Bang-Gizmo) algorithm.
All files are processed in the order specified.
There is a command on the linuxjournal site with the following:
$ groff -Tascii -man coffee.man | more
The groff man page starts with the following:
The man macro package for groff is used to produce manual pages
(“man pages”) like the one you are reading.

Parsing a text file or an html file to create a table

I have a simple issue with a .msg file from outlook, but I discovered that with a code someone helped me with, it was not working since the htmlbody from the .msg file would vary between different emails even though they are from the same source, so my next option was to save the email as a .txt and .html file, since I have no knowledge of html I have no idea how to grab the table which is structured in the html with a . but on the text I found something easy, for example this is data from one table:
Summary
Date
Good mail
Rule matches
Spam
Malware
2019-10-22
4927
4519
2078
0
2019-10-23
4783
4113
1934
0
this is on the text file, Summary is the keyword, and after that key word, the next 5 lines are the columns of the table, after that ,each 5 lines following are the rows, this goes up to 7 rows in total, so headers and then 7 rows.
Now what I want to do is create a table from this text using the 5 first lines after summary as my columns. Since each .msg is different, this 5 columns will change order on each file randomly so I want to avoid this, my best attempt was to use convertfrom-string to create a table , but I have little idea on how to format the table with the conditions set above.
The problem I have is this simple, I have a table on the txt file shown as above, with 5 columns, each column besides the headers contains 7 rows, therei s also the condition that the email since it has more data, I need to stop there nad just grab that part which should be easy.
How can I use convertfrom-string to create the table using those 5 columns , how can I set the delimiter as a new line and how can I set the first 5 lines as the column headers?
I think trying to make this work with ConvertFrom-StringData is adding more work than necessary. But here is an alternative that works with your sample set.
$text = Get-Content -Path File.txt
$formattedText = if ($text[0] -match '^Summary') {
for ($i = 1; $i -lt $text.count; $i+=5 ) {
$text[$i..($i+4)] -join ','
}
}
$fomattedText | ConvertFrom-Csv | ConvertTo-Html
Explanation:
If we assume your text data is in File.txt, Get-Content is used to read the data as an array ($text). If the first line begins with Summary, the file will be parsed.
The for loop is used to skip 5 lines during each iteration until the end of the file. The for loop begins with $text values (indexes 1, 2, 3, 4, and 5) joined together by a ,. Then the index increment ($i) is increased by 5 and the next five index values are joined together. Each increment will create a new line of comma separated values. The reason for the , join is just to use the simple ConvertFrom-Csv later.
ConvertFrom-Csv converts the CSV data into an array of objects ($formattedText) with the first row becoming those objects' properties.
Finally, the array is piped to ConvertTo-Html, which will output all of the objects in a table.
Note: If you want to resize or add extra format to the table, you may need to do that after the code is generated. If your data has commas, you will need a different delimiter when joining the strings. You will then need to add the -Delimiter parameter to the ConvertFrom-Csv with the delimiter you choose.
Adaptation:
The code is fairly flexible. If you need to work with more than five properties, the $i+=5 will need to reflect the number of properties you need to cycle through. The same change needs to apply to $text[$i..($i+4)]. You want the .. to separate two values that differ by your property number.

Org (version 7.9) converts periods and hyphens in tables to 0

I am using the Org mode that comes with Emacs 24.3, and I am having an issue that when Org creates a table from the result of a code block it is replacing characters like '-' and '.' with 0 (integer zero). Then when I pass the table to another code block that's expecting a column of strings I get type errors etc.
I haven't been able to find anything useful, as it seems to be practically un-Googleable. Has anyone had the same problem? If I update to the latest version of org-mode, will that fix it?
EDIT:
I updated to Org 8.2 and this problem seems to have gone away. Now I have another (related) problem, where returning a table with a cell containing a string consisting of one double quote character ('"' in python) messes something up; Org added 2 extra columns to the table, one had something like
(quote (quote ) ())
in it. The reason my tables have things like this in them is that I'm working with part-of-speech tags from natural language data.
It's pretty obvious Org is doing some stuff to try to interpret the table contents, and not dealing well with meta characters. Technically I think these are bugs where Org should be dealing better with unexpected input.
EDIT 2:
Here is a minimal reproduction with Org 7.9.3f (system Python is 3.4):
#+TBLNAME: table
| DT | The |
| . | . |
| - | - |
#+BEGIN_SRC python :var table=table
return table
#+END_SRC
#+RESULTS:
| DT | The |
| 0 | 0 |
| 0 | 0 |
Incidentally, Org does not like the '"' character at all, in tables or in code blocks (I just get a "End of file during parsing" message when the above table has a cell with just '"' in it). It's probably just better to avoid it altogether, so I think my problem is solved. If nobody wants to add anything, I'll answer this myself in a day or so.

Search text between symbol

I have this text (taken from concatenated field row)
Astronomic Event 2013/1434H - Aceh ....
How do We search it by 2013 or 1434h keywords?
I have tried below code but it resulting no row.
to_tsvector result:
'2013/1434h':8,12 'aceh':1 'bin.....
Sample Case:
WITH sample_table as
(SELECT to_tsvector('Astronomic Event 2013/1434H - Aceh') sample_content)
SELECT *
FROM sample_table, to_tsquery('2013') q
WHERE sample_content ## q
How do We search it by 2013 or 1434h keywords?
It seems like you want to replace:
to_tsquery('1434h') q
with:
to_tsquery('1434h | 2013') q
http://www.postgresql.org/docs/current/static/functions-textsearch.html
Side note: the to_tsquery() syntax is extremely capricious. It doesn't allow for much if any fantasy, and many of the assumptions in Postgres are everything but end-user friendly.
More often than not, you'll be better off using plainto_tsquery(), which allows any amount of garbage to be thrown at it. Thus, consider pre-processing the string before issuing the query. For instance, you could split the string, and OR the original parts together:
where sc.text_index ## (plainto_tsquery('1434h') || plainto_tsquery('2013'))
Doing so will make your code a bit more complex, but it won't rely on your users needing to understand that (contrary to what they're accustomed to in Google) they should enter 'quick & brown & fox & jumps & lazy & dog' instead of plain 'The quick brown fox jumps over the lazy dog'.
Edit: I ended up actually trying your sample query, and it seems you're actually running into a parser issue:
# SELECT alias, description, token FROM ts_debug('Astronomic Event 2013/1434H - Aceh');
alias | description | token
-----------+-------------------+------------
asciiword | Word, all ASCII | Astronomic
blank | Space symbols |
asciiword | Word, all ASCII | Event
blank | Space symbols |
file | File or path name | 2013/1434H
blank | Space symbols |
blank | Space symbols | -
asciiword | Word, all ASCII | Aceh
(8 rows)
http://www.postgresql.org/docs/current/static/textsearch-parsers.html
It looks like you might need to write (or find) and configure an app-specific parser. Having never done this personally, the best I can do is to highlight that Postgres allows this and includes a sample:
http://www.postgresql.org/docs/current/static/test-parser.html
Alternatively, change your tsvector-related trigger so that it matches e.g. \d{4}/\d+[a-zA-Z] or whatever seems most appropriate, and adds spaces accordingly, before converting it to a tsvector. Something as simple as the following might do the trick if you never need to store file names:
SELECT alias, description, token
FROM ts_debug(replace('Astronomic Event 2013/1434H - Aceh', '/', ' / '));