How can I extract a specific table and copy from MS Word to Excel in Perl? - perl

I have an MS Word 2003 file which contains several tables in it and I want to extract a specific table contents. For example, tables will be coming under some sections and I want to extract the contents of the table that are coming under section 6 alone and no other table contents, I want copy those contents to an new Excel sheet with formatting.
SECTION 4
Table data
Table data
Table data
SECTION 5
Table data
Table data
Table data
SECTION 6
Table data # TABLE DATA TO BE EXTRACTED AND IMPORTED TO AN NEW EXCEL SHEET
Table data # TABLE DATA TO BE EXTRACTED AND IMPORTED TO AN NEW EXCEL SHEET
Table data # TABLE DATA TO BE EXTRACTED AND IMPORTED TO AN NEW EXCEL SHEET
SECTION 7
Table data
Table data
Table data

Unless you are thinking of using something like antiword, your starting point is the Perl module Win32::OLE, which is installed as part of ActiveState Perl. You need to start a Microsoft Word application using OLE, open your document, then look at the Sections object of the document object, find the Section object which is your section six, then look at the Tables property of the Range property of the Section object, and find the Table object you want in it.
Copying to an Excel sheet involves something similar.
It's difficult to write a code example unless I have a document to work with, so I'm not even going to attempt that.

Related

Display comma separated values from an SQLite database in LibreOffice Base

I have an SQLite database used in QGIS with geographic data.
Some of the fields can store multiple values.
They are list values stored in a separate table. The name of the value is taken from the primary key of each item.
And QGIS stores them in the related field like this {3,4,9,10,11,15,16}.
For example, for a table 'list1'
1 AA
2 BB
3 CC
etc.
How in LibreOffice base I can create the same checkboxes in a new form from this QGIS database?
I cannot even manage a way to get multiple values.

Is it safe to change the data source format to OLE DB database file when using a htm/html as the datasource to remove the limitation of 62 fields?

I have a mail merge datasource which is in htm/html format and it contains 70 fields. Since there is a limitation of 62 fields for such datasources(Reference).
Is it safe to change the datasource type to OLE DB database file in the confirm datasource dialog when selecting the datasource?
When you choose the "All web pages" type (and this is the default type in the case of an HTML file), you are in essence choosing a Word internal file converter to retrieve your data. The reason you end up with the concatenated columns is because
The internal converter is not primarily designed to "read data sources". It's there to convert a document in HTML format into a document in Word format.
Your HTML file contains a table in HTML format, so naturally, the converter tries to convert that into a Word table
However, Word tables can only have 63 columns, whereas HTML tables can have more, so the converter has to deal with that somehow. In this case, it concatenates the column data so column 63 ends up containing all the remaining data in the row.
Once the document is converted, Word uses the converted document as the data source. It's really no different from the situation where it uses a Word document as the data source.
If your HTML file actually contained (say) 1 paragraph of 70 comma-delimited values for each row of data, rather than an HTML table row with td cells, Word would end up treating the data as 70 separate columns (but it would also probably ask for the column delimiter every time you used the file, and you would have to ensure that commas in the data were correctly quoted.
In general, when you choose the "OLE DB Database Files" option, Word either knows of an OLE DB Provider type that can read the specified type of file, or it won't be able to read the file. In this case, what it tries to do is read the file using the Jet OLE DB provider (or in recent versions of Word, the ACE OLE DB provider).
The Jet/ACE providers are one of the mechanisms used to read Access .mdb/.accdb data, but these providers can read a number of formats such as Excel workbook data and plain text file data, using a number of what Jet/ACE calls "Installable ISAMs (IISAMs).
Since there is an IISAM for HTML format data, Word will try to get the data using that IISAM.
In that case, as long as the IISAM can actually read the HTML (it may not be able to read more modern versions of HTML very well) it works much more like the case where Word gets data from Excel. For example, if your HTML file contained two tables, you may get to choose which table to read, cf. an Excel workbook with multiple worksheets and perhaps named ranges.
Jet/ACE IISAMs generally do not support more than 255 columns. 70 shhould be fine. However, you may need to verify what the HTML IISAM does about
Columns with mixed data types (for example where some rows have numbers in them and others have text). When the Excel IISAM finds such data in the first 8 rows (by default) it tries to choose a format - somtimes that can mean that cells with text are read as if they contained "0". FWIW I do not think the HTML IISAM does that, but I would check anyway.
Columns with large amount of text, particularly if there is more than one such column. The IISAM is quite likely to truncate such columns to 255 characters or even less.
Columns with non-ANSI data (non-ANSI Unicode text e.g. Arabic, Hindi or Chinese text.
Other than delimited text files which will let you go over theat 255 limit if they are read by the internal converter, the only data source I know that will let WOrd see thousands of columns is SQL Server. Other servers with OLE DB providers such as MySQL might allow that too. If you have to use a very large number of columns, be aware that you may not see all the available field names in the relevant dropdowns in WOrd, but you should be able to insert the MERGEFIELD codes in manually in the usual way.
What is your current mailmerge connection method (OLEDB, DDE)? By switching to the OLEDB connection method - which is Word's default - you would not be changing the datasource type (only the connection method). Whether doing so will work with your datasource can easily be established by changing to OLEDB and leaving the datasource type alone. If it doesn't work, close the document without saving (or revert to the current connection method.
Regardless, the screen you're showing allows you to specify a datasource type, not the connection method. HTML files are not OLE DB database files and you'd be unlikely to find your datasource if you switched to that file type.
In any event, the 62-field limitation most likely only relates to the fields you can see via the GUI. If you know the field name, you can insert its reference via the keyboard. To do so, simply press Ctrl-F9 to create a pair of field braces (i.e. { }) and fill in between them with 'MERGEFIELD' and the field name, thus { MERGEFIELD FieldName }.

How to copy data from an a csv to Azure SQL Server table?

I have a dataset based on a csv file. This exposes a data as follows:
Name,Age
John,23
I have an Azure SQL Server instance with a table named: [People]
This has columns
Name, Age
I am using the Copy Data task activity and trying to copy data from the csv data set into the azure table.
There is no option to indicate the table name as a source. Instead I have a space to input a Stored Procedure name?
How does this work? Where do I put the target table name in the image below?
You should DEFINITELY have a table name to write to. If you don't have a table, something is wrong with your setup. Anyway, make sure you have a table to write to; make sure the field names in your table match the fields in the CSV file. Then, follow the steps outlined in the description below. There are several steps to click through, but all are pretty intuitive, so just follow the instructions step by step and you should be fine.
http://normalian.hatenablog.com/entry/2017/09/04/233320
You can add records into the SQL Database table directly without stored procedures, by configuring the table value on the Sink Dataset rather than the Copy Activity which is what is happening.
Have a look at the below screenshot which shows the Table field within my dataset.

Tableau does not import all records from data source (.txt)

I have a simple text file as a data source of customer names with three columns. 'Refresh All Extracts' do not include certain blocks of customers. For example, the key to the table is a state/customer. In the data source tab, this table omits specific states, not just random records. It's repeatable, but can't understand why a state here or there would be omitted in the Data Source.
EDIT:
I turned on Data Interpreter. It tells me data has been excluded, but the data source tab with my table doesn't have any records marked with * DATA REMOVED *. If it did, would I expect to see a reason why?

How to create a row record with a word attachment in a table?

In my java web jsf seam jboss application, I want to add a row in a table with some data fields such as date type, String etc, but one field will be a word attachment, does anybody know any example of how to do it? thanks a lot for any help,
Basically, there is a loop to create some rows of data in a table, before each table row data creation, I will create a word file and saved to a local path(the physical file path is hardcoded), then I will create a table row and attach this word file to that row. the loop will continue until the end of loop.
I appreciate if anybody can know any resource or example codes.
Thanks a lot,