Mass wrap html-files and save in utf-8 - encoding

I exported apple notes to html files by Automator scripts. I need them in HTML to import in Notion.
There are 2 main problems:
Files saved with wrong encoding, need to save in UTF-8
There's no <html>, <head>, <body> tags.
There are many of them, so can't edit them manually.
Files look like that
<div>some text</div>
<div>another line</div>
Is there any way I could do this in Sublime Text 3 or with terminal?

Related

Is it possible to build a LibreOffice document from code similar to the way a web page is built from HTML and CSS?

Is it possible to build a LibreOffice document from code similar to the way a web page is built from HTML and CSS? Can one write an ODF file in which the content and styling are separate, and then/view open in LibreOffice? If so, can one write the code in a text editor as done for HTML/CSS?
There area two reasons I now ask. 1) When I need to make a style change in LibreOffice I have to manually make the same adjustments in a hundred places, such as changing the style of block quotes. 2) I'd like to build documents from a database of text.
I found a question on this in relation to databases but it was about eight years old.
Thank you for any direction you may be able to provide.
Unzip an .odt file that contains styles. You will see two files, content.xml and styles.xml. Edit these files using a text editor and then zip the folder back up to get a modified .odt file.
Be aware that there are two types of styles in the XML files. Named styles are what most people think of as styles, whereas automatic styles are custom formatting, like when you select some text and change the font directly.
The link from tohuwawohu describes utilities to work programmatically with the file. Also as mentioned in the link, it's not too hard to write code yourself. For example in python, import the built-in libraries zipfile and xml.etree.

How do I automate converting PDF to HTML?

I work for a publisher and am trying to extract content from our fully laid out PDFs. I've tried pdftohtml, pdftotext, pdfminer, and other Python-based approaches to getting the content, as well as saving to Word, HTML, XML, etc. from the original Acrobat files.
I don't need just the text, I also need the text formatting. That's because, for example, I need all the blue text in the document.
When I save to HTML, Word, etc. from Acrobat, the resulting files contain screenshots of the pages, not the laid out text. When I extract text using different Python modules I get the text but lose the text formatting.
The only solution I've found is to manually copy and paste from the PDF into a word doc, then saving as HTML. I'm hoping to automate this.
Why does copying from Acrobat into Word achieve what I can't do by other means? Has anybody come across this problem before?
Maybe you can consider another method. The software (https://pdfapi.codeplex.com/) can convert pdf files to html directly via MVS. If you are able to use the MVS, i think the software i mentioned above is useful for you to convert the text in pdf files to html that can keep the format perfectly. Of course, it's just a referral, you can have a try.

how to create .doc or word file in iphone by code [duplicate]

I have an iPhone app consisting of a few forms in which I collect data from users. Now at the end of these forms, after user has filled all data, I want that all the collected data is exported and a MS word .doc file is generated. The data too is not simple text. There are headings, tables along with normal text in it. Is there any way I can accomplish this?
Short answer yes, long answer:
You can't do this to create "proper" Word documents, however you should be able to acomplish this on any platform by building the word doc from HTML and saving it with a .doc extension (instead of HTML). You can put anything in there, custom layouts - I'd probably stick to paragraphs and tables and floated elements (like imgs and such).
There may be extra code you will need in the HTML doc (for instance to make it open in page view rather than in HTML view) but you can figure all that out by saving a word doc in HTML format. :) There's also a lot of information on the internet about it if you know where to look.
I did something like this not long ago. I'll see if I can find an example and post it here.
Update
This is the only "custom" stuff I have in my html word doc:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">
And this - to make it open in Page view:
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Print</w:View>
<w:Zoom>100</w:Zoom>
</w:WordDocument>
</xml><![endif]-->
The rest of it is just standard HTML and CSS (remember to put CSS INSIDE the HTML document in <style> tags - word isn't going to remotely fetch your css files).
If it is acceptable to be connected when you produce your document, you could use an on-line service like the Docmosis cloud service. It can do the mail merge and deliver the document in various formats. It can be called from iOS.
Hope that helps.
Prepare appropriate html file as per required doc format.
Add this line at the top
<html xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:w='urn:schemas-microsoft-com:office:word' xmlns='http://www.w3.org/TR/REC-html40'>
and save html file as .doc format.
It will resolve your problem.
For more formatting options refer this link.
http://sebsauvage.net/wiki/doku.php?id=word_document_generation

How can I properly display Vietnamese characters in ColdFusion?

I having a hard trying to properly display Vietnamese text in ColdFusion. I've proper charset set to UTF-8 but still no luck. The same texts work fine in a HTML page. What else am I missing? Any suggestion would be much appreciated.
Html:
ColdFusion:
Thanks!
There are two things you need to watch out for, as far as I recall of the top of my head.
The first is to ensure that the .cfm file itself is saved as UTF-8 - this is a file system option, and will probably be settable in your editor. This ensures that the UTF-8 characters are correctly preserved when saving the file.
The other is that every .cfm file that includes any UTF-8 text should start with:
<cfprocessingdirective pageencoding="utf-8" />
This ensures that ColdFusion delivers the page to the browser in the correct format.
Just to be sure, when you display your working HTML, can you check the page encoding used by your browser (ie. in FireFox you can right-click+page Info). Maybe your text is not UTF-8 encoded that could explain the problem...

How to paste HTML to clipboard with GTK+

How do I paste HTML to the clipboard so that it is recognized as HTML in applications such as Open Office and MS Word? It is possible when using gtkhtml or gecko if you've already rendered it, but I need a straight GTK+ solution.
You call gtk_clipboard_set_with_data or gtk_clipboard_set_with_owner, passing a GtkTargetEntry with "text/html" as the value for the target field.
It's good practice to also provide "UTF8_STRING" and "STRING" targets for applications that don't support HTML.
Here's an example of some code that does this: GEdit HTML clipboard plugin.