Building an automated script to copy a page of text, translate it in Google Translator and paste it in Word - powershell

So, as the title says, I would like to make an automated script that is going to take all the text from one PDF page, copy it, paste it into Google Translate and then copy the translated text into another Microsoft Word document.
Since that PDF has a lot of pages (150+), I thought it may be easier to make an automated script to do that.
What language would I have to use, would it be complicated for me to do it and in the end, will I actually save time by using this script (implying that I have to learn it first, but I have some programming experience (I know C++, Javascript, PHP), but I do not have a strong grasp of algorithms (like Flood Fill, ...))?
Thanks in advance!
EDIT : I found that I could use AutoIt for scripting... but I don't know would I be better off using AutoIt or Powershell... I also want to learn something that would be enable me to create other scripts (for example to automate some processes I do in Camtasia Studio)... So, AutoIt or Powershell?

As an AutoIt user I would say AutoIt.
Copying text out of PDFs is not quite as simple as you might imagine. Mileage will vary on how the PDF was created, and there are several methods you can use:
Most PDFs will have most of the text in the file itself, allowing you to get the text using a simple method like this
This method uses zlib to do something to the pdf. Not sure what as I've never tried it.
There are a variety of examples of using third party programs to do this, which may be better. There is one using Debenu and another using XPDF
Automating other programs such as acrobat should be possible, in acrobats case they have an api that can be used, though I'm not aware of this already being wrapped in AutoIt.
As to the rest of the requirements, there is a UDF to translate with google translate here, and the word UDF is a standard one that comes with the AutoIt installation.

Related

Create a desktop app that generate ms word file

I'm now working with tons of MS Word files and trying to find a way in improving my workflow.
I'm wondering if there's a way to create a desktop app which can preview certain parts from a Word file, select them and generate a new one with controls in Word's text style, paragraph, etc.
I supposed that this would take MS Word API and some frame structure particularly. I've been using Electron/node.js to create some cross platform applications, wondering if it can do as well? Or is there any reference that I can dig in?
Sorry if this sounds like a rookie one. I've tried to search but still can't find out where to start.
There are three possible ways to get the job done:
Automate MS Word to get job done. See Automate MS Office Applications using Python win32com module for more information. For example:
import win32com.client
word = win32com.client.Dispatch("Word.Application")
Use the Open XML SDK for generating Word documents at runtime, see Welcome to the Open XML SDK 2.5 for Office for more information.
Use third-party components.
If you are on Windows, there seem to be some way to access Word files in Python: https://www.blog.pythonlibrary.org/2010/07/16/python-and-microsoft-office-using-pywin32/. Maybe in node too.

Powershell import encoded module

I am writing a script with a lot of modules but I don't really want the user to see my source code so I figured to encode everything in base64 since the user won't be able to decode it even if it is that basic.
I tried to somehow add an encoded module but no luck.
So my question is -
Is it possible to import a base64 encoded module to the main script file?
If you have any better solutions to hide source code please share, I would be more than happy to try them out.
P.S. I tried to find some info on making a .dll files but found out I would have to rewrite the script in C#. (if I didn't missed anything)
Also I tried to put all modules into one encoded file, but then the file gets too big and Powershell is not able to process it anymore.
You've got two options, which can be combined if you would like to be extremely sure that no one will be able to access your code, making your code into an exe was already mentioned, there are several projects to do this but This one is nice as it is wholly contained within PS. The other, imo better, method is to use an obfuscator, which will take your code and replace variable names with nonsense strings and make other changes to make your code very difficult to read, it's still possible to work out your code but generally not worth the effort, you can find a working one Here. But I do have to add that obfuscating your code really goes against the powershell ethos and I recommend against doing it unless you have some sort of requirement too being passed down from management. And please note that this NOT an acceptable method of obscuring code that includes passwords, api keys, or any other information that needs to be secured as all of those are quite easy to extract from code that has been obfuscated this way.
You could change your ps1 to an exe file by using
https://ps2exe.codeplex.com/
You'd still be able to get at the code if you tried, but it would prevent a casual look.
Why do you want to hide the modules?

modify a template PDF using iTextSharp server side

So I'm brand spanking new to iTextSharp and I know I have quite a bit of reading ahead of me but in an attempt to shave a bunch of time off a relatively trivial task I thought I reach out the stack brain-trust.
I have a very simple goal: Starting with a template pdf, I need to create new pdf with a few of the characters changed. We're talking single characters on each page. I don't need a detailed answer complete with code (although that'd be awesome) so much as a general list of tools and api's I'm going to need.
The data I need will already be in a db which I could output to xml files if need be.
So far it looks like my template will need the "editable" characters tagged somehow (not sure how to do that yet) and using PDFStamper I can modify the copy. Is that the right path or is there a better way?
Thanks for any insight.

Is there currently a way to get Emacs muse-mode to output rtf,odt or doc format?

Muse is a special mode in emacs that can be used as a wiki. It has multiple output formats like static HTML pages, LaTeX, PDF etc.
But sometimes I need to output something that less tech-savvy people can edit/correct and send back to me.
I think either RTF, ODT or DOC would do the trick.
My problem is that muse only supports HTML, LaTeX, TexInfo and XML out of the box.
Implementing an own output format is currently not an option as I cannot program in elisp and learning it would take too much time.
I searched for a way to convert to or use markdown as pandoc can convert to RTF. But I found only the following discussion that does not solve my problem.
My last resort would be to convert to HTML and then to RTF, ODT or DOC but AFAIK the results are far from great.
It would appreciate a solution that can be automated (with custom scripts).
I think, that importing of HTML into MS Word (or compatible processor) should work. As I remember, OpenOffice had some scripting support, so you can launch it, and perform some commands inside it.
Another way - writing RTF export backend, it shouldn't be too complicated, although it could be too much details to be taken into account. If you'll go this way, please write to muse mailing list, and I'll try to help you

Is there a Platform-independent Web-based replacement for Word Templates?

The above Title is my Manager's words, not mine. :)
This is a follow-up to a question that I posted previously. After reading my assessment on the impacts of converting Word Templates from PC to Mac, I have now been asked to investigate whether Word Templates can be replaced with a "Platform-independent Web-based solution" (her words, not mine). She has suggested using Adobe Forms (ie. Adobe Designer).
Personally, I think the only truly platform-independent web-based solution is text files or html forms. What do other people think?
It's called WordprocessingML (aka. WordXML, WordML)...
Overview of WordprocessingML [Word 2003 XML Reference] at http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx.
MSDN Search for "WordML" at http://social.msdn.microsoft.com/Search/en-US?query=WordML&ac=3
It could be called XForms...
The Web was suppose to be platform-independent electronic documents. In other words, if you truly want platform-independence, then I agree with you and your forms should be in HTML. Yet, HTML forms are really not a good development platform. That is why Adobe, Microsoft, and others provide "form" solutions. XForms is an attempt to make developing and using HTML forms more flexible, overcome its limitations, and provide a platform-independent object model for completing HTML forms. You might want to look at XForms at http://www.w3.org/MarkUp/Forms/.
But, I wouldn't call it PDF
In my opinion, working with PDF files is difficult. I have not looked at the file format specification, but I heard it is not trivial. Moreover, you need a custom editor and you are locked into one vendor, which is Adobe. (Yet, there are other open-source and vendors who support the file format.) Adobe is not know for creating programs that are easy to use.
My Suggestion
If you are already using Word, then moving to WordML should be fairly easy. You can easily convert your existing Word documents into WordML by simply saving them as XML from the Save Dialog; therefore, you can automate this process through code. In addition, I believe WordML supports form templates (the actual form) and data documents (the actual data for a form).
It's called PDF...
At the core (and without the million of extra unnecessary features" that's exactly the niche that Adobe PDFs were designed to fill.
I'd suggest you look more into Adobe Acrobat Professional for more info. Although, I don't think there's any good way to directly convert Word docs to PDF format.
Note: This question should be moved to Super User since it's not really programming related
Google Docs meets those requirements of a Platform-independent Web-based solution. Your mileage will vary with Google Docs though - if you just want to use it for letters, it's good. Much beyond that, it's rather limited. Unless you get the Premier (read: Corporate) version which you have to pay for, you won't be able to programmatically fiddle with the templates.
If you want a "Platform-independent solution", go with ODF or OOXML. You can make either "web-based" to your hearts content - maybe with HTML5 or another solution such as Flash or Silverlight.