Import docx file with comments into emacs org (vision) - emacs

I collaborate with other researchers and frequently have the following work flow:
I write a draft in Emacs org, then export it to docx.
Other authors make edits using track changes and add comments.
I revise the draft in emacs org.
For step 3, I import back the docx file manually, which typically involves:
- Accepting all track changes.
- C&P'ing text back into the org file, making sure that I do not delete markups (pandoc can help here).
- Putting the comments in a list and making todos and further edits; often I write down a note about what I did to address the comment.
I've been looking for ways to make this process better. I found other discussions of this issue, and it boils down to: if you can, have your collaborators edit the manuscript as a text file (not realistic for me, at list not at this point); or do some manual import similar to the one I described above.
So this post is about your thoughts / ideas regarding a great solution to importing back edited docx files that might become reality in the future, and how it could be done.
I think there are two parts here:
How to automatically import back text without destroying markups such as footnotes, references etc.?
How to automatically extract all the notes and integrate them into the Emacs org file?
For the second question, my vision would be to have some sort of comment blocks above the paragraph of the comment, or a list of headlines, each of them representing a comment and a link to the paragraph. A properties drawer would be a great additional feature, it could have one entry for open/closed and one entry for response / notes.
P.S.: I think this is a real barrier to using text-based manuscript writing and it would be a huge step forward if there was a good way. Even more, with all the capabilities of Emacs org, I bet the end result would be much better then revising a paper within word, which is just painful.

Here's how you might be able to do it
assume all changes are properly marked.
assume you know the "base version" of your org file.
assume every marked change comes with a "before" and an "after".
Then, analyze the .docx (same for .odt) looking for marked changes. Ignore everything else. Take the "before" version of each change, turn it into plain text, and try to find the matching element in the org file, then replace that text with the "after" version.
For comments, you could probably try a similar approach.
Caveat: I have no idea how easy/hard it is to find the marked changes, extract the "before/after" info and turn it into plain text.
Oh, and this will probably only work acceptably for small localized changes, e.g. the kind of thing you might get from a reviewer. For things coming from another author who may end up making larger changes and reorganizations it'll probably break down miserably.

Related

Is there a way to prevent MS docx document editing in OpenOffice?

I know this is too strange question, but we have multiple authors of one document and some contributors use OpenOffice to edit document, originating and edited by majority in MS word. Document is quite complex with differently structured paragraphs and fonts, bullets, numbering, embedded pictures, references to comments under the line, copied/pasted sections pasted with source formatting instead of pure text etc., so generally "fragile" and maybe little bit exceeding expectations of OpenOffice authors for MS compatibility. Bottom line is about various formatting issues, glue-ing of some words (occasionally space is missing), page footer/header modified or completely disappeared etc. We are unable to control behaviour of contributors and editors to the extent I would like to have, so I am trying to findout whether is there a way how to force users to use exclusively MS word for particular docx and to prevent using anything else? (I am not on MS payroll, I personally moved couple of people around me with "standard" document writing needs to OpenOffice, but incompatibility in this case creates useless redaction work for us.)
Thanks for any hint.
whether is there a way how to force users to use exclusively MS word for particular docx and to prevent using anything else
To me, it sounds like a terrible idea to try to enforce this with a macro or similar (and it probably wouldn't work even if you tried). Instead, come up with a better workflow and communicate with anyone who may be involved so they know what to do.
First question, is the document under configuration control? For example, if a bad change is made, do you have a way of going back to a previous version? There are many different configuration management tools available, both free and commercial.
Next, I would strongly recommend making final changes with only one Office suite. Pick either LibreOffice (or Apache OpenOffice - is that what you mean by OpenOffice? The OpenOffice.org suite was forked several years ago) or MS Word to be the official editing tool, but not both.
If you pick MS Word, then people can still make preliminary changes to the document using LibreOffice. However, someone with MS Word will then need to use a Diff tool to see the changes and then use MS Word to incorporate those changes into the document. Or ideally, Track Changes would be turned on to make it easier to see what changes were made and who made them. Comments can also be added to explain why changes were made.
What is even better is to get people to send marked-up PDF files that contain their proposed changes. PDF files cannot be edited, which is good because it avoids the kinds of problems that led you to write this question, and also the formatting changes they made will not appear differently on another computer. However, this requires a certain amount of education so that everyone agrees to do it this way, and in my experience, that's not easy with a diverse group.
If you ever see that someone has made changes to the main document using LibreOffice, you or someone else needs to go back to the latest version not edited by LibreOffice and then use MS Word to incorporate all of the new changes.
At this point, if both suites have been used to edit the document, then I would probably start off with a new blank document and copy all of the text unformatted into it. This would require redoing all tables and other formatting. Otherwise, it's likely to be nearly impossible to get a clean document, and the underlying formatting may have no end to the number of problems that keep popping up.

Is there a text editor that will "pretty print" for display without changing the underlying text?

There are two competing requirements
1) I want to pretty print so that I can read and understand code written by my peers
2) I don't want to check in the pretty printed code to source control because...
It might not meet the coding standard where I work
When reviewing history in source control, formatting changes obscure the 'real' changes.
For discussion of problems checking in formatting changes, see this other stackoverflow question
Committing when changing source formatting?
At the same time, I still need to be able to edit the file and save changes. I want to be able to edit the ugly text while I see the pretty text.
Does anyone know an editor with a feature like this?

Writing a macro

I'm mostly new to programming and so I have come here for some help.
Recently, at work, the program that we have used for years has drastically changed and all of our old file types are no longer supported. This has left us completely out in the cold as to how we can access our old files without using the older software. With that being said, here is my problem with macros that I'm in need of help:
I need to be able to open a file in a specific program, copy all the text in the file, paste the text into a new notepad document and then save the notepad file with file's original name as a simple text document. I need to do this to an entire folder (and eventually folders within a folder but that can wait for now)
If I need to clarify anything let me know. Like I said, I'm new to this stuff and I'd appreciate any tips you guys could give me.
Since you mention Notepad, I'll assume that you are working in Windows. In that case, you're probably best off writing it in PowerScript. I don't have the skills for that, but if you add "Windows" and "PowerScript" to the tags, you may have a better chance of find someone sho does. (You may want to try this question over at SuperUser)

Diff/Compare Tool That Lets Me Write Comments On Differing Lines

I'm looking for a diff/compare tool that shows differing lines from two text files, and gives me a space to comment on those files. Ideally this application would have three panes, pane one would be file A, pane two would file B and pane three would be a comment I can enter to on why the files are different.
We're going to be using this diff tool to compare test and production environments. Sometimes it'll be justifiable that the two files are different but we need to have a space to explain why. I'd rather not write those comments in the files themselves.
I've used TortosieMerge, WinDiff and Beyond Compare. I like beyond compare the most because it lets me see the whole file, just the differing lines or the differing lines in context.
Tools that sit inside Visual Studio or eclipse are fine too.
It sounds to me like you might want to use a code review tool for this (even if you're not really performing code reviews). They record diffs in a database and allow comments on those diffs.
A couple free ones are:
CodeStriker - I've used this and it works pretty well, but required more tweaking and mucking around in Perl that I'd have liked (that was a while ago, though)
Review Board - never used this, but it sure looks nice. I'm trying to get it installed at my current place of work.
I would like to suggest a high-efficiency software CodeGen to you. It's not only include TextCompare tools, but another Codec/Database tools is supported as well.
For more detail, please kindly access the Github repository.
https://github.com/work7z/CodeGen

How to highlight the differences between two versions of a text in .NET web app?

I have been supporting a web application at work for our Call Center unit for about 2 years now. The app is written in ASP.NET 3.5 with SQL server 2005 database. I’ve been asked to expand the call detail section to allow agents to edit the current call note with the ability to revert back to its previous version. Now, that’s all cool but now the manager wants to be able to click on any particular note and see all edits with changes highlighted in yellow (and if something was deleted, he wants to SEE the deleted text crossed out). Actually, what I need is very similar to how Stackoverflow handles edits on their questions. I’ve been thinking about how to go about this and after doing research and Google-ing of course, I am still unsure which route to take. I am fairly new to .NET development. Any ideas on the best technique for highlighting the changes in UI? I am afraid I am going to have to store a copy of the entire note each time they make a change because the manager wants to be able to easily review notes and revert back to ANY version (not just the most recent one) before sending the monthly call report off to our VIP customers. Since this department OFTEN changes their mind on things, I want to make sure the new functionality is scalable and easy to maintain. Any ideas would be greatly appreciated. I am really just looking for someone to point me in the right direction; maybe there are some tools out there that can be useful, recommended keywords in Google lookup, etc.
This will be difficult do to.
You'll need a "text editor" control that can not only edit the text, but which can also tell you what changes were made.
You then need to store not only the final text string, but also the list of changes
You'll then need to be able to display the text plus changes, using strike-outs, and different colors for inserts vs. changes
You'll need to do this not only for the changes of a single user, but you'll need to store each users' changes in the database, and will need to be able to display all the changes, all at once.
Your manager should be really sure he needs this.
Some tools for doing the diff for you can be found at Any decent text diff/merge engine for .NET?.
This would entail storing every version like you say. This should allow you to implement it similarly to SO. I seem to recall reading or hearing Jeff mention it, but wasn't able to find it, likely in one of the SO podcasts.
Easiest would be to store the text for each revision, then when the user wants to see the diff use a diff tool to generate the highlighted text.
Here is some Javascript diff code:
http://ejohn.org/projects/javascript-diff-algorithm/
If all the computers have Word installed you may be able to use a Word control to accomplish this. TortoiseSVN has scripts in its program directory which can take two word documents and produce a document with changes highlighted. To see this create c:\aaa.doc and bbb.doc, then install TortoiseSVN and run:
wscript.exe "C:\program files\tortoisesvn\Diff-Scripts\diff-doc.js" c:\aaa.doc c:\bbb.doc //E:javascript
I think you should see http://en.wikipedia.org/wiki/Revision_control