MS Word (2007) - increased file size after removing content - ms-word

MS Word (2007 in my case, but I had that experience also with 2010, didn't use 2013 yet) surprises me with the file size it uses - I have a standard .docx of 96 kB, after changing one character (a 7 to a 6) and saving again, it had 101 kB. I had in mind that Word sometimes saves additional information, so I searched a bit and found that in the Office button menu (the round button in the upper left corner) there is Prepare and then Inspect Document. I chose to have the Properties removed and also Header and Footers. Then, after saving the file size was 104 kB.
So, what is MS Word doing when saving documents after small changes or deleting content, that file size can increase afterwards. And how to get rid of this behaviour.

Word file sizes can increase if there's "dross" in the file: sometimes, a document becomes damaged and left-overs accumulate. If the damage is not critical, Word will work around it, but the "bad" information often remains in the file. Under some circumstances, Word encounters the problem every time it saves, which will cause file size to increase.
It can help to save the document to another file format, such as RTF, HTML or an earlier version of Word, then opening that file in Word. Another thing you can try is to copy/paste the content to a new document WITHOUT any section breaks and WITHOUT the last paragraph mark (because "dross" often accumulates in the non-visible section information).
But these attempts should always be done on a COPY of the document because information can get lost in the dual conversion process.

According to support.microsoft.com/en-us/kb/111277, the file size of your Word document may increase unexpectedly in the following situations:
Allow Fast Saves option is turned on. A fast save appends the changes to the end of your document, which increases the size of the document. By contrast, when you turn off the Allow Fast Saves option and save the document, Word performs a full save, which incorporates all your revisions (instead of appending them). If you perform a full save after a file was fast saved, Word reduces the size of the file.
Note Even with the Allow Fast Saves option turned on, Word periodically performs a full save of your document. As a result, the file size of your document may change substantially between save operations.
The option to Embed TrueType Fonts is selected. To check this, on the Tools menu, click Options and then click the Save tab.
You are automatically saving versions of a document. On the File menu, click versions. Check to see whether Automatically Save a Version on Close is selected.
If you open a document from a previous version of Word, Word may temporarily allocate more disk space for the document than is actually necessary.

Related

Is there an Emacs read-only or view mode that allows inserting some text?

Here's the use case: I'm writing a novel in Emacs (in org-mode). One part of my writing/editing flow is to read over some large portion of what I've written, collecting notes/possible edits/etc as I go. The sort of thing you'd do, on paper, by printing it all out and then writing notes in the margin.
I want to prevent myself from, as I do this kind of review, actually doing any writing -- but that's surprisingly hard. Like, if the buffer is editable, I start to type a brief note about a fix, then find myself starting to restructure / fix a sentence, and next thing I know, I've spent five minute polishing
a single paragraph. This not only slows me down, it breaks my ability to imagine a reader's response.
I've tried just putting the buffer in view-mode, and that sort of works -- but then it's laborious to try to identify the places I want to go back and review/fix up.
My ideal would be, to have something in view-mode, which I genuinely can't edit, but which, as I move the cursor through it, I could hit some key combination, and it would allow me to enter a brief note in the minibuffer, which would then get inserted into the main buffer, at point, possibly inside brackets or a comment or some such.
Does anyone know of something like that? Or have any pointers to something similar which I could try to adapt?
You can easily set bookmarks at any locations. And bookmarks can contain annotations.
If you use library Bookmark+:
The annotations are in Org Mode by default, and they can even be separate files (by default they are part of the bookmarks themselves, so stored in your bookmarks file).
You can bookmark not just a position but also a region of text, whether a sentence, paragraph, page, or an arbitrary span of text.
You can automatically name bookmarks as you set them, if you don't care about the names.
Updated after OP's comment saying "I prefer to shove the comments/questions/notes directly into the text of the novel. Because I end up adding/deleting/moving text a ton, and I want the notes to move with the text":
Bookmarks move with the surrounding text. That is, they generally get relocated automatically, since the surrounding text is recorded as part of the bookmark, and when jumping to a bookmark that text is looked for.
Occasionally the context has changed so much that a bookmark can't be relocated automatically, and you are prompted to relocate it manually.
But yes, bookmarks are stored in a bookmark file, separately from the files they target. There are both advantages and disadvantages to this feature. Advantages include (1) removing clutter from the text (annotations, including notes about possible text changes are metadata), (2) immediate access to particular text locations from anywhere, (3) a separate, persistent record/history of work or thoughts on it, (4) you can have multiple, separate sets of bookmarks/annotations for the same target text.
One thing you might find handy, when using bookmarks especially for annotating a particular file: C-x p C-l switches to a bookmark file that has only bookmarks for the current file or buffer, creating such a file on the fly if none
exists. (This is available only with Bookmark+.)

VSCode: activeTextEditor encoding

Is there any way to get current document encoding (that is in the bottom bar) in my extension code?
Something like vscode.window.activeTextEditor.encoding
This does not appear to be possible.
Since it's nearly impossible to prove a negative, the rest of this answer documents what I explored.
The string "encoding" does not appear (in this sense) anywhere in the API docs nor in the index.d.ts file it is derived from. (With VSCode 1.37.1, current as of writing.)
I dug into the vscode sources to see if there might be a clever solution, but came up empty. The code that executes when the encoding is changed by the user is in editorStatus.ts, class ChangeEncodingAction. This makes its way to textFileEditorModel.ts, function updatePreferredEncoding, which sets preferredEncoding. That field controls what happens when the file is saved, and is used to populate the status indicator, but doesn't go anywhere else I can find.
Reading the status indicator itself does not appear possible since the API allows extensions to create new indicators with window.createStatusBarItem but not enumerate existing ones. And directly accessing the DOM is not possible.
I also came up empty searching through VSCode issues related to encoding, both open and closed, but only skimmed the most recent ~100 closed issue titles.
Alternatives
My main suggestion at this point would be to file an enhancement request on the VSCode github.
It should also be possible to do something with reflection but of course it would be fragile.
Finally, the encoding controls how the document in memory (a sequence of characters) maps to a file on disk (a sequence of bytes). Depending on what you're trying to do, it might work to speculatively encode the document in several encodings and compare each to what is on disk (so long as the file is not dirty).

Prevent Word 2010 from saving o:gfxdata base64 or uuencoded VML?

I am working with .docx files containing several drawing canvases with images inserted and some lines and arrows drawn in Word 2010. I am using 2010 format with no compatibility mode.
Word inserts an o:gfxdata attribute into each v:shape and v:group element and fills it with ascii encoded something. From what I have read it may be a copy of the VML describing the v:shape or v:group. I don't know if I just don't know what to look for, but I cannot determine what this data is for as its removal has no apparent effect on my ability to read or edit the document in Word 2003, 2007, or 2010.
It does swell the document.xml to almost twice the (apparent) necessary size. This considerably slows OpenTBS' processing so I would like to remove it, if possible. Does anyone know of a way to tell Word 2010 to quit saving this extra data? Or what it is for? I have really struggled to find any documentation on it beyond this post.
Edit:
Here is a sample .docx. The document.xml is ~141KB and OpenTBS takes an average of 10.35 seconds to create a file that includes this as a subtemplate 21 times. If I remove all of the o:ogfxdata attributes, the file size is reduced to ~37KB and OpenTBS takes only 2.99 seconds to produce the same file.
Edit 2:
After further investigation, it appears the removal of the o:gfxdata may cause Word 2003 with an older Compatibilty Pack installed, to object to the file with the following error:
"This is a pre-release version of the Compatibility Pack and can open
pre-release Office 2007 files only. Do you want to check for a newer
version of the Compatibility Pack?"
I have been able to open the file by installing a newer compatibility pack - though it prompts the user about the incompatibility and converts the file in order to open it. This does not damage my file, but it is something to look out for.
Attribute o:ogfxdata is poorly documented in the web.
According to your investigations, it's some kind of compatibility extra information.
You can delete those attributes in your template using OpenTBS.
The cleaning can be done once on your template without any merging, and then save the cleaned template as a new template. Or you can perform the cleaning each time you open the template.
Cleaning the DOCX file:
while ($x = clsTbsXmlLoc::FindStartTagHavingAtt($TBS->Source, 'o:gfxdata', 0) ) {
$x->ReplaceAtt('o:gfxdata', '');
$TBS->Source = str_replace(' o:gfxdata=""', '', $TBS->Source);
}
Note that the class clsTbsXmlLoc is provided with OpenTBS and is undocumented.
The code should work since OpenTBS 1.8.0. (which is currently in stable beta version).
I've noticed that since attributes o:gfxdata are deleted, they do not come back immediately when you edit the docx.

A rotating log file in perl

I have implemented a log file that will be storing the cpu and memory state of a process after every minute.I have limited the maximum size of the file to 3MB (thats enough for my purpose).
The script will be called by a cron job after every minute and the script will log the details for that minute and will rename the file as "Log_.log".
When the size reaches "3MB - 100 bytes" I reset the file pointer to point to the begining and will overwrite the first entry in the log file and will now rename the file as "Log_<0+some offset>.log".
As I am renaming the file after every minute to update the file pointer position, is it a good/efficient way ?
I do not want to maintain more than one log file for this purpose.
Another option for me is to maintain the file pointer position in a file ,but ....another file !! not interested in maintaining one if this option is good :)
Thanks in Advance.
Are you an engineer? This is a nice example of some simple task, solved by a perfectly working but overly complex solution.
Unless the content you put in takes exactly as many bytes as the content you take out, writing "in" a file will actually cause the whole following part after your writing position to be rewritten to disk. Append is much cheaper.
Renaming the file to store the pointer works - but it's not very elegant, and makes stuff more complex (for one, your process needs write rights to the directory in which the file resides - else just write access to two files is sufficient)
Unless disk space is an issue (and really, it rarely is), your approach is less efficient than say, append everything to a file, and rotate the file when it reaches its maximum size. This way you always have the last 3MB of logs available, and maximum 3MB more in your current file. It will make parsing the file a lot easier too, instead of recalculating the entire pointer position thing.
Update to answer your comment:
Renaming a file every minute (or even every second) shouldn't slow down your system significantly, don't worry about that.
Our concerns are mainly with "why you think you need to rename the file". It's not better technically, it's not better from a logical point of view, it makes a lot of other (future) tasks harder. You could store the file pointer in a seperate file, or at the end of your file, and there are better^H^H^H^H^H^H simpler solutions that don't require the file pointer at all.
I'm confused why you would rename your file. What does this accomplish?
Are the log entries fixed size? Or variable size?
If the entries are fixed size, then there is no trouble in re-writing the existing file from the start: you won't ever have incomplete entries in your file, and if you are writing a counter or timestamps to the file, it should be clear where the 'cursor' is located.
If the entries are variable size, then you should probably not begin re-writing the file from the beginning without somehow making it clear where the 'cursor' is located in the file, and write code that is resilient to reading truncated log entries.
Can you re-use existing tools such as RRDtool?

Word Document with images File size

I am actually placing screen shots into the MS word document. When i save the document am not sure of which format the image is being considered. The size of the Document is becoming very large. Is there any option in MS word to save the document as a smaller file.
Whilst this isn't a programming-related question, I'll attempt to answer what I think is your question.
All images saved in Word are stored at their original resolution, at their original size, regardless of any resizing/cropping that is performed in Word itself. If you want to reduce file-size, crop/compress the images externally before inserting them into a Word document.
Look at Word Tips sites like
http://www.klariti.com/microsoft-word/Reduce-Microsoft-Word-File-Size3.shtml
and apply the solution.
A quick technique i use to reduce the size is open up MSPaint via Start Menu > Run ...and copy the screenshot in there. Then you can save the file as a JPG, and Insert the JPG into your word document by accessing Insert > Picture > From File.