I'm in the process of updating a heap of manuals which for whatever reason have Heading Formats applied to them. The empty lines sometimes appear in the table of contents which is annoying the end recipients somewhat.
The documents have a mix of styles I need to replace, ie the Paragraphs in the boxes below, but I can't work out a way to find an empty line with a specific style that may or may not differ from the previous line.
I'm getting the impression it's not doable but thought I'd ask here before doing it manually. The main effect is on navigation but occasionally TOC, usually when compatibility is maintained.
Related
I love Emacs and Org-Mode. But I can only stand to use Org Mode in the clean view (or whatever it's called - with org-indent-mode on).
My problem is that I often want to use headers that don't have a bullet in front of them. I want one asterisk to be the start of the list, not the header.
Example:
List 1
List 2
Header 1
List 3
List 4
But when I try to do this, Header 1 gets indented to the level of List 2.
I know just turning off org-indent-mode and getting used to that is one solution. But is there a way to reset the indentation for Header 1?
The things you are talking about changing are pretty fundamental to org-mode; basically you are trying to change the org-mode syntax. The reason why Header 1 in your example is not being dedented, is that org-mode does not see it as a headline, because headlines by definition start with leading stars. Also, while it is technically supported to use * to identify a plain list item, this is not recommended, and can cause some unexpected behavior (see footnote 1 here).
That being said, you can have some control over the appearance of headlines. For example, you can use the org-bullets package. You can then define the bullets to use in place of * like this:
(setq org-bullets-bullet-list
'("◉" "◎" "⚫" "○" "►" "◇"))
which will define the bullets used for the first six levels of headlines. You can replace the bullets in that list with other utf-8 symbols, and you can even use " " as one of the symbols, so that your Headlines will be preceded by a single space. However, note that this only affects the way headlines are displayed; they will still be preceded by * in the actual file.
I know it is not very helpful, but my overall suggestion would be to stick with the org-mode syntax if you want to use org-mode, i.e., use a structure like this:
- List one
- List two
* Header 1
- List three
- List four
with * starting a headline, and - starting a plain list. Since org-mode files are just plain text, the magic of that mode depends heavily on those files having a set structure. In my own experience, if you try to change that structure (another example is changing timestamp formats), it will cause more headaches than it relieves, and cause a lot of the functionality that makes org-mode so great to break.
Just as a side note: I prefer a cleaner view as well, and one option I like to enable in addition to org-indent-mode is (setq org-hide-leading-stars t), which will display only a single star/bullet per headline (although the leading stars will still be present in the actual text file).
How can you work with the two sides of a bilingual/parallel text?
I know how to run diff and ediff on a text to spot little differences, but a bilingual text will have two completely different sides (expect for the paragraphs, number of chapters and other structural elements like notes). End of line and end of paragraph are certainly useful to mark the units of the two sides.
Is it possible to open two buffers, side by side, and tell what matches what?
This is a hard problem but I dug up an old blog post I read awhile ago that is relevant (and even mentions emacs for preprocessing):
https://languagefixation.wordpress.com/2011/02/09/how-to-create-parallel-texts-for-language-learning-part-1/
Especially check out part2
Beyond that, my suggestion is twofold:
1) Operate on small parts at a time (a chapter or less) and not an entire book
2) Utilize alignment tools available to generate metadata which emacs uses to just 'prettify' the buffer
As there's no existing solution (that I know of or can find), you'll have to get dirty with elisp and create a major or minor mode to colorize matching segments and/or navigate segments.
Quick Hack
However, I hacked some elisp together that takes preprocessed text and uses emacs concept of 'paragraph' and 'sentence' to colorize the buffer; it's a little verbose so I stuck it in a gist:
https://gist.github.com/terranpro/3175bb9f3ed00b3a145c
It's pretty ugly but should give you a start; just run it once in each of the text buffers. But be aware that you'll need to have the text already ready in terms of emacs' paragraphs and sentences (two spaces after a period!!). Hope it gives you a decent starting point.
fname = dir('*sir');
dayH = zeros(length(fname),1360,3600);
for i=1:length(fname)
dayH(i,:,:) = loadsir(fname(i).name);
end
fname = dir('*sir');
dayH = cell(1,length(fname));
for i=1:2
dayH{i} = loadsir(fname(i).name);
end
Basically it loads all my files. I have a separate .m file called loadsir that loads those specialized files. The output of the .sir files will be an array 1360x3600.
Right now that code is crashing saying, "Cannot display summaries of variables with more than 524288 elements." I guess it's because 1360X3600 = 5,000,000 about?
Putting Serg's comment as an answer:
Most likely you missed a semicolon (;) somewhere in loadsir. Matlab then thinks you want to print the output, which it won't do due to the large number of elements.
Additionally, to prevent such things from happening in the future:
Matlab is an interpreted language, meaning, no compilation is necessary. Any and all code can be parsed while you type it, which allows for things like auto-correct. Of course, this sort of thing is included already in standard Matlab. If you don't already, code in Matlab's own editor every now and then. It warns you of such silly mistakes/errors (and a lot more), including but not limited to, via the right vertical bar in the editor. The little square at the top right of the window should always be green. If it's orange or red, there's things to be improved or corrected, respectively.
The right vertical bar is an overview of all the lines in your file that leave room for improvement. If a small orange/red bar appears somewhere, a mouseover will tell you what's wrong with what line. Clicking it will navigate the editor to the line, which will likely be wavy-underlined in either orange or red. Mouse-over the line often gives useful suggestions, and <alt>+<enter> is often enough to fix the simple mistakes. I find it an indispensable tool when developing larger applications in Matlab.
You can of course configure which errors/warnings this tool ("code analyzer", formerly "mlint") displays. Sometimes, there will be a warning about an inefficiency that you simply cannot work around. Add an OK-directive behind the line to suppress it (%#ok), but don't make a habit of suppressing anything and everything "annoying" because that will of course completely beat the purpose of the code analyzer :)
For the past year and a half, I've maintained a monolithic buffer in Org Mode for my engineering notes with my current employer. Despite containing mostly pointers to other documents, this file has become quite large by human standards (48,290 lines of text), while remaining trivially searchable and editable through programmatic means (read: grep and Org Mode tag search).
One thing bothers me, though. When I perform a tag search using Org Mode 6.33x, Org's sparse tree view retains the folded representation of unmatched trees within the buffer (that is, content preceded by a single asterisk, *). This is generally useful for smaller buffers or those better organized into a single tree with multiple branches. However, this doesn't work especially well for documentation where each new tree is generated chronologically, one for each day, as I've been doing.
.
Before I continue, I'll note that my workaround is inherent in what I've just asked, as are the obvious alterations in my documentation habits with this buffer. However, the following questions remain:
1) Why does Org Mode organize trees in this manner when performing sparse tag searching? The technical details are self-evident, the UX decisions less so.
2) If I wished to correct this issue with a script written in Emacs Lisp, what hooks and commands should I explore in more detail to restructure the document view? Writing overrides for the standard commands (for example, org-match-sparse-tree) is already self-evident.
.
Thank you in advance.
As you already noticed the problem only affects the top level headings. The good thing is that in org-mode you can demote easily all headings with simple keystrokes. This way you can avoid the problem. Also cleaning up afterwards are just some simple keystrokes.
Step-by-step instructions:
Mark the full buffer
Call M-right (for outline-demote)
Input * root\n at the beginning of the file
Now, build up your subtree and do what you want with it.
When done you can remove * root\n at the beginning of the file and promote the headings again with M-left
I have got the impression that you can even leave the overall-heading where it is for your application.
My goal is coming up with a script to track the point a line was added, even if the line is subsequently modified or moved around (both of which confuse traditional vcs 'blame' scripts. I've done some minor background research (see bottom) but didn't find anything useful. I have a concept for how to proceed but the runtime would be atrocious (there's a factorial involved).
The two missing features are tracking edited-in-place lines separate from a deletion-and-addition of that line, and tracking entire functions moved around so they're in different hunks. For those experienced with diff but unfamiliar with the terminology, a subsequence is a contiguous group of + or - lines, with a type of either delete (all -), add (all +), or replace (a combination). I need more information, on moves and edit-in-place lines, vaguely alluded to in an entry on c2: DiffAlgorithm (paragraph starts with "My favorite mode"). Does anyone know what that is? (seems to be based on Tichy, see bottom.)
Here's more info on the two missing features:
no concept of a change on a line, (a fourth type, something like edit-in-place). In this hunk, the parent of 'bc' is 'b' but 'd' is new and isn't a descendant of 'b':
a
-b
+bc
+d
The workaround for this isn't too complicated, if the position of edits is the same (just an expanded version of markup_instraline_changes but comparing edit distance on all equal-sized subsets of old and new lines.
no concept of "moving" code that preserves the ownership of the lines, e.g. this diff shouldn't alter the ownership of "line", although its position changes.
a
-line
c
+line
This could be dealt with in the same way but with much worse runtime (instead of only checking single blocks marked 'replace', you'd need to check Levenshtein distance between all added against all removed lines) and with likely false positives (some, like whitespace-only lines, aren't relevant to my problem).
Research I've done: reading about gestalt pattern matching (Ratcliff and Obershelp, used in Python's difflib) and An O(ND) Difference Algorithm and its Variations (EW Myers).
After posting the question, I found references to Tichy84 which appears to be The string-to-string correction problem with block moves (which I haven't read yet) according to Walter Tichy's paper a year later on RCS
You appear to be interested in origin tracking, the problem of tracing where a line came from.
Ideally, you'd instrument the editor to remember how things were edited, and store the edits with the text in your repository, thus solving the problem trivially, but none of us software engineers seem to be smart enough to implement this simple idea.
As a weak substitute, one can look at a sequence of source code revisions from the repository and reconstruct a "plausible" history of changes. This is what you seem to be doing by proposing the use of "diff". As you've noted, diff doesn't understand the idea of "moving" or "copying".
SD Smart Differencer tools compare source text by parsing the text according to the langauge it is in, discovering the code structures, and computing least-Levensthein differences in terms of programming language constructs (identifiers, expressions, statements, blocks, classes, ...) and abstract editing operators "insert", "delete", "copy", "move" and "rename identifier within a scope". They produce diff-like output, a little richer because they tell you line/column -> line/column with different editing operations.
Obviously the "move" and "copy" edits are the ones most interesting to you in terms of tracking specific lines (well, specific language constructs). Our experience is that code goes through lots of copy and edits, too, which I suspect won't surprise you.
These tools are in Beta, and are presently available for COBOL, Java and C#. Lots of other langauges are in the pipe, because the SmartDifferencer is built on top of a langauge-parameterized infrastructure, DMS Software Reengineering Toolkit, which has quite a number of already existing, robust langauge grammars.
I think the idea of what amount of editing a line that can be done while it remains a descendent of some previously written line is very subjective, and based on context, both things that a computer cannot work with. You'd have to specify some sort of configurable minimum similarity on lines in your program I think... The other problem is that it is entirely possible for two identical lines to be written completely independently (for example incrementing the value of some variable), and this will be be quite a common thing, so your desired algorithm won't really give truthful or useful information about a line quite often.
I would like to suggest an algorithm for this though (which makes tons of hopefully obvious assumptions by the way) so here goes:
Convert both texts to lists of lines
Copy the lists and Strip all whitespace from inside of each line
Delete blank lines from both lists
Repeat
Do a Levenshtein distance from the old to new lists ...
... keeping all intermediate data
Find all lines in the new text that were matched with old lines
Mark the line in both new/old original lists as having been matched
Delete the line from the new text (the copy)
Optional: If some matched lines are in a contiguous sequence ...
... in either original text assign them to a grouping as well!
Until there is nothing left but unmatchable lines in the new text
Group together sequences of unmatched lines in both old and new texts ...
... which are contiguous in the original text
Attribute each with the line match before and after
Run through all groups in old text
If any match before and after attributes with new text groups for each
//If they are inside the same area basically
Concatenate all the lines in both groups (separately and in order)
Include a character to represent where the line breaks are
Repeat
Do a Levenshtein distance on these concatenations
If there are any significantly similar subsequences found
//I can't really define this but basically a high proportion
//of matches throughout all lines involved on both sides
For each matched subsequence
Find suitable newline spots to delimit the subsequence
Mark these lines matched in the original text
//Warning splitting+merging of lines possible
//No 1-to-1 correspondence of lines here!
Delete the subsequence from the new text group concat
Delete also from the new text working list of lines
Until there are no significantly similar subsequences found
Optional: Regroup based on remaining unmatched lines and repeat last step
//Not sure if there's any point in trying that at the moment
Concatenate the ENTIRE list of whitespaced-removed lines in the old text
Concatenate the lines in new text also (should only be unmatched ones left)
//Newline character added in both cases
Repeat
Do Levenshtein distance on these concatenations
Match similar subsequences in the same way as earlier on
//Don't need to worry deleting from list of new lines any more though
//Similarity criteria should be a fair bit stricter here to avoid
// spurious matchings. Already matched lines in old text might have
// even higher strictness, since all of copy/edit/move would be rare
While you still have matchings
//Anything left unmatched in the old text is deleted stuff
//Anything left unmatched in the new text is newly written by the author
Print out some output to show all the comparing results!
Well, hopefully you can see the basics of what I mean with that completely untested algorithm. Find obvious matches first, and verbatim moves of chunks of decreasing size, then compare stuff that's likely to be similar, then look for anything else which is similar, but both modified and moved: probably just coincidentally similar.
Well, if you try implementing this, tell me how it works out, and what details you changed, and what kind of assignments you made to the various variables involved... I expect there will be some test cases where it works brilliantly and others where it just abyssmally fails due to some massive oversight. The idea is that most stuff will be matched before you get to the inefficient final loop, and indeed the previous one