How to diff rewrapped text? - diff

When editing documents I always stick to a certain line width of max 80 or 150 characters, depends what I am writing (code, text, etc.). If I change only a little the whole paragraph will shift and hence multiple lines are now in different order to optimal fit for the given line width. How do I diff this to see the actual real change an not the rewrapping artifacts?
Example, textwidth=30:
The actual changes are rather tiny:
line 9 insert: "Now I change a little"
line 15 insert: "Fill in here something and write totally new stuff with much more lines. "
line 18 change: s/Duis/TYPO/
The fact that I use (g)vimdiff here is of no matter, if other software can accomplish the desired diff.
Of course software is designed to wrap automatically when text reaches window borders, so I also tried to use just line breaks in the end of a paragraph. The reason why this is not good is, that automatically diffs are line based, and for small changes in paragraphs I get the whole line, meaning then the whole paragraph as diff update :(.

GNU wdiff does a word-by-word diff, not treating spaces and new lines any differently. One can even find vim syntax files for it (e.g. here).
$ cat file1
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Aenean vel molestie
nulla. Pellentesque placerat lacus vel
eros malesuada tristique. Nulla vitae
volutpat justo. Donec est mauris,
$ cat file2
Lorem amet, consectetur adipiscing some
inserted text! elit. Aenean vel molestie
nulla. Pellentesque placerat lacus vel
eros malesuada replacement. Nulla vitae
volutpat justo. Donec est mauris,
$ wdiff file1 file2
Lorem [-ipsum dolor sit-] amet, consectetur
adipiscing {+some inserted text!+} elit. Aenean vel molestie
nulla. Pellentesque placerat lacus vel
eros malesuada [-tristique.-] {+replacement.+} Nulla vitae
volutpat justo. Donec est mauris
([- ... -] is deleted text, {+ ... +} is inserted text).
(There are other diff programs that do a similar thing: e.g. adiff, and maybe some of the ones listed in https://stackoverflow.com/questions/12625/best-diff-tool)

I like Beyond Compare for this kind of side-by-side file comparison. Also lets you do folder comparisons and bit-level comparisons, and you can right-click to select the left-hand file to compare, then another to select the right-hand one; or select two files and right-click Compare to bring them both up straight away.

I use DiffMerge which is free and available on many platforms.

Related

Dart: Is there a way to split strings into sentences without using Dart's split method?

I'm looking to split a paragraph of text into individual sentences using Dart. The problem I am having is that sentences can end in a number of punctuation marks (e.g. '.', '!', '?') and in some cases (such as the Japanese language), sentences can end in unique symbols (e.g. '。').
Additionally, Dart's split method removes the split value from the string. For example, 'Hello World!" becomes "Hello World" when using the code text.split('! ');
I've looked around at Dart packages available but I'm unable to find anything that does what I'm looking for.
Ideally, I'm looking for something similar to BreakIterator in Java which allows the programmer to define which locale they wish to use when detecting punctuation and also maintains the punctuation mark when splitting the string into sentences. I'm happy to use a solution in Dart that doesn't automatically detect sentence endings based on Locale but if this isn't available I would like to have the ability to define all sentence endings to look for when splitting a string.
Any help is appreciated. Thank you in advance.
it can be done using regex, something like this:
String str1 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vulputate odio eros, sit amet ultrices ipsum auctor sed. Mauris in faucibus elit. Nulla quam orci? ultrices a leo a, feugiat pharetra ex. Nunc et ipsum lorem. Integer quis congue nisi! In et sem eget leo ullamcorper consectetur dignissim vitae massa。Nam quis erat ac tellus laoreet posuere. Vivamus eget sapien eget neque euismod mollis.";
// regular expression:
RegExp re = new RegExp(r"(\w|\s|,|')+[。.?!]*\s*");
// get all the matches:
Iterable matches = re.allMatches(str1);
// Iterate all matches:
for (Match m in matches) {
String match = m.group(0);
print("match: $match");
}
output:
// match: Lorem ipsum dolor sit amet, consectetur adipiscing elit.
// match: In vulputate odio eros, sit amet ultrices ipsum auctor sed.
// match: Mauris in faucibus elit.
// match: Nulla quam orci?
// match: ultrices a leo a, feugiat pharetra ex.
// match: Nunc et ipsum lorem.
// match: Integer quis congue nisi!
// match: In et sem eget leo ullamcorper consectetur dignissim vitae massa。
// match: Nam quis erat ac tellus laoreet posuere.
// match: Vivamus eget sapien eget neque euismod mollis.

Is it possible to crop an Element using Elm's std library? If not, is there a 3rd party lib for this?

I often run across a situation where I've constructed an Element or Form and wish to crop the view down to a given area (i.e. for scrolling within a smaller rectangle) though I haven't been able to find any methods for this within their respective modules.
Is it possible to do this using Elm's std library? If not, are there any 3rd-party libraries capable of doing this?
Otherwise, perhaps there is a better way of achieving this?
Any help or suggestions appreciated!
No scollbars (using the std library)
I can't find a way to crop but have scrollbars with the current Graphics.Element. What is possible is to crop without having scrollbars, either through a container that's smaller than it's contents or by resizing an element with size. I think the container way is more robust, as resizing an image will actually warp the image.
Here's an example:
import Graphics.Element exposing (..)
import Text
string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec consectetur ante hendrerit. Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean ut gravida lorem. Ut turpis felis, pulvinar a semper sed, adipiscing id dolor. Pellentesque auctor nisi id magna consequat sagittis. Curabitur dapibus enim sit amet elit pharetra tincidunt feugiat nisl imperdiet. Ut convallis libero in urna ultrices accumsan. Donec sed odio eros. Donec viverra mi quis quam pulvinar at malesuada arcu rhoncus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. In rutrum accumsan ultricies. Mauris vitae nisi at sem facilisis semper ac in est."
main : Element
main =
let element =
leftAligned (Text.fromString string)
|> container 400 300 topLeft
in container 205 200 topLeft element
Scrollbars (via "3rd party" library)
If you want scrollbars, you'll probably need at least a little bit of html from elm-html. Note that the author of the library is also the author of Elm, so it's not quite 3rd party :P . You can keep it minimal by using conversions to html and from html and wrapping it in a div with style attributes that define the smaller size and the right overflow property. As long as that div has a known size, it should be easy to convert back to an Element.

How do I compose format=flowed emails that include hanging indents with vim?

Is there a good way to configure vim to send format=flowed emails that include hanging indents?
My complete vimrc (for testing purposes) is:
set nocompatible
set fo+=awn
set tw=72
set ai
I'm typing something like:
1. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam
posuere dui lorem, et condimentum nulla. Sed pharetra justo nec ante
fringilla non mattis nisi blandit. Donec molestie ligula dolor.
Nulla facilisi. Aliquam vel nulla elit, mollis facilisis metus. Sed
id eros a ante blandit convallis id sit amet elit. Duis malesuada
lobortis leo a placerat. Sed ut ipsum nisl. Sed pretium mauris vitae
velit sollicitudin iaculis.
vim adds a trailing space to each line except the last, per set fo+=w. It also adds spaces for the hanging indent. It looks great!
My mail client sets the format=flowed header. The result when this email is viewed in either Mail.app or mutt is not pretty:
1. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam posuere dui lorem, et condimentum nulla. Sed pharetra justo nec ante fringilla non mattis nisi blandit. Donec molestie ligula dolor. Nulla facilisi. Aliquam vel nulla elit, mollis facilisis metus. Sed id eros a ante blandit convallis id sit amet elit. Duis malesuada lobortis leo a placerat. Sed ut ipsum nisl. Sed pretium mauris vitae velit sollicitudin iaculis.
The paragraph wraps correctly, in the sense that resizing the reader client reflows it (which is not what you'll see here on stackoverflow, but you get the idea). The problem is, there are 5 spaces between "Etiam" and "posuere" and all the other lines that have been joined back together.
Is there a fix for this in vim? Or is this a limitation of the format=flowed spec? How do other people handle this?
The paragraph wraps correctly, in the sense that resizing the reader client reflows it (which is not what you'll see here on stackoverflow, but you get the idea). The problem is, there are 5 spaces between "Etiam" and "posuere" and all the other lines that have been joined back together.
This is a limitation of the "format=flowed" MIME parameter as specified in RFC 3676. There is nothing in the specification that would allow a client to recognize the leading spaces as ornaments intended only for plaintext versions of the mail.
Section 4.1 of the RFC states:
If the first character of a line is a space, the line has been space-stuffed (see Section 4.4). Logically, this leading space is deleted before examining the line further (that is, before checking for flowed).
The referenced "space-stuffing" from Section 4.4:
Space-stuffing adds a single space to the start of any line which needs protection when the message is generated. On reception, if the first character of a line is a space, it is logically deleted. This occurs after the test for a quoted line (which logically counts and deletes any quote marks), and before the test for a flowed line.
So an RFC 3676-compliant mail client would remove a single leading space from each line beginning with such a character and then (optionally) remove any the linebreaks that following a single space character. This process would not touch the remaining leading whitespace

selection of text and then split it into three parts with jQuery for AJAX

I would like to take a bunch of HTML inside a container(DIV) then let the user select part of it. Its not an "editable area" that I am looking for. As we dont want the user to be able to overwrite/change the text. Just mark it.
After the user has selected it, I would like to know what was selected BUT also WHERE that selected part IS.
Eg.
We have a bullet list and the users selects bulletline 3 and 4.
We have some Headline1 and three paragraphs. Then the user selectes part of the middle paragraph. I would like to know where in that paragraph.
I've research a little and from what I understand MSIE has a problem with selection, if it comes to the startPos and endPos of the selection.
Secondly, what if the marked text is multiple times inside the whole container?
Here is an example:
<div id="markable">
<h1>Here is a nice headline</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque non
tempor metus. Ut malesuada posuere nunc eu venenatis. Donec sagittis tempus
neque, tempus iaculis sapien consectetur id.</p>
<p>Nulla tempus porttitor pellentesque. Curabitur cursus dictum felis quis tempus.
Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia
Curae; Quisque fringilla massa id libero commodo venenatis.</p>
<ol>
<li>here is a bullet line #1</li>
<li>here is a bullet line #2</li>
<li>here is a bullet line #3</li>
<li>here is a bullet line #4</li>
<li>here is a bullet line #5</li>
<li>here is a bullet line #6</li>
</ol>
<h2>here is a sub-headline</h2>
<p>Aenean auctor fringilla dolor. Aenean pulvinar tortor sed lacus auctor cursus.
Sed sit amet imperdiet massa. Class aptent taciti sociosqu ad litora torquent
per conubia nostra, per inceptos himenaeos. Fusce lectus neque, rhoncus et
malesuada at, blandit at risus. Vivamus rhoncus ante vel erat mollis
consequat.</p>
</div>
The problem is here if the user selects "tempus" its not good enough to know the word, I also need to know WHICH of the words (what paragraph/headline/bullet element) it is.
Reason is that we want "readers" to be able to spot things of interest/attention. Sometimes whole paragraphs, sometimes just a single word or headline.
The perfect solution
would be if we somehow could detect in which "element" (counting from top I guess) in the DOM that is selected. Secondly how much (start and end point) inside that particular element.
Because then we could do some sort of Ajax back to our ASP.NET which tells the backend what has been marked and then do what ever ...
I've found some online-code editors that does a BUNCH of the above + a lot more than needed, but believe the solution is much more simple on this one. Just cant find the proper way to get started with a jQuery solution.
In hope of a jQuery Yoda reading this. :-)
Sorry this is only a partial answer this will give you indexes of elements in which the selection starts and ends in all browsers, but the offsets of the beginning and end of selection will only work in Gecko and WebKit browsers. IE only supports a TextRange object which is a bit of a mystery to me and a bit of a pain to work with (the link at the bottom of this answer has an example of implementation covering all browsers)
This solution returns indexes of elements that the selection contains (in relation to your #markable container) and indexes of start and end selection (in relation to their containing nodes).
In this example I am using events to capture the elements which contain the selection (this gets around browser differences) but you can easily do this without events as well (for Firefox, Opera, Chrome and Safari) as the Range object gives you anchorNode and focusNode which are the DOM nodes in which the selection starts and ends respectively (more info here http://www.quirksmode.org/dom/range_intro.html)
<html>
<head>
<script src="http://code.jquery.com/jquery-1.4.4.js"></script>
<script>
var end;
var start;
indexes = new Array();
var endIndex;
var startIndex;
$(document).ready(function(){
$("#markable").mouseup(function(event){
end = event.target;
indexes.push($("*", "#markable").index($(end)));
//normalize start and end just in case someone selects 'backwards'
indexes.sort(sortASC);
event.stopPropagation()
})
$("#markable").mousedown(function(event){
indexes.length=0;
start = event.target;
event.stopPropagation()
indexes.push($("*", "#markable").index($(start)));
})
$(".button").click(function(){
sel = getSel();
alert("Index of the element selection starts in (relative to #markable): "+indexes[0] +"\n" +
"Index of the of the beginning of selection in the start node: "+ sel.anchorOffset+"\n" +
"Index of the element selection ends in (relative to #markable): "+ indexes[1]+"\n"+
"Index of the of the end of selection in the end node: "+ sel.focusOffset)
})
})
function sortASC(a, b){ return (a-b); }
function sortDESC(a, b){ return (b-a); }
function getSel()
{
    var txt = '';
     if (window.getSelection)
    {
        txt = window.getSelection();
             }
    else if (document.getSelection)
    {
        txt = document.getSelection();
            }
    else if (document.selection)
    {
        txt = document.selection.createRange();
            }
    else return;
return txt;
}
</script>
</head>
<body>
<div id="markable">
<h1>Here is a nice headline</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque non
tempor metus. Ut malesuada posuere nunc eu venenatis. Donec sagittis tempus
neque, tempus iaculis sapien consectetur id.</p>
<p>Nulla tempus porttitor pellentesque. Curabitur cursus dictum felis quis tempus.
Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia
Curae; Quisque fringilla massa id libero commodo venenatis.</p>
<ol>
<li>here is a bullet line #1</li>
<li>here is a bullet line #2</li>
<li>here is a bullet line #3</li>
<li>here is a bullet line #4</li>
<li>here is a bullet line #5</li>
<li>here is a bullet line #6</li>
</ol>
<h2>here is a sub-headline</h2>
<p>Aenean auctor fringilla dolor. Aenean pulvinar tortor sed lacus auctor cursus.
Sed sit amet imperdiet massa. Class aptent taciti sociosqu ad litora torquent
per conubia nostra, per inceptos himenaeos. Fusce lectus neque, rhoncus et
malesuada at, blandit at risus. Vivamus rhoncus ante vel erat mollis
consequat.</p>
</div>
<input type=button class=button value="Get selection data">
</body>
</html>
And here is a link which will give you more of a cross browser solution (scroll down to example 2)
http://help.dottoro.com/ljjmnrqr.php
EDIT: For IE you need to use document.body.createTextRange() to get a text range. I am still not sure how you get the equivalent of anchorOffset but the following link might be helpful:
http://bytes.com/topic/javascript/answers/629503-ie-selection-range-set-range-start-click-position-get-char-offset
Here is a cross browser library that will do all you want for you
http://code.google.com/p/rangy/

Analyzing and storing text in a data structure

I hope you understand what I want to do. It is hard to choose the best words, because English is not my first language and I distrust automatic translators. I will try to explain as well as I can.
I was thinking about analyzing a long text. Suppose, for example, that I have a string divided into paragraphs.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla vitae elit libero, a pharetra augue. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras mattis consectetur purus sit amet fermentum.
Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur blandit tempus porttitor. Maecenas sed diam eget risus varius blandit sit amet non magna.
I would like to store this string in an array or something similar, in a way I can find the length or location of the two paragraphs very quickly. For example (pseudocode):
Array => {
paragraphs => {
"Lorem ipsum dolor sit amet, [...] fermentum.",
...
}
}
I don't really know whether this has a name. I suppose there is much theory about how to do this type of task. I am really interested in practices that take care about performance when processing a big amount of text. I would like to have something to study and read carefully.
Any help would be very appreciated. Thanks in advance,
—Alberto
Perhaps read into Apache's UIMA, it's all about analyzing unstructured information, text analysis being a major component of it.