Extract first 50 words from a string (Perl) [closed] - perl

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I write in Perl.
I need to split a string into first 50 words (or the entire text if there is less than 50 words total) and the rest words (empty string if the first is not above 50 words).
In the first part (first 50 words) and the second part (the rest) the word separators should be preserved: newline should remain newline and space should remain space.

Assuming that by word you mean just a sequence of non-whitespace characters, this can be done simply using a single regex. The one below looks for N-1 consecutive sequences of non-whitespace characters followed by whitespace characters, and then a further stretch of non-whitespace characters. This is the first part of the string. Any following whitespace is skipped, and then the rest of the string forms the second part.
I have used the /s modifier so that a dot . within the regex matches any characters, including newlines. The /x modifier allows for insignificant whitespace within the regex to make it more readable.
Thanks to #knarf for the data.
use strict;
use warnings;
my $text = 'Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Donec hendrerit tempor tellus. Donec pretium posuere
tellus. Proin quam nisl, tincidunt et, mattis eget, convallis nec,
purus. Cum sociis natoque penatibus et magnis dis parturient montes,
nascetur ridiculus mus. Nulla posuere. Donec vitae dolor. Nullam
tristique diam non turpis. Cras placerat accumsan nulla. Nullam
rutrum. Nam vestibulum accumsan nisl.';
my ($first, $rest) = wsplit($text, 50);
print $first, "\n\n";
print $rest, "\n";
sub wsplit {
my ($s, $n) = #_;
--$n;
$s =~ / ( (?: \S+ \s+ ){0,$n} \S+ ) \s* (.*) /xs;
}
output
Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Donec hendrerit tempor tellus. Donec pretium posuere
tellus. Proin quam nisl, tincidunt et, mattis eget, convallis nec,
purus. Cum sociis natoque penatibus et magnis dis parturient montes,
nascetur ridiculus mus. Nulla posuere. Donec vitae dolor. Nullam
tristique diam non turpis. Cras placerat
accumsan nulla. Nullam
rutrum. Nam vestibulum accumsan nisl.

I came up with this naive way but I guess there's a better one using a single regex.
use strict;
use warnings;
use Data::Dumper;
my $text = 'Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Donec hendrerit tempor tellus. Donec pretium posuere
tellus. Proin quam nisl, tincidunt et, mattis eget, convallis nec,
purus. Cum sociis natoque penatibus et magnis dis parturient montes,
nascetur ridiculus mus. Nulla posuere. Donec vitae dolor. Nullam
tristique diam non turpis. Cras placerat accumsan nulla. Nullam
rutrum. Nam vestibulum accumsan nisl.';
sub wsplit {
my ($s, $words) = #_;
my $pos = length $s;
my $n = 0;
while ($s =~ /\S+/g) {
$n++;
if ($n == $words) {
$pos = pos $s;
last;
}
}
return [substr($s, 0, $pos), substr($s, $pos)]
}
print Dumper(wsplit($text, 8));
Output:
$VAR1 = [
'Lorem ipsum dolor sit amet, consectetuer adipiscing
elit.',
' Donec hendrerit tempor tellus. Donec pretium posuere
tellus. Proin quam nisl, tincidunt et, mattis eget, convallis nec,
purus. Cum sociis natoque penatibus et magnis dis parturient montes,
nascetur ridiculus mus. Nulla posuere. Donec vitae dolor. Nullam
tristique diam non turpis. Cras placerat accumsan nulla. Nullam
rutrum. Nam vestibulum accumsan nisl.'
];

Related

Multiline Regex replacement with Autohotkey

Can't seem to wrap my head around the proper regex!
MY GOAL
add 2 spaces to each line of a selected block of text
MY CONTEXT
some markdown tools I used need 2 spaces at the end of each line to properly manage lists, etc.
if a file is edited multiple times, I do not want to end up with lines ending with 4+ spaces
a block of text can be a line, a paragraph, the whole file content as shown in the editor
I have some kind of macro in Notepad++ that does the trick but I want to do the same with Autohotkey to be editor-independant
MY EXAMPLE
----
# 2020-03-17
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. Aliquam lorem ante, dapibus in, viverra quis, feugiat a,
MY SNIPPET SO FAR
; CTL+SHIFT+F12
^+F12::
Clipboard = ; Empty the clipboard so that ClipWait has something to detect
SendInput, ^c ; Copy selected text
ClipWait
OutputText := ""
Loop, parse, Clipboard, `n, `r
{
OutputText .= RegExReplace(A_LoopField,"m)^(.*) *$","$1 `r`n")
}
SendRaw % OutputText
return
MY PROBLEM
Between the character ignored when looping, what I am trying to match and what I try to replace the group with, I end up with far more lines and spaces than needed.
CURRENT OUTPUT
----
# 2020-03-17
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. Aliquam lorem ante, dapibus in, viverra quis, feugiat a,
DESIRED OUTPUT
----
# 2020-03-17
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. Aliquam lorem ante, dapibus in, viverra quis, feugiat a,
You're getting too many lines in the output because the send command is messing up due to the carriage returns, which aren't needed in there anyway. I don't really know why that is happening, and I can't be bothered to find out why since the approach isn't good anyway.
And also your indentation is getting messed up because your text editor automatically adds indentation based on the previous line.
But anyway, sending such (long) input is never a good idea.
Make use of the clipboard and just send a ctrl+v to instantly and reliably paste in the text.
Here's an example of that along another way to add the spaces at the end:
inp := "
(
----
# 2020-03-17
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. Aliquam lorem ante, dapibus in, viverra quis, feugiat a,
)"
Loop, Parse, inp, `n, `r
OutputText .= (A_LoopField = "" ? "" : RTrim(A_LoopField) " ") "`n"
Clipboard := OutputText
SendInput, ^v
The ternary A_LoopField = "" ? "" : RTrim(A_LoopField) " " returns true if the line was empty and then the two spaces aren't added at the end.
I think that's the behavior you were doing for.
And RTrim is used to trim any trailing spaces (or tabs) off the end, so we're sure to end up with just the two we want.
And, of course, at the end of any line we add one line feed `n.
Also, your Regex approach was just fine as well at first it just seemed off to me, but well, here's another way. And I guess this would be more efficient, though you'd have to work with seriously large inputs and/or slow hardware for that to make any meaningful difference haha.

Strip \n within paragraphs of text in a file

Say I have a file with multiple paragraphs similar to
Lorem ipsum dolor sit amet. Velit et ornare feugiat ve fringilla adipiscing, non
augue risus, eleifend. Laoreet a, taciti porttitor mus. Erat leo metus
venenatis. Natoque eni, nunc quis elit est. Nec enim dui. Sem parturient lectus,
sed, egestas. Amet nascetur quisque, nonummy amet ut odio proin hymenaeos sit,
consequat proin hymenaeos vestibulum. Duis ad penatibus natoque, fames nec amet
eni inceptos. Ligula orci scelerisque laoreet, massa leo dictumst feugiat
praesent varius netus suspendisse. Et et quis volutpat quam, aenean sit, magnis
integer ad luctus hendrerit per. Lectus adipiscing nascetur quisque consectetuer
feugiat etiam eros. Natoque massa. Semper ut nam tortor. Odio ut nullam mus,
sociis at, luctus aliquet at odio habitant fames.
Penatibus ipsum lacus blandit ad dis ante dolor. Cursus porta penatibus
facilisi. Nisl erat rutrum primis dis elit dolor penatibus pretium duis
sollicitudin ut. Sed urna leo massa cubilia eget, elementum mus. Ve metus ac
vitae at litora tincidunt id, ac hac. Dis justo nullam. Fames sollicitudin,
augue ve at. Tristique. Primis convallis praesent, eget. Nullam, penatibus ut,
proin non mus id nascetur dis, lorem arcu. Magna urna nascetur ornare, nunc
proin quisque cum, pharetra. Quisque, litora eu lobortis diam eros. Vel mi
hymenaeos ipsum in. Ligula curabitur ve, magnis hymenaeos euismod.
The file was generated by processing a markdown file, which as you can see has broken lines at around 80 characters. Using Perl or sed or awk (I'm running Linux so could use any solution but I not much of a Python or Ruby user), how can I undo the breaking of lines within paragraphs?
I know how to strip \n from an entire file, but that would run the two paragraphs shown into a single unbroken line. I don't want that. I just want to operate a paragraph at a time, so any solution should skip lines where \n is the only content.
The file I have uses Unix/Linux file-endings, i.e. line feeds, hence only \n are present. I do need to preserve the spaces between paragraphs.
Breaks/newlines are replaced with space char,
perl -00 -lpe 's|\r?\n| |g' file
Here is brief explanation of switches, and deparsed source
perl -MO=Deparse -00 -lpe 's|\r?\n| |g' file
BEGIN { $/ = ""; $\ = "\n\n"; } # see below
LINE: while (defined($_ = <ARGV>)) { # -p switch
chomp $_; # also -l switch
s/\r?\n/ /g;
}
continue {
print $_; # -p switch
}
-00 => $/ = ""; # input record separator set to paragraph mode
-l => $\ = "\n\n"; # output record separator set to $/
Try to chomp() last newline when a regular expression matches any line with a non-blank character:
perl -pe 'chomp if m/\S/' infile
EDIT: To keep a blank line between paragraphs and a final newline character, try the following:
perl -pe 'm/\S/ ? chomp() : print "\n"; END { print "\n" }' infile
Without having to read the whole file into memory:
$ cat file
Lorem ipsum dolor sit amet. Velit et ornare feugiat ve fringilla adipiscing, non
augue risus, eleifend. Laoreet a, taciti porttitor mus. Erat leo metus
venenatis. Natoque eni, nunc quis elit est.
Penatibus ipsum lacus blandit ad dis ante dolor. Cursus porta penatibus
facilisi. Nisl erat rutrum primis dis elit dolor penatibus pretium duis
sollicitudin ut. Sed urna leo massa cubilia eget, elementum mus. Ve metus ac
vitae at litora tincidunt id, ac hac. Dis justo nullam.
$ awk -v RS= -v ORS='\n\n' -F'\n' '{$1=$1}1' file
Lorem ipsum dolor sit amet. Velit et ornare feugiat ve fringilla adipiscing, non augue risus, eleifend. Laoreet a, taciti porttitor mus. Erat leo metus venenatis. Natoque eni, nunc quis elit est.
Penatibus ipsum lacus blandit ad dis ante dolor. Cursus porta penatibus facilisi. Nisl erat rutrum primis dis elit dolor penatibus pretium duis sollicitudin ut. Sed urna leo massa cubilia eget, elementum mus. Ve metus ac vitae at litora tincidunt id, ac hac. Dis justo nullam.
lines where \n is the only content.
means at least two consecutive newline chars.
You can do it easily with regex. A regex pattern would be (?:[^\r\n])\n(?:[^\r\n])
A sample python file
import re
mystring = """sjdfkj
adlfklk
dlkfl """
print re.sub(r"(?:[^\r\n])\n(?:[^\r\n])"," ",mystring)

Kentico CMS: Text area input length calculation and determining max length - specifically when there are line breaks in the text

How does Kentico calculate the length of inputted content in a text area on a form and how much value does it give to a line break? A line break is 2 characters according to my JavaScript calculation but seems like Kentico calculates it as being more than 2 characters.
Summary of problem:
I have a maximum length of 2500 set on a text area input on a form on my Kentico site.
I have entered some text into a this text area and with my JavaScript calculations (used to show how many characters the user has left) the character length is exactly 2500 (including line breaks and spaces) and so should therefore validate and send. However Kentico is failing my input saying that my max length has been exceeded. See below:
If I remove the line break and type some extra characters to bring my character calculation back up to 2500, the form sends without failing.
Test used that fails:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque vitae
augue ac enim molestie scelerisque a id metus. Suspendisse purus
justo, iaculis quis accumsan ut, congue vitae mauris. Nunc luctus
vulputate scelerisque. Nullam ullamcorper porta elit, sed ornare lorem
placerat dictum. Sed quis enim quis nibh convallis sagittis nec vitae
felis. Sed porttitor, nibh et volutpat posuere, neque dui sollicitudin
sapien, at scelerisque lacus elit quis enim. Donec at metus lectus.
Sed quis enim quis nibh convallis sagittis nec vitae felis. Sed
porttitor, nibh et volutpat posuere, neque dui sollicitudin sapien, at
scelerisque lacus elit quis enim. Donec at metus lectus. Lorem ipsum
dolor sit amet, consectetur adipiscing elit. Quisque vitae augue ac
enim molestie scelerisque a id metus. Suspendisse purus justo, iaculis
quis accumsan ut, congue vitae mauris. Nunc luctus vulputate
scelerisque. Nullam ullamcorper porta elit, sed ornare lorem placerat
dictum. Sed quis enim quis nibh convallis sagittis nec vitae felis.
Sed porttitor, nibh et volutpat posuere, neque dui sollicitudin
sapien, at scelerisque lacus elit quis enim. Donec at metus
lectus.Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Quisque vitae augue ac enim molestie scelerisque a id metus.
Suspendisse purus jus
to, iaculis quis accumsan ut, congue vitae mauris. Nunc luctus
vulputate scelerisque. Nullam ullamcorper porta elit, sed ornare lorem
placerat dictum. Sed quis enim quis nibh convallis sagittis nec vitae
felis. Sed porttitor, nibh et volutpat posuere, neque dui sollicitudin
sapien, at scelerisque lacus elit quis enim. Donec at metus lectus.
Sed quis enim quis nibh convallis sagittis nec vitae felis. Sed
porttitor, nibh et volutpat posuere, neque dui sollicitudin sapien, at
scelerisque lacus elit quis enim. Donec at metus lectus. Lorem ipsum
dolor sit amet, consectetur adipiscing elit. Quisque vitae augue ac
enim molestie scelerisque a id metus. Suspendisse purus justo, iaculis
quis accumsan ut, congue vitae mauris. Nunc luctus vulputate
scelerisque. Nullam ullamcorper porta elit, sed ornare lorem placerat
dictum. Sed quis enim quis nibh convallis sagittis nec vitae felis.
Sed porttitor, nibh et volutpat posuere, neque dui sollicitudin
sapien, at scelerisque lacus elit quis enim. Donec at metus
lectus.Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Quisque vitae augue ac enim molestie scelerisque a id metus.
Suspendisse purus justo, iaculis quis accumsan ut, congue vitae maur d
Test used that passes: Notice that the line break has been removed and 2 extra characters added to the end to bring it back up to 2500 characters
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque vitae
augue ac enim molestie scelerisque a id metus. Suspendisse purus
justo, iaculis quis accumsan ut, congue vitae mauris. Nunc luctus
vulputate scelerisque. Nullam ullamcorper porta elit, sed ornare lorem
placerat dictum. Sed quis enim quis nibh convallis sagittis nec vitae
felis. Sed porttitor, nibh et volutpat posuere, neque dui sollicitudin
sapien, at scelerisque lacus elit quis enim. Donec at metus lectus.
Sed quis enim quis nibh convallis sagittis nec vitae felis. Sed
porttitor, nibh et volutpat posuere, neque dui sollicitudin sapien, at
scelerisque lacus elit quis enim. Donec at metus lectus. Lorem ipsum
dolor sit amet, consectetur adipiscing elit. Quisque vitae augue ac
enim molestie scelerisque a id metus. Suspendisse purus justo, iaculis
quis accumsan ut, congue vitae mauris. Nunc luctus vulputate
scelerisque. Nullam ullamcorper porta elit, sed ornare lorem placerat
dictum. Sed quis enim quis nibh convallis sagittis nec vitae felis.
Sed porttitor, nibh et volutpat posuere, neque dui sollicitudin
sapien, at scelerisque lacus elit quis enim. Donec at metus
lectus.Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Quisque vitae augue ac enim molestie scelerisque a id metus.
Suspendisse purus justo, iaculis quis accumsan ut, congue vitae
mauris. Nunc luctus vulputate scelerisque. Nullam ullamcorper porta
elit, sed ornare lorem placerat dictum. Sed quis enim quis nibh
convallis sagittis nec vitae felis. Sed porttitor, nibh et volutpat
posuere, neque dui sollicitudin sapien, at scelerisque lacus elit quis
enim. Donec at metus lectus. Sed quis enim quis nibh convallis
sagittis nec vitae felis. Sed porttitor, nibh et volutpat posuere,
neque dui sollicitudin sapien, at scelerisque lacus elit quis enim.
Donec at metus lectus. Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Quisque vitae augue ac enim molestie scelerisque a id
metus. Suspendisse purus justo, iaculis quis accumsan ut, congue vitae
mauris. Nunc luctus vulputate scelerisque. Nullam ullamcorper porta
elit, sed ornare lorem placerat dictum. Sed quis enim quis nibh
convallis sagittis nec vitae felis. Sed porttitor, nibh et volutpat
posuere, neque dui sollicitudin sapien, at scelerisque lacus elit quis
enim. Donec at metus lectus.Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Quisque vitae augue ac enim molestie scelerisque a id
metus. Suspendisse purus justo, iaculis quis accumsan ut, congue vitae
maur dee
The problem lay in the fact that my calculation in my JavaScript gave a length of 1 to a line break whereby Kentico's calculation gives a length of 2 to a line break. So they weren't matching up. Hence my character counter said that the length of the entered text was ok but Kentico's check deemed it over the max length.
This is what I had previously:
enteredText = textareaVariableName.val();
characterCount = enteredText.length; //one line break entered returned 1
This is what I have changed it to:
enteredText = textareaVariableName.val();
enteredTextEncoded = escape(enteredText);
//next I match any line break characters - %0A - after encoding the text area text
linebreaks = enteredTextEncoded.match(/%0A/g);
(linebreaks != null) ? linebreaksLength = linebreaks.length : linebreaksLength = 0;
characterCount = enteredText.length + linebreaksLength; //one line break entered now returns 2
Is there a better way I could check for line breaks in the text, rather than to encode the text and then check for the substring %0A ?
EDIT/UPDATE: I believe the following is a better solution as opposed to what I was doing above.
var limit = 2500; //for example
enteredText = textareaVariableName.val();
numberOfLineBreaks = (enteredText.match(/\n/g)||[]).length;
left = limit - enteredText.length - numberOfLineBreaks;
if (left < 0) {
//character count over code here
} else {
//character count within limits code here
}
This is basically JavaScript problem related to browser. In Firefox or Chrome or any other WebKit based browser textareaVariableName.val().length will count only 1 character for new line (\n). Same for jQuery implementation. But in IE document.getElementById('textareaVariableName').value.length will count 2 for new line (\r\n)
In Kentico, the text is validated to the actual count of characters and therefore the validation is failing.
Quick fix for this is simple regular expression for counting the actual length:
function getTextLength(elementId){
if (elementId) {
var elem = document.getElementById(elementId);
if (elem) {
var str = elem.value;
if (str) {
str = str.replace(/(\r\n|\r|\n)/g, '\r\n');
return str.length;
}
}
}
return 0;
}
This should help you to count characters correctly independently of the browser used by the customer.

Working with columns and adding elements after

I'm creating a pdf using the MultiColumnText object within iTextSharp. The text carries over to the second page where it only fills the left column. So I have two questions:
1) Is it possible to fill in all three columns on the second page, and only take up as much vertical space as it requires?
2) Is it possible to add additional page elements after the column object ends without knocking the new elements over to a new page?
The only thing I can think of would be to "write out" your text into a ColumnText using go(true) to simulate layout and find out how tall your text really is (with no page breaks), and use that knowledge to construct columns with specific heights such that they'll be even.
This gets Really Difficult if your columns aren't all the same width.
Don't forget about the page's top & bottom margins when calculating how much room you have to work with.
Here is my solution to disperse the text equally among 3 columns on the final page.
The trick was to
1) simulate the code
2)find which column the code ended in
3)find how far down the page the text went
4)calculate the new 'bottom'
5)only apply the new 'bottom' to the final page of the non-simulated output
class Program
{
static void Main(string[] args)
{
string fileName = "columntexttest.pdf";
Document doc = new Document(PageSize.LETTER, 50, 50, 50, 50);
PdfWriter pdfWrite = PdfWriter.GetInstance(doc, new FileStream(fileName, FileMode.Create));
doc.Open();
PdfContentByte cb = pdfWrite.DirectContent;
ColumnText ct = new ColumnText(cb);
//default values
int colCount = 0;
float bottom = doc.Bottom;
int pageCount = 0;
AddText(ct);
CreateColumnText(doc, ct, ref bottom, true, ref pageCount, ref colCount); //simulation
AddText(ct);
CreateColumnText(doc, ct, ref bottom, false, ref pageCount, ref colCount); //non-simulation
doc.Add(new Paragraph("testing new paragraph"));
doc.Close();
System.Diagnostics.Process.Start(fileName);
}
private static void CreateColumnText(Document doc, ColumnText ct, ref float bottom, bool simulate, ref int pageCount, ref int colCount)
{
//reseting variables for non-simulation
int status = 0;
int currentPage = 1;
int currentColumn = 0;
float tempBottom = bottom;
float tempBottom2 = tempBottom;
if (simulate)
{
pageCount = 1;
}
//column attributes
float gutter = 15f;
float colwidth = (doc.Right - doc.Left - gutter * 2) / 3;
while (ColumnText.HasMoreText(status))
{
//calculates the bottom Y
if (simulate == false && currentPage == pageCount)
{
if (colCount == 1) //1 column on final page
{
tempBottom2 = (doc.Top - tempBottom) / 3 + 6;
bottom = doc.Top - tempBottom2;
}
else if (colCount == 2) //2 columns on final page
{
tempBottom2 = ((doc.Top - tempBottom) + doc.Top) / 3 + 6;
bottom = doc.Top - tempBottom2;
}
else if (colCount == 0) //0 colCount means 3 columns
{
tempBottom2 = ((doc.Top - tempBottom) + doc.Top * 2) / 3 + 6;
bottom = doc.Top - tempBottom2;
}
}
else
{
bottom = doc.Bottom; //default value for all pages except the last, or the value for a single page
}
if (currentColumn == 0) //writes first column
{
float[] left = {doc.Left, doc.Top, //top = 742 (true top is 792 then a 50 point margin)
doc.Left, bottom }; //bottom = 50
float[] right = {doc.Left + colwidth, doc.Top,
doc.Left + colwidth, bottom};
ct.SetColumns(left, right);
currentColumn++;
}
else if (currentColumn == 1) //writes second column
{
float[] left2 = {doc.Left+ colwidth + gutter, doc.Top,
doc.Left + colwidth + gutter, bottom};
float[] right2 = {doc.Right - colwidth - gutter, doc.Top,
doc.Right - colwidth - gutter, bottom};
ct.SetColumns(left2, right2);
currentColumn++;
}
else //writes third column
{
float[] left3 = { doc.Right - colwidth, doc.Top,
doc.Right- colwidth, bottom};
float[] right3 = { doc.Right, doc.Top,
doc.Right, bottom};
ct.SetColumns(left3, right3);
currentColumn = 0;
}
status = ct.Go(simulate); //simulate mode
if (currentColumn == 0 && status == 2) //creates new page only if text remains.
{
doc.NewPage();
currentPage += 1;
}
}
//values carry forward to non-simulation mode
pageCount = currentPage;
bottom = ct.YLine;
colCount = currentColumn;
}
private static void AddText(ColumnText ct)
{
Font font2 = new Font(Font.NORMAL, 9f);
ct.AddText(new Phrase("orem ipsum dolor sit amet, consectetuer adipiscing elit. Suspendisse blandit blandit turpis. Nam in lectus ut dolor consectetuer bibendum. Morbi neque ipsum, laoreet id; dignissim et, viverra id, mauris. Nulla mauris elit, consectetuer sit amet, accumsan eget, congue ac, libero. Vivamus suscipit. Nunc dignissim consectetuer lectus. Fusce elit nisi; commodo non, facilisis quis, hendrerit eu, dolor? Suspendisse eleifend nisi ut magna. Phasellus id lectus! Vivamus laoreet enim et dolor. Integer arcu mauris, ultricies vel, porta quis, venenatis at, libero. Donec nibh est, adipiscing et, ullamcorper vitae, placerat at, diam. Integer ac turpis vel ligula rutrum auctor! Morbi egestas erat sit amet diam. Ut ut ipsum? Aliquam non sem. Nulla risus eros, mollis quis, blandit ut; luctus eget, urna. Vestibulum vestibulum dapibus erat. Proin egestas leo a metus?\n\n", font2));
ct.AddText(new Phrase("Vivamus enim nisi, mollis in, sodales vel, convallis a, augue? Proin non enim. Nullam elementum euismod erat. Aliquam malesuada eleifend quam! Nulla facilisi. Aenean ut turpis ac est tempor malesuada. Maecenas scelerisque orci sit amet augue laoreet tempus. Duis interdum est ut eros. Fusce dictum dignissim elit. Morbi at dolor. Fusce magna. Nulla tellus turpis, mattis ut, eleifend a, adipiscing vitae, mauris. Pellentesque mattis lobortis mi.\n\n", font2));
ct.AddText(new Phrase("Nullam sit amet metus scelerisque diam hendrerit porttitor. Aenean pellentesque, lorem a consectetuer consectetuer, nunc metus hendrerit quam, mattis ultrices lorem tellus lacinia massa. Aliquam sit amet odio. Proin mauris. Integer dictum quam a quam accumsan lacinia. Pellentesque pulvinar feugiat eros. Suspendisse rhoncus. Sed consectetuer leo eu nisi. Suspendisse massa! Sed suscipit lacus sit amet elit! Aliquam sollicitudin condimentum turpis. Nunc ut augue! Maecenas eu eros. Morbi in urna consectetuer ipsum vehicula tristique.\n\n", font2));
ct.AddText(new Phrase("Donec imperdiet purus vel ligula. Vestibulum tempor, odio ut scelerisque eleifend, nulla sapien laoreet dui; vel aliquam arcu libero eu ante. Curabitur rutrum tristique mi. Sed lobortis iaculis arcu. Suspendisse mauris. Aliquam metus lacus, elementum quis, mollis non, consequat nec, tortor.\n", font2));
ct.AddText(new Phrase("Quisque id diam. Ut egestas leo a elit. Nulla in metus. Aliquam iaculis turpis non augue. Donec a nunc? Phasellus eu eros. Nam luctus. Duis eu mi. Ut mollis. Nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean pede. Nulla facilisi. Vestibulum mattis adipiscing nulla. Praesent orci ante, mattis in, cursus eget, posuere sed, mauris.\n\n", font2));
ct.AddText(new Phrase("Nulla facilisi. Nunc accumsan risus aliquet quam. Nam pellentesque! Aenean porttitor. Aenean congue ullamcorper velit. Phasellus suscipit placerat tellus. Vivamus diam odio, tempus quis, suscipit a, dictum eu; lectus. Sed vel nisl. Ut interdum urna eu nibh. Praesent vehicula, orci id venenatis ultrices, mauris urna mollis lacus, et blandit odio magna at enim. Pellentesque lorem felis, ultrices quis, gravida sed, pharetra vitae, quam. Mauris libero ipsum, pharetra a, faucibus aliquet, pellentesque in, mauris. Cras magna neque, interdum vel, varius nec; vulputate at, erat. Quisque vitae urna. Suspendisse potenti. Nulla luctus purus at turpis! Vestibulum vitae dui. Nullam odio.\n\n", font2));
ct.AddText(new Phrase("Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Sed eget mi at sem iaculis hendrerit. Nulla facilisi. Etiam sed elit. In viverra dapibus sapien. Aliquam nisi justo, ornare non, ultricies vitae, aliquam sit amet, risus! Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Phasellus risus. Vestibulum pretium augue non mi. Sed magna. In hac habitasse platea dictumst. Quisque massa. Etiam viverra diam pharetra ante. Phasellus fringilla velit ut odio! Nam nec nulla.\n\n", font2));
ct.AddText(new Phrase("Integer augue. Morbi orci. Sed quis nibh. Nullam ac magna id leo faucibus ornare. Vestibulum eget lectus sit amet nunc facilisis bibendum. Donec adipiscing convallis mi. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus enim. Mauris ligula lorem, pellentesque quis, semper sed, tristique sit amet, justo. Suspendisse potenti. Proin vitae enim. Morbi et nisi sit amet sapien ve.", font2));
ct.Alignment = Element.ALIGN_JUSTIFIED;
}
}

Can a pixbuf inserted into a GTK+ text buffer be set as "floating"?

I'm writing an application [a Pidgin plugin, actually], which inserts an image embedded into a GtkTextBuffer. Currently, I add it using:
gtk_text_buffer_insert_pixbuf(textBuffer, &iter, pixbuf);
However, this just puts the image "inline" with the text. What I'm looking for for is something similar to HTML's "float". For example, assuming my image is about twice the size of a line of text, I current get this [where X is the image]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam gravida
XXXX
XXXX ante in massa dignissim aliquam. Nullam tempus quam luctus eros volutpat laoreet.
XXXX
XXXX sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
Mauris semper, nunc quis gravida molestie,
leo neque imperdiet nulla, vel consectetur nisi nisl non metus. Maecenas pharetra
magna nec magna mattis faucibus convallis nibh
Ideally, I'd like to have:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam gravida
XXXX ante in massa dignissim aliquam. Nullam tempus quam luctus eros volutpat laoreet.
XXXX
XXXX sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
XXXX Mauris semper, nunc quis gravida molestie,
leo neque imperdiet nulla, vel consectetur nisi nisl non metus. Maecenas pharetra
magna nec magna mattis faucibus convallis nibh
Note that there are four paragraphs, where the second and third have an image in the beginning.
Is this possible?
The short answer is no; images in TextView are just treated as a character (which may be a lot bigger than a usual character). There isn't any layout engine in the HTML sense. (Layout is limited to what PangoLayout can do.)
You could probably hack something together, using an approach such as:
leave a margin the size of the image on your paragraph
add an expose event handler to paint the image to the window (see the "border windows" examples which are I think in gtk-demo or the docs somewhere, but draw to the main window not border windows)
Some amount of work, but it would probably get the job done.