How to find a string of text, but exclude text from a text file? - scala

How to exclude a line of text with “K-Address”, when there is a line of text with “Address”?
I want the line of text with the word "Address", but then I don't want the line of text if it has "K-Address" (for example). I have this code below but it grabs both lines with Address and K-Address, so I have 2 lines. I just want the one line with "Address". How can I make this happen?
myRDD.filter(line => line.contains("Address") && !(line.contains("K-Address")) )

myRDD.filter(line => line.contains(“Address”) && !(line.contains("K-Address")) )
This is correct to exclude and include text in the same line.

Related

Randomly choose 1 text-line from each of 3 lists of text-lines but instead it brings 3 blank text-lines

I have this code to randomly choose one line of text from each of the three lists of text lines within the same file, and copy them together to the clipboard.
It works but not properly because it brings three blank lines instead of three actual text lines.
It seems to me that the issue may be in the way this code is copying the text lines, but I'm not an expert.
Can somebody help me find where the issue is and maybe code it properly to bring the actual text lines instead of the blank ones?
My AutoHotkey is version 1.1.36.02 . Thanks in advance to everyone.
Random, rand1, 1, 4
Random, rand2, 1, 4
Random, rand3, 1, 4
; Definining 3 lists of text lines:
list1 = 1st text line, 2nd text line, 3rd text line, 4th text line
list2 = 5th text line, 6th text line, 7th text line, 8th text line
list3 = 9th text line, 10th text line, 11th text line, 12th text line
; Selecting randomly one text line from each list:
selectedLine1 := list1[rand1]
selectedLine2 := list2[rand2]
selectedLine3 := list3[rand3]
; Concatenating the 3 selected text lines and copy them to the clipboard:
clipboard = %selectedLine1% `n %selectedLine2% `n %selectedLine3%
; Seeing the result on a message box:
msgbox, Randoms: %rand1%, %rand2%, %rand3%`nSelected lines:`n%selectedLine1%`n%selectedLine2%`n%selectedLine3%
This solves the problem:
Random, rand1, 1, 4
Random, rand2, 1, 4
Random, rand3, 1, 4
; Definining 3 lists of text lines:
list1 = 1st text line, 2nd text line, 3rd text line, 4th text line
list2 = 5th text line, 6th text line, 7th text line, 8th text line
list3 = 9th text line, 10th text line, 11th text line, 12th text line
; Selecting randomly one text line from each list:
selectedLine1 := strsplit(list1,",")[rand1]
selectedLine2 := strsplit(list2,",")[rand2]
selectedLine3 := strsplit(list3,",")[rand3]
; Concatenating the 3 selected text lines and copy them to the clipboard:
clipboard = %selectedLine1% `n %selectedLine2% `n %selectedLine3%
; Seeing the result on a message box:
msgbox, Randoms: %rand1%, %rand2%, %rand3%`nSelected lines:`n%selectedLine1%`n%selectedLine2%`n%selectedLine3%

iText PDFSweep RegexBasedCleanupStrategy not work in some case

I'm trying to use iText PDFSweep RegexBasedCleanupStrategy to redact some words from pdf, however I only want to redact the word but not appear in other word, eg.
I want to redact "al" as single word, but I don't want to redact the "al" in "mineral".
So I add the word boundary("\b") in the Regex as parameter to RegexBasedCleanupStrategy,
new RegexBasedCleanupStrategy("\\bal\\b")
however the pdfAutoSweep.cleanUp not work if the word is at the end of line.
In short
The cause of this issue is that the routine that flattens the extracted text chunks into a single String for applying the regular expression does not insert any indicator for a line break. Thus, in that String the last letter from one line is immediately followed by the first letter of the next which hides the word boundary. One can fix the behavior by adding an appropriate character to the String in case of a line break.
The problematic code
The routine that flattens the extracted text chunks into a single String is CharacterRenderInfo.mapString(List<CharacterRenderInfo>) in the package com.itextpdf.kernel.pdf.canvas.parser.listener. In case of a merely horizontal gap this routine inserts a space character but in case of a vertical offset, i.e. a line break, it adds nothing extra to the StringBuilder in which the String representation is generated:
if (chunk.sameLine(lastChunk)) {
// we only insert a blank space if the trailing character of the previous string wasn't a space, and the leading character of the current string isn't a space
if (chunk.getLocation().isAtWordBoundary(lastChunk.getLocation()) && !chunk.getText().startsWith(" ") && !chunk.getText().endsWith(" ")) {
sb.append(' ');
}
indexMap.put(sb.length(), i);
sb.append(chunk.getText());
} else {
indexMap.put(sb.length(), i);
sb.append(chunk.getText());
}
A possible fix
One can extend the code above to insert a newline character in case of a line break:
if (chunk.sameLine(lastChunk)) {
// we only insert a blank space if the trailing character of the previous string wasn't a space, and the leading character of the current string isn't a space
if (chunk.getLocation().isAtWordBoundary(lastChunk.getLocation()) && !chunk.getText().startsWith(" ") && !chunk.getText().endsWith(" ")) {
sb.append(' ');
}
indexMap.put(sb.length(), i);
sb.append(chunk.getText());
} else {
sb.append('\n');
indexMap.put(sb.length(), i);
sb.append(chunk.getText());
}
This CharacterRenderInfo.mapString method is only called from the RegexBasedLocationExtractionStrategy method getResultantLocations() (package com.itextpdf.kernel.pdf.canvas.parser.listener), and only for the task mentioned, i.e. applying the regular expression in question. Thus, enabling it to properly allow recognition of word boundaries should not break anything but indeed should be considered a fix.
One merely might consider adding a different character for a line break, e.g. a plain space ' ' if one does not want to treat vertical gaps any different than horizontal ones. For a general fix one might, therefore, consider making this character a settable property of the strategy.
Versions
I tested with iText 7.1.4-SNAPSHOT and PDFSweep 2.0.3-SNAPSHOT.

Applescript to resort piles of numbers

I'm trying to resort a bunch of numbers with Applescript. I'm very new to the language and I thought I'd ask you for help.
I have a group of numbers which looks like this in my TextEdit file:
v 0.186472 0.578063 1.566364
v -0.186472 0.578063 1.566364
v 0.335649 0.578063 1.771483
What i need is a script that resorts these numbers, making it appear like this:
(0.186472, 0.578063, 1.566364),
(-0.186472, 0.578063, 1.566364),
(0.335649, 0.578063, 1.771483),
So after each number, there has to be a comma, and always the three numbers on one line have to be put into brackets (). finally there has to be another comma after every bracketed group of three and the v before every line has to be deleted.
I've only so far managed to get rid of every "v" using:
set stringToFind to "v"
set stringToReplace to ""
But now im stuck and I'm hoping for help.
To find and replace strings in AppleScript the native way is using text item delimiters. There are a fixed number of values separated by spaces (or tabs) on each line, using text item delimiters, text itemsand string concatenation we can solve your problem.
I've added an addition linefeed in front and at the back of the string to show that lines that doesn't contain 4 words are ignored.
set theString to "
v 0.186472 0.578063 1.566364
v -0.186472 0.578063 1.566364
v 0.335649 0.578063 1.771483
"
set theLines to paragraphs of theString
set oldTIDs to AppleScript's text item delimiters
repeat with i from 1 to count theLines
set AppleScript's text item delimiters to {space, tab}
if (count of text items of item i of theLines) = 4 then
set theNumbers to text items 2 thru -1 of item i of theLines
set AppleScript's text item delimiters to ", "
set item i of theLines to "(" & (theNumbers as string) & "),"
else
set item i of theLines to missing value
end if
end repeat
set theLines to text of theLines
set AppleScript's text item delimiters to linefeed
set newString to theLines as string
set AppleScript's text item delimiters to oldTIDs
return newString

Removing Unwanted commas from a csv

I'm writing a program in Progress, OpenEdge, ABL, and whatever else it's known as.
I have a CSV file that is delimited by commas. However, there is a "gift message" field, and users enter messages with "commas", so now my program will see additional entries because of those bad commas.
The CSV fields are not in double qoutes so I CAN NOT just use my main method with is
/** this next block of code will remove all unwanted commas from the data. **/
if v-line-cnt > 1 then /** we won't run this against the headers. Otherwise thhey will get deleted **/
assign
v-data = replace(v-data,'","',"\t") /** Here is a special technique to replace the comma delim wiht a tab **/
v-data = replace(v-data,','," ") /** now that we removed the comma delim above, we can remove all nuisance commas **/
v-data = replace(v-data,"\t",'","'). /** all nuisance commas are gone, we turn the tabs back to commas. **/
Any advice?
edit:
From Progress, I cal call Linux commands. So I should be able to execute C++/PHP/Shell etc all from my Progress Program. I look forward to advice, until then I shall look into using external scripts.
You are not providing quite enough data for a perfect answer but given what you say I think the IMPORT statement should handle this automatically.
In my example here commaimport.csv is a comma-separated csv-file with quotes around text fields. Integers, logical variables etc have no quotes. The last field contains a comma in one line:
commaimport.csv
=======================
"Id1", 123, NO, "This is a message"
"Id2", 124, YES, "This is a another message, with a comma"
"Id3", 323, NO, "This is a another message without a comma"
To import this file I define a temp-table matching the file layout and use the IMPORT statement with comma as delimiter:
DEFINE TEMP-TABLE ttImport NO-UNDO
FIELD field1 AS CHARACTER FORMAT "xxx"
FIELD field2 AS INTEGER FORMAT "zz9"
FIELD field3 AS LOGICAL
FIELD field4 AS CHARACTER FORMAT "x(50)".
INPUT FROM VALUE("c:\temp\commaimport.csv").
REPEAT :
CREATE ttImport.
IMPORT DELIMITER "," ttImport.
END.
INPUT CLOSE.
FOR EACH ttImport:
DISPLAY ttImport.
END.
You don't have to import into a temp-table. You could import into variables instead.
DEFINE VARIABLE c AS CHARACTER NO-UNDO FORMAT "xxx".
DEFINE VARIABLE i AS INTEGER NO-UNDO FORMAT "zz9".
DEFINE VARIABLE l AS LOGICAL NO-UNDO.
DEFINE VARIABLE d AS CHARACTER NO-UNDO FORMAT "x(50)".
INPUT FROM VALUE("c:\temp\commaimport.csv").
REPEAT :
IMPORT DELIMITER "," c i l d.
DISP c i l d.
END.
INPUT CLOSE.
This will render basically the same output:
You don't show what your data file looks like. But if the problematic field is the last one, and there are no quotes, then your best bet is probably to read it using INPUT UNFORMATTED to get it a line at a time, and then split the line into fields using ENTRY(). That way you can treat everything after the nth comma as a single field no matter how many commas the line has.
For example, say your input file has three columns like this:
boris,14.23,12 the avenue
mark,32.10,flat 1, the grange
percy,1.00,Bleak house, Dartmouth
... so that column three is an address which might contain a comma and is not enclosed in quotes so that IMPORT DELIMITER can't help you.
Something like this would work in that case:
/* ...skipping a lot of definitions here ... */
input from "datafile.csv".
repeat:
import unformatted v-line.
create tt-thing.
assign tt-thing.name = entry(1, v-line, ',')
tt-thing.price = entry(2, v-line, ',')
tt-thing.address = entry(3, v-line, ',').
do v=i = 4 to num-entries(v-line, ','):
tt-thing.address = tt-thing.address
+ ','
+ entry(v-i, v-line, ',').
end.
end.
input close.

How can I add a string to the last line in multiline EditText, Matlab?

I often use this way to add a string to the last line in multiline editText.
Example: The before editText: (handles.txtLine)
line 1
line 2
line 3
and i want to add string "line 4" to it. So i do:
msg = get(handles.txtLine,'string');
msg_i = sprintf('\nline 4');
msg = [msg msg_i];
set(handles.txtLine,'string',msg)
Result:
line 1
line 2
line 3
line 4
Are there other methods to do the same function?
The String property of a multiline edit control can be set in three ways:
a multiline character array, e.g. txt1= ['line 1'; 'line 2']. Here txt1 has size 2x6.
a single line character array containing newline characters, e.g. txt2= sprintf('line 1\nline 2'). Here txt2 has size 1x13.
a cell array of strings, e.g. txt3 = {'line 1', 'line 2'}
You would add or remove text from the string in each case in different ways, and each method has advantages and disadvantages.
1 is usually inconvenient, as all your lines have to have exactly the same length, or be padded with spaces. But if that's the case, then it's easy to add or remove lines.
2 (basically the way you're doing it now) is also usually less convenient, as while it's easy to append lines, it's less easy to remove them from the middle unless you parse the string looking for newlines. But if you only ever need to add lines, it's probably fine.
I would modify the way you're using sprintf and then concatenating:
msg = sprintf('%s\n%s', msg, 'line 4');
is a simpler and more flexible syntax.
Your general method of getting, modifying and setting the String property is fine, although if you wanted you could combine it all into one starement, such as:
set(handles.txtLine, 'String', sprintf('%s\n%s', get(handles.txtLine, 'String'), 'line4'))
3 would typically be the most convenient, as long as you're comfortable with cell arrays. Each line can be whatever you like, and it's easy to add or remove items.