How to find the shortest of substring of string before a certain text in python 3 - substring

I am trying to extract the shortest substring of a string before a certain text in Python 3. For instance, I have the following string.
\\n...\\n...\\n...TEXT
I want to extract the shortest sub-string of the string that contains exactly two \\n before 'TEXT'. The example text may have random number of \\n and random letters between \\n.
I have already tried this in Python 3.4 but I get the result as the original text. It seems like when I try the code, it finds the first '\n' as the first search find and treats rest of '\n' as just any other texts.
text='\\n abcd \\n efg \\n hij TEXT'
pattern1=re.compile(r'\\n.\*?\\n.\*?TEXT', re.IGNORECASE)
obj = re.search(pattern1, text)
obj.group(0)
When I try my code, I get the result as \\n abcd \\n efg \\n hij TEXT which is exactly same as the input.
I would like to result to be
\\n efg \\n hij TEXT
Can anyone help me with this?

Using regex with negative lookahead:
import re
text = '\\n abcd \\n efg \\n hij TEXT'
pattern = re.compile(r'(\\n(?!.*\\n.*\\).*)')
res = re.search(pattern, str(respData))
res.group(0)
Using python methods:
text = '\\n abcd \\n efg \\n hij TEXT'
text[text[:text.rfind("\\n")].rfind("\\n"):]

I am not sure if I good understanding the problem...
Using simple split text, meaby was useful:
text = '\\\n abcd \\\n efg \\\n hij TEXT - the rest of string'
text = text.split('TEXT')[0]
list_part = text.split('\\\n ')
print(list_part)
minimal_set = text
for parts in list_part:
if len(parts) is not 0 and len(parts) < len(minimal_set):
minimal_set = parts
print (minimal_set)

Related

Extract lines from input text file to text file with some transformation with Mule 4

I have a requirement where I need to read text file and extract some data and send the extracted to other system for which am unable to do it.
Input file:
1BoraBora Island
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
3BR 209078 BoraBora 6798989 99999
1 BR 67854 JAIHIND 789 000Y247 9898983
2 BR CR9 BoraBora 123 QK J12Y64 00010520
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Output should be:
1BoraBora Island
0000000000000000000000
1 BR 67854 JAIHIND 789 000Y247 9898983
2 BR CR9 BoraBora 123 QK J12Y64 00010520
Need to extract only row having "BR" in it at 3th letter.
Please guide me how to achieve this in text format only.
Assuming that the input is `text/plain'. Using a DataWeave script and the subscript() function you can extract a given position from the input:
%dw 2.0
import * from dw::core::Strings
output text/plain
var lines=payload splitBy "\n" // separate text into an array of lines
---
lines[0] ++"\n" ++ lines[1] ++"\n"
++ (lines[2 to -1] // use the range selector to get the remaining lines
filter (substring($,2,4)=="BR") // filter lines that have "BR" at the right position
reduce ($$++"\n"++$) // concatenate the remaining lines again into a single text file
)
Output:
1BoraBora Island
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1 BR 67854 JAIHIND 789 000Y247 9898983
2 BR CR9 BoraBora 123 QK J12Y64 00010520
Since you are working with Text, you can also use Regex with the scan function to scan all lines that matches your condition then joinBy a new line character
%dw 2.0
output text/plain
---
flatten(payload scan /(?<=^|\n).{2}BR.*/)
joinBy "\n"
(?<=^|\n).{2}BR.* Regex breakdown:
(?<= is a positive lookbehind, that means it will start matching the rest of the pattern only if it follows the pattern specified by it
(?<=^|\n) is positive lookahead with either start of string (^) of a new line (\n)
.{2}BR.* indicates any character twice followed by the literal BR then any number of any character thereafter

Swift String including Special Characters

I have a user enter a multi string in an NSTextView.
var textViewString = textView.textStorage?.string
Printing the string ( print(textViewString) ), I get a multi-line string, for example:
hello this is line 1
and this is line 2
I want a swift string representation that includes the new line characters. For example, I want print(textStringFlat) to print:
hello this is line 1\n\nand this is line 2
What do I need to do to textViewString to expose the special characters?
If you just want to replace the newlines with the literal characters \ and n then use:
let escapedText = someText.replacingOccurrences(of: "\n", with: "\\n")

Using tab as sed separator

I would like to include tab as delimited new row to a file inp.txt.
This is the input produced by R:
inp <- 'AX-1 1 125
AX-2 2 456
AX-3 3 3445'
inp <- read.table(text=inp, header=F)
write.table(inp, "inp.txt", col.names=F, row.names=F, quote=F, sep="\t")
That´s what I am trying to do:
sed -i '1i The name\tThe pos\tThe pos2\' inp.txt
However, those three col names: 1- The name, 2- The pos, 3- The pos2 are not separated by tab in the output file. It just contain the \t string. Someone can help me here with the syntax?
Put the tab in a variable:
tab=$(echo "\t")
or
tab=$'\t'
Then you can use it in your sed script:
sed -i "1i The name${tab}The pos${tab}The pos2" inp.txt

Extract a substring using command line utilities

I have a text file including lines in the form of:
(term1 x:a y:b (term2 z:c k:a))
I want to extract only terms from this line using command line utilities such as awk, grep, sed. i.e I want the result to be:
term1
term2
I have formed a regex matching the rest but the terms, but could not find a way to negate it.
(\()|( \()|( (.*?) \()|( (.*?)\)+)
How can I form a command extracting the every substring after '(' and before ' '?
Thanks
Try this:
sed "s/(\([^ (]*\)[^(]*/\1\n/g"
For example:
$ echo "(term1 x:a y:b (term2 (term3) z:c k:a) x (termX a:b ) )" | sed "s/(\([^ )]*\)[^(]*/\1\n/g"
term1
term2
term3
termX

Open Office replace()

does Open Office BASIC support function replace(string,search string, replace with)?
Yes, Open Office BASIC supports this function, but be aware it is not case sensitive.
Yes, the function is SUBSITUTE(input, search_text, replace_text[, occurrence])
Examples:
input | result
----------------------------------------------+-----------------
=SUBSTITUTE("nyan cat cat","cat","nyan!") | nyan nyan! nyan!
=SUBSTITUTE("nyan cat cat","cat","nyan!", 1) | nyan nyan! cat
=SUBSTITUTE("nyan cat cat","cat","nyan!", 2) | nyan cat nyan!
Here is the official documentation
Replaces part of a text string with a different text string.
Syntax:
REPLACE(originaltext; startposition; length; newtext)
in originaltext, removes length characters beginning at character startposition, replaces them with newtext, and returns the result.
startposition and length must be 1 or more.
Example:
REPLACE("mouse"; 2; 3; "ic")
returns mice. Beginning at character position 2, 3 characters (ous) are removed and replaced by ic.