perl script for find and replace or insert if blank - perl

I am trying to write a perl script to search a config file for the following line:
remote_phonebook.data.1.url =
and do 1 of 2 things:
if the right of side of the = is blank add someString
if there is something there, replace anything there with someString
This will insert just fine:
s/remote_phonebook\.data\.1\.url = /remote_phonebook.data.1.url = someString/;
however if someString already exists, it will append it to look like this:
remote_phonebook.data.1.url = someString someString
This will replace just fine if someString already exists, but wont insert if its blank.
s/remote_phonebook\.data\.1\.url = someString/remote_phonebook.data.1.url = someString/;

.* is your friend, here. It means "match 0 or more (*) of any character (.)":
s/remote_phonebook\.data\.1\.url =.*/remote_phonebook.data.1.url = someString/;
So whether or not there is anything after the =, you'll end up with the contents you want. To make sure that you're matching from the start of the line (so "xxxremote_phonebook..." won't match), and to allow for more (or less) space before the "=", I'd use:
s/^remote_phonebook\.data\.1\.url\s*=.*/remote_phonebook.data.1.url = someString/;

s/^\s*remote_phonebook\.data\.1\.url\s*=\K.*/someString/;
The .* will match anything up to a newline.
The \K makes it so you don't have to repeat everything.

Related

Converting numbers into timestamps (inserting colons at specific places)

I'm using AutoHotkey for this as the code is the most understandable to me. So I have a document with numbers and text, for example like this
120344 text text text
234000 text text
and the desired output is
12:03:44 text text text
23:40:00 text text
I'm sure StrReplace can be used to insert the colons in, but I'm not sure how to specify the position of the colons or ask AHK to 'find' specific strings of 6 digit numbers. Before, I would have highlighted the text I want to apply StrReplace to and then press a hotkey, but I was wondering if there is a more efficient way to do this that doesn't need my interaction. Even just pointing to the relevant functions I would need to look into to do this would be helpful! Thanks so much, I'm still very new to programming.
hfontanez's answer was very helpful in figuring out that for this problem, I had to use a loop and substring function. I'm sure there are much less messy ways to write this code, but this is the final version of what worked for my purposes:
Loop, read, C:\[location of input file]
{
{ If A_LoopReadLine = ;
Continue ; this part is to ignore the blank lines in the file
}
{
one := A_LoopReadLine
x := SubStr(one, 1, 2)
y := SubStr(one, 3, 2)
z := SubStr(one, 5)
two := x . ":" . y . ":" . z
FileAppend, %two%`r`n, C:\[location of output file]
}
}
return
Assuming that the "timestamp" component is always 6 characters long and always at the beginning of the string, this solution should work just fine.
String test = "012345 test test test";
test = test.substring(0, 2) + ":" + test.substring(2, 4) + ":" + test.substring(4, test.length());
This outputs 01:23:45 test test test
Why? Because you are temporarily creating a String object that it's two characters long and then you insert the colon before taking the next pair. Lastly, you append the rest of the String and assign it to whichever String variable you want. Remember, the substring method doesn't modify the String object you are calling the method on. This method returns a "new" String object. Therefore, the variable test is unmodified until the assignment operation kicks in at the end.
Alternatively, you can use a StringBuilder and append each component like this:
StringBuilder sbuff = new StringBuilder();
sbuff.append(test.substring(0,2));
sbuff.append(":");
sbuff.append(test.substring(2,4));
sbuff.append(":");
sbuff.append(test.substring(4,test.length()));
test = sbuff.toString();
You could also use a "fancy" loop to do this, but I think for something this simple, looping is just overkill. Oh, I almost forgot, this should work with both of your test strings because after the last colon insert, the code takes the substring from index position 4 all the way to the end of the string indiscriminately.

Substring is getting too less data

I want to grab lots of text content from a .sql file between a --Start and --End comment.
Whatever I do somehow I don`t get the substring method correctly to grab only the text within the --Start and --End comment:
text.sql
This text I want not
--Start
this text I want here
--End
This text I want not
This is what I tried:
$insertStartComment = "--Start"
$insertEndComment = "--End"
$content = [IO.File]::ReadAllText("C:\temp\test.sql")
$insertStartPosition = $content.IndexOf($insertStartComment) + $insertStartComment.Length
$insertEndPosition = $content.IndexOf($insertEndComment)
$content1 = $content.Substring($insertStartPosition, $content1.Length - $insertEndPosition)
$content = $content1.Substring(0,$content1.Length - $insertEndPosition)
It would be nice if someone could help me out find my error :-)
There's an attempt to use uninitialized variable in the code:
$content1 = $content.Substring($insertStartPosition, $content1.Length - $insertEndPosition)
The variable $content1 isn't initialized yet, thus the substring call goes haywire. When you run the code again, the variable is set - and results are even more weird.
Use Powershell's Set-StrictMode to enable warnings about uninitialized variables.
It's not the substring approach you are looking for, but I figured that I would toss out a RegEx solution. This will find the text between the --Start and --End on a text file. In this case, I am grouping the matched text with a named capture called LineYouWant and display the matches that it finds. This also works if you have multiple instances of --Start--End blocks in a single file.
$Text = [IO.File]::ReadAllText("C:\users\proxb\desktop\SQL.txt")
[regex]::Matches($Text,'.*--Start\s+(?<LineYouWant>.*)\s+--End.*') | ForEach {
$_.Groups['LineYouWant'].Value
}

Adding new line to NSCharacterSet

I want to strip a string of all new lines and commas (and place it into an array), so I created this:
let results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: ",\n"))
However, the newlines are still existing in my array (the commas are being removed). What's the correct way of adding newline to the NSCharacterSet? Or, how to add comma to NSCharacterSet.newLineCharacterSet.
Thanks.
Here is janky solution, but still looking for a more elegant one.
var results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: ","))
text = results.joinWithSeparator(" ")
results = text.componentsSeparatedByCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
(one-line) SOLUTION:
var results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: " ,\u{000A}\u{000B}\u{000C}\u{000D}\u{0085}"))
Explanation is below.
You can unite two NSCharacterSet by first using an NSMutableCharacterSet, for example:
let charset = NSMutableCharacterSet(charactersInString: ",")
charset.formUnionWithCharacterSet(NSCharacterSet.newlineCharacterSet())
let results = text.componentsSeparatedByCharactersInSet(charset)
So MartinR brought to my attention that there are more line feeds than just "\n".
I looked at the values used in NSCharacterSet.newlineCharacterSet and added them all, giving me:
var results = text.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: " ,\u{000A}\u{000B}\u{000C}\u{000D}\u{0085}"))
This got rid of all the whitespace, commas, and new lines. Interestingly - when I used all the newline values separately to see if I could figure out which newline was being used in my case, none of them worked. But when used all together, it strips my new lines.

Regexp to find a matching condition in a string

Hi need help in using regexp for condition matching.
ex.my file has the following content
{hello.program='function'`;
bye.program='script'; }
I am trying to use regexp to match the string that has .program='function' in them:
pattern = '[.program]+\=(function)'
also tried pattern='[^\n]*(.hello=function)[^\n]*';
pattern_match = regexp(myfilename,pattern , 'match')
but this returns me pattern_match={} while i expect the result to be hello.program='function'`;
If 'function' comes with string-markers, you need to include these in the match. Also, you need to escape the dot (otherwise, it's considered "any character"). [.program]+ looks for one or several letters contained in the square brackets - but you can just look for program instead. Also, you don't need to escape the =-sign (which is probably what messed up the match).
cst = {'hello.program=''function''';'bye.program=''script'''; };
pat = 'hello\.program=''function''';
out = regexp(cst,pat,'match');
out{1}{1} %# first string from list, first match
hello.program='function'
EDIT
In response to the comment
my file contains
m2 = S.Parameter;
m2.Value = matlabstandard;
m2.Volatility = 'Tunable';
m2.Configurability = 'None';
m2.ReferenceInterfaceFile ='';
m2.DataType = 'auto';
my objective is to find all the lines that match, .DataType='auto'
Here's how you find the matching lines with regexp
%# read the file with textscan into a variable txt
fid = fopen('myFile.m');
txt = textscan(fid,'%s');
fclose(fid);
txt = txt{1};
%# find the match. Allow spaces and equal signs between DataType and 'auto'
match = regexp(txt,'\.DataType[ =]+''auto''','match')
%# match is not empty only if a match was found. Identify the non-empty match
matchedLine = find(~cellfun(#isempty,match));
Try this as it matches .program='function' exactly:
(\.)program='function'
I think this did not work:
'[.program]+\=(function)'
because of how the []'s work. Here is a link explaining why I say that: http://www.regular-expressions.info/charclass.html

Insert double quotes multiple times into string

I have inherited a flat html file with a few hundred lines similar to this:
<blink>
<td class="pagetxt bordercolor="#666666 width="203 colspan="3 height="20>
</blink>
So far I have not been able to work out a sed way of inserting the closing double quotes for each element. Probably needs something other than sed to do this. Can anyone suggest an easy way to do this?
Thanks
sed -i 's/"\([^" >]\+\)\( \|>\)/"\1"\2/g' file.html
Explanation:
" - leading double quote
\([^" >]\+\) - non-quote-or-space-or-'>' chars, grouped (into group 1)
\( \|>\) - terminating space or '>', grouped (into group 2)
We replace it with '"<group1>"<group2>'.
One solution that pops out at me is to parse through each line of the file looking for the quote. When it finds one, activate a flag to keep track of being inside a quoted area, then continue parsing the line until it hits the first space or > it comes to and inserts an additional " just before it. Flip the flag off, then continue through the string looking for the next quote. Probably not a perfect solution, but a start perhaps.
If all lines share the same structure, you could use a simple texteditor to globally replace
' bordercolor'
with
'" bordercolor'
(without single-quotes). This is then independend from the field values and works similarly for the other fields. You still have to do some manual work, but if it's just one big file, I'd bite the bullet this time and not waste probably more time working out a sed-solution.
This should do if your file is simple - it won't work if you have whitespace which should be inside the quotes - in that case, a more complex code will be needed, but can be done along the same lines.
#!usr/bin/env python
#change the "utf-8" bellow to your files encoding
data = open("<myfile.html>").read().decode("utf-8")
new_data = []
inside_tag = False
inside_quotes = False
for char in data:
if char == "<":
inside_tag = True
if char == '"':
inside_quotes = True
if inside_tag and (char.isspace() or char==">") and inside_quotes:
new_data.append('"')
inside_quotes = False
if char == ">":
inside_tag = False
new_data.append(char)
outputfile = open("<mynewfile.html>", "wt")
outputfile.write("".join(new_data).encode("utf-8"))
outputfile.close()
with bash
for file in *
do
flag=0
while read -r line
do
case "$line" in
*"<blink>"*)
flag=1
;;
esac
if [ "$flag" -eq 1 ];then
case "$line" in
*class=\"pagetxt*">" )
line="${line%>}\">"
flag=0
;;
esac
fi
echo "${line}"
done <"file" > temp
mv temp "$file"
done
Regular expressions are your friend:
Find: (="[^" >]+)([ >])
Replace: \1"\2
After you've done that, make sure to run this one too:
Find: </?blink>
Replace: \n
(This won't fix more than one class on an element, like <element class="class1 class2 id="jimmy">)