How to do preg_replace on a long string - preg-replace

I want to be able to find and replace a long line javascript code. The code has a lot / and \ in it too.
Is this even possible?

You can modify the limit manually so PHP will allow you to handle very long strings.
Put the following line somewhere before calling preg_replace.
ini_set('pcre.backtrack_limit', 99999999999);
Even better, if can modify your php.ini file, you can change the value of pcre.backtrack_limit from there so the new limit will be globally available.

It depends how long - there is an upper length limit (see http://nz.php.net/manual/en/function.preg-last-error.php for how to detect if you reach it).
You can escape variables going into your pattern with preg_quote if you need to, which takes care of the / and \ characters.

PHP string functions have size limit and sadly those limits are not specified...you will have to divide the whole sting into chunk of smaller strings ....then run preg_replace
on each of the string..then combine those strings together..that is what I did.

Related

Removing spaces from a string using Powershell

I have an issue where extracting data from database it sometimes (quite often) adds spaces in between strings of texts that should not be there.
What I'm trying to do is create a small script that will look at these strings and remove the spaces.
The problem is that the spaces can be in any position in the string, and the string is a variable that changes.
Example:
"StaffID": "0000 25" <- The space in the number should not be there.
Is there a way to have the script look at this particular line, and if it finds spaces, to remove them.
Or:"DateOfBirth": "23-10-199 0" <-It would also need to look at these spaces and remove them.
The problem is that the same data also has lines such as:
"Address": " 91 Broad street" <- The spaces should be here obviously.
I've tried using TRIM, but that only removes spaces from start/end.
Worth mentioning that the data extracted is in json format and is then imported using API into the new system.
You should think about the logic of what you want to do, and whether or not it's programmatically possible to determine if you can teach your script where it is or is not appropriate to put spaces. As it is, this is one of the biggest problems facing AI research right now, so unfortunately you're probably going to have to do this by hand.
If it were me, I'd specify the kind of data format that I expect from each column, and try my best to attempt to parse those strings. For example, if you know that StaffID doesn't contain spaces, you can have a rule that just deletes them:
$staffid = $staffid.replace("\s+",'')
There are some more complicated things that you can do with forced formatting (.replace) that have already been covered in this answer, but again, that requires some expectation of exactly what data is going to come out of what column.
You might want to look more closely at where those spaces are coming from, rather than process the output like this. Is the retrieval script doing it? Maybe you can optimize the database that you're drawing from?

PostgreSQL Trimming Leading and Trailing Characters: = and "

I'm working to build an import tool that utilizes a quoted CSV file. However, several of the fields in the CSV file are reported as such:
"=""38000"""
Where 38000 is the data I need. The data integration software I use (Talend 6.11) already strips the leading and trailing double quotes for me (so, "38000" becomes 38000), but I can't find a way to get rid of those others.
So, essentially, I need "=""38000""" to become "38000" where the leading "=" is removed and the trailing "" is removed.
Is there a TRIM function that can accomplish this for me? Perhaps there is a method in Talend that can do this?
As the other answer stated, you could do that operation in SQL. Or, you could do it in Java, Groovy, etc, within Talend. However, if there is an existing Talend component which does the job, my preference is to use it. That leads to faster development, potentially less testing, and easier maintenance. Having said that, it is important to review all the components which are available, so you know what's available to you.
You can use the Talend component tReplace, to inspect each of the input columns you want to trim of quotes and equal signs. A single tReplace component can do search and replace operations on multiple input columns. If all the of the replaces are related to each other, I would keep them within a single tReplace. When it gets to the point of doing unrelated replacements, I might place those within a new tReplace so that logical operations are organized and grouped together.
tReplace
For a given Input Column
search for "=", replace with ""
search for "\"", replace with ""
Something like that:
SELECT format( '"%s"', trim( both '"=' from '"=""38000"""' ) );
-[ RECORD 1 ]---
format | "38000"
1st: trim() function removes all " and = chars. Result is simply 38000
2nd: with format can add double quote back to get wishful end result
Alternatively, can use regexp and other Postgres string functions.
See more:
https://www.postgresql.org/docs/current/static/functions-string.html

Ogg metadata - Vorbis Comment end

I want to implement a class to read vorbis comments. I know that a field will start with a field name, followed by an equal sign and the value. But how does it end? Documentation makes me think that a semicolon will end the field but I checked an ogg file with a hex editor and I cannot see any.
This is how I think it should look like in a file :
TITLE=MY SUPER TITLE;
The field name is title, followed by the equals sign and then the value is MY SUPER TITLE. And finally the semicolon to end the field.
But instead inside my file, the fields look like this :
TITLE=MY SUPER TITLE....
It's almost as above but there is no semicolon. The .'s are characters that cannot be displayed. I thought okay, it seems like the dots represent a value that will say "this is the end of the field!!" but they are almost always different. I noticed that there are always exactly 4 dots. The first dot has always a different value. The other free have usually a value of 0. But not always...
My question now, how does a field end? How do I read this comment?
Also, yeah I know that there are libraries and that I should use them instead of reinventing the wheel over and over again. I will use libraries later but first I want to know how to do it myself. Educational purpose only.
Each field is preceded by a little-endian 32-bit integer that indicates the number of bytes to read. You then convert the bytes to a string via UTF8.
See NVorbis' implementation (LoadComments(...)) for details.

Regular Expression for number.(space), objective-c

I have an NSArray of lines (objective-c iphone), and I'm trying to find the line which starts with a number, followed by a dot and a space, but can have any number of spaces (including none) before it, and have any text following it eg:
1. random text
2. text random
3.
what regular expression would I use to get this? (I'm trying to learn it, and I needed the above expression anyway, so I thought I'd use it as an example)
With C#:
#"^ *[0-9]+\. "
It doesn't check for the presence of something after the ., so this is legal:
1.(space)
If you delete the # and escape the \ it should work with other languages (it is pretty "down-to-earth" as RegExpes go)
I may suggest (Perl-compatible regexp):
^\s*\d+\.\s
At the beginning of a line:
Any number (0-n) of spaces
One or more digits
A dot
A space
Something like
^\s*\d+\.
But it depends on the language.
/^\s*[0-9]+\.\s+/
would be my guess providing you don't have any space before the number

How to replace variables in an html file which is feeded to an UIWebView?

In my app, I want to replace some variables inside an html file when it goes to an UIWebView. The file is local. For example, I add a variable $userName$ and then I would replace that with #"John".
Would I have to read the file in as string, and then perform some string replacement action?
Yes, that should work... take a look at the replaceOccurrencesOfString:withString:options:range: method of NSMutableString. If there is much content and the position of your variable is not known exactly but is to be expected within known limits (for example within the first 100 characters), you might want to provide a range to reduce the overhead from comparing the whole content to the term you're about to replace. If the position of the variable is fixed, you could use replaceCharactersInRange:withString: instead. This would eliminate the performance penalty of comparing every part of the string to the string you're about to replace completely.