select part of a file name with regex - regex-group

i have a file u01/appl/wandl/import/AT/file.csv. i need to get the AT part only and store it in a variable using regex.it is always a 2 digit code but it could be BE UK etc etc.

\/([A-Z]{2})\/
That regex find exactly two big letters with / / around so it finds things like /AT/ or /UK/ or /BE/
As you can see parentheses are around 2 digit code so only code will be captures by grouping

Related

selecting cases based upon first few characters in spss?

i want to select cases with particular first 3 characters.
for example cases with first 3 characters containing "I22".
the length of whole value can vary. e,g "I228" or "I2279" but they have common first three characters "I22"
i usually use compute variable_name= "I228".
but this is tedious as i have to enter all variation of "I22" e.g "I228", "I229" and so on..
it would be much easier if i can just select cases based upon same first 3 characters
you can use the char.cubstr function to find out what the first three characters are in your string variable. For example:
if char.substr(variable_name,1,3)="I22" keep_this=1.
or:
select cases if char.substr(variable_name,1,3)="I22".

officejs : Search Word document using regular expression

I want to search strings like "number 1" or "number 152" or "number 36985".
In all above strings "number " will be constant but digits will change and can have any length.
I tried Search option using wildcard but it doesn't seem to work.
basic regEx operators like + seem to not work.
I tried 'number*[1-9]*' and 'number*[1-9]+' but no luck.
This regular expression only selects upto one digit. e.g. If the string is 'number 12345' it only matches number 12345 (the part which is in bold).
Does anyone know how to do this?
Word doesn't use regular expressions in its search (Find) functionality. It has its own set of wildcard rules. These are very similar to RegEx, but not identical and not as powerful.
Using Word's wildcards, the search text below locates the examples given in the question. (Note that the semicolon separator in 1;100 may be soemthing else, depending on the list separator set in Windows (or on the Mac). My European locale uses a semicolon; the United States would use a comma, for example.
"number [0-9]{1;100}"
The 100 is an arbitrary number I chose for the maximum number of repeats of the search term just before it. Depending on how long you expect a number to be, this can be much smaller...
The logic of the search text is: number is a literal; the valid range of characters following the literal are 0 through 9; there may be one to one hundred of these characters - anything in that range is a match.
The only way RegEx can be used in Word is to extract a string and run the search on the string. But this dissociates the string from the document, meaning Word-specific content (formatting, fields, etc.) will be lost.
Try putting < and > on the ends of your search string to indicate the beginning and ending of the desired strings. This works for me: '<number [1-9]*>'. So does '<number [1-9]#>' which is probably what you want. Note that in Word wildcards the # is used where + is used in other RegEx systems.

How to fill a field with spaces until a length in Notepad++

I've prepared a macro in Notepad++ to transform a ldif file in a csv file with a few fields. Everything is OK but I have a final problem: I have to have 2 fields with a specific length and in this moment I cannot ensure that length because in the source file they are not coming so
For instance, I generate this line:
12345,namenamename,123456
And I have to ensure that the 2nd and 3rd fields have 30 (filling with spaces at right side) and 9 (filling with zeros at left) characters, so in this case I should generate:
12345,namenamename ,000123456
I haven't found how Notepad++ could match a pattern in order to add spaces/zeros, so I have though in to add 1 space/zero to the proper field and repeat this step so many times as needed to ensure the lengths (this is, 29 and 8, because they cannot come empty) and search with the length in the regex (for instance: \d{1,8} for the third field)
My question is: can I repeat only one step of the macro several times (and the rest of the macro only 1 repetition)?
I've read the wiki related to this point (http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Editing_Configuration_Files#.3CMacros.3E) and I don't found anything neither
If not possible, how could be a good solution? Create another 2 different macros and after execute the main one, execute this new 2 macros several times?
Thanks in advance!
A two pass solution with Notepad++ is possible. Find a pair of characters or two short sequence of characters that never occurs in your data file. I will use =#<= and =>#= here.
First pass, generate or convert the input text into the form 12345,=#<=namenamename______________________________,000000000123456=>#=. Ie add 30 spaces after the name and nine zeroes before the number (underscores used here just to make things clearer).
Second pass, do a regular expression search for =#<=(.{30})_*,0*(\d{9})=>#= and replace with \1,\2.
I have just suggested a similar solution in special timestamp format of csv

Strip excess padding from a string

I asked a question earlier today and got a really quick answer from llbrink. I really should have asked that question before I spent several hours trying to find an answer.
So - here's another question that I have never found an answer for (although I have created a work-around which seems very cludgy).
My AHK program asks the user for a login name. The program then compares the login name with an existing list of names in a file.
The login name in the file may contain spaces, but there are never spaces at the beginning of the name. When the user enters the name, he may include spaces at the beginning. This means that when my program compares the name with those in the file, it can not find a match (because of the extra spaces).
I want to find a way of stripping the spaces from the beginning of the input.
My work-round has been to split the input string into an array (which does ignore leading spaces) and then use the first element of the array. This is my code :
name := DoStrip(name)
DoStrip(xyz) ; strip leading and trailing spaces from string
{
StringSplit, out, xyz, `,, %A_Space%
Return out1
}
This seems to be a very laboured way to do it - is there a better way ?
I don't see a problem with your example if it works on all cases.
There is a much simpler way; just use Autotrim which works like this.
AutoTrim, On ; not required it is on by default
my_variable = %my_variable%
There are also many other different ways to trim string in autohotkey,
which you can combine into something useful.
You can also use #LTrim and #RTrim to remove white spaces at the beginning and at the end of the string.

Removing a trailing Space from Regex Matched group

I'm using regular expression lib icucore via RegKit on the iPhone to
replace a pattern in a large string.
The Pattern i'm looking for looks some thing like this
| hello world (P1)|
I'm matching this pattern with the following regular expression
\|((\w*|.| )+)\((\w\d+)\)\|
This transforms the input string into 3 groups when a match is found, of which group 1(string) and group 3(string in parentheses) are of interest to me.
I'm converting these formated strings into html links so the above would be transformed into
Hello world
My problem is the trailing space in the third group. Which when the link is highlighted and underlined, results with the line extending beyond the printed characters.
While i know i could extract all the matches and process them manually, using the search and replace feature of the icu lib is a much cleaner solution, and i would rather not do that as a result.
Many thanks as always
Would the following work as an alternate regular expression?
\|((\w*|.| )+)\s+\((\w\d+)\)\| Where inserting the extra \s+ pulls the space outside the 1st grouping.
Though, given your example & regex, I'm not sure why you don't just do:
\|(.+)\s+\((\w\d+)\)\|
Which will have the same effect. However, both your original regex and my simpler one would both fail, however on:
| hello world (P1)| and on the same line | howdy world (P1)|
where it would roll it up into 1 match.
\|\s*([\w ,.-]+)\s+\((\w\d+)\)\|
will put the trailing space(s) outside the capturing group. This will of course only work if there always is a space. Can you guarantee that?
If not, use
\|\s*([\w ,.-]+(?<!\s))\s*\((\w\d+)\)\|
This uses a lookbehind assertion to make sure the capturing group ends in a non-space character.