How to extract string from sentence using sub-string and position function? - substring

I have to extract a value from string and I am working on cognos application that doesn't support regex. It has some built in functions like substring and position
My string is similar to
/content/folder[#name='ab_Salary Reports']/folder[#name='INT Salary Reports']/folder[#name='INT Sal Sche']/jobDefinition[#name='Salary Rep R025']
And I have to extract Salary Rep R025, ie. the last name value.
Static substring will not work because string is variable.

Use the position function to locate the starting and ending point of your target substring. Try
position('/jobDefinition', [pathstring])
combined with substring:
substring( [pathstring], position('/jobDefinition', [pathstring]) + 22, length([pathstring]) - position('/jobDefinition', [pathstring]) + 22)
This will start 22 characters after where it finds /jobDefinition, meaning it will start just past '/jobDefinition[#name='', and will proceed for the remaining length of the string, determined by subtracting the starting point from the full length.
You may need to adjust by +1 or -1 in order to include or exclude your quotes.
Also note that this is using Report Studio functions. The source for Cognos reports is queries on tables, so you may have native functions available depending on your source. For example, most of the reports I work with come out of an Oracle database, so I can use oracle string functions instead of Report Studio functions. They work better, and are processed on the database side rather than on the Cognos Dispatcher, which is always faster.


Postgres regexp_replace: inability to replace source text with first captured group

Using PostgreSQL, I am unable to design the correct regex pattern to achieve the desired output of an SQL statement that uses regexp_replace.
My source text consists of several scattered blocks of text of the form 'PU*' followed by a date string in the form of 'YYYY-MM'--for example, 'PU*2020-11'. These blocks are surrounded by strings of unpredictable, arbitrary text (including other instances of 'PU*' followed by the above date string format, such as 'PU*2017-07), white space, and line feeds.
My desire is to replace the entire source text with the FIRST instance of the 'YYYY-MM' text pattern. In the above example, the desired output would be '2020-11'.
Currently, my search pattern results in the correct replacement text in place of the first capturing group, but unfortunately, all of the text AFTER the first capturing group also inadvertently appears in the output, which is not the desired output.
Version: postgres (PostgreSQL) 13.0
A more complex example of source text:
First line
Exec committee
added by Terranze
My pattern so far:
Current SQL statement:
select regexp_replace('First line\nExec committee; PU*2020-08\nPU*2019-09\nPU*2017-10\n\nadded by Terranze\n', '(\s|\S)*?PU\*(\d{4}-\d{2})(\s|\S*)*', '\2') as _regex;
Current output on
Current output on psql
2020-08\nPU*2019-09--cancelled\nPU*2017-10\n\nadded by Terranze\n
(1 row)
Desired output:
Any help appreciated. Thanks--
How about this expression:

PostgreSQL Trimming Leading and Trailing Characters: = and "

I'm working to build an import tool that utilizes a quoted CSV file. However, several of the fields in the CSV file are reported as such:
Where 38000 is the data I need. The data integration software I use (Talend 6.11) already strips the leading and trailing double quotes for me (so, "38000" becomes 38000), but I can't find a way to get rid of those others.
So, essentially, I need "=""38000""" to become "38000" where the leading "=" is removed and the trailing "" is removed.
Is there a TRIM function that can accomplish this for me? Perhaps there is a method in Talend that can do this?
As the other answer stated, you could do that operation in SQL. Or, you could do it in Java, Groovy, etc, within Talend. However, if there is an existing Talend component which does the job, my preference is to use it. That leads to faster development, potentially less testing, and easier maintenance. Having said that, it is important to review all the components which are available, so you know what's available to you.
You can use the Talend component tReplace, to inspect each of the input columns you want to trim of quotes and equal signs. A single tReplace component can do search and replace operations on multiple input columns. If all the of the replaces are related to each other, I would keep them within a single tReplace. When it gets to the point of doing unrelated replacements, I might place those within a new tReplace so that logical operations are organized and grouped together.
For a given Input Column
search for "=", replace with ""
search for "\"", replace with ""
Something like that:
SELECT format( '"%s"', trim( both '"=' from '"=""38000"""' ) );
-[ RECORD 1 ]---
format | "38000"
1st: trim() function removes all " and = chars. Result is simply 38000
2nd: with format can add double quote back to get wishful end result
Alternatively, can use regexp and other Postgres string functions.
See more:

How to fill a field with spaces until a length in Notepad++

I've prepared a macro in Notepad++ to transform a ldif file in a csv file with a few fields. Everything is OK but I have a final problem: I have to have 2 fields with a specific length and in this moment I cannot ensure that length because in the source file they are not coming so
For instance, I generate this line:
And I have to ensure that the 2nd and 3rd fields have 30 (filling with spaces at right side) and 9 (filling with zeros at left) characters, so in this case I should generate:
12345,namenamename ,000123456
I haven't found how Notepad++ could match a pattern in order to add spaces/zeros, so I have though in to add 1 space/zero to the proper field and repeat this step so many times as needed to ensure the lengths (this is, 29 and 8, because they cannot come empty) and search with the length in the regex (for instance: \d{1,8} for the third field)
My question is: can I repeat only one step of the macro several times (and the rest of the macro only 1 repetition)?
I've read the wiki related to this point ( and I don't found anything neither
If not possible, how could be a good solution? Create another 2 different macros and after execute the main one, execute this new 2 macros several times?
Thanks in advance!
A two pass solution with Notepad++ is possible. Find a pair of characters or two short sequence of characters that never occurs in your data file. I will use =#<= and =>#= here.
First pass, generate or convert the input text into the form 12345,=#<=namenamename______________________________,000000000123456=>#=. Ie add 30 spaces after the name and nine zeroes before the number (underscores used here just to make things clearer).
Second pass, do a regular expression search for =#<=(.{30})_*,0*(\d{9})=>#= and replace with \1,\2.
I have just suggested a similar solution in special timestamp format of csv

zip code + 4 mail merge treated like an arithmetic expression

I'm trying to do a simple mail merge in Word 2010 but when I insert an excel field that's supposed to represent a zip code from Connecticut (ie. 06880) I am having 2 problems:
the leading zero gets suppressed such as 06880 becoming 6880 instead. I know that I can at least toggle field code to make it so it works as {MERGEFIELD ZipCode # 00000} and that at least works.
but here's the real problem I can't seem to figure out:
A zip+4 field such as 06470-5530 gets treated like an arithmetic expression. 6470 - 5530 = 940 so by using above formula instead it becomes 00940 which is wrong.
Perhaps is there something in my excel spreadsheet or an option in Word that I need to set to make this properly work? Please advise, thanks.
See macropod's post in this conversation
As long as the ZIP codes are reaching Word (with or without "-" signs in the 5+4 format ZIPs, his field code should sort things out. However, if you are mixing text and numeric formats in your Excel column, there is a danger that the OLE DB provider or ODBC driver - if that is what you are using to get the data - will treat the column as numeric and return all the text values as 0.
Yes, Word sometimes treats text strings as numeric expressions as you have noticed. It will do that when you try to apply a numeric format, or when you try to do a calculation in an { = } field, when you sum table cell contents in an { = } field, or when Word decides to do a numeric comparison in (say) an { IF } field - in the latter case you can get Word to treat the expression as a string by surrounding the comparands by double-quotes.
in Excel, to force the string data type when entering data that looks like a number, a date, a fraction etc. but is not numeric (zip, phone number, etc.) simply type an apostrophe before the data.
=06470 will be interpreted as a the number 6470 but ='06470 will be the string "06470"
The simplest fix I've found is to save the Excel file as CSV. Word takes it all at face value then.

Excel: searching for a value within multiple arrays within cells

I'm trying to set up an error check between two systems and need to compare week numbers in different formats. One system produces week numbers in a text format e.g "8-15, 18, 31-32" and the other produces discrete values. How would I see whether a value e.g 16 fell within a multiple range like the one above?
It's part of a bigger issue where I'm checking a reference number, day, time and week number (e.g XXX111 Weds 9:00 9) in one system against the output of another system (e.g XXX111 Wed 9:00 7:11, 13, 16, 52-63 or XXX111 Thu 9:00 5, 6, 11-16). Despite lots of searching I've hit a wall with the bit above so any help would be greatly appreciated.
I'd rather not use VBA if possible. Thanks in advance for your wisdom.
7:11 should be 7-11
63 should be 53
A number not part of a range (eg 18) is not a problem
Ranges are in Text format
I hope the following helps or at least is ‘a step in the right direction’:
A Parse the components
Eg for 8-15, 18, 31-32, paste into a cell (say A1) and Data > Data Tools - Text to Columns > Delimited > Next > check Comma, Space and Treat consecutive delimiters as one > Next > Select Columns as required > select Text for each > Finish
May be easier to deal with a single column so select data, Copy > Select A2 > Paste Special > Transpose > OK and Delete contents of Row1.
B Add your search value (16) into B1
C Copy the formula below into B2 and copy down as required:
The result should be TRUE where the search value is within or on either bound of the discrete range:
The formula uses the hyphen to ‘recognise’ a discrete range. SEARCH looks for where it is positioned (because there could be one or two characters either side of it). LEFT and RIGHT are for the lower and upper bounds (in the case of RIGHT used in conjunction with LEN to address whether the upper bound is one or two characters). VALUE is required to convert the Text into something that can be equated to the search value. AND is for the process to consider both bounds in determining whether ‘in range’.
“I’d rather not use VBA if possible” – but might be advisable!
However, use of some fixed references ($) should make it a little easier than otherwise with standard formulae only because the given discrete ranges (which may be appended in ColumnA) can be queried for various search values by copying the formulae across to the right/down as required and entering (as Number format) further search values in Row1.