Need to Replace in string with pentaho - pentaho-spoon

I have to replace a part of strings in fields rows. Perhaps if anybody have some knowledge about using the replace in string with pentaho.
string to replace:

Use Replace in String step. Type the String you want to search in the Search field and in the Replace with field, you have to type the String that will replace the existing String you searched.
You can always browse to ../Pentaho/design-tools/data-integration/samples/transformations for the examples.

An alternative to "Replace in string":
If you want to replace a part of a String, something you could do is use the "Strings cut" step to cut a part of a string which you want to replace, replace the value by whatever you want and then "Concat fields" to put strings together again.

Related

How do I remove the actual decimal from a numeric field that I'm converting to text? ex: 125.02 needs to be 12502

I'm creating an OCR line for our remits that our scanner will read. The scanner doesn't allow the '.' in the field - it assumes the last 2 digits are the decimal place values. I'm converting the field to to text but not sure how to remove the '.' and keep the decimal place values.
The most simple solution would be to create a Formula Field and use the Replace() function. The formula for your Formula Field would look like this:
StringVar myVariable;
myVariable := Replace({table.column}, ".", "");
myVariable;
This will search {table.column} for the first occurrence of a decimal and replace it with an empty string.
However, if your intent is to barcode the value, there may be a UFL available that could also do this for you. When creating barcodes, User Function Libraries are usually preferred because they have functions specifically designed to encode your barcode values. They aren't required though and you can always choose to manually encode barcode values manually with Formula Fields.

How can I use tsvector on a string with numbers?

I would like to use a postgres tsquery on a column that has strings that all contain numbers, like this:
FRUIT-239476234
If I try to make a tsquery out of this:
select to_tsquery('FRUIT-239476234');
What I get is:
'fruit' & '-239476234'
I want to be able to search by just the numeric portion of this value like so:
239476234
It seems that it is unable to match this because it is interpreting my hyphen as a "negative sign" and doesn't think 239476234 matches -239476234. How can I tell postgres to treat all of my characters as text and not try to be smart about numbers and hyphens?
An answer from the future. Once version 13 of PostgreSQL is released, you will be able to do use the dict_int module to do this.
create extension dict_int ;
ALTER TEXT SEARCH DICTIONARY intdict (MAXLEN = 100, ABSVAL=true);
ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR int WITH intdict;
select to_tsquery('FRUIT-239476234');
to_tsquery
-----------------------
'fruit' & '239476234'
But you would probably be better off creating your own TEXT SEARCH DICTIONARY as well as copying the 'english' CONFIGURATION and modifying the copy, rather than modifying the default ones in place. Otherwise you have the risk that upgrading will silently lose your changes.
If you don't want to wait for v13, you could back-patch this change and compile into your own version of the extension for a prior server.
This is done by the text search parser, which is not configurable (short of writing your own parser in C, which is supported).
The simplest solution is to pre-process all search strings by replacing - with a space.

Docvariable with empty string value

In word, I'm using docvariables to manage pluralization.
A VBA macro is changing the value of several docvariables to pluralize / singularize them.
But sometimes I want to use a Docvariable only for enable/disable a 's' suffix.
Problem: I cannot set it to empty string, because it deletes the docvariable.
The field displays an error in word.
So I'm searching a way to achieve this, it could be :
A way to keep a Docvariable existing, with empty string or equivalent value
A field formula which make this job if the variable doesn't exist
Any other workaround would be appreciated.
Thank you
A Document Variable (used in DocVariable field codes) cannot exist if it has no content.
A possibility would be to also store the space in this DocVariable so that it display s[space] or just [space].
Otherwise you may need to write this information to a Bookmark (possibly using a Set field) and display the content using a Ref field.

zip code + 4 mail merge treated like an arithmetic expression

I'm trying to do a simple mail merge in Word 2010 but when I insert an excel field that's supposed to represent a zip code from Connecticut (ie. 06880) I am having 2 problems:
the leading zero gets suppressed such as 06880 becoming 6880 instead. I know that I can at least toggle field code to make it so it works as {MERGEFIELD ZipCode # 00000} and that at least works.
but here's the real problem I can't seem to figure out:
A zip+4 field such as 06470-5530 gets treated like an arithmetic expression. 6470 - 5530 = 940 so by using above formula instead it becomes 00940 which is wrong.
Perhaps is there something in my excel spreadsheet or an option in Word that I need to set to make this properly work? Please advise, thanks.
See macropod's post in this conversation
As long as the ZIP codes are reaching Word (with or without "-" signs in the 5+4 format ZIPs, his field code should sort things out. However, if you are mixing text and numeric formats in your Excel column, there is a danger that the OLE DB provider or ODBC driver - if that is what you are using to get the data - will treat the column as numeric and return all the text values as 0.
Yes, Word sometimes treats text strings as numeric expressions as you have noticed. It will do that when you try to apply a numeric format, or when you try to do a calculation in an { = } field, when you sum table cell contents in an { = } field, or when Word decides to do a numeric comparison in (say) an { IF } field - in the latter case you can get Word to treat the expression as a string by surrounding the comparands by double-quotes.
in Excel, to force the string data type when entering data that looks like a number, a date, a fraction etc. but is not numeric (zip, phone number, etc.) simply type an apostrophe before the data.
=06470 will be interpreted as a the number 6470 but ='06470 will be the string "06470"
The simplest fix I've found is to save the Excel file as CSV. Word takes it all at face value then.

String to Unicode in C#

I want to use Unicode in my code. My Unicode value is 0100 and I am adding my Unicode string \u with my value. When I use string myVal="\u0100", it's working, but when I use like below, it's not working. The value is looking like "\\u1000";. How do I resolve this?
I want to use it like the below one, because the Unicode value may vary sometimes.
string uStr=#"\u" + "0100";
There are a couple of problems here. One is that #"\u" is actually the literal string "\u" (can also be represented as "\u").
The other issue is that you cannot construct a string in the way you describe because "\u" is not a valid string by itself. The compiler is expecting a value to follow "\u" (like "\u0100") to determine what the encoded value is supposed to be.
You need to keep in mind that strings in .NET are immutable, which means that when you look at what is going on behind the scenes with your concatenated example (`#"\u"+"0100"), this is what is actually happening:
Create the string "\u"
Create the string "0100"
Create the string "\u0100"
So you have three strings in memory. In order for that to happen all of the strings must be valid.
The first option that comes to mind for handling those values is to actually parse them as integers, and then convert them to characters. Something like:
var unicodeValue = (char)int.Parse("0100",System.Globalization.NumberStyles.AllowHexSpecifier);
Which will give you the single Unicode character value. From there, you can add it to a string, or convert it to a string using ToString().