Split: A subscript must be between 1 and the size of the array - crystal-reports

I have a super simple formula. The problem is that sometimes the data doesn't have a second value, or sometimes the value is blank.
Split ({PO_RECEIVE.VENDOR_LOT_ID}," ")[2]
ID
111 222
123
123 222
I was thinking if I could come up with some logic to figure out whether the string has multiple value's it would solve my problem, but haven't quiet found what I'm looking for:
If {PO_RECEIVE.VENDOR_LOT_ID} = SingleOrBlankString then
{PO_RECEIVE.VENDOR_LOT_ID} else
Split ({PO_RECEIVE.VENDOR_LOT_ID}," ")[2]
Better Example Data:
3011111*42011111111
2711 00291111111
711111//12111111111
/J1111 69111111111
170111

If the string can contain a maximum of two values, separated by a space, then you can check if the string contains a space using the InStr function:
If InStr({PO_RECEIVE.VENDOR_LOT_ID}, " ") > 0 Then
{PO_RECEIVE.VENDOR_LOT_ID}
Else
Split ({PO_RECEIVE.VENDOR_LOT_ID}," ")[2]
If there can be multiple spaces between the parts you can use following formulas to get the values:
Left part:
This function returns the left part of the string until the first space.
If InStr({PO_RECEIVE.VENDOR_LOT_ID}, " ") > 0 Then
Left({PO_RECEIVE.VENDOR_LOT_ID}, InStr({PO_RECEIVE.VENDOR_LOT_ID}, " "))
Right part:
This function returns the right part of the string after the last space.
The InStrRev-function returns the position of the last space because it searches the string backwards.
The Len-function returns the length of the string.
[length] - [position of last space] = [length of the right part]
If InStr({PO_RECEIVE.VENDOR_LOT_ID}, " ") > 0 Then
Right({PO_RECEIVE.VENDOR_LOT_ID}, Len({PO_RECEIVE.VENDOR_LOT_ID}) - InStrRev(testString, " "))

Related

How to strip extra spaces when writing from dataframe to csv

Read in multiple sheets (6) from an xlsx file and created individual dataframes. Want to write each one out to a pipe delimited csv.
ind_dim.to_csv (r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
Currently outputs like this:
1|value1 |value2 |word1 word2 word3 etc.
Want to strip trailing blanks
Suggestion
Include the method .apply(lambda x: x.str.rstrip()) to your output string (prior to the .to_csv() call) to strip the right trailing blank from each field across the DataFrame. It would look like:
Change:
ind_dim.to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
To:
ind_dim.apply(lambda x: x.str.rstrip()).to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
It can be easily inserted to the output code string using '.' referencing. To handle multiple data types, we can enforce the 'object' dtype on import by including the argument dtype='str':
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
Or on the DataFrame itself by:
df = pd.DataFrame(df, dtype='str')
Proof
I did a mock-up where the .xlsx document has 5 sheets, with each sheet having three columns: The first column with all numbers except an empty cell in row 2; the second column with both a leading blank and a trailing blank on strings, an empty cell in row 3, and a number in row 4; and the third column * with all strings having a leading blank, and an empty value in row 4*. Integer indexes and integer columns have been included. The text in each sheet is:
0 1 2
0 11111 valueB1 valueC1
1 valueB2 valueC2
2 33333 valueC3
3 44444 44444
4 55555 valueB5 valueC5
This code reads in our .xlsx testing_xlsx_dtype.xlsx to the DataFrame dictionary ind_dim.
Next, it loops through each sheet using a for loop to place the sheet name variable as a key to reference the individual sheet DataFrame. It applies the .str.rstrip() method to the entire sheet/DataFrame by passing the lambda x: x.str.rstrip() lambda function to the .apply() method called on the sheet/DataFrame.
Finally, it outputs the sheet/DataFrame as a .csv with the pipe delimiter using .to_csv() as seen in the OP post.
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for sheet in ind_dim:
ind_dim[sheet].apply(lambda x: x.str.rstrip()).to_csv(sheet + '_ind_dim_out.csv', sep='|')
Returns:
|0|1|2
0|11111| valueB1| valueC1
1|| valueB2| valueC2
2|33333|| valueC3
3|44444|44444|
4|55555| valueB5| valueC5
(Note our column 2 strings no longer have the trailing space).
We can also reference each sheet using a loop that cycles through the dictionary items; the syntax would look like for k, v in dict.items() where k and v are the key and value:
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for k, v in ind_dim.items():
v.apply(lambda x: x.str.rstrip()).to_csv(k + '_ind_dim_out.csv', sep='|')
Notes:
We'll still need to apply the correct arguments for selecting/ignoring indexes and columns with the header= and names= parameters as needed. For these examples I just passed =None for simplicity.
The other methods that strip leading and leading & trailing spaces are: .str.lstrip() and .str.strip() respectively. They can also be applied to an entire DataFrame using the .apply(lambda x: x.str.strip()) lambda function passed to the .apply() method called on the DataFrame.
Only 1 Column: If we only wanted to strip from one column, we can call the .str methods directly on the column itself. For example, to strip leading & trailing spaces from a column named column2 in DataFrame df we would write: df.column2.str.strip().
Data types not string: When importing our data, pandas will assume data types for columns with a similar data type. We can override this by passing dtype='str' to the pd.read_excel() call when importing.
pandas 1.0.1 documentation (04/30/2020) on pandas.read_excel:
"dtypeType name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion."
We can pass the argument dtype='str' when importing with pd.read_excel.() (as seen above). If we want to enforce a single data type on a DataFrame we are working with, we can set it equal to itself and pass it to pd.DataFrame() with the argument dtype='str like: df = pd.DataFrame(df, dtype='str')
Hope it helps!
The following trims left and right spaces fairly easily:
if (!require(dplyr)) {
install.packages("dplyr")
}
library(dplyr)
if (!require(stringr)) {
install.packages("stringr")
}
library(stringr)
setwd("~/wherever/you/need/to/get/data")
outputWithSpaces <- read.csv("CSVSpace.csv", header = FALSE)
print(head(outputWithSpaces), quote=TRUE)
#str_trim(string, side = c("both", "left", "right"))
outputWithoutSpaces <- outputWithSpaces %>% mutate_all(str_trim)
print(head(outputWithoutSpaces), quote=TRUE)
Starting Data:
V1 V2 V3 V4
1 "Something is interesting. " "This is also Interesting. " "Not " "Intereting "
2 " Something with leading space" " Leading" " Spaces with many words." " More."
3 " Leading and training Space. " " More " " Leading and trailing. " " Spaces. "
Resulting:
V1 V2 V3 V4
1 "Something is interesting." "This is also Interesting." "Not" "Intereting"
2 "Something with leading space" "Leading" "Spaces with many words." "More."
3 "Leading and training Space." "More" "Leading and trailing." "Spaces."

How do I find letters in words that are part of a string and remove them? (List comprehensions with if statements)

I'm trying to remove vowels from a string. Specifically, remove vowels from words that have more than 4 letters.
Here's my thought process:
(1) First, split the string into an array.
(2) Then, loop through the array and identify words that are more than 4 letters.
(3) Third, replace vowels with "".
(4) Lastly, join the array back into a string.
Problem: I don't think the code is looping through the array.
Can anyone find a solution?
def abbreviate_sentence(sent):
split_string = sent.split()
for word in split_string:
if len(word) > 4:
abbrev = word.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"
I just figured out that the "abbrev = words.replace..." line was incomplete.
I changed it to:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "") if len(words) > 4 else words for words in split_string]
I found the part of the solution here: Find and replace string values in list.
It is called a List Comprehension.
I also found List Comprehension with If Statement
The new lines of code look like:
def abbreviate_sentence(sent):
split_string = sent.split()
for words in split_string:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
if len(words) > 4 else words for words in split_string]
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"

String to Integer (atoi) [Leetcode] gave wrong answer?

String to Integer (atoi)
This problem is implement atoi to convert a string to an integer.
When test input = " +0 123"
My code return = 123
But why expected answer = 0?
======================
And if test input = " +0123"
My code return = 123
Now expected answer = 123
So is that answer wrong?
I think this is expected result as it said
Requirements for atoi:
The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value.
Your first test case has a space in between two different digit groups, and atoi only consider the first group which is '0' and convert into integer

Returning the string between the 5th and 6th Spaces in a String

I have a column of strings that look like this:
Target Host: dcmxxxxxxc032.erc.nam.fm.com Target Name:
dxxxxxxgsc047.erc.nam.fm.com Filesystem /u01 has 4.98% available space
- fallen below warning (20) or critical (5) threshold.
The column name is [Description]
The substring I would like returned is (dxxxxxxgsc047.erc.nam.fm.com)
The only consistency in this data is that the desired string occurs between the 5th and 6th occurrences of spaces " " in the string, and after the phrase "Target Name: " The length of the substring varies, but it always ends in another " ", hence my attempt to grab the substring between the 5th and 6th spaces.
I have tried
MID([Description],((FIND([Description],"Target Name: "))+13),FIND([Description]," ",((FIND([Description],"Target Name"))+14)))
But that does not work.
(Edit: We use Tableau 8.2, the Tableau 9 only functions can't be part of the solution, thanks though!)
Thank you in advance for your help.
In Tableau 9 you can use regular expressions in formulas, it makes the task simpler:
REGEXP_EXTRACT([Description], "Target Name: (.*?) ")
Alternatively in Tableau 9 you can use the new FINDNTH function:
MID(
[Description],
FINDNTH([Description]," ", 5) + 1,
FINDNTH([Description]," ", 6) - FINDNTH([Description]," ", 5) - 1
)
Prior to Tableau 9 you'd have to use string manipulation methods similar to what you've tried, just need to be very careful with arithmetic and providing the right arguments (the third argument in MID is length, not index of the end character, so we need to subtract the index of the start character):
MID(
[Description]
, FIND([Description], "Target Name:") + 13
, FIND([Description], " ", FIND([Description], "Target Name:") + 15)
- (FIND([Description], "Target Name:") + 13)
)
Well, you need to find "Target name: " and then the " " after it, not so hard. I'll split in 3 fields just to be more clear (you can mix everything in a single field). BTW, you were in the right direction, but the last field on MID() should be the string length, not the char position
[start]:
FIND([Description],"Target name: ")+13
[end]:
FIND([Description]," ",[start])
And finally what you need:
MID([Description],[start]+1,[end]-[start]-1)
This should do. If you want to pursue the 5th and 6th " " approach, I would recommend you to find each of the " " until the 6th.
[1st]:
FIND([Description], " ")
[2nd]:
FIND([Description], " ",[1st] + 1)
And so on. Then:
MID([Description],[5th]+1,[6th]-[5th]-1)
A simple solution -
SPLIT( [Description], " ", 3 )
This returns a substring from the Description string, using the space delimiter character to divide the string into a sequence of tokens.
The string is interpreted as an alternating sequence of delimiters and
tokens. So for the string abc-defgh-i-jkl, where the delimiter
character is ‘-‘, the tokens are abc, defgh, i and jlk. Think of these
as tokens 1 through 4. SPLIT returns the token corresponding to the
token number. When the token number is positive, tokens are counted
starting from the left end of the string; when the token number is
negative, tokens are counted starting from the right. -
Tableau String Functions
I don't know Tableau, but perhaps something like this?
MID(
MID([Description], FIND([Description],"Target Name: ") + 13, 50),
1,
FIND(MID([Description], FIND([Description],"Target Name: ") + 13, 50), " ")
)

Format a variable in iReport in a string with multiple fields

I have a text field that has the following expression:
$F{casNo} + " Total " + $P{chosenUom} + ": " + $V{total_COUNT}
casNo is a string, chosenUom is a string. total_COUNT is a sum variable of doubles. The total_COUNT variable displays, but it's got 8 - 10 decimal places (1.34324255234), all I need is something along the lines of 1.34.
Here's what I tried already:
$F{casNo} + " Total " + $P{chosenUom} + ": " + new DecimalFormat("0.00").format($V{total_COUNT}).toString()
Any help would be appreciated
For now I'm just doing basic math, but I'm hoping for a real solution, not a workaround
((int)($V{total_COUNT}*100.0))/100.0
You can format the in lline numbers by using:
new DecimalFormat("###0.00").format(YOUR NUMBER)
You might split the text field into two, one containing everything but the $V{total_COUNT}, and the second containing only $V{total_COUNT}, but with the Pattern property set to something like "#0.00".
You'd have to get a bit creative with layout, though, to prevent unwanted word-wrapping and spacing; for example, first text field could be wide and right-aligned, while text field containing the count could be left-aligned and wide enough to accommodate the formatted number.