Special characters other than delimiter in DelimitedLineTokenizer of FlatFileItemReader - spring-batch

I am trying to read unl file in Spring batch.
Use FlatFileItemReader and delimiter is "|".
001-A472468827" |N|100| The delimiter does not work when encountering this data.
Data cannot be divided by the delimiter if it contains " and spaces or if it contains the # character.
quoteCharacter doesn't seem to work.
In this situation, is there a way to import special characters such as " and # as they are?
#Bean
#StepScope
public FlatFileItemReader unlFileReader() throws MalformedURLException {
return new FlatFileItemReaderBuilder<ExampleDTO>()
.name("unlFileReader")
/*.encoding(StandardCharsets.UTF_8.name())*/
.resource(fileService.inputFileResource(UNZIP_PATH + "example.unl"))
.fieldSetMapper(new BeanWrapperFieldSetMapper<>())
.targetType(ExampleDTO.class)
.delimited().delimiter("|")
.quoteCharacter('#')
.quoteCharacter('"')
.quoteCharacter(DelimitedLineTokenizer.DEFAULT_QUOTE_CHARACTER)
.includedFields(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141
)
.names(ExampleDTO.getFieldNameArrays())
.build();
}
In this situation, is there a way to import special characters such as " and # as they are?

You are calling quoteCharacter() several times, note that this overrides the previous value and does not add the quote character to a list of quote characters. Only one quote character will be used (the last one added if you chain such calls).
Data cannot be divided by the delimiter if it contains " and spaces or if it contains the # character
This is because " is the default quote character. If the input contains a single ", you need to specify another delimiter (otherwise Spring Batch considers that as a "bug" in your data, which is true as the field is not correctly quoted). Here is a quick test that passes:
#Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827\"|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827\"", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions.assertEquals("100", fieldSet.readString(2));
}
This test shows that the " is part of the first field. The same test passes with a # in the input:
#Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827#|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827#", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions.assertEquals("100", fieldSet.readString(2));
}

Related

How to cut a string from the end in UIPATH

I have this string: "C:\Procesos\rrhh\CorteDocumentos\Cortados\10001662-1_20060301_29_1_20190301.pdf" and im trying to get this part : "20190301". The problem is the lenght is not always the same. It would be:
"9001662-1_20060301_4_1_20190301".
I've tried this: item.ToString.Substring(66,8), but it doesn't work sometimes.
What can I do?.
This is a code example of what I said in my comment.
Sub Main()
Dim strFileName As String = ""
Dim di As New DirectoryInfo("C:\Users\Maniac\Desktop\test")
Dim aryFi As FileInfo() = di.GetFiles("*.pdf")
Dim fi As FileInfo
For Each fi In aryFi
Dim arrname() As String
arrname = Split(Path.GetFileNameWithoutExtension(fi.Name), "_")
strFileName = arrname(arrname.Count - 1)
Console.WriteLine(strFileName)
Next
End Sub
You could achieve this using a simple regular expressions, which has the added benefit of including pattern validation.
If you need to get exactly eight numbers from the end of file name (and after an underscore), you can use this pattern:
_(\d{8})\.pdf
And then this VB.NET line:
Regex.Match(fileName, "_(\d{8})\.pdf").Groups(1).Value
It's important to mention that Regex is by default case sensitive, so to prevent from being in a situations where "pdf" is matched and "PDF" is not, the patter can be adjusted like this:
(?i)_(\d{8})\.pdf
You can than use it directly in any expression window:
PS: You should also ensure that System.Text.RegularExpressions reference is in the Imports:
You can achieve it by this way as well :)
Path.GetFileNameWithoutExtension(Str1).Split("_"c).Last
Path.GetFileNameWithoutExtension
Returns the file name of the specified path string without the extension.
so with your String it will return to you - 10001662-1_20060301_29_1_20190301
then Split above String i.e. 10001662-1_20060301_29_1_20190301 based on _ and will return an array of string.
Last
It will return you the last element of an array returned by Split..
Regards..!!
AKsh

Spring Batch - Comma separated values - Save in Data Base

I have a file which contains list of values (user IDs) separated by comma(“,”) as follows.
111, 222, 333, 444, 555, 777 …………
The file contains millions of such records and I wanted to save these values into a single column in a table in RDBMS.
I tried to use DelimitedLineTokenizer for parsing data.
The issue is that “DelimitedLineTokenizer” considers only one entry in a single line, and rest of the values are ignored.The first entry ("111") is saved and rest of the values in the same line are ignored.If there is a second line , the first element in the second line is saved and rest are ignored.
Is there a way to tokenize all the comma separated values from a single line and save all of them into DB?
The query is a s follows.
INSERT INTO users (id) VALUES (: userid).
I used the following code to parse the file and save it in DB.
public FlatFileItemReader<User> reader() {
FlatFileItemReader<User> reader = new FlatFileItemReader<User>();
DelimitedLineTokenizer reader = new DelimitedLineTokenizer(",");
reader.setNames(new String[] {“userid”});
blah…blah….blah….
reader.setLineMapper(new DefaultLineMapper<User>() {
{
setLineTokenizer(reader);
setFieldSetMapper(new BeanWrapperFieldSetMapper<User>() {
{
setTargetType(User.class);
}
});
}
});
return reader;
}
#Bean
public UserItemProcessor processor() {
return new UserItemProcessor();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener) {
return jobBuilderFactory.get("importUserJob").incrementer(new RunIdIncrementer()).listener(listener)
.flow(step1()).end().build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1").<User, User> chunk(5).reader(reader()).processor(processor())
.writer(writer()).build();
}
Basically, you have two delimiters for target object - comma & new line. So either you writer a custom reader that works on both delimiters or you need to pre process your file to bring it to standard format.
In my opinion, you are better off by pre processing your file to replace all comma with new line character.
You might retain original file as is and create pre processed data in a new temporary file.
You can either do that as a separate spring batch step ( not recommended due to file size ) or if its going to be a scheduled job then probably, in your kick off script.
Replace comma with newline in java
How to break lines at a specific character in Notepad++?
Notepad++ find and replace string with a new-line
Replace comma with new line in a text file using tr in Linux

Search removing comma using Entity Framework

I want to search a text that contains comma in database, but, there is not comma in the reference.
For example. In database I have the following value:
"Development of computer programs, including electronic games"
So, I try to search the data using the following string as reference:
"development of computer programs including electronic games"
NOTE that the only difference is that in database I have a comma in the text, but, in my reference for search, I have not.
Here is my code:
public async Task<ActionResult>Index(string nomeServico)
{
using (MyDB db = new MyDB())
{
// 1st We receive the following string:"development-of-computer-programs-including-electronic-games"
// but we remove all "-" characters
string serNome = nomeServico.RemoveCaractere("-", " ");
// we search the service that contains (in the SerName field) the value equal to the parameter of the Action.
Servicos servico = db.Servicos.FirstOrDefault(c => c.SerNome.ToLower().Equals(serNome, StringComparison.OrdinalIgnoreCase));
}
}
The problem is that, in the database, the data contains comma, and in the search value, don't.
In you code you are replacing "-" with "" and that too in your search string. But as per your requirement you need to change "," with "" for your DB entry.
Try doing something like this:
string serNome = nomeServico.ToLower();
Servicos servico = db.Servicos.FirstOrDefault(c => c.SerNome.Replace(",","").ToLower() == serNome);

Parsing an XML string containing " " (which must be preserved)

I have code that is passed a string containing XML. This XML may contain one or more instances of (an entity reference for the blank space character). I have a requirement that these references should not be resolved (i.e. they should not be replaced with an actual space character).
Is there any way for me to achieve this?
Basically, given a string containing the XML:
<pattern value="[A-Z0-9 ]" />
I do not want it to be converted to:
<pattern value="[A-Z0-9 ]" />
(What I am actually trying to achieve is to simply take an XML string and write it to a "pretty-printed" file. This is having the side-effect of resolving occurrences of in the string to a single space character, which need to be preserved. The reason for this requirement is that the written XML document must conform to an externally-defined specification.)
I have tried creating a sub-class of XmlTextReader to read from the XML string and overriding the ResolveEntity() method, but this isn't called. I have also tried assigning a custom XmlResolver.
I have also tried, as suggested, to "double encode". Unfortunately, this has not had the desired effect, as the & is not decoded by the parser. Here is the code I used:
string schemaText = #"...<pattern value=""[A-Z0-9&#x20;]"" />...";
XmlWriterSettings writerSettings = new XmlWriterSettings();
writerSettings.Indent = true;
writerSettings.NewLineChars = Environment.NewLine;
writerSettings.Encoding = Encoding.Unicode;
writerSettings.CloseOutput = true;
writerSettings.OmitXmlDeclaration = false;
writerSettings.IndentChars = "\t";
StringBuilder writtenSchema = new StringBuilder();
using ( StringReader sr = new StringReader( schemaText ) )
using ( XmlReader reader = XmlReader.Create( sr ) )
using ( TextWriter tr = new StringWriter( writtenSchema ) )
using ( XmlWriter writer = XmlWriter.Create( tr, writerSettings ) )
{
XPathDocument doc = new XPathDocument( reader );
XPathNavigator nav = doc.CreateNavigator();
nav.WriteSubtree( writer );
}
The written XML ends up with:
<pattern value="[A-Z0-9&#x20;]" />
If you want it to be preserved, you need to double-encode it: &#x20;. The XML-reader will translate entities, that's more or less how XML works.
<pattern value="[A-Z0-9&#x20;]" />
What I did above is replaced "&" with "&" thereby escaping the ampersand.

how to maintain the spaces between the characters?

i am using the following code
String keyword=request.getParameter("keyword");
keyword = keyword.toLowerCase();
keyword.replaceAll(" "," "); //first double space and then single space
keyword = keyword.trim();
System.out.println(keyword);
i am given the input as t s
but iam getting as
[3/12/10 12:07:10:431 IST] 0000002c SystemOut O t s // here i am getting the two spaces
how can decrease two single space
use the follwoing program
public class whitespaces {
public static void main(String []args){
try{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String str = br.readLine();
System.out.println( str.replaceAll("\b\s{2,}\b", " "));
}catch(Exception e){
e.printStackTrace();
}
}
}
thanks,
murali
If your database always have only one space, you could use some keypress event to automatically ignore any occurrences of multiple spaces (by replace double spaces with single space in the search string or something).
StackOverflow has solved the same (or at least a similar) problem regarding spaces in tags, by not having them. Instead, if you want to denote a space in a tag on SO, use - (dash). You could run a query to replace all spaces with - in your database (even though it would probably take quite some time to run you'll only have to do it once). If you want to display them as spaces on the page, just do a replace when you render.