CsvParser not working for missing double quotes - scala

I have a messages in file like below and I am using com.univocity.parsers.csv.CsvParser to split the string based on delimiter(in this case its -)
1-bc-"name"-def-address
1-abc-"name-def-address
I create my CsvParser object like
private val settings = new CsvParserSettings()
settings.getFormat.setDelimiter('-')
settings.setIgnoreLeadingWhitespaces(true)
settings.setIgnoreTrailingWhitespaces(true)
settings.setReadInputOnSeparateThread(false)
settings.setNullValue("")
settings.setMaxCharsPerColumn(-1)
val parser = new CsvParser(settings)
and parse the input message like :
for (line <- Source.fromFile("path\\test.txt").getLines) {
println(parser.parseLine(line).toList)
}
and the output is:
List(1, bc, name, def, address)
List(1, abc, name-def-address)
If you see the output you can see that for 1st message the string was split properly however for second message it takes everything as a value after first double quote. Does anyone know why the behavior is like this and how can I get the desired output? I am reading every message as a string to it should simple treat a quote/double quote as a character.

Author of this library here. When the quote is found after your - delimiter, the parser will try to find a closing quote.
The easiest way around this is to make the parser simply ignore quotes with:
settings.getFormat().setQuote('\0');
Hope it helps.

Related

Make scala ignore multiple quotes in input

How do I make scala ignore the quotes inside of a String?
e.g.
val line1 = "<row Id="85" PostTypeId="1""
I want <row Id="85" PostTypeId="1" to be considered as a single string. However scala outputs error thinking that "<row Id=" is a string and everything after it is not related
Thanks in advance
val line1 = """<row Id="85" PostTypeId="1""""
Note those triple quotes ("""blahblah""") - to parse the string without escaping.

What is the most effective way in systemVerilog to know how many words a string has?

I have Strings in the following structure:
cmd, addr, data, data, data, data, ……., \n
For example:
"write,A0001000,00000000, \n"
I have to know how many words the String has.
I know that I can go over the String and search for the number of commas, but is there more effective way to do it?
UVM provides a facility to do regexp matching using the DPI, in case you're already using that. Have a look at the functions in uvm_svcmd_dpi.svh
Verilab also provides svlib, a package containing string matching functions.
A simpler option would be to change the commas(,) to a space, then you can use $sscanf (or $fscanf to skip the intermediate string and read directly from a file), assuming each command has a maximum number of words.
int code; // returns the number of words read
string str,word[5];
code = $sscanf(str,"%s %s %s %s %s", word[0],word[1],word[2],word[3],word[4]);
You can use %h if you know a word is in hex and translate it directly to a numeric value instead of a string.
The first step is to define extremely clearly what a word actually is vis. what constitutes the start of a word and what constitutes the end of the word, once you understand this, if should become obvious how to parse the string correctly.
In Java StringTokenizer is the best way to find the count of words in a string.
String sampleString= "cmd addr data data data data...."
StringTokenizer st = new Tokenizer(sampleString);
st.countTokens();
Hope this will help you :)
In java you can use following code to count words in string
public class WordCounts{
public static void main(String []args){
String text="cmd, addr, data, data, data, data";
String trimmed = text.trim();
int words = trimmed.isEmpty() ? 0 : trimmed.split("\\s+").length;
System.out.println(words);
}
}

Xstream ignore whitespace characters

I load data from XML into java classes using xstream library. The texts in several tags are very long and take more than one line. Such formatting causes that I have in Java class field text with additional characters like \n\t. Is there any way to load data from XML file without these characters?
Xml tag is declared in two lines. Opening tag is in the first line, then I have very long text, and the closing tag is declared in second line.
You can use regex or the string split method.
String string = "004-034556";
String[] parts = string.split("-");
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556
Just split your string. In your case it would be
String wantedText = parts[0];
Another solution would be to put your values into a string array, loop the array, match and remove any characters you dont want.
You can see how to match and remove Here

Correct preg_replace format

What is the correct preg_replace format to replace everything (including the brackets) between (xxxxxx). Example have string (not long) link this: AAAAABBBBB (AAAAABBBBB) and all I want is AAAAABBBBB. I need to use preg_replace as the string and it's string length changes.
Thanks
Not sure if I understood your question right, but this should return the string insite the parentheses
$string = "CCCCC (AAAAABBBBB)";
return preg_replace("(.+\(|\))","",$string);
This regex will basically look for any string followed by ( and singular")" and replace them with "".

Is it possible with Eclipse to only find results in string literals?

Lets assume I have the following Java code:
public String foo()
{
// returns foo()
String log = "foo() : Logging something!"
return log;
}
Can I search in Eclipse for foo() occurring only in a String literal, but not anywhere else in the code? So in the example here Eclipse should only find the third occurrance of foo(), not the first one, which is a function name and not the second one, which is a comment.
Edit: Simple Regular Expressions won't work, because they will find foo() in a line like
String temp = "literal" + foo() + "another literal"
But here foo() is a function name and not a String literal.
You can try it like this:
"[^"\n]*foo\\(\\)[^"\n]*"
You have to escape brackets, plus this regex do not match new lines or additional quotes, which prevent wrong matches.
Maybe you should use regex to find any occurence of foo() between two " ?