What's wrong with this simplest jflex code? - jflex

I'm learning jflex, and wrote a simplest jflex code, which makes a single character #:
import com.intellij.lexer.FlexLexer;
import com.intellij.psi.tree.IElementType;
%%
%class PubspecLexer
%implements FlexLexer
%unicode
%type IElementType
%function advance
%debug
Comment = "#"
%%
{Comment} { System.out.println("Found comment!"); return PubTokenTypes.Comment; }
. { return PubTokenTypes.BadCharacter; }
Then I generate a PubspecLexer class, and try it:
public static void main(String[] args) throws IOException {
PubspecLexer lexer = new PubspecLexer(new StringReader("#!!!!!"));
for (int i = 0; i < 3; i++) {
IElementType token = lexer.advance();
System.out.println(token);
}
}
But it prints 3 nulls:
null
null
null
Why it neither return Comment nor BadCharacter?

It's not jflex problem, actually, it's because the idea-flex changes original usage.
When use jflex to write intellij-idea plugins, we are using a patched "JFlex.jar" and "idea-flex.skeleton", later defines the zzRefill method as:
private boolean zzRefill() throws java.io.IOException {
return true;
}
Instead of original:
private boolean zzRefill() throws java.io.IOException {
// ... ignore some code
/* finally: fill the buffer with new input */
int numRead = zzReader.read(zzBuffer, zzEndRead,
zzBuffer.length-zzEndRead);
// ... ignore some code
// numRead < 0
return true;
}
Notice there is a zzReader in the code, and which holds the string #!!!!! I passed in. But in the idea-flex version, which is never used.
So to work with idea-flex version, I should use it like this:
public class MyLexer extends FlexAdapter {
public MyLexer() {
super(new PubspecLexer((Reader) null));
}
}
Then:
public static void main(String[] args) {
String input = "#!!!!!";
MyLexer lexer = new MyLexer();
lexer.start(input);
for (int i = 0; i < 3; i++) {
System.out.println(lexer.getTokenType());
lexer.advance();
}
}
Which prints:
match: --#--
action [19] { System.out.println("Found comment!"); return PubTokenTypes.Comment(); }
Found comment!
Pub:Comment
match: --!--
action [20] { return PubTokenTypes.BadCharacter(); }
Pub:BadCharacter
match: --!--
action [20] { return PubTokenTypes.BadCharacter(); }
Pub:BadCharacter

Related

How to solve NoClassDefFoundError?

I am enrolled in the Duke University course offered by coursera "Java-Programming-Arrays-Lists-and-Structured-data".
I am using Eclipse instead of using BlueJ for my own ease. But when I try to compile the program I get the following error.
Exception in thread "main" java.lang.NoClassDefFoundError: edu/duke/FileResource
I have imported the jar file edu.duke.* and trying to run the program from the main method.
Can someone kindly help how to solve this problem?
import java.lang.*;
import edu.duke.*;
public class WordLengths {
public void countWordLengths (FileResource resource, int[] counts) {
for(String word:resource.words()) {
int wordLength = word.length();
for(int i = 0; i < wordLength; i++) {
char curChar = word.charAt(i);
if((i == 0) || (i == wordLength - 1)) {
if(!Character.isLetter(curChar))
wordLength--;
}
}
counts[wordLength]++;
System.out.println(" Words of length "+ wordLength +" "+ word);
}
}
public void indexOfMax (int[] values) {
int maxIndex = 0;
int position = 0;
for(int i = 0; i < values.length; i++) {
if(values[i] > maxIndex) {
maxIndex = values[i];
position = i;
}
}
System.out.println("The most common word is :"+ position);
}
public void testCountWordLengths () {
FileResource f = new FileResource("C:\\Users\\Ramish-HP\\eclipse-workspace\\Assignment1\\smallHamlet.txt");
int[] counts = new int[31];
countWordLengths(f, counts);
indexOfMax(counts);
}
}
public class WordLengthsTest {
public static void main(String[] args) {
WordLengths wl = new WordLengths();
wl.testCountWordLengths();
}
}
java.lang.NoClassDefFoundError commonly thrown if the Java Virtual Machine or a ClassLoader instance tries to load in the definition of a class (as part of a normal method call or as part of creating a new instance using the new expression) and no definition of the class could be found.
The searched-for class definition existed when the currently executing class was compiled, but the definition can no longer be found.
Reference: https://docs.oracle.com/javase/7/docs/api/java/lang/NoClassDefFoundError.html
I will suggest to check run time libraries.

Why doesn't this Netbeans print from this code in the lower console

This code will not produce any text output from the lower output console. Am I missing a configuration setting or is the code wrong?
The code seems to be effective as the console will display that the build was successful in 0 seconds.
public class SpaceRemover {
public static void main(String[] args) {
String mostFamous = "Rudolph the Red-Nosed Reindeer";
char[] mfl = mostFamous.toCharArray();
for (int dex = 0; dex < mfl.length; dex++) {
char current = mfl[dex];
if (current != ' ') {
System.out.print(current);
} else {
System.out.print('.');
}
}
System.out.println();
}
}
Build was successful yet no text was printed.

Avro serialize and desiaralize List<UUID>

I cannot understand how to serialize List to binary format and deserialize back to List. I have tried to use CustomEncoding for this purpose:
public class ListUUIDAsListStringEncoding extends CustomEncoding<List<UUID>> {
{
schema = Schema.createArray(Schema.createUnion(Schema.create(Schema.Type.STRING)));
schema.addProp("CustomEncoding", "com.my.lib.common.schemaregistry.encoding.ListUUIDAsListStringEncoding");
}
#Override
protected void write(Object datum, Encoder out) throws IOException {
var list = (List<UUID>) datum;
out.writeArrayStart();
out.setItemCount(list.size());
for (Object r : list) {
if (r instanceof UUID) {
out.startItem();
out.writeString(r.toString());
}
}
out.writeArrayEnd();
}
#Override
protected List<UUID> read(Object reuse, Decoder in) throws IOException {
var newArray = new ArrayList<UUID>();
for (long i = in.readArrayStart(); i != 0; i = in.arrayNext()) {
for (int j = 0; j < i; j++) {
newArray.add(UUID.fromString(in.readString()));
}
}
return newArray;
}
}
'write' method seems to pass correctly, but 'read' method stoped with exception 'java.lang.ArrayIndexOutOfBoundsException: 36' when trying to read string.
What I do wrong and how to deserialize data correctly?
Solved myself:
Put my encoding class here if someone will need it:
public class ListUuidAsNullableListStringEncoding extends CustomEncoding<List<UUID>> {
{
schema = Schema.createUnion(
Schema.create(Schema.Type.NULL),
Schema.createArray(Schema.create(Schema.Type.STRING))
);
}
#Override
protected void write(Object datum, Encoder out) throws IOException {
if (datum == null) {
out.writeIndex(0);
out.writeNull();
} else {
out.writeIndex(1);
out.writeArrayStart();
out.setItemCount(((List) datum).size());
for (Object item : (List) datum) {
if (item instanceof UUID) {
out.startItem();
out.writeString(item.toString());
}
}
out.writeArrayEnd();
}
}
#Override
protected List<UUID> read(Object reuse, Decoder in) throws IOException {
switch (in.readIndex()) {
case 1:
var newArray = new ArrayList<UUID>();
for (long i = in.readArrayStart(); i != 0; i = in.arrayNext()) {
for (int j = 0; j < i; j++) {
newArray.add(UUID.fromString(in.readString()));
}
}
return newArray;
default:
in.readNull();
return null;
}
}
}

Creating custom plugin for chinese tokenization

I'm working towards properly integrating the stanford segmenter within SOLR for chinese tokenization.
This plugin involves loading other jar files and model files. I've got it working in a crude manner by hardcoding the complete path for the files.
I'm looking for methods to create the plugin where the paths need not be hardcoded and also to have the plugin in conformance with the SOLR plugin architecture. Please let me know if there are any recommended sites or tutorials for this.
I've added my code below :
public class ChineseTokenizerFactory extends TokenizerFactory {
/** Creates a new WhitespaceTokenizerFactory */
public ChineseTokenizerFactory(Map<String,String> args) {
super(args);
assureMatchVersion();
if (!args.isEmpty()) {
throw new IllegalArgumentException("Unknown parameters: " + args);
}
}
#Override
public ChineseTokenizer create(AttributeFactory factory, Reader input) {
Reader processedStringReader = new ProcessedStringReader(input);
return new ChineseTokenizer(luceneMatchVersion, factory, processedStringReader);
}
}
public class ProcessedStringReader extends java.io.Reader {
private static final int BUFFER_SIZE = 1024 * 8;
//private static TextProcess m_textProcess = null;
private static final String basedir = "/home/praveen/PDS_Meetup/solr-4.9.0/custom_plugins/";
static Properties props = null;
static CRFClassifier<CoreLabel> segmenter = null;
private char[] m_inputData = null;
private int m_offset = 0;
private int m_length = 0;
public ProcessedStringReader(Reader input){
char[] arr = new char[BUFFER_SIZE];
StringBuffer buf = new StringBuffer();
int numChars;
if(segmenter == null)
{
segmenter = new CRFClassifier<CoreLabel>(getProperties());
segmenter.loadClassifierNoExceptions(basedir + "ctb.gz", getProperties());
}
try {
while ((numChars = input.read(arr, 0, arr.length)) > 0) {
buf.append(arr, 0, numChars);
}
} catch (IOException e) {
e.printStackTrace();
}
m_inputData = processText(buf.toString()).toCharArray();
m_offset = 0;
m_length = m_inputData.length;
}
#Override
public int read(char[] cbuf, int off, int len) throws IOException {
int charNumber = 0;
for(int i = m_offset + off;i<m_length && charNumber< len; i++){
cbuf[charNumber] = m_inputData[i];
m_offset ++;
charNumber++;
}
if(charNumber == 0){
return -1;
}
return charNumber;
}
#Override
public void close() throws IOException {
m_inputData = null;
m_offset = 0;
m_length = 0;
}
public String processText(String inputText)
{
List<String> segmented = segmenter.segmentString(inputText);
String output = "";
if(segmented.size() > 0)
{
output = segmented.get(0);
for(int i=1;i<segmented.size();i++)
{
output = output + " " +segmented.get(i);
}
}
System.out.println(output);
return output;
}
static Properties getProperties()
{
if (props == null) {
props = new Properties();
props.setProperty("sighanCorporaDict", basedir);
// props.setProperty("NormalizationTable", "data/norm.simp.utf8");
// props.setProperty("normTableEncoding", "UTF-8");
// below is needed because CTBSegDocumentIteratorFactory accesses it
props.setProperty("serDictionary",basedir+"dict-chris6.ser.gz");
props.setProperty("inputEncoding", "UTF-8");
props.setProperty("sighanPostProcessing", "true");
}
return props;
}
}
public final class ChineseTokenizer extends CharTokenizer {
public ChineseTokenizer(Version matchVersion, Reader in) {
super(matchVersion, in);
}
public ChineseTokenizer(Version matchVersion, AttributeFactory factory, Reader in) {
super(matchVersion, factory, in);
}
/** Collects only characters which do not satisfy
* {#link Character#isWhitespace(int)}.*/
#Override
protected boolean isTokenChar(int c) {
return !Character.isWhitespace(c);
}
}
You can pass the argument through the Factory's args parameter.

Eclipse editor plug-in: syntax highlighting

I'm working on an editor plugin for a custom language and I've managed to set it up so all the necessary keywords highlight. The problem is the words become highlighted even if they are part of another word.
For example: let's say public is a keyword and I initialize a variable called publicVar so it looks like this public int publicVar. public highlights as expected but the 'public' part of publicVar is also highlighted which is not what I want.
public WFSPartitionScanner()
{
int index = 0;
int numOfRules = 5 + reversedWords.length + commonFunctions.length+directives.length +
BIFs.length + operators.length + strongOperators.length;
IToken string = new Token(WFS_STRING);
IToken comment = new Token(WFS_COMMENT);
IToken reversedWord = new Token(WFS_REVERSED_WORD);
IToken commonFunction = new Token(WFS_COMMON_FUNCTION);
IToken directive = new Token(WFS_DIRECTIVE);
IToken bif = new Token(WFS_BIF);
IToken operator = new Token(WFS_OPERATOR);
IToken strongOperator = new Token(WFS_STRONG_OPERATOR);
IToken numberToken = new Token(WFS_NUMBER);
IPredicateRule[] rules= new IPredicateRule[numOfRules];
rules[index] = new MultiLineRule("\"","\"", string, '\\');
rules[++index] = new MultiLineRule("\'", "\'", string, '\\');
rules[++index] = new SingleLineRule("//","\n", comment);
rules[++index] = new MultiLineRule("/*", "*/", comment);
rules[++index] = new WFSNumberRule(numberToken);
for(int i = 0; i < reversedWords.length; i++)
{
rules[++index] = new WordPatternRule(new WordDetector(reversedWords[i]), reversedWords[i], "",reversedWord);
}
for(int i = 0; i < commonFunctions.length; i++)
{
rules[++index] = new WordPatternRule(new WordDetector(commonFunctions[i]), commonFunctions[i], "",commonFunction);
}
for(int i = 0; i < BIFs.length;i++)
{
rules[++index] = new WordPatternRule(new WordDetector(BIFs[i]), BIFs[i], "",bif);
}
for(int i = 0; i < directives.length;i++)
{
rules[++index] = new WordPatternRule(new WordDetector(directives[i]), directives[i], "",directive);
}
for(int i=0; i < operators.length; i++)
{
rules[++index]= new WordPatternRule(new WordDetector(operators[i]), operators[i], "", operator);
}
for(int i=0; i < strongOperators.length; i++)
{
rules[++index]= new WordPatternRule(new WordDetector(strongOperators[i]), strongOperators[i], "", strongOperator);
}
setPredicateRules(rules);
}
public class WordDetector implements IWordDetector{
private char start;
private char[] part;
public WordDetector(String word)
{
this.start = word.charAt(0);
this.part = new char[word.length() - 1];
for(int i = 1; i < word.length(); i++)
{
part[i-1] = word.charAt(i);
}
}
#Override
public boolean isWordPart(char c) {
for(int i = 0; i < part.length; i++)
{
if(c == part[i])
{
return true;
}
}
return false;
}
#Override
public boolean isWordStart(char c) {
return (c == start);
}
}
I've also tried changing the WordPatternRule from
WordPatternRule(new WordDetector('KEYWORD'), 'KEYWORD', "",reversedWord);
to
WordPatternRule(new WordDetector('KEYWORD'), 'FIRST LETTER OF KEYWORD', 'LAST LETTER OF KEYWORD,reversedWord);
but I got the same results.
Ok I figured out part of the problem. To fix it i made my own WORDRULE
public class WFSWordRule extends WordRule implements IPredicateRule{
private IToken successToken;
public WFSWordRule(IWordDetector detector, String[] keywords, IToken token ) {
super(detector);
this.successToken= token;
for (String word : keywords) {
addWord(word, token);
}
}
#Override
public IToken evaluate(ICharacterScanner scanner, boolean arg1) {
return super.evaluate(scanner);
}
#Override
public IToken getSuccessToken() {
return successToken;
}
}
this doesn't completely solve the problem though. Now if there is a keyword at the end of a word the keyword is still highlighted. Using the same example as before, if i have a keyword 'Public' and i have a variable called 'varPublic' public is highlighted in both cases. But if i have a variable called 'PublicVar' public is not highlighted. any tips?
If you're trying to highlight words, not parts of words, I think you should use WordRule instead of WordPatternRule, as you have. From what I understand, WordPatternRule is used for finding patterns within a word, whereas WordRule is used for finding individual words.
I've used WordRules in an Eclipse plug in which I've been working on, and I don't have the problem of words being highlighted within other words. You can look at its code and use it as an example. Basically, it takes an IWordDetector implementation and you use its addWord method to add all the words you want it to detect.
Unlike CombinedWordRule, which greg-449 mentioned, it's a public part of the JFace framework which you are already using.
Not sure, if this is still useful, but I fixed this issue by overriding the method nextToken in my Scanner:
public IToken nextToken() {
if (this.fContentType == null || this.fRules == null) {
// don't try to resume
this.fTokenOffset = this.fOffset;
this.fColumn = RuleBasedScanner.UNDEFINED;
if (this.fRules != null) {
for (IRule fRule : this.fRules) {
IToken token = fRule.evaluate(this);
try {
if (!token.isUndefined()) {
if (this.fTokenOffset > 0 && (isWordEnd(this.fDocument.getChar(this.fTokenOffset - 1))
|| isSpecialKey(this.fDocument.getChar(this.fTokenOffset)))) {
this.fContentType = null;
return token;
}
}
} catch (BadLocationException ex) {
ex.printStackTrace();
}
}
}
if (this.read() == ICharacterScanner.EOF) {
return Token.EOF;
}
return this.fDefaultReturnToken;
}
}
isWordEnd checks for:
c == (char) 0 || c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '(' || c == ')' || c == ',' || c == ';'
isSpecialKey checks for:
c == '%'
For the other Issue, you've already fixed, I reimplemented the endSequenceDetected method in my rule:
#Override
protected boolean endSequenceDetected(ICharacterScanner scanner) {
int c = scanner.read();
scanner.unread();
return isWordEnd((char) c);
}