I'd like to search for a string starting with doi = { or url = { and then remove it from the file. For example, for the following data I'd like to remove the url and subsequently doi sections.
I don't know how I can use the replace command, as I don't know the complete string, and for Macro, how can I do this if these lines are not at regular distance from each other?
#article{Carrion2006,
author = {Carrion, M. and Arroyo, J.M.},
doi = {10.1109/TPWRS.2006.876672},
journal = {IEEE Trans. Power Syst.},
title = {{Bla Bla Bla 1}},
pages = {1371--1378},
url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1664974},
year = {2006}
}
#article{Chandrasekaran2012,
author = {Chandrasekaran, K. and Hemamalini, S. and Simon, Sishaj P. and Padhy, Narayana Prasad},
issn = {03787796},
journal = {Electr. Power Syst. Res.},
pages = {109--119},
publisher = {Elsevier B.V.},
title = {{Bla Bla Bla 2}},
url = {http://linkinghub.elsevier.com/retrieve/pii/S0378779611002471},
volume = {84},
year = {2012}
}
How about:
Find what: ^(?:doi|url)\s*=\s*[^}]+\},\R
Replace with: NOTHING
Related
I want to extract from the full String this part:
Will
full String:
"Will posted an update in the group Testing"
Another example
longerName
full String:
"longerName posted an update in the group Testing"
any help please
This could work:
String str = "Will posted an update in the group Testing";
String result = str.split("</a>")[0] + "</a>";
From the top of my head, this could work.
Regex
<a\b[^>]*>(.*?)</a>
Dart
final myString = "Will bla bla bla";
final regexp = RegExp(r'<a\b[^>]*>(.*?)</a>');
// final match = regexp.firstMatch(myString);
// final link = match.group(0);
Iterable matches = regexp.allMatches(myString);
matches.forEach((match) {
print(myString.substring(match.start, match.end));
});
I am trying to write a parser for LOLCODE GOD, WHAT I AM DOING???
(just in case to explain those strange words=) )
So, I need to have tokens for O RLY? and YA RLY.
I am trying to do like this:
reserved = { ...,
'O': 'IF_O',
'RLY?': 'IF_RLY',
'YA': 'THEN_YA',
'RLY': 'THEN_RLY', ...}
tokens = reserved.values() + (...)
t_IF_O = r'O'
t_IF_RLY = r'RLY\?'
t_THEN_YA = r'YA'
t_THEN_RLY = r'RLY'
And when I write O RLY? it is parsed like IF_O THEN_RLY and an undefined symbol ?.
If I replace RLY? with, for example, RLYY, replacing in dictionary RLY?: 'IF_RLY' -> 'RLYY': 'IF_RLY' and t_IF_RLY = r'RLYY', then it works for O RLYY.
So I think this is a problem with question marks in reserved words and do not know a workaround for this.
Sorry, but I can't reproduce this problem. Here is a working sample (ply=3.10, python=3.6):
import ply.lex as lex
tokens = (
'IF_O',
'IF_RLY',
'THEN_YA',
'THEN_RLY'
)
t_IF_O = r'O'
t_IF_RLY = r'RLY\?'
t_THEN_YA = r'YA'
t_THEN_RLY = r'RLY'
t_ignore = ' \t'
def t_error(t):
print(t)
lexer = lex.lex()
lexer.input('O RLY?')
while True:
token = lexer.token()
if token is None:
break
print(token)
And it prints:
LexToken(IF_O,'O',1,0)
LexToken(IF_RLY,'RLY?',1,2)
I am struggling with writing the correct rule which involves macros to identify organizations in a text.
To Identify Matrix Inc. in:
With it's rising share prices Matrix Inc. has come out a winner this quarter.
I am trying to check for words like Inc within the entity and thus defined a macros and rule as below:
$ORGANIZATION_TITLES = "/pharmaceuticals?|group|corp|corporation|international|co.?|inc.?|incorporated|holdings|motors|ventures|parters|llc|limited liability corporation|pvt.? ltd.?/"
ENV.defaults["stage"] = 1
{
ruleType: "tokens",
pattern: ([$ORGANIZATION_TITLES]),
action: ( Annotate($0, ner, "ORGANIZATION") )
}
ENV.defaults["stage"] = 2
{ ( [{tag:NNP}]+? ($ORGANIZATION_TITLES)) => ORGANIZATION }
I tried using bindings also and then applying the rule.
env.bind("$ORGANIZATION_TITLES", TokenSequencePattern.compile(env,"/pharmaceuticals?|group|corp|corporation|international|co.?|inc.?|incorporated|holdings|motors|ventures|parters|llc|limited liability corporation|pvt.? ltd.?/"));
Nothing seems to be working. I need to define more complex pattern rules involving macros like:
pattern: ( [ { ner:PERSON } ]+ /,/*? ($TITLES_CORPORATE_PREFIXES)*? $TITLES_CORPORATE+? /,/*? /of|for/? /,/*? [ { ner:ORGANIZATION } ]+ )
where $TITLES_CORPORATE_PREFIXES and $TITLES_CORPORATE are macros similar to $ORGANIZATION_TITLES.
What am I doing wrong?
EDIT
Here's my code:
public static void main(String[] args)
{
String rulesFile = "D:\\Workspace\\resource\\NERRulesFile.txt";
String dataFile = "D:\\Workspace\\resource\\GoldSetSentences.txt";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// pipeline.addAnnotator(new TokensRegexAnnotator(rulesFile));
String inputText = "Bill Edelman , CEO and Chairman , for Paragonix commented on the Supply Agreement with Essential Pharmaceuticals .";
Annotation document = new Annotation(inputText.toLowerCase());
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(TokenSequencePattern.getNewEnv(), rulesFile);
/* Next we can go over the annotated sentences and extract the annotated words,
Using the CoreLabel Object */
for (CoreMap sentence : sentences)
{
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
for(MatchedExpression phrase : matched){
// Print out matched text and value
System.out.println("matched: " + phrase.getText() + " with value " + phrase.getValue());
// Print out token information
CoreMap cm = phrase.getAnnotation();
for (CoreLabel token : cm.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
String pos = token.get(PartOfSpeechAnnotation.class);
String ne = token.get(NamedEntityTagAnnotation.class);
System.out.println("matched token: " + "word="+word + ", lemma="+lemma + ", pos=" + pos + "ne=" + ne);
}
}
}
}
Here is a rules file that should work:
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
$ORGANIZATION_TITLES = "/inc\.|corp\./"
{ pattern: ([{pos: NNP}]+ $ORGANIZATION_TITLES), action: ( Annotate($0, ner, "RULE_FOUND_ORG") ) }
I have made some changes to our code base to make the TokensRegexAnnotator more easily accessible. You will need to get the latest version from GitHub: https://github.com/stanfordnlp/CoreNLP
java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,tokensregex -tokensregex.rules organization.rules -file samples.txt -outputFormat text -tokensregex.caseInsensitive
If you run this command or the equivalent Java API call it should work:
I try to get page title and edit it like on the following example.
Page name My tools
Final string my_tools (Thus, I will be able to use it through markers in my css classes)
I know how to get page title using:
HEADERTITLE = TEXT
HEADERTITLE.data = page : title
But how can I transform this string?
Thank you for your help!
to convert the case and replace some characters you may use these TypoScript settings:
HEADERTITLE = TEXT
HEADERTITLE {
data = page:title
### replace whitespace
replacement.10 {
search = #\s#i
replace = _
useRegExp = 1
}
/*
### replace all special characters
replacement.10 {
search = #\W#i
replace =
useRegExp = 1
}
*/
### transform string to lowercase
case = lower
}
{
education = (
{
school = {
id = 108102169223234;
name = psss;
};
type = College;
year = {
id = 142833822398097;
name = 2010;
};
}
);
}
!-- 1.2398s -->
the above leads me error as " NSLocalizedDescription=Unrecognised leading character"
not even close to valid JSON.. http://www.jsonlint.com/
Are you in charge of generating the feed? If so I would think it a lot better to fix the problem at the source than try re-factor your code to accommodate what ever that is that is getting returned.
Are you using a JSON framework in Xcode to parse that string?