Replace characters in C# - character

I have a requirement.
I have a text which can contain any characters.
a) I have to retain only Alphanumeric characters
b) If the word "The" is found with a space prefixed or suffixed with the word, that needs to be removed.
e.g.
CASE 1:
Input: The Company Pvt Ltd.
Output: Company Pvt Ltd
But
Input: TheCompany Pvt Ltd.
Output: TheCompany Pvt Ltd
because there is no space between The & Company words.
CASE 2:
Similarly, Input: Company Pvt Ltd. The
Output: Company Pvt Ltd
But Input: Company Pvt Ltd.The
Output: Company Pvt Ltd
Case 3:
Input: Company#234 Pvt; Ltd.
Output: Company234 Pvt Ltd
No , or . or any other special characters.
I am basically setting the data to some variable like
_company.ShortName = _company.CompanyName.ToUpper();
So at the time of saving I cannot do anything. Only when I am getting the data from the database, then I need to apply this filter. The data is coming in _company.CompanyName
and I have to apply the filter on that.
So far I have done
public string ReplaceCharacters(string words)
{
words = words.Replace(",", " ");
words = words.Replace(";", " ");
words = words.Replace(".", " ");
words = words.Replace("THE ", " ");
words = words.Replace(" THE", " ");
return words;
}
private void button1_Click(object sender, EventArgs e)
{
MessageBox.Show(ReplaceCharacters(textBox1.Text.ToUpper()));
}
Thanks in advance. I am using C#

Here is a basic regex that matches your supplied cases. With the caveat that as Kobi says, your supplied cases are inconsistent, so I've taken the periods out of the first four tests. If you need both, please add a comment.
This handles all the cases you require, but the rapid proliferation of edge cases makes me think that maybe you should reconsider the initial problem?
[TestMethod]
public void RegexTest()
{
Assert.AreEqual("Company Pvt Ltd", RegexMethod("The Company Pvt Ltd"));
Assert.AreEqual("TheCompany Pvt Ltd", RegexMethod("TheCompany Pvt Ltd"));
Assert.AreEqual("Company Pvt Ltd", RegexMethod("Company Pvt Ltd. The"));
Assert.AreEqual("Company Pvt LtdThe", RegexMethod("Company Pvt Ltd.The"));
Assert.AreEqual("Company234 Pvt Ltd", RegexMethod("Company#234 Pvt; Ltd."));
// Two new tests for new requirements
Assert.AreEqual("CompanyThe Ltd", RegexMethod("CompanyThe Ltd."));
Assert.AreEqual("theasdasdatheapple", RegexMethod("the theasdasdathe the the the ....apple,,,, the"));
// And the case where you have THETHE at the start
Assert.AreEqual("CCC", RegexMethod("THETHE CCC"));
}
public string RegexMethod(string input)
{
// Old method before new requirement
//return Regex.Replace(input, #"The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
// New method that anchors the first the
//return Regex.Replace(input, #"^The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
// And a third method that does look behind and ahead for the last test
return Regex.Replace(input, #"^(The)+\s|\s(?<![A-Z0-9])[\s]*The[\s]*(?![A-Z0-9])| The$|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
}
I've also added a test method to my example that exercises the RegexMethod that contains the regular expression. To use this in your code you just need the second method.

string company = "Company; PvtThe Ltd.The . The the.the";
company = Regex.Replace(company, #"\bthe\b", "", RegexOptions.IgnoreCase);
company = Regex.Replace(company, #"[^\w ]", "");
company = Regex.Replace(company, #"\s+", " ");
company = company.Trim();
// company == "Company PvtThe Ltd"
These are the steps. 1 and 2 can be combined, but this is more clear.
Remove "the" as a whole word (also works for ".the").
Remove anything that isn't a letter or space.
Remove all adjacent spaces.
Remove spaces from the edges.

Related

Google Contacts Fields to fill variables in email template

First of all, thank you for your time.
I have been looking for a while for a program, a script or anything that could help me automate a task that otherwise is going to take very long.
See, i'm a french computer technician working for almost exclusively doctors here in France.
The doctors receive results by email, the results are then imported to the patient's folder from the email automatically.
But in order for them to receive that information we have to communicate an email address from a special domain + the doctor's ID that is like your driver's ID.
We use google contact as an address book because it's convenient. Since whenever we make a new maintenance contract with a doctor we input everything to google contact the info is already there. Sometimes we have up to 20 doctors in the same cabinet to set.
Link to a Google Sheet Contact Sample
The fields are the following :
Structure's Name : {{contact company name}} (all the doctors share the same structure)
Strutre's Adress : {{contact full address}} (all the doctors share the same structure)
First doctor
Last Name : {{last_name}}
First Name : {{first_name}}
eMail Address : {{email_address}} (this one is tagged MSSANTE in ggC)
Doc's ID : {{custom_field}} (this is a custom field tagged RPPS in ggC)
Second doctor
Last Name : {{last_name}}
First Name : {{first_name}}
eMail Address : {{email_address}} (this one is tagged MSSANTE in ggC)
Doc's ID : {{custom_field}} (this is a custom field tagged RPPS in ggC)
So on and so on.
Then this as to be sent to many laboratories all in BCC and the customers/doctors usually in CC
I was thinking of using google sheets or google's people API somehow...
Can someone give me a strategy or some code to start ?
Again thanks to anyone who can help even a bit.
Try
function email() {
const ss = SpreadsheetApp.getActiveSpreadsheet()
const emails = ss.getSheetByName('LABS mails').getRange('C2:C').getValues().flat().filter(r => r != '').join(',')
MailApp.sendEmail({
to: emails,
subject: 'titre du mail',
htmlBody: body()
})
}
function body() {
const ss = SpreadsheetApp.getActiveSpreadsheet()
const template = ss.getSheetByName('Mail Template (Exemple)')
const docteurs = ss.getSheetByName('Doctors')
let [headers, ...data] = docteurs.getDataRange().getDisplayValues()
let debut = template.getRange('A2:A').getValues().flat().filter(r => r != '').join('<br>')
let variable = template.getRange('B2:B').getValues().flat().filter(r => r != '').join('<br>')
let fin = template.getRange('C2:C').getValues().flat().filter(r => r != '').join('<br>')
const liste = ['{CABINET}', '{NOM}', '{PRENOM}', '{EMAIL}', '{RPPS}']
const colonnes = [1,4,3,8,7]
let message = debut
data.forEach((r, row) => {
var texte = variable
for (var i = 0; i < liste.length; i++) {
texte = texte.replace(liste[i], r[+colonnes[i] - 1])
}
message += texte + '<br><br>'
})
message += fin
return (message)
}
Put the text as follows (you will need a little html tags)
The email will be

Regular Expression

I need to extract few values from below string with Powershell Regex.
Request ID = 1234 { andquot;EMOandquot;: andquot;123456-Uandquot;, andquot;Terminated Accountandquot;: andquot;Test Userandquot;, andquot;Descriptionandquot;: andquot;andquot;, andquot;Last Dayandquot;: andquot;2019-06-26andquot;, andquot;Terminated User Mailandquot;: andquot;Test.User#gmail.comandquot; } Location : UK ,London
I Need to get Test.User#gmail.com, Test User and 2019-06-26. Please help me to get powershell regex for getting these values from above string.
Thank you.
I Tried below -
$description = "Request ID = 1234 { andquot;EMOandquot;: andquot;123456-Uandquot;, andquot;Terminated Accountandquot;: andquot;Test Userandquot;, andquot;Descriptionandquot;: andquot;andquot;, andquot;Last Dayandquot;: andquot;2019-06-26andquot;, andquot;Terminated User Mailandquot;: andquot;Test.User#gmail.comandquot; } Location : UK ,London"
$formatdesc = $description -replace ' ?(and)?quot;','"'
$formatdesc
Request ID = 1234 {"EMO":"123456-U","Terminated Account":"Test User","Description":"","Last Day":"2019-06-26","Terminated User Mail":"Test.User#gmail.com" } Location : UK ,London
With above how would I have extract Terminated User Mail, Terminated Account and Last Day the values are not static they are dynamic. Please help.
Break down the pattern logically you are looking to find first. It looks like you are looking for: Test.User#gmail.com Test User - use a simple -match e.g. $Myvariablename = [Your string] -match 'Test.User#gmail.com'
2019-06-26: For a date like this, break it down to its parts so that's 4 digits, a hyphen, 2 digits, a hyphen and then 2 digits so that (quickly and therefore not perfect without testing) comes out to a -match like $Myvariablename = [Your string] -match '^\d{4}-\d{2}-\d{2}'

Search removing comma using Entity Framework

I want to search a text that contains comma in database, but, there is not comma in the reference.
For example. In database I have the following value:
"Development of computer programs, including electronic games"
So, I try to search the data using the following string as reference:
"development of computer programs including electronic games"
NOTE that the only difference is that in database I have a comma in the text, but, in my reference for search, I have not.
Here is my code:
public async Task<ActionResult>Index(string nomeServico)
{
using (MyDB db = new MyDB())
{
// 1st We receive the following string:"development-of-computer-programs-including-electronic-games"
// but we remove all "-" characters
string serNome = nomeServico.RemoveCaractere("-", " ");
// we search the service that contains (in the SerName field) the value equal to the parameter of the Action.
Servicos servico = db.Servicos.FirstOrDefault(c => c.SerNome.ToLower().Equals(serNome, StringComparison.OrdinalIgnoreCase));
}
}
The problem is that, in the database, the data contains comma, and in the search value, don't.
In you code you are replacing "-" with "" and that too in your search string. But as per your requirement you need to change "," with "" for your DB entry.
Try doing something like this:
string serNome = nomeServico.ToLower();
Servicos servico = db.Servicos.FirstOrDefault(c => c.SerNome.Replace(",","").ToLower() == serNome);

Getting a string which ends with a string "lngt" in Lex

I am writing a lex script to tokenize C ASTs. I want to write a regex in lex to get a string that ends with a specific string "lngt" but does not include "lngt" in the final string returned by lex. So basically the string form would be (.*lngt), but I haven't been able to figure out how to do this in lex. Any advice/direction would be really helpful
Example:I have this line in my file
#65 string_cst type: #71 strg: Reverse order of the given number is : %d lngt: 42
I want to retrieve string after strg: and before lngt: ie "Reverse order of the given number is : %d" (NOTE: this string could be composed of any characters possible)
Thanks.
This question needs an answer is similar to the one I wrote here. It can be done by writing your own state machine in lex. It could also be done by writing some C code as shown in the cited answer or in the other texts cited below.
If we assume that the string you want is always between "strg" and "lngt" then this is the same as any other non-symmetric string delimiters.
%x STRG LETTERL LN LNG LNGT
ws [ \t\r\n]+
%%
<INITIAL>"strg: " {
BEGIN(STRG);
}
<STRG>[^l]*l {
yymore();
BEGIN(LETTERL);
}
<LETTERL>n {
yymore();
BEGIN(LN);
}
<LN>g {
yymore();
BEGIN(LNG);
}
<LNG>t {
yymore();
BEGIN(LNGT);
}
<LNGT>":" {
printf("String is '%s'\n", yytext);
BEGIN(INITIAL);
}
<LETTERL>[^n] {
BEGIN(STRG);
yymore();
}
<LN>[^g] {
BEGIN(STRG);
yymore();
}
<LNG>[^t] {
BEGIN(STRG);
yymore();
}
<LNGT>[^:] {
BEGIN(STRG);
yymore();
}
<INITIAL>{ws} /* skip */ ;
<INITIAL>. /* skip anything not in the string */
%%
To quote my other answer:
There are suggested solutions on several university compiler courses. The one that explains it well is here (at Manchester). Which cites a couple of good books which also cover the problems:
J.Levine, T.Mason & D.Brown: Lex and Yacc (2nd ed.)
M.E.Lesk & E.Schmidt: Lex - A Lexical Analyzer Generator
The two techniques described are to use Start Conditions to explicity specify the state machine, or manual input to read characters directly.

What tag set is used in OpenNLP's german maxent model?

currently I am using the OpenNLP tools to PoS-tag german sentences, with the maxent model listed on their download-site:
de POS Tagger Maxent model trained on tiger corpus. de-pos-maxent.bin
This works very well and I got results as:
Diese, Community, bietet, Teilnehmern, der, Veranstaltungen, die, Möglichkeit ...
PDAT, FM, VVFIN, NN, ART, NN, ART, NN ...
With the tagged sentences I want to do some further processing where I have to know the meaning of the single tags. Unforunately searching the OpenNLP-Wiki for the tag sets isn't very helpful as it says:
TODO: Add more tag sets, also for non-english languages
Does anyone know where can I find the tag set used in the german maxent model?
I created an enum containing the german tags (Reverse lookup is possible):
public enum POSGermanTag {
ADJA("Attributives Adjektiv"),
ADJD("Adverbiales oder prädikatives Adjektiv"),
ADV("Adverb"),
APPR("Präposition; Zirkumposition links"),
APPRART("Präposition mit Artikel"),
APPO("Postposition"),
APZR("Zirkumposition rechts"),
ART("Bestimmer oder unbestimmer Artikel"),
CARD("Kardinalzahl"),
FM("Fremdsprachichles Material"),
ITJ("Interjektion"),
KOUI("unterordnende Konjunktion mit zu und Infinitiv"),
KOUS("unterordnende Konjunktion mit Satz"),
KON("nebenordnende Konjunktion"),
KOKOM("Vergleichskonjunktion"),
NN("normales Nomen"),
NE("Eigennamen"),
PDS("substituierendes Demonstrativpronomen"),
PDAT("attribuierendes Demonstrativpronomen"),
PIS("substituierendes Indefinitpronomen"),
PIAT("attribuierendes Indefinitpronomen ohne Determiner"),
PIDAT("attribuierendes Indefinitpronomen mit Determiner"),
PPER("irreflexives Personalpronomen"),
PPOSS("substituierendes Possessivpronomen"),
PPOSAT("attribuierendes Possessivpronomen"),
PRELS("substituierendes Relativpronomen"),
PRELAT("attribuierendes Relativpronomen"),
PRF("reflexives Personalpronomen"),
PWS("substituierendes Interrogativpronomen"),
PWAT("attribuierendes Interrogativpronomen"),
PWAV("adverbiales Interrogativ- oder Relativpronomen"),
PAV("Pronominaladverb"),
PTKZU("zu vor Infinitiv"),
PTKNEG("Negationspartike"),
PTKVZ("abgetrennter Verbzusatz"),
PTKANT("Antwortpartikel"),
PTKA("Partikel bei Adjektiv oder Adverb"),
TRUNC("Kompositions-Erstglied"),
VVFIN("finites Verb, voll"),
VVIMP("Imperativ, voll"),
VVINF("Infinitiv"),
VVIZU("Infinitiv mit zu"),
VVPP("Partizip Perfekt"),
VAFIN("finites Verb, aux"),
VAIMP("Imperativ, aux"),
VAINF("Infinitiv, aux"),
VAPP("Partizip Perfekt"),
VMFIN("finites Verb, modal"),
VMINF("Infinitiv, modal"),
VMPP("Partizip Perfekt, modal"),
XY("Nichtwort, Sonderzeichen"),
UNDEFINED("Nicht definiert, zb. Satzzeichen");
private final String desc;
private static final Map<String, POSGermanTag> nameToValueMap = new HashMap<String, POSGermanTag>();
static {
for (POSGermanTag value : EnumSet.allOf(POSGermanTag.class)) {
nameToValueMap.put(value.name(), value);
}
}
public static POSGermanTag forName(String name) {
return nameToValueMap.get(name);
}
private POSGermanTag(String desc) {
this.desc = desc;
}
public String getDesc() {
return this.desc;
}
}
It seems very likely that the STTS tag set is used. This tag set is said to be the most common tag set for the German language, e.g. in this question or in this Wikipedia entry.
It is my understanding that the OpenNLP POS tagger for German was trained on the Tiger corpus. This corpus does indeed use the STTS tag set, with minor modifications. I found the following helpful: A Brief Introduction to the Tiger Sample Corpus