Extract text from email then send text

Extract text from email then send text - email

I have alerts setup with my bank for whenever a transaction occurs. I have been trying to extract only the Date and the amount and forward that as a text message to myself.
Here is what the alert email looks like:
FIRSTNAME LAST NAME
A transaction has been posted to your BANKNAME ACCOUNTNAME, and is within the parameters you set for triggering this alert.
The transaction was on 06/20/2014 in the amount of ($40.00). For recent account history, including transaction descriptions and running balances, sign on to BANKNAME Account Manager (online banking) and click on the account name.
BANKNAME Disclaimer: This transmittal is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this transmittal is not the intended recipient, or the employee or agent responsible for delivering the transmittal to the intended recipient, you are notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by e-mail and delete this message from your computer.
I have been able to grep, awk, and sed but only can get the entire line to display.
:~# nawk '/The transaction was on/,/For recent account history/' alert.txt
The transaction was on 06/20/2014 in the amount of ($40.00). For recent account history, including transaction descriptions and running balances, sign on to BANKNAME Account Manager (online banking) and click on the account name.
What can I do to change the command to extract only the date and the amount so that the result would look something like this:
06/20/2014 $40.00
The plan is to take that output and send it to my self as a text message.

You could try the below grep command to get the date and the amount,
$ grep -oP '\d{2}\/\d{2}\/\d{4}|\$[^\)]*' file | paste -d' ' - -
06/20/2014 $40.00
You could do it also in GNU sed,
$ sed -nr 's~^.*([0-9]{2}\/[0-9]{2}\/[0-9]{4}).*\((\$[^)]*)\).*$~\1 \2~p' file
06/20/2014 $40.00

Try
awk -vRS=\ '/[0-9]+\/[0-9]+\/[0-9]+/ {d=$0} /\$[0-9]+\.[0-9]+/ {print d, substr($0, 2, length - 3); exit}'
Explanation:
/[0-9]+\/[0-9]+\/[0-9]+/
Matches 1 or more digits, a slash, 1 or more digits, a slash, and 1 or more digits.
[0-9] matches a single digit character in 0, 1, 2, ..., 9
+ causes the previous entity to be matched 1 or more times
\/ is a literal slash (the backslash "escapes" it so it doesn't terminate
the pattern)
/\$[0-9]+\.[0-9]+/
Matches a dollar sign, 1 or more digits, a period, and 1 or more digits.
\$ matches a literal dollar sign (a dollar sign is otherwise an anchor matching
the end of the string)
\. matches a literal period (a period otherwise matches any character)

Related

Extracting substring from inside bracketed string, where the substring may have spaces

I've got an application that has no useful api implemented, and the only way to get certain information is to parse string output. This is proving to be very painful...
I'm trying to achieve this in bash on SLES12.
Given I have the following strings:
QMNAME(QMTKGW01) STATUS(Running)
QMNAME(QMTKGW01) STATUS(Ended normally)
I want to extract the STATUS value, ie "Ended normally" or "Running".
Note that the line structure can move around, so I can't count on the "STATUS" being the second field.
The closest I have managed to get so far is to extract a single word from inside STATUS like so
echo "QMNAME(QMTKGW01) STATUS(Running)" | sed "s/^.*STATUS(\(\S*\)).*/\1/"
This works for "Running" but not for "Ended normally"
I've tried switching the \S* for [\S\s]* in both "grep -o" and "sed" but it seems to corrupt the entire regex.

This is purely a regex issue, by doing \S you requested to match non-white space characters within (..) but the failing case has a space between which does not comply with the grammar defined. Make it simple by explicitly calling out the characters to match inside (..) as [a-zA-Z ]* i.e. zero or more upper & lower case characters and spaces.
sed 's/^.*STATUS(\([a-zA-Z ]*\)).*/\1/'
Or use character classes [:alnum:] if you want numbers too
sed 's/^.*STATUS(\([[:alnum:] ]*\)).*/\1/'

sed 's/.*STATUS(\([^)]*\)).*/\1/' file
Output:
Running
Ended normally

Extracting a substring matching a given pattern is a job for grep, not sed. We should use sed when we must edit the input string. (A lot of people use sed and even awk just to extract substrings, but that's wasteful in my opinion.)
So, here is a grep solution. We need to make some assumptions (in any solution) about your input - some are easy to relax, others are not. In your example the word STATUS is always capitalized, and it is immediately followed by the opening parenthesis (no space, no colon etc.). These assumptions can be relaxed easily. More importantly, and not easy to work around: there are no nested parentheses. You will want the longest substring of non-closing-parenthesis characters following the opening parenthesis, no mater what they are.
With these assumptions:
$ grep -oP '\bSTATUS\(\K[^)]*(?=\))' << EOF
> QMNAME(QMTKGW01) STATUS(Running)
> QMNAME(QMTKGW01) STATUS(Ended normally)
> EOF
Running
Ended normally
Explanation:
Command options: o to return only the matched substring; P to use Perl extensions (the \K marker and the lookahead). The regexp: we look for a word boundary (\b) - so the word STATUS is a complete word, not part of a longer word like SUBSTATUS; then the word STATUS and opening parenthesis. This is required for a match, but \K instructs that this part of the matched string will not be returned in the output. Then we seek zero or more non-closing-parenthesis characters ([^)]*) and we require that this be followed by a closing parenthesis - but the closing parenthesis is also not included in the returned string. That's a "lookahead" (the (?= ... ) construct).

Joining specific lines in file

I have a text file (snippet below) containing some public-domain corporate earnings report data, formatted as follows:
Current assets:
Cash and cash equivalents
$ 21,514 $ 21,120
Short-term marketable securities
33,769 20,481
Accounts receivable
12,229 16,849
Inventories
2,281 2,349
and what I'm trying to do (with sed) is the following: if the current line starts with a capital letter, and the next line starts with whitespace, copy the last N characters from the next line into the last N columns of the current line, then delete the next line. I'm doing it this way, because there are other lines in the files that begin with whitespace that I want to ignore. The results should look like the following:
Current assets:
Cash and cash equivalents $ 21,514 $ 21,120
Short-term marketable securities 33,769 20,481
Accounts receivable 12,229 16,849
Inventories 2,281 2,349
The closest I've come to getting what I want is:
sed -i -r ':a;N;$!ba;s/[^A-Z]*\n([[:space:]])/\1/g' file.txt
and I believe I've got the pattern matching ok, but the subsequent substitution really messes up the alignment of the columns of numbers. When I first started this, this seemed like a simple operation, but hours of searching and experimenting haven't helped. I'm open to any solutions that use something else other than sed, but would prefer to keep it strictly bash. Thank you much!

This might work for you (GNU sed):
sed -r '/^[[:upper:]]/{N;/\n\s/{h;x;s/\n.*//;s/./ /g;x;G;s/(\n *)(.*)\1$/\2/};P;D}' file
This solution only processes two consecutive lines that start with an upper-case letter and a white space respectively. All other lines are printed as is.
Having gathered the above two lines into the pattern space (PS), a copy is made and stored in the hold space (HS). Processing now swaps to the HS. The second line is removed and the contents of the first turned into spaces. Processing now swaps back to the PS. The HS is appended to the PS and using matching and back references the length of the first line in spaces is subtracted from the combined lines.
The line(s) are printed and then deleted. If the second line did not begin with a space, by use of the P and D commands, it is not deleted but re-appraised by virtue of the regexp at the start of the sed script.

conditional find and replace with sed

I have a text file with thousands of bank transaction in it and I need to search and replace text based on text found on another line of the transaction. Each transaction is listed as such...
2016/01/08 * POS DEBIT LOWES #02793* SPOKANE VALLE WA #7522
Expenses:Unknown $289.78
Assets:INB Checking
I need to be able to search the top line for 'LOWES' and if text matches it will change the expenses column to Expenses:Building Materials
So the whole transaction would like like this...
2016/01/08 * POS DEBIT LOWES #02793* SPOKANE VALLE WA #7522
Expenses:Building Materials $289.78
Assets:INB Checking
I know that I can use sed to do find and replace but how can I do so based of a pattern match on the top line?

You can use a starting address and loop with the t command to find and replace your pattern up to next line not starting with numbers:
sed '/LOWES/{:a;N;/\n[^0-9]/{s/\(Expenses:\).*/\1Building Materials/;P;D;}}' file
Try the -i flag to edit the file in place:
sed -i '/LOWES/{:a;N;/\n[^0-9]/{s/\(Expenses:\).*/\1Building Materials/;P;D;}}' file

If you can assume that every LOWES expense starts with a line matching LOWES and ends with a line matching Expenses, you can just do this:
sed -e '/LOWES/,/Expenses/s/Unknown/Building Materials/'
This says that in every range of lines from /LOWES/ to /Expenses/ replace 'Unknown' with 'Building Materials'.
If you use a simpler pattern like this instead of the more robust solution suggested by SLePort, it's essential that you diff the results and ensure you didn't mangle something else.

SED search and replace substring in a database file

To all,
I have spent alot of time searching for a solution to this but cannot find it.
Just for a background, I have a text database with thousands of records. Each record is delineated by :
"0 #nnnnnn# Xnnn" // no quotes
The records have many fields on a line of their own, but the field I am interested in to search and replace a substring (notice spaces) :
" 1 X94 User1.faculty.ventura.ca" // no quotes
I want to use sed to change the substring ".faculty.ventura.ca" to ".students.moorpark.ut", changing nothing else on the line, globally for ALL records.
I have tested many things with negative results.
How can this be done ?
Thank You for the assistance.
Bob Perez (robertperez1957#gmail.com)

If I understand you correctly, you want this:
sed 's/1 X94 \(.*\).faculty.ventura.ca/1 X94 \1.students.moorpark.ut/' mydatabase.file
This will replace all records of the form 1 X94 XXXXXX.faculty.ventura.ca with 1 X94 XXXXX.students.moorpark.ut.
Here's details on what it all does:
The '' let you have spaces and other messes in your script.
s/ means substitute
1 X94 \(.*\).faculty.ventura.ca is what you'll be substituting. The \(.*\) stores anything in that regular expression for use in the replacement
1 X94 \1.students.moorpark.ut is what to replace the thing you found with. \1 is filled in with the first thing that matched \(.*\). (You can have multiple of those in one line, and the next one would then be \2.)
The final / just tells sed that you're done. If your database doesn't have linefeeds to separate its records, you'll want to end with /g, to make this change multiple times per line.
mydatabase.file should be the filename of your database.
Note that this will output to standard out. You'll probably want to add
> mynewdatabasefile.name
to the end of your line, to save all the output in a file. (It won't do you much good on your terminal.)
Edit, per your comments
If you want to replace 1 F94 bperez.students.Napvil.NCC to 1 F94 bperez.JohnSmith.customer, you can use another set of \(.*\), as:
sed 's/1 X94 \(.*\).\(.*\).Napvil.NCC/1 X94 \1.JohnSmith.customer/' 251-2.txt
This is similar to the above, except that it matches two stored parameters. In this example, \1 evaluates to bperez and \2 evaluates to students. We match \2, but don't use it in the replace part of the expression.
You can do this with any number of stored parameters. (Sed probably has some limit, but I've never hit a sufficiently complicated string to hit it.) For example, we could make the sed script be '\(.\) \(...\) \(.*\).\(.*\).\(.*\).\(.*\)/\1 \2 \3.JohnSmith.customer/', and this would make \1 = 1, \2 = X94, \3 = bperez, \4 = Napvil and \5 = NCC, and we'd ignore \4 and \5. This is actually not the best answer though - just showing it can be done. It's not the best because it's uglier, and also because it's more accepting. It would then do a find and replace on a line like 2 Z12 bperez.a.b.c, which is presumably not what you want. The find query I put in the edit is as specific as possible while still being general enough to suit your tasks.
Another edit!
You know how I said "be as specific as possible"? Due to the . character being special, I wasn't. In fact, I was very generic. The . means "match any character at all," instead of "match a period". Regular expressions are "greedy", matching the most they could, so \(.*\).\(.*\) will always fill the first \(.*\) (which says, "take 0 to many of any character and save it as a match for later") as far as it can.
Try using:
sed 's/1 X94 \(.*\)\.\(.*\).Napvil.NCC/1 X94 \1.JohnSmith.customer/' 251-2.txt
That extra \ acts as an escape sequence, and changes the . from "any character" to "just the period". FYI, since I don't (but should) escape the other periods, technically sed would consider 1 X94 XXXX.StdntZNapvilQNCC as a valid match. Since . means any character, a Z or a Q there would be considered a fit.

The following tutorial helped me
sed - replace substring in file
try the same using a -i prefix to replace in the file directly
sed -i 's/unix/linux/' file.txt

Are single quotes legal in the name part of an email address?

For example:
jon.o'conner#example.com ?

Yes, jon.o'conner#example.com is a valid email address according to RFC 5322.
From the Email address article at wikipedia (Syntax section):
The local-part of the email address may use any of these ASCII characters:
Uppercase and lowercase English letters (a–z, A–Z)
Digits 0 to 9
Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively (e.g. John..Doe#example.com).
(The syntax is formally defined in RFC 5322 section 3.4.1 and RFC 5321.)

Although the answer is correct according to RFC 5322 the practice of using the quote (') has holes.
Since it is string delimiter, too many automation and integration services fail when this character is used.
You will note that professional mail services like GMail do not allow it.
Strongly suggest that you use the alternate quote (`) if you need it, but in practice it should be avoided.

The format for email addresses is defined in RFC 5322; The local part (i.e. recipient) may use any of these ASCII characters:
Uppercase and lowercase English letters (a–z, A–Z)
Digits 0 to 9
Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively (e.g. John..Doe#example.com).
From this, you can see that single quotes are valid for the recipient address

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse