conditional find and replace with sed - sed

I have a text file with thousands of bank transaction in it and I need to search and replace text based on text found on another line of the transaction. Each transaction is listed as such...
2016/01/08 * POS DEBIT LOWES #02793* SPOKANE VALLE WA #7522
Expenses:Unknown $289.78
Assets:INB Checking
I need to be able to search the top line for 'LOWES' and if text matches it will change the expenses column to Expenses:Building Materials
So the whole transaction would like like this...
2016/01/08 * POS DEBIT LOWES #02793* SPOKANE VALLE WA #7522
Expenses:Building Materials $289.78
Assets:INB Checking
I know that I can use sed to do find and replace but how can I do so based of a pattern match on the top line?

You can use a starting address and loop with the t command to find and replace your pattern up to next line not starting with numbers:
sed '/LOWES/{:a;N;/\n[^0-9]/{s/\(Expenses:\).*/\1Building Materials/;P;D;}}' file
Try the -i flag to edit the file in place:
sed -i '/LOWES/{:a;N;/\n[^0-9]/{s/\(Expenses:\).*/\1Building Materials/;P;D;}}' file

If you can assume that every LOWES expense starts with a line matching LOWES and ends with a line matching Expenses, you can just do this:
sed -e '/LOWES/,/Expenses/s/Unknown/Building Materials/'
This says that in every range of lines from /LOWES/ to /Expenses/ replace 'Unknown' with 'Building Materials'.
If you use a simpler pattern like this instead of the more robust solution suggested by SLePort, it's essential that you diff the results and ensure you didn't mangle something else.

Related

Joining specific lines in file

I have a text file (snippet below) containing some public-domain corporate earnings report data, formatted as follows:
Current assets:
Cash and cash equivalents
$ 21,514 $ 21,120
Short-term marketable securities
33,769 20,481
Accounts receivable
12,229 16,849
Inventories
2,281 2,349
and what I'm trying to do (with sed) is the following: if the current line starts with a capital letter, and the next line starts with whitespace, copy the last N characters from the next line into the last N columns of the current line, then delete the next line. I'm doing it this way, because there are other lines in the files that begin with whitespace that I want to ignore. The results should look like the following:
Current assets:
Cash and cash equivalents $ 21,514 $ 21,120
Short-term marketable securities 33,769 20,481
Accounts receivable 12,229 16,849
Inventories 2,281 2,349
The closest I've come to getting what I want is:
sed -i -r ':a;N;$!ba;s/[^A-Z]*\n([[:space:]])/\1/g' file.txt
and I believe I've got the pattern matching ok, but the subsequent substitution really messes up the alignment of the columns of numbers. When I first started this, this seemed like a simple operation, but hours of searching and experimenting haven't helped. I'm open to any solutions that use something else other than sed, but would prefer to keep it strictly bash. Thank you much!
This might work for you (GNU sed):
sed -r '/^[[:upper:]]/{N;/\n\s/{h;x;s/\n.*//;s/./ /g;x;G;s/(\n *)(.*)\1$/\2/};P;D}' file
This solution only processes two consecutive lines that start with an upper-case letter and a white space respectively. All other lines are printed as is.
Having gathered the above two lines into the pattern space (PS), a copy is made and stored in the hold space (HS). Processing now swaps to the HS. The second line is removed and the contents of the first turned into spaces. Processing now swaps back to the PS. The HS is appended to the PS and using matching and back references the length of the first line in spaces is subtracted from the combined lines.
The line(s) are printed and then deleted. If the second line did not begin with a space, by use of the P and D commands, it is not deleted but re-appraised by virtue of the regexp at the start of the sed script.

How to use sed to isolate only the first part of a file

I'm running Windows and have the GnuWin32 toolkit, which includes sed. Specifically:
C:\TEMP>sed --version
GNU sed version 4.2.1
I have a text file with two sections: A fixed part I want to preserve, and a part that's appended after running a job.
In the file is a unique string that identifies the start of the part that's added, and I'd like to use Gnu sed to isolate only the part of the file that's before the unique string - i.e., so I can append different data to the fixed part each time the job is run.
I know I could keep the fixed portion in a separate file, but that adds complexity and it would be more elegant if I could just reuse the data at the start of the same file.
A long time ago I knew how to set up sed scripts, and I'm sure this can be done with sed, but I've slept since then. :)
Can you please describe how to use sed to display the lines of text in a file up to and not including a specific string?
Example:
line 1 of fixed portion
line 2 of fixed portion
unique string
line 1 of appended portion
line 2 of appended portion
line 3 of appended portion
What I'd like is to see as output:
line 1 of fixed portion
line 2 of fixed portion
I've gotten as far as:
sed -r -n -e "0,/unique string/p"
but that prints the unique string as well.
Thanks in advance.
-Noel
This should work for you:
sed -n '/unique string/q;p' file
It quits processing at unique string. Other lines get printed.
An alternative might be to use a range address like this:
sed -n '1,/unique string/{/unique string/!p}' file
Note that sed includes the range border. We need to exclude unique string from printing.
Furthermore I'm using the -n option which makes sed suppress the output of input lines by default.
One thing, if unique string can contain characters which are also syntax characters in the regex like ...
test*
... sed might not be the right tool for the job any more since it can only match regular expressions but not fixed strings.
In that case awk might be the tool of choice:
awk 'index("*unique string*"){exit}1' file
index("string") returns a non zero value (the position) if the string has been found. We cancel further processing of input lines in that case and don't print that line as well.
The trailing 1 always evaluates to true and makes awk print all the lines until the previous condition applies.

Sed to replace last character on condition

I have a file which has following lines
172XI207 X123955 1
412XE401 XE05689 1
412XI402 XI9515 1
412XI403 XI06702 1
412XE404 XE75348 1
I want to replace last column to 2 if the first two characters in the second column matches to XE.
The result should be like below
172XI207 X123955 1
412XE401 XE05689 2
412XI402 XI9515 1
412XI403 XI06702 1
412XE404 XE75348 2
I wanted to use sed (not awk). Can someone please let me know how this can be acheived using sed?
many sed commands take an address or address range (see the man page for the gory details). Probably the most common command is s of course, but it is among those that take an address range, meaning it doesn't need to apply to every line. An address range xan be a regular expression. The s command is:
{address}s/pattern/replacement/
For you the address - matching RE - is / XE/ (assuming your columns are space separarated; change that to a tab if necessary), the pattern is 1$ and the replacement 2. Therefore:
/ XE/s/1$/2/
or as a command line
sed -e '/ XE/s/1$/2/' < oldfile > newfile
EDIT: oops, second column, not start of line.
This command should do the trick (providing you are looking at myfile.txt)
sed -e '/ XE/ s/1$/2//' myfile.txt
You can make sure your replacement is acted by adding the -i option which will modify the file in-place, make sure it's exactly what you are expecting before though.
Edit: based on question in comments, here is a command that matches on 3rd column and replaces on fifth.
sed -e 's/^\(\(\w\+\W\+\)\{2\}XE\(\w\+\W\+\)\{2\}\)1/\12/'
Or, as an alternative, you can first select the line and then substitute:
sed -e '/^\(\w\+\W\+\)\{2\}XE/ s/^\(\(\w\+\W\+\)\{4\}\)1/\12/'

Alternatives to grep/sed that treat new lines as just another character

Both grep and sed handle input line-by-line and, as far as I know, getting either of them to handle multiple lines isn't very straightforward. What I'm looking for is an alternative or alternatives to these two programs that treat newlines as just another character. Is there any tool that fits such a criteria
The tool you want is awk. It is record-oriented, not line-oriented, and you can specify your record-separator by setting the builtin variable RS. In particular, GNU awk lets you set RS to any regular expression, not just a single character.
Here is an example where awk uses one blank line to separate every record. If you show us what data you have, we can help you with it.
cat file
first line
second line
third line
fourth line
fifth line
sixth line
seventh line
eight line
more data
Running awk on this and reconstruct data using blank line as new record.
awk -v RS= '{$1=$1}1' file
first line second line third line
fourth line fifth line sixth line
seventh line eight line
more data
PS RS is not equal to file, is set to RS= blank, equal to RS=""
1) Sed can handle a block lines together, not always line by line.
In sed, normally I use :loop; $!{N; b loop}; to get all the lines available in pattern space delimited by newline.
Sample:
Productivity
Google Search\
Tips
"Web Based Time Tracking,
Web Based Todo list and
Reduce Key Stores etc"
result (remove the content between ")
sed -e ':loop; $!{N; b loop}; s/\"[^\"]*\"//g' thegeekstuff.txt
Productivity
Google Search\
Tips
You should read this URL (Unix Sed Tutorial: 6 Examples for Sed Branching Operation), it will give you detail how it works.
http://www.thegeekstuff.com/2009/12/unix-sed-tutorial-6-examples-for-sed-branching-operation/
2) For grep, check if your grep support -z option, which needn't handle input line by line.
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.

SED search and replace substring in a database file

To all,
I have spent alot of time searching for a solution to this but cannot find it.
Just for a background, I have a text database with thousands of records. Each record is delineated by :
"0 #nnnnnn# Xnnn" // no quotes
The records have many fields on a line of their own, but the field I am interested in to search and replace a substring (notice spaces) :
" 1 X94 User1.faculty.ventura.ca" // no quotes
I want to use sed to change the substring ".faculty.ventura.ca" to ".students.moorpark.ut", changing nothing else on the line, globally for ALL records.
I have tested many things with negative results.
How can this be done ?
Thank You for the assistance.
Bob Perez (robertperez1957#gmail.com)
If I understand you correctly, you want this:
sed 's/1 X94 \(.*\).faculty.ventura.ca/1 X94 \1.students.moorpark.ut/' mydatabase.file
This will replace all records of the form 1 X94 XXXXXX.faculty.ventura.ca with 1 X94 XXXXX.students.moorpark.ut.
Here's details on what it all does:
The '' let you have spaces and other messes in your script.
s/ means substitute
1 X94 \(.*\).faculty.ventura.ca is what you'll be substituting. The \(.*\) stores anything in that regular expression for use in the replacement
1 X94 \1.students.moorpark.ut is what to replace the thing you found with. \1 is filled in with the first thing that matched \(.*\). (You can have multiple of those in one line, and the next one would then be \2.)
The final / just tells sed that you're done. If your database doesn't have linefeeds to separate its records, you'll want to end with /g, to make this change multiple times per line.
mydatabase.file should be the filename of your database.
Note that this will output to standard out. You'll probably want to add
> mynewdatabasefile.name
to the end of your line, to save all the output in a file. (It won't do you much good on your terminal.)
Edit, per your comments
If you want to replace 1 F94 bperez.students.Napvil.NCC to 1 F94 bperez.JohnSmith.customer, you can use another set of \(.*\), as:
sed 's/1 X94 \(.*\).\(.*\).Napvil.NCC/1 X94 \1.JohnSmith.customer/' 251-2.txt
This is similar to the above, except that it matches two stored parameters. In this example, \1 evaluates to bperez and \2 evaluates to students. We match \2, but don't use it in the replace part of the expression.
You can do this with any number of stored parameters. (Sed probably has some limit, but I've never hit a sufficiently complicated string to hit it.) For example, we could make the sed script be '\(.\) \(...\) \(.*\).\(.*\).\(.*\).\(.*\)/\1 \2 \3.JohnSmith.customer/', and this would make \1 = 1, \2 = X94, \3 = bperez, \4 = Napvil and \5 = NCC, and we'd ignore \4 and \5. This is actually not the best answer though - just showing it can be done. It's not the best because it's uglier, and also because it's more accepting. It would then do a find and replace on a line like 2 Z12 bperez.a.b.c, which is presumably not what you want. The find query I put in the edit is as specific as possible while still being general enough to suit your tasks.
Another edit!
You know how I said "be as specific as possible"? Due to the . character being special, I wasn't. In fact, I was very generic. The . means "match any character at all," instead of "match a period". Regular expressions are "greedy", matching the most they could, so \(.*\).\(.*\) will always fill the first \(.*\) (which says, "take 0 to many of any character and save it as a match for later") as far as it can.
Try using:
sed 's/1 X94 \(.*\)\.\(.*\).Napvil.NCC/1 X94 \1.JohnSmith.customer/' 251-2.txt
That extra \ acts as an escape sequence, and changes the . from "any character" to "just the period". FYI, since I don't (but should) escape the other periods, technically sed would consider 1 X94 XXXX.StdntZNapvilQNCC as a valid match. Since . means any character, a Z or a Q there would be considered a fit.
The following tutorial helped me
sed - replace substring in file
try the same using a -i prefix to replace in the file directly
sed -i 's/unix/linux/' file.txt