understanding sed lines in csh script - sed

I "inherited" an old csh script, which runs fine, but now i was asked to improve something. Now I try to understand what was programmed some years ago...
At some places sed was used to extract filenames or directory names and I am not able to understand in detal what happens there. Perhaps someome is able and kind to explain it to me.
The code lines are:
set File = `echo $Dirnames[$i] |sed 's/.*\///'`".bdf"
set Dir = `echo $Dirnames[$i] | sed 's/\(.*\)\/.*/\1/'`
I understand all the code except the sed parts...

Assuming that Dirname[$i] is assigned a file path:
sed 's/.*\///': removes from path any characters up to last slash .*\/. The remaining part of the path (ie the file) is echoed with .bdf extension
s/\(.*\)\/.*/\1/: outputs the path of the file directory. All characters up to last / are captured \(.*\) and output using backreference \1

Related

Use sed for Mixed Case Tags

Trying to reformat tags in an xlm file with gnu sed v4.7 on win10 (shoot me). sed is in the path and run from the Command Prompt. Need to escape some windows command-line characters with ^.
sourcefile
BEGIN
...
<trn:description>V7906 03/11 ALFREDOCAMEL HATSWOOD 74564500125</trn:description>
...
END
(There are three spaces at the start of the line.)
Expected output:
BEGIN
...
<trn:description>V7906 03/11 Alfredocamel Hatswood 74564500125</trn:description>
...
END
I want Title Case but this does in-place to lower case:
sed -i 's/^<trn:description^>\(.*\)^<\/trn:description^>$/^<trn:description^>\L\1^<\/trn:description^>/g' sourcefile
This command changes to Title Case:
sed 's/.*/\L^&/; s/\w*/\u^&/g' sourcefile
Can this be brought together as a one-liner to edit the original sourcefile in-place?
I want to use sed because it is available on the system and the code is consistently structured. I'm aware I should use a tool like xmlstarlet as explained:
sed ... code can't distinguish a comment that talks about sessionId tags from a real sessionId tag; can't recognize element encodings; can't deal with unexpected attributes being present on your tag; etc.
Thanks to Whirlpool Forum members for the answer and discussion.
It was too hard to achieve pattern matching "within the tags" in sed and the file was well formed so the required lines were changed:
sed -i.bak '/^<trn:description^>/s/\w\+/\L\u^&/g; s/^&.*;\^|Trn:Description/\L^&/g' filename
Explanation
in-place edit saving original file with .bak extension
select lines containing <trn:description>
for one or more words
replace first character with uppercase and rest with lowercase
select strings starting with & and ending with ; or Trn:Description
restore codes by replacing characters with lowercase
source/target filename
Note: ^ is windows escape character and is not required in other implementations

How to use sed to isolate only the first part of a file

I'm running Windows and have the GnuWin32 toolkit, which includes sed. Specifically:
C:\TEMP>sed --version
GNU sed version 4.2.1
I have a text file with two sections: A fixed part I want to preserve, and a part that's appended after running a job.
In the file is a unique string that identifies the start of the part that's added, and I'd like to use Gnu sed to isolate only the part of the file that's before the unique string - i.e., so I can append different data to the fixed part each time the job is run.
I know I could keep the fixed portion in a separate file, but that adds complexity and it would be more elegant if I could just reuse the data at the start of the same file.
A long time ago I knew how to set up sed scripts, and I'm sure this can be done with sed, but I've slept since then. :)
Can you please describe how to use sed to display the lines of text in a file up to and not including a specific string?
Example:
line 1 of fixed portion
line 2 of fixed portion
unique string
line 1 of appended portion
line 2 of appended portion
line 3 of appended portion
What I'd like is to see as output:
line 1 of fixed portion
line 2 of fixed portion
I've gotten as far as:
sed -r -n -e "0,/unique string/p"
but that prints the unique string as well.
Thanks in advance.
-Noel
This should work for you:
sed -n '/unique string/q;p' file
It quits processing at unique string. Other lines get printed.
An alternative might be to use a range address like this:
sed -n '1,/unique string/{/unique string/!p}' file
Note that sed includes the range border. We need to exclude unique string from printing.
Furthermore I'm using the -n option which makes sed suppress the output of input lines by default.
One thing, if unique string can contain characters which are also syntax characters in the regex like ...
test*
... sed might not be the right tool for the job any more since it can only match regular expressions but not fixed strings.
In that case awk might be the tool of choice:
awk 'index("*unique string*"){exit}1' file
index("string") returns a non zero value (the position) if the string has been found. We cancel further processing of input lines in that case and don't print that line as well.
The trailing 1 always evaluates to true and makes awk print all the lines until the previous condition applies.

Insert specific lines from file before first occurrence of pattern using Sed

I want to insert a range of lines from a file, say something like 210,221r before the first occurrence of a pattern in a bunch of other files.
As I am clearly not a GNU sed expert, I cannot figure how to do this.
I tried
sed '0,/pattern/{210,221r file
}' bunch_of_files
But apparently file is read from line 210 to EOF.
Try this:
sed -r 's/(FIND_ME)/PUT_BEFORE\1/' test.text
-r enables extendend regular expressions
the string you are looking for ("FIND_ME") is inside parentheses, which creates a capture group
\1 puts the captured text into the replacement.
About your second question: You can read the replacement from a file like this*:
sed -r 's/(FIND_ME)/`cat REPLACEMENT.TXT`\1/' test.text
If replace special characters inside REPLACEMENT.TXT beforehand with sed you are golden.
*= this depends on your terminal emulator. It works in bash.
In https://stackoverflow.com/a/11246712/4328188 CodeGnome gave some "sed black magic" :
In order to insert text before a pattern, you need to swap the pattern space into the hold space before reading in the file. For example:
sed '/pattern/ {
h
r file
g
N
}' in
However, to read specific lines from file, one may have to use a two-calls solution similar to dummy's answer. I'd enjoy knowing of a one-call solution if it is possible though.

Using command line to lowercase all text in all files?

I've been trying out several commands such as
dd if=*.xml of=*.xml conv=lcase
to mass all the content of all the xml files in my folder to being lowercase. The folders filenames are already lowercase, I'm trying to change all the actual content to being lower case as well.
Can someone post the command to do this or tell me what I'm doing wrong? Thanks!
Use sed to edit files in place which will save you from writing a loop.
sed -ri 's/.+/\L\0/' *.xml
for i in *.xml; do tr A-Z a-z < $i > tmp && mv tmp $i; done
If your file names contain unusual characters (whitespace, newlines, control characters, etc), you may have to quote "$i", but since you say the names are all lowercase, I'm assuming that is not necessary.
I would go for:
sed -ie 's/\(.*\)/\L\1/' *.xml
I see that you've tagged your question with ssh. You didn't specify it, but does this mean that you want to run this command at the end of an ssh command? I that case, you will need to escape out the asterisks, as they're supposed to be interpreted remotely, like this:
sed -ie 's/\(.*\)/\L\1/' \*.xml

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?
This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'
#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters
In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.
The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file
Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).