Using command line to lowercase all text in all files?

Using command line to lowercase all text in all files? - command-line

I've been trying out several commands such as
dd if=*.xml of=*.xml conv=lcase
to mass all the content of all the xml files in my folder to being lowercase. The folders filenames are already lowercase, I'm trying to change all the actual content to being lower case as well.
Can someone post the command to do this or tell me what I'm doing wrong? Thanks!

Use sed to edit files in place which will save you from writing a loop.
sed -ri 's/.+/\L\0/' *.xml

for i in *.xml; do tr A-Z a-z < $i > tmp && mv tmp $i; done
If your file names contain unusual characters (whitespace, newlines, control characters, etc), you may have to quote "$i", but since you say the names are all lowercase, I'm assuming that is not necessary.

I would go for:
sed -ie 's/\(.*\)/\L\1/' *.xml
I see that you've tagged your question with ssh. You didn't specify it, but does this mean that you want to run this command at the end of an ssh command? I that case, you will need to escape out the asterisks, as they're supposed to be interpreted remotely, like this:
sed -ie 's/\(.*\)/\L\1/' \*.xml

Related

Using sed/awk to print ONLY words that contains matched pattern - Words starting with /pattern/ or Ending with /pattern/

I have the following output:
junos-vmx-x86-64-21.1R1.11.qcow2 metadata-usb-fpc0.img metadata-usb-fpc10.img
metadata-usb-fpc11.img metadata-usb-fpc1.img metadata-usb-fpc2.img metadata-usb-fpc3.img
metadata-usb-fpc4.img metadata-usb-fpc5.img metadata-usb-fpc6.img metadata-usb-fpc7.img
metadata-usb-fpc8.img metadata-usb-fpc9.img metadata-usb-re0.img metadata-usb-re1.img
metadata-usb-re.img metadata-usb-service-pic-10g.img metadata-usb-service-pic-2g.img
metadata-usb-service-pic-4g.img vFPC-20210211.img vmxhdd.img
The output came from the following script:
images_fld=$(for i in $(ls "$DIRNAME_IMG"); do echo ${i%%/}; done)
The previous output is saved in a variable called images_fld=
Problem:
I need to extract the values of junos-vmx-x86-64-21.1R1.11.qcow2
vFPC-20210211.img and vmxhdd.img When I mean values I mean the entire word
The problem is that this directory containing all the files is always being updated, and new files are added constantly, which means that I can not rely on the line number ($N) to extract the name of those files.
I am trying to use awk or sed to achieve this.
Is there a way to:
match all files ending with.qcow2 and then extract the full file name? Like: junos-vmx-x86-64-21.1R1.11.qcow2
match all files starting withvFPC and then extract the full file name? Like: vFPC-20210211.img
match all files starting withvmxhdd and then extract the full file name? Like: vmxhdd.img
I am using those patterns as those file names tend to change names according to each version I am deploying. But the patterns like: .qcow2 or vFPC or vmxhddalways remain the same regardless, so for that reason, I need to extract the entire string only by matching partial patterns. Is it possible? Thanks!
Note: I can not rely on files ending with .img as there are quite a lot of them, so it would make it more difficult to extract the specific file names :/

This might work for you (GNU sed):
sed -nE '/\<\S+\.qcow2\>|\<(vFPC|vmxhdd)\S+\>/{s//\n&\n/;s/[^\n]*\n//;P;D}' file
If a string matches the required criteria, delimit it by newlines.
Delete up to and including the first newline.
Print/delete the first line and repeat.

Thanks to KamilCuk I was able to solve the problem. Thank you! For anyone who may need this in the future, instead of using sed or awk the solution was by using tail.
echo $images_fld | tail -f | tr ' ' '\n' | grep '\.qcow2$\|vFPC\|vmxhdd')
Basically, the problem that I was having was only to extract the names of the files ending with .qcow2 | and starting with vFPC & vmxhdd
Thank you KamilCuk
Another solution given by potong is by using
echo $images_fld sed -nE '/\<\S+\.qcow2\>|\<(vFPC|vmxhdd)\S+\>/{s//\n&\n/;s/[^\n]*\n//;P;D}'
which gives the same output as KamilCuk's! Thanks both

sed backreference not being found

I am trying to use 'sed' to replace a list of paths in a file with another path.
An example string to process is:
/path/to/file/block
I want to replace /path/to/file with something else.
I have Tried
sed -r '/\s(\S+)\/block/s/\1/new_path/'
I know it's finding the matching string but I'm getting an invalid back reference error.
How can I do this?

This may do:
echo "/path/to/file/block" | sed -r 's|/\S*/(block)|/newpath/\1|'
/newpath/block
Test
echo "test=/path/file test2=/path/to/file/block test3=/home/root/file" | sed -r 's|/\S*/(block)|/newpath/\1|'
test=/path/file test2=/newpath/block test3=/home/root/file

Back-references always refer to the pattern of the s command, not to any address (before the command).
However, in this case, there's no need for addressing: we can apply the substitution to all lines (and it will change only lines where it matches), so we can write:
s,\s(\S+)/block/, \1/new_path,
(I added a space to the RHS, as I'm guessing you didn't mean to overwrite that; also used a different separator to reduce the need for backslashes.)

replace a string from first line on multiple files

I got 10,000 text files which I have to make changes.
First line on every file contains a url.
By mistake for few files url missking 'com'
eg:
1) http://www.supersonic./psychology
2) http://www.supersonic./social
3) http://www.supersonic.com/science
my task is to check and add 'com' if it is missing
eg:
1) http://www.supersonic.com/psychology
2) http://www.supersonic.com/social
3) http://www.supersonic.com/science
all urls are of same domain(supersonic.com)
can you suggest me any fast and easy approach ?
Tried this : replacing supersonic./ with supersonic.com
sed -e '1s/supersonic.//supersonic.com/' *
no change in the output.

Use -i to change the files instead of just outputting the changed lines.
Use a different delimiter than / if you want to use / in the regex (or use \/ in the regex).
Use \. to match a dot literally, . matches anything.
sed -i~ -e '1s=supersonic\./=supersonic.com/=' *
Some versions of sed don't support -i.

You are very close with your code, but you need to account for the trailing / char after the . char.
Assuming you are using a modern sed with the -i (inplace-edit) option you can do
sed -i '1s#supersonic\./#supersonic.com/#' *
Note that rather than have to escape / inside of the s/srchpat\/withSlash/replaceStr/', you can use another char after the the s command as the delimiter, here I use s#...#...#. If your search pattern had a # char, then you would have to use a different char.
Some older versions of sed need to you to escape the alternate delimiter at the first use, so
sed 's\#srchStr#ReplStr#' file
for those cases.
If you're using a sed that doesn't support the -i options, then
you'll need to loop on your file, and manage the tmp files, i.e.
for f in *.html ; do
sed '1s#supersonic\./#supersonic.com/#' "$f" > /tmp/"$f".fix \
&& /bin/mv /tmp/"$f".fix "$f"
done
Warning
But as you're talking about 10,000+files, you'll want to do some testing before using either of these solutions. Copy a good random set of those files to /tmp/mySedTest/ dir and run one of these solutions there to make sure there are no surprises.
And you're likely to blow out the cmd-line MAX_SIZE with 10,000+ files, so read about find and xargs. There are many posts here about [sed] find xargs. Check them out if needed.
IHTH

Manipulate characters with sed

I have a list of usernames and i would like add possible combinations to it.
Example. Lets say this is the list I have
johna
maryb
charlesc
Is there is a way to use sed to edit it the way it looks like
ajohn
bmary
ccharles
And also
john_a
mary_b
charles_c
etc...
Can anyone assist me into getting the commands to do so, any explanation will be awesome as well. I would like to understand how it works if possible. I usually get confused when I see things like 's/\.(.*.... without knowing what some of those mean... anyway thanks in advance.
EDIT ... I change the username

sed s/\(user\)\(.\)/\2\1/
Breakdown:
sed s/string/replacement/ will replace all instances of string with replacement.
Then, string in that sed expression is \(user\)\(.\). This can be broken down into two
parts: \(user\) and \(.\). Each of these is a capture group - bracketed by \( \). That means that once we've matched something with them, we can reuse it in the replacement string.
\(user\) matches, surprisingly enough, the user part of the string. \(.\) matches any single character - that's what the . means. Then, you have two captured groups - user and a (or b or c).
The replacement part just uses these to recreate the pattern a little differently. \2\1 says "print the second capture group, then the first capture group". Which in this case, will print out auser - since we matched user and a with each group.
ex:
$ echo "usera
> userb
> userc" | sed "s/\(user\)\(.\)/\2\1/"
auser
buser
cuser
You can change the \2\1 to use any string you want - ie. \2_\1 will give a_user, b_user, c_user.
Also, in order to match any preceding string (not just "user"), just replace the \(user\) with \(.*\). Ex:
$ echo "marya
> johnb
> alfredc" | sed "s/\(.*\)\(.\)/\2\1/"
amary
bjohn
calfred

here's a partial answer to what is probably the easy part. To use sed to change usera to user_a you could use:
sed 's/user/user_/' temp
where temp is the name of the file that contains your initial list of usernames. How this works: It is finding the first instance of "user" on each line and replacing it with "user_"
Similarly for your dot example:
sed 's/user/user./' temp
will replace the first instance of "user" on each line with "user."

Sed does not offer non-greedy regex, so I suggest perl:
perl -pe 's/(.*?)(.)$/$2$1/g' file
ajohn
bmary
ccharles
perl -pe 's/(.*?)(.)$/$1_$2/g' file
john_a
mary_b
charles_c
That way you don't need to know the username before hand.

Simple solution using awk
awk '{a=$NF;$NF="";$0=a$0}1' FS="" OFS="" file
ajohn
bmary
ccharles
and
awk '{a=$NF;$NF="";$0=$0"_" a}1' FS="" OFS="" file
john_a
mary_b
charles_c
By setting FS to nothing, every letter is a field in awk. You can then easy manipulate it.
And no need to using capturing groups etc, just plain field swapping.

This might work for you (GNU sed):
sed -r 's/^([^_]*)_?(.)$/\2\1/' file
This matches any charactes other than underscores (in the first back reference (\1)), a possible underscore and the last character (in the second back reference (\2)) and swaps them around.

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?

This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'

#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters

In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.

The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file

Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Using command line to lowercase all text in all files? - command-line

Use sed to edit files in place which will save you from writing a loop. sed -ri 's/.+/\L\0/' *.xml

for i in *.xml; do tr A-Z a-z < $i > tmp && mv tmp $i; done If your file names contain unusual characters (whitespace, newlines, control characters, etc), you may have to quote "$i", but since you say the names are all lowercase, I'm assuming that is not necessary.

Related

Using sed/awk to print ONLY words that contains matched pattern - Words starting with /pattern/ or Ending with /pattern/

sed backreference not being found

replace a string from first line on multiple files

Manipulate characters with sed

How can I remove all non-word characters except the newline?

Categories

Resources