How to Print a partial text from a given text using sed - sed

How to print partial text from the following input file using sed ?
Input data file
ABC mobile - ABC, Inc
XYZ123 mobile - XYZ123 company
TD Ameritrade Mobile Trader - TD Ameritrade Mobile, LLC
Expected Output
ABC, Inc
XYZ123 Company
TD Ameritrade Mobile, LLC

Just delete everything up to a dash and space:
sed 's/.*- //'

using back reference :
sed 's/.*-\(.*\)/\1/'

Apparently you want to print everything from -:
For this you can use cut:
$ cut -d'-' -f2 file
ABC, Inc
XYZ123 company
TD Ameritrade Mobile, LLC
But it is showing the leading space after the dash. So it is better to go for awk:
$ awk -v FS="- " '{print $2}' file
ABC, Inc
XYZ123 company
TD Ameritrade Mobile, LLC
-v FS="- " sets the field separator as - (note the space) and then {print $2} prints the second block.
Or sed:
$ sed 's/^[^-]*- //' file
ABC, Inc
XYZ123 company
TD Ameritrade Mobile, LLC
^[^-]*- matches everything from the beginning of the line (^) up to the first dash and then a space. Then it replaces it with an empty string, so in fact it is deleting everything up to -.

Related

Sed ignoring potential characters at the end of the match group

I have the following text:
<h2 id="title"> ABC A BBBBB0 </h2>
<h2 id="title">ABC A BBBBB1 </h2>
<h2 id="title">ABC A BBBBB2</h2>
<h2 id="title"> ABC A BBBBB3 </h2>
and want to get of it the following:
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
I am currently running the next command:
sed -n "s/.*\"title\">[[:space:]]*\(.*\)<.*/\1/p" ./file.txt
but get lines with spaces at the end:
ABC A BBBBB0[space][space][space][space]
ABC A BBBBB1[space]
ABC A BBBBB2
ABC A BBBBB3[space]
I can not understand the concept of ignoring possible spaces at the end in my case, at the beginning of the possible matches I understand how to do it. Can somebody give me a clear example for this?
The last character in the group has to not be a space, then there may be spaces.
's/.*"title">[[:space:]]*\(.*[^[:space:]]\)[[:space:]]*<.*/\1/p'
I can not understand the concept
.* matches everything up until the end of the whole line. Then regex engine reads < and goes back from right to left up until it matches <, and then continues matching further.
You have to put something so that when you go back from the end of the string, you will end up at the place you want to be. So "not a space", for example. The process of "going back" is called "backtracking".
I can recommend https://www.regular-expressions.info/engine.html
Using sed
$ sed 's/[^>]*>[[:space:]]*\?\([[:alnum:][:space:]]*\)[[:space:]]\?<.*/\1/' file
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
$ sed -E 's/[^>]*> *?([A-Z0-9 ]*) ?<.*/\1/' file
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
When using seds grouping and back referencing, you can easily exclude any character, including spaces by not including it within the grouping parenthesis.
[^>]*> - Skip everything till the next >, as this is not within the parenthesis, it will be excluded.
*? - As too will this space. The ? makes it an optional character (or zero or more).
([A-Z0-9 ]*) - Everything within the parenthesis is included which will be capitals, integers and spaces.
?<.*/\1/' - Exclude a single space before < if one is present.
I'd just use awk:
$ awk -F'> *| *<' '{print $3}' file
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
This might work for you (GNU sed):
sed -nE 's/<h2 id="title">\s*(.*\S)\s*<\/h2>/\1/p' file
Use pattern matching to return the required strings.
N.B. \s matches white space and \S is its dual. Thus (.*\S) captures word or words.

Sed Pattern Match then Append to Line

I have some lines down below and I'm trying to append "Check" to the line that starts with Apples. Does someone know how I can get "Check" on the same line as Apples, not a new one and print the output? I wasn't getting anywhere on my own.
Thanks
What I have:
Grocery store bank and hardware store
Apples Bananas Milk
What I want:
Grocery store bank and hardware store
Apples Bananas Milk Check
What I tried:
sed -i '/^Apples/a Check' file
What I got:
Grocery store bank and hardware store
Apples Bananas Milk
Check
This might work for you (GNU sed):
sed '/Apples/s/$/ check/' file
If a line contains Apples append the string check. Where $ represents an anchor that is the end of the line (see here).
The problem is that you append the line with a command, see this reference:
The "a" command appends a line after the range or pattern.
What you want is a mere substitution. However, there may be some more tweaks you would like to implement, here are some suggestions:
sed -i 's/Apples/& Check/g' file # Adds ' Check' after each 'Apples'
sed -i 's/\<Apples\>/& Check/g' file # Only adds ' Check' after 'Apples' as whole word
sed -i -E 's/\<Apples(\s+Check)?\>/& Check/g' file # Adds ' Check' after removing existing ' Check'
Note these suggestions are for GNU sed only. \< and \> in GNU sed patterns are word boundaries, \s+ matches one or more whitespaces in GNU sed POSIX ERE patterns, and -E enables the POSIX ERE pattern syntax.
See the online demo:
#!/bin/bash
s='Grocery store bank and hardware store
Apples Bananas Milk'
sed 's/Apples/& Check/g' <<< "$s"
sed 's/\<Apples\>/& Check/g' <<< "$s"
sed -E 's/\<Apples(\s+Check)?\>/& Check/g' <<< "$s"
Output in each case is:
Grocery store bank and hardware store
Apples Check Bananas Milk
Using sed
$ sed '/^Apples/s/.*/& Check/' input_file
Grocery store bank and hardware store
Apples Bananas Milk Check
You can match lines that begin with Apples, return it with & appending Check

How to exclude end of lines of textfiles via terminal?

Given a file ./wordslist.txt with <word> <number_of_apparitions> such as :
aš toto 39626
ir 35938
tai 33361
tu 28520
kad 26213
...
How to exclude the end-of-lines digits in order to collect in output.txt data such :
aš toto
ir
tai
tu
kad
...
Note :
Sed, find, cut or grep prefered. I cannot use something which keeps [a-z] things since my data can contain ascii letters, non-ascii letters, chinese characters, digits, etc.
I suggest:
cut -d " " -f 1 wordslist.txt > output.txt
Or :
sed -E 's/ [0-9]+$//' wordslist.txt > output.txt.
Use awk for print first word in this case.
awk '{print $1}' your_file > your_new_file
awk solution to simply print input line excluding last column
$ awk '{NF--; print}' wordslist.txt
aš toto
ir
tai
tu
kad
Note:
This will only work in some awks. Per POSIX incrementing NF adds a null field but decrementing NF is undefined behavior (thanks #EdMorton for the info)
This doesn't check if last column is numeric and field separation in output will be single space only
If there can be empty lines in input file, use awk 'NF{NF--}1'
The following works :
sed -r 's/ [0-9]+$//g' wordslist.txt

adding a comma in the text file

I have a text file in the following format
group1: 2010EL-1749 2010EL-1749_00001 3554-08 3554-08_01855 2010EL-1749_00002
group2: 2010EL-1749 2010EL-1749_00002 3554-08 3554-08_01856 2010EL-1749_00001
group7: 3554-08 2010EL-1749_00001 3554-08_01855
And would like to add a comma in between the ids as shown below
group1: 2010EL-1749,2010EL-1749_00001,3554-08,3554-08_01855,2010EL-1749_00002
group2: 2010EL-1749,2010EL-1749_00002,3554-08,3554-08_01856,2010EL-1749_00001
group7: 3554-08,2010EL-1749_00001,3554-08_01855
In AWK, replace all spaces with commas and then the first comma back to space:
awk 'gsub(/ /,",") && sub(/,/," ")' testfile
or using gensub:
awk '$0=gensub(/([^:]) /,"\\1,","g")' testfile
$ sed 's/ /,/g; s/,/ /' textfile
group1: 2010EL-1749,2010EL-1749_00001,3554-08,3554-08_01855,2010EL-1749_00002
group2: 2010EL-1749,2010EL-1749_00002,3554-08,3554-08_01856,2010EL-1749_00001
group7: 3554-08,2010EL-1749_00001,3554-08_01855
This works by changing all spaces to commas: s/ /,/g. It then changes the first comma back to a space: s/,/ /.
s/,/ / is an example of a substitute command. The form is s/old/new/ where old is a regular expression and the first match for old is replaced with new. If we add a g to the end of the command, like s/ /,/g, then not just the first is replaced: all non-overlapping matches are replaced.
This approach assumes that no ID contains a space and no group name contains a comma.
To change the file in place:
sed -i.bak 's/ /,/g; s/,/ /' textfile
Alternatives
As suggested by sp asic in the comments, if we can assume that all IDs end with a number, then:
sed 's/\([0-9]\) /\1,/g' textfile
Or, if instead we can assume that only groups, not IDs, end with a colon (Hat tip: James Brown):
sed 's/\([^:]\) /\1,/g' testfile

how to strip(removed) word on perl or unix command line

I want to remove domain name from my device list file. I have (device.marketing.company.com) and some of them (device.company.com) on my lists of devices. I only need the device name and the ip address. so how I can strip the domain name and sub domain name from the list. and keep the device name and ip address
device1.marketing.company.com 10.1.100.12
device2.marketing.company.com 10.1.100.13
device3.marketing.company.com 10.1.100.14
device4.marketing.company.com 10.1.100.15
device5.company.com 10.1.100.16
device6.company.com 10.1.100.17
device7.company.com 10.1.100.18
The result I am looking for
device1 10.1.100.12
device2 10.1.100.13
device3 10.1.100.14
device4 10.1.100.15
device5 10.1.100.16
device6 10.1.100.17
device7 10.1.100.18
thanks,
Update 2
Use sed '/\..* /s/\.[^ ]*//' ./devlist
Input
$ cat ./devlist
device1.marketing.company.com 10.1.100.12
device2.marketing.company.com 10.1.100.13
device3.marketing.company.com 10.1.100.14
device4.marketing.company.com 10.1.100.15
device5.company.com 10.1.100.16
device6.company.com 10.1.100.17
device7.company.com 10.1.100.18
device8 10.1.100.19
Output
$ sed '/\..* /s/\.[^ ]*//' ./devlist
device1 10.1.100.12
device2 10.1.100.13
device3 10.1.100.14
device4 10.1.100.15
device5 10.1.100.16
device6 10.1.100.17
device7 10.1.100.18
device8 10.1.100.19
sed 's/\([^.]*\)\..* \(.*\)/\1 \2/'
Explanation below
\( #Start remembering
[^.]* #A collection of things that don't include periods
\) #Stop remembering
\..* #A period, anything else, and finally a space
\(.*\) #Remember everything after the space
/\1 \2/ #Print out the first section remembered, a space, and the second section
Using awk instead of sed/cut is a better choice.
$ cat devicefile
device1.marketing.com
device2.marketing.company.com 10.1.100.1
device3.marketing.company.com 10.1.100.2
device4.marketing.company.com 10.1.100.3
$ awk '{split($1,a,".");$1=a[1]}1' devicefile
device1
device2 10.1.100.1
device3 10.1.100.2
device4 10.1.100.3
You could regex out the device name into a separate list. Something to the affect of "^(\w*)[.].*$"
For example if you wanted to use sed:
sed 's/^\(\w*\)[.].*$/\1/' listfile.txt > output.txt
Breaking down the expression:
s/ We're using sed (I honestly am
not sure what this does)
^ Starting
the line with
\( \) Escaped
brackets. Any pattern found between
these can be gotten with the
placeholder \1
\w* Any alphanumeric character [A-Za-z0-9_]
[.] A literal period
$ Marks the end of the line
/ Separates the find pattern from the result pattern
\1 Gets the placeholder for the matched pattern between the brackets \( and \)
/ Ends the sed expression