Insert new line after any amount of numeric characters - sed

I need to insert a new line, or delimiter, in a text file after a "numeric" string consisting of 10 numbers, then a "-", then either 1 to 4 numbers...
Example:
randomtext,1234567890-1234blahblah
Should be:
randomtext,1234567890-1234, blahblah
Or:
randomtext,1234567890-1234
blahblah
Note that the set of numbers will always be 10 characters, the numbers after the - will either 1,2,3 or 4 characters.
I've used sed a lot for similar tasks, but can't find a way to work with the last set of numbers which vary from 1 to 4 characters....
I really hope someone can help!
Many thanks!

$ echo randomtext,1234567890-1234blahblah |
sed -E 's/[0-9]{10}+-[0-9]{1,4}/&\n/'
randomtext,1234567890-1234
blahblah

Related

Using sed to replace a number located between two other numbers

I need to replace a numeric value, that occurs in a specific line of a series of config files in a pattern like this:
string number_1 number_to_replace number_2
I want to obtain something like this:
string number_1 number_replaced number_2
The difficulties I encountered are:
number_1 or number_2 can be equal to number_to_replace, so a simple replacement is not possible.
number_1 and number_2 vary between config files so I don't know them in advance.
The closest attempt I got until now is:
echo "field 4 4 4" | sed 's/\s4\s/3/'
Which ouputs:
field34 4
This is close, given that I want to replace the intermediate number I added another "\s" to try to use the known fact that the line starts with a character.
echo "field 4 4 4" | sed 's/\s\s4\s/3/'
Which gives:
field 4 4 4
So, nothing is replaced this time. How can I proceed? A somewhat detailed explanation would be ideal, because my knowledge of replacing expressions that involve patterns in nearly zero.
Thanks.
You can do something like below, which matches your exact sequence of digits as in the example. You could replace 3 with any digit of your choice.
sed 's/\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)/\1 3 \3/'
Notice that I've used the POSIX bracket expression to match the whitespace character which should be supported in any variant of sed you are using. Note that \s is supported in only the GNU variants.
The literal meaning of the regex definition is to match a single digit followed by a space, then a digit and space and another digit. The captured groups are stored from \1. Since your intention is to remove the 2nd digit, you replace that with the word of your choice.
If the extra escapes causes it unreadable, use the -E flag for extended regex support. I've used the default BRE version

Extracting Specific Variables from a | delimited .txt and extracting them to a new .txt

I have a .txt that is say 100,000 rows (observations) by 50 columns (variables), and the variables are | delimited. I would like to extract the 8th and 9th variables (or 7 and 8 if the indexing were to start at 0). In doing so, I'd like to create a new .txt that is 100,000 rows (the same observations) by 2 columns (these 2 variables) in which these 2 variables remain | delimited.
For example, the data in one row is formatted as:
var1|var2|var3|var4|var5|var6|var7|var8|var9|var10|var11 .........
I'd like to create a .txt with this row being:
var7|var8
I've tried:
$ perl -wplaF'|' -e'$_ = join "|", #F[7, 8]' fileoriginal.txt > filenew.txt
This output is just kind of gibberish, however.
Any help would be greatly appreciated!
The argument to -F is compiled into a regular expression, and | is a special character in regular expressions. To use a literal | char, you need to escape it on the command line.
One of
perl -F\\\| -wlape ...
perl -F'\|' -wlape ...
does the trick on Unix.

Append different number of spaces at the end of a string

I want to add different number of spaces after a string:
I have used
echo "444rrrr" | sed 's/$/ /'
This adds 5 space after "444rrrr". Since I do not know the number of spaces that I have to add before hand. Is there away to tell the "sed" command to vary the spaces that I want to append at the end of each string ?
Thank you in advance.
see this, note the _ just for example, since spaces are not easy to see here. you can change it into space.
kent$ n=5
kent$ echo "444rrr"|awk -vn="$n" '{for(i=1;i<=n;i++)$0=$0 "_"}1'
444rrr_____

sed delete remaining characters in line except first 5

what would be sed command to delete all characters in line except first 5 leading ones, using sed?
I've tried going 'backwards' on this (reverted deleting) but it's not most elegant solution.
This might work for you (GNU sed):
echo '1234567890' | sed 's/.//6g'
12345
Or:
echo '1234567890' | cut -c-5
12345
Try this (takes 5 repetitions of 'any' character at the beginning of the line and save this in the first group, then take any number of repetition of any characters, and replace the matched string with the first group):
sed 's/^\(.\{5\}\).*/\1/'
Or the alternative suggested by mouviciel:
sed 's/^\(.....\).*/\1/'
(it is more readable as long as the number of first characters you want does not grow too large)

regular expression in sed for masking credit card

We need to mask credit card numbers.Masking all but last 4 digits. I am trying to use SED. As credit card number length varies from 12 digits to 19,I am trying to write regular expression.Following code will receive the String. If it contains String of the form "CARD_NUMBER=3737291039299199", it will mask first 12 digits.
Problem is how to write regular expression for credit card-12 to 19 digits long? If I write another expression for 12 digits, it doesn't work.that means for 12 digit credit card- first 8 digits should be masked. for 15 digit credit card, first 11 digits should be masked.
while read data; do
var1=${#data}
echo "Length is "$var1
echo $data | sed -e "s/CARD_NUMBER=\[[[:digit:]]\{12}/CARD_NUMBER=\[\*\*\*\*\*\*\*\*/g"
done
How about
sed -e :a -e "s/[0-9]\([0-9]\{4\}\)/\*\1/;ta"
(This works in my shell, but you may have to add or remove a backslash or two.) The idea is to replace a digit followed by four digits with a star followed by the four digits, and repeat this until it no longer triggers.
This does it in one sed command without an embedded newline:
sed -r 'h;s/.*([0-9]{4})/\1/;x;s/CARD_NUMBER=([0-9]*)([0-9]{4})/\1/;s/./*/g;G;s/\n//'
If your sed doesn't have -r:
sed 'h;s/.*\([0-9]\{4\}\)/\1/;x;s/CARD_NUMBER=\([0-9]*\)\([0-9]\{4\}\)/\1/;s/./*/g;G;s/\n//'
If your sed needs -e:
sed -e 'h' -e 's/.*\([0-9]\{4\}\)/\1/' -e 'x' -e 's/CARD_NUMBER=\([0-9]*\)\([0-9]\{4\}\)/\1/' -e 's/./*/g' -e 'G' -e 's/\n//'
Here's what it's doing:
duplicate the number so it's in pattern space and hold space
grab the last four digits
swap them into hold space and the whole number into pattern space
grap all but the last four digits
replace each digit with a mask character
append the last four digits from hold space to the end of the masked digits in pattern space (a newline comes along for free)
get rid of the newline
try this, you don't have to create complicated regex
var1="CARD_NUMBER=3737291039299199"
IFS="="
set -- $var1
cardnumber=$2
echo $cardnumber | awk 'BEGIN{OFS=FS=""}{for(i=1;i<=NF-4 ;i++){ $i="*"} }1'
output
$ ./shell.sh
************9199
I'm not much of a sed guru, and thus I cannot manage to do it in only one command, though there surely are ways. But with two sed commands, here is what I got:
sed -e 's/CARD_NUMBER=\([0-9]*\)\([0-9]\{4\}\)/\1\
\2/' | sed -e '1s/./x/g ; N ; s/\n//'
Please note the embedded newline.
Because sed works by lines, I first break the card number into the initial part and the last four digits, separating them by a newline (the first sed command). Then, I mask the initial part (1s/./x/g), and remove the new line (N ; s/\n//).
Good luck!