How to find specific number patterns in a data file [closed] - perl

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a data file that looks like this
15105021
15105043
15106013
15106024
15106035
15105024
15105042
15106015
15106021
15106034
and I need to grep lines that have sequence numbers like 1510603, 1510504
I tried this awk command
awk /[1510603,1510504]/ soursefile.txt
but it does not work.

Using egrep and word boundary on LHS since OP wants to match all matching numbers on RHS:
egrep '\b(1510603|1510504)' file
15105043
15106035
15105042
15106034

An shorter awk
awk '/1510603|1510504/' file

Based on the contents of your file the following should suffice
grep -E '^1510603|^1510504' file
If your grep version does not support the -E flag, try egrep instead of grep
If you insist on awk
awk '/^1510603/ || /^1510504/' file

Think this works:
egrep '1510603|1510504' source

Your question is very poorly stated, but if you want to print all numbers in the file that begin with either 1510603 or 1510504, then you can write this in Perl
perl -ne 'print if /^1510(?:603|504)/' sourcefile.txt

Related

Linux shell script, parsing each line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am facing a problem with my shell script (I'm using SH):
I have a file with multiple line including mail adressess, for example:
abcd
plm
name_aA.2isurnamec#Text.com -> this is a line that checks the correct condition
random efgh
aaaaaa
naaame_aB.3isurnamec#Text.ro ->same (this is not part of the file)
I have used grep to filter the correct mail adresses like this:
grep -E '^[a-z][a-zA-Z_]*.[0-9][a-zA-Z0-9]+#[A-Z][A-Z0-9]{,12}.(ro|com|eu)$' file.txt
I have to write a shell that cheks the file and prints the following (for the above example it would be like this ):
"Incorrect:" abcd
"Incorrect:" plm
"Correct:" name_aA.2isurnamec#Text.com
"Incorrect:" random efgh
"Incorrect:" aaaaaa
"Correct:" naaame_aB.3isurnamec#Text.ro
I want to solve this problem using grep or sed, while, if, or pipes etc i dont want to use lists or other things.
I have tried using something like this
grep condition abc.txt | while read -r line ; do
echo "Processing $line"
# your code goes here
done
but it only prints the correct lines, and i know that i can also print the lines that dont match the grep condition using -v on grep, but i want to print the lines in the order they appear in the text file.
I'm having trouble trying to parse each line of the file, or maybe i don't need to parse the lines 1
by 1, i really dont know how to solve it.
If you could help me i would appreciate it.
Thanks
#!/bin/bash
pattern='^[a-z][a-zA-Z_]*\.[0-9][a-zA-Z0-9]+#[A-Z][A-Za-z0-9]{,12}\.(ro|com|eu)$'
while read line; do
if [ "$line" ]; then
if echo "$line" | grep -E -q $pattern; then
echo "\"Correct:\" $line"
else
echo "\"Incorrect:\" $line"
fi
fi
done
Invoke like this, assuming the bash script is called filter and the text file, text.txt: ./filter < text.txt.
Note that the full stops in the regular expression are escaped and that the domain name can contain lowercase letters (although, I think that your regex is too restrictive). Other characters are not escaped because the string is in single quotes.
while reads the standard input line by line into $line; the first if skips the empty lines; the second one checks $line against $pattern (-q suppresses grep output).

Find and replace a string in Perl [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
I have the following command line:
perl -i -pe 's/_GSV*//g' file.fasta
My goal is change some sequences that have the following pattern:
GSVIVG01006342001_GSVIVT01006342001
I want to find all sequences that starts with _GSV and finish with anything (that`s why I put the '*') and substitute for nothing.
When I run my command it just recognize the _GSV and return to me that:
GSVIVG01006342001IVT01006342001
and I want that:
GSVIVG01006342001
Can anybody tell me what's wrong with my command line?
before the *, include a dot that means any character
perl -i -pe 's/_GSV.*//g' file.fasta
You can also include the symbol $ to ensure you arrive until the end of the string
perl -i -pe 's/_GSV.*$//g' file.fasta

How to remove a string between a pattern using sed / awk command in linux [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
How can I remove the letter f from the below mentioned string in a file:
a;b;c;d;e;f;g;h;i;j;k;l;m
This needs to be done only by using delimiter ; using sed or awk.
The output will be:
a;b;c;d;e;g;h;i;j;k;l;m
This might work for you (GNU sed):
sed 's/[^;];//6' file
$ echo 'a;b;c;d;e;f;g;h;i;j;k;l;m' | sed 's/;*f;*/;/'
a;b;c;d;e;g;h;i;j;k;l;m
easier using perl pie than sed (unless sed has added an inplace-edit flag in the last 20 years).
perl -p -i -e 's/;f;/;/' fileName.txt
sed 's/f;//' YourFile
be carefull if f is only a sample pattern for the sample due to possible special character in a généric pattern

Extract E-mail Addresses from Text File (SED? AWK?) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a file of e-mail addresses harvested from Outlook so that the addresses in the harvested form show up like this:
-A#b.com
-C#d.com
-A#b.com,JOHN DOE, RICHARD ROE,"\O=USERS:SAM"
etc.
What I would like to end up with is a text file that has one validly formed address on each line. So A#b.com would be OK, but "RICHARD ROE" and the "\O=USERS,etc." would not be. Perhaps this could be done with SED or AWK?
Here's one way with GNU awk given your posted input file:
$ gawk -v RS='[[:alnum:]_.]+#[[:alnum:]_]+[.][[:alnum:]]+' 'RT{print RT}' file
A#b.com
C#d.com
A#b.com
It just finds simple email addresses, e.g. "bob#the_moon.net" or "Joe.Brown#google.com", feel free to change the setting of RS if you can figure out an appropriate RE to capture the more esoteric email addresses that are allowed or post a more representative input file if you have examples. here's another RE that works by specifying what character cannot be in the parts of an email address rather than those that can:
$ gawk -v RS='[^[:space:][:punct:]]+#[^[:space:][:punct:]]+[.][^[:space:][:punct:]]+' 'RT{print RT}' file
A#b.com
C#d.com
A#b.com
Again it works with your posted sample, but may not with others. Massage to suit...
With other awks you can do the same by setting FS or using match() and looping.
You can try:
awk -F, '{
for (i=1; i<=NF; i++)
if ($i ~ /#/)
print $i
}' file
or like this:
awk -F, -f e.awk file
where e.awk is:
{
for (i=1; i<=NF; i++)
if ($i ~ /#/)
print $i
}

How to double quote all fields in a text file? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm looking for a quick and efficient way to double quote all fields in tab delimited or comma separated text files.
Ideally, this would be a Perl one-liner that I can run from the command-line, but I'm open to any kind of solution.
Use Text::CSV:
perl -MText::CSV -e'
my $c = Text::CSV->new({always_quote => 1, binary => 1, eol => "\n"}) or die;
$c->print(\*STDOUT, $_) while $_ = $c->getline(\*ARGV)' <<'END'
foo,bar, baz qux,quux
apple,"orange",spam, eggs
END
Output:
"foo","bar"," baz qux","quux"
"apple","orange","spam"," eggs"
The always_quote option is the important one here.
If your file does not contain any double quoted strings containing the delimiter, you can use
perl -laF, -ne '$" = q(","); print qq("#F")'
awk -F, -v OFS='","' -v q='"' '{$0=q$0q;$1=$1}7' file
for example, comma sep:
kent $ echo "foo,bar,baz"|awk -F, -v OFS='","' -v q='"' '{$0=q$0q;$1=$1}7'
"foo","bar","baz"
tab sep would be similar.