print the names which are in both files [closed]

print the names which are in both files [closed] - sed

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have two files. I would like to print the names which are in both files.
file1
1dfg
4rte
aabd
hjgf
file2
4rte
2fgh
1dfg
desired output
1dfg
4rte

One way:
$ comm -12 <(sort file1) <(sort file2)
1dfg
4rte

grep can do that:
grep -f file2 file1
Results:
1dfg
4rte
However, awk may be more appropriate depending on the significance of whitespace:
awk 'FNR==NR { a[$0]; next } $0 in a' file2 file1
or:
awk 'FNR==NR { a[$1]; next } $1 in a' file2 file1

try
cat file1 | grep -Fxf file2
and if you want it printed without grep search highlighting
cat file1 | grep -Fxf file2 | awk '{print $1}'

Other way using awk
awk 'FNR==NR{a[$1]+=1;next} a[$1]' file1.txt file2.txt

Related

Linux shell script, parsing each line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am facing a problem with my shell script (I'm using SH):
I have a file with multiple line including mail adressess, for example:
abcd
plm
name_aA.2isurnamec#Text.com -> this is a line that checks the correct condition
random efgh
aaaaaa
naaame_aB.3isurnamec#Text.ro ->same (this is not part of the file)
I have used grep to filter the correct mail adresses like this:
grep -E '^[a-z][a-zA-Z_]*.[0-9][a-zA-Z0-9]+#[A-Z][A-Z0-9]{,12}.(ro|com|eu)$' file.txt
I have to write a shell that cheks the file and prints the following (for the above example it would be like this ):
"Incorrect:" abcd
"Incorrect:" plm
"Correct:" name_aA.2isurnamec#Text.com
"Incorrect:" random efgh
"Incorrect:" aaaaaa
"Correct:" naaame_aB.3isurnamec#Text.ro
I want to solve this problem using grep or sed, while, if, or pipes etc i dont want to use lists or other things.
I have tried using something like this
grep condition abc.txt | while read -r line ; do
echo "Processing $line"
# your code goes here
done
but it only prints the correct lines, and i know that i can also print the lines that dont match the grep condition using -v on grep, but i want to print the lines in the order they appear in the text file.
I'm having trouble trying to parse each line of the file, or maybe i don't need to parse the lines 1
by 1, i really dont know how to solve it.
If you could help me i would appreciate it.
Thanks

#!/bin/bash
pattern='^[a-z][a-zA-Z_]*\.[0-9][a-zA-Z0-9]+#[A-Z][A-Za-z0-9]{,12}\.(ro|com|eu)$'
while read line; do
if [ "$line" ]; then
if echo "$line" | grep -E -q $pattern; then
echo "\"Correct:\" $line"
else
echo "\"Incorrect:\" $line"
fi
fi
done
Invoke like this, assuming the bash script is called filter and the text file, text.txt: ./filter < text.txt.
Note that the full stops in the regular expression are escaped and that the domain name can contain lowercase letters (although, I think that your regex is too restrictive). Other characters are not escaped because the string is in single quotes.
while reads the standard input line by line into $line; the first if skips the empty lines; the second one checks $line against $pattern (-q suppresses grep output).

Should I use cut or awk to extract fields and field substrings?

I have a file with pipe-separated fields. I want to print a subset of field 1 and all of field 2:
cat tmpfile.txt
# 10 chars.|variable length num|text
ABCDEFGHIJ|99|U|HOMEWORK
JIDVESDFXW|8|C|CHORES
DDFEXFEWEW|73|B|AFTER-HOURS
I'd like the output to look like this:
# 6 chars.|variable length num
ABCDEF|99
JIDVES|8
DDFEXF|73
I know how to get fields 1 & 2:
cat tmpfile.txt | awk '{FS="|"} {print $1"|"$2}'
And know how to get the first 6 characters of field 1:
cat tmpfile.txt | cut -c 1-6
I know this is fairly simple, but I can't figure out is how to combine the awk and cut commands.
Any suggestions would be greatly appreciated.

You could use awk. Use the substr() function to trim the first field:
awk -F'|' '{print substr($1,1,6),$2}' OFS='|' inputfile
For your input, it'd produce:
ABCDEF|99
JIDVES|8
DDFEXF|73
Using sed, you could say:
sed -r 's/^(.{6})[^|]*([|][^|]*).*/\1\2/' inputfile
to produce the same output.

You could use cut and paste, but then you have to read the file twice, which is a big deal if the file is very large:
paste -d '|' <(cut -c 1-6 tmpfile.txt ) <(cut -d '|' -f2 tmpfile.txt )

Just for another variation: awk -F\| -vOFS=\| '{print $1,$2}' t.in | cut -c 1-6,11-
Also, as tripleee points out, two cuts can do this too: cut -c 1-6,11- t.in | cut -d\| -f 1,2

I like a combination of cut and sed, but that's just a preference:
cut -f1-2 -d"|" tmpfile.txt|sed 's/\([A-Z]\{6\}\)[A-Z]\{4\}/\1/g'
Result:
# 10-digits|variable length num
ABCDEF|99
JIDVES|8
DDFEXF|73
Edit: (Removed the useless cat) Thanks!

removing columns with perl on tab delimited file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am trying to remove three columns with perl on a tabulated file.
Input file:
A B C D
Expected/new file:
A B C
I saw in other question how to remove only one column, the answer being:
perl.exe -na -e "print qq{$F[3]\n}" < input
How could I rewrite this to remove three columns?
Thanks

Does this work for you:
perl.exe -na -e "print qq{#F[0..2]\n}" < input > newfile

Use perl in awk-mode:
$ cat -T f1
a^Ib^Ic^Id^Ie^If
a^Ib^Ic^Id^Ie^If
a^Ib^Ic^Id^Ie^If
$ perl -F'\t' -lane 'print $F[0],"\t",$F[1],"\t",$F[2]' input
a b c
a b c
a b c
or space-separated:
$ perl -F'\t' -lane 'print qq{#F[0..2]}' input
a b c
a b c
a b c
or to print the first three columns, tab-separated in awk
$ awk 'BEGIN{OFS="\t"}{print $1, $2, $3}' input
a b c
a b c
a b c

perl -lane "pop #F; print qq(#F)" input

Here's another option (Perl v5.14+):
perl -lne "print s/.+\K\s+\S$//r" inFile

How to find specific number patterns in a data file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a data file that looks like this
15105021
15105043
15106013
15106024
15106035
15105024
15105042
15106015
15106021
15106034
and I need to grep lines that have sequence numbers like 1510603, 1510504
I tried this awk command
awk /[1510603,1510504]/ soursefile.txt
but it does not work.

Using egrep and word boundary on LHS since OP wants to match all matching numbers on RHS:
egrep '\b(1510603|1510504)' file
15105043
15106035
15105042
15106034

An shorter awk
awk '/1510603|1510504/' file

Based on the contents of your file the following should suffice
grep -E '^1510603|^1510504' file
If your grep version does not support the -E flag, try egrep instead of grep
If you insist on awk
awk '/^1510603/ || /^1510504/' file

Think this works:
egrep '1510603|1510504' source

Your question is very poorly stated, but if you want to print all numbers in the file that begin with either 1510603 or 1510504, then you can write this in Perl
perl -ne 'print if /^1510(?:603|504)/' sourcefile.txt

Perl Pattern matching and appending [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
In perl I want to achieve the following translation:
stmt1; gosub xyz;
to
stmt1; xyz();
How can I do this?

The answers already given has provided the approximative answer, this will deal with your edge cases (missing semi-colons, additional clauses after semi-colons).
perl -plwe 's/\bgosub\s+([^;]+)/$1()/g'
It will match any sequence of characters after the gosub keyword followed by whitespace that are "not semi-colon" and remove them. I also added the /g global modifier, as it seems likely that you'd want to do all replacements possible on a single line. Note the use of word boundary \b to prevent partial matches, e.g. not replace legosub.
If the word boundary is not sufficient, e.g. it will replace 1.gosub because . causes a break between word characters, you can use a negative lookbehind instead:
perl -plwe 's/(?<![^;\s])gosub\s+([^;]+)/$1()/g'
This requires that any character before gosub is not anything except semi-colon or whitespace. Note that the double negation also allows for non-matches (beginning of line).

Run from the command line on the file you want to edit (replaceing file.ext):
perl -i.bk -pe 's/gosub (.*?);/$1()/g' file.ext

my $str = 'stmt1; gosub xyz;';
$str =~ s/gosub (.*?);?/$1();/;
print $str;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

print the names which are in both files [closed] - sed

One way: $ comm -12 <(sort file1) <(sort file2) 1dfg 4rte

grep can do that: grep -f file2 file1 Results: 1dfg 4rte However, awk may be more appropriate depending on the significance of whitespace: awk 'FNR==NR { a[$0]; next } $0 in a' file2 file1 or: awk 'FNR==NR { a[$1]; next } $1 in a' file2 file1

try cat file1 | grep -Fxf file2 and if you want it printed without grep search highlighting cat file1 | grep -Fxf file2 | awk '{print $1}'

Other way using awk awk 'FNR==NR{a[$1]+=1;next} a[$1]' file1.txt file2.txt

Related

Linux shell script, parsing each line [closed]

Should I use cut or awk to extract fields and field substrings?

removing columns with perl on tab delimited file [closed]

How to find specific number patterns in a data file [closed]

Perl Pattern matching and appending [closed]

Categories

Resources