how to find a duplicate value between two files and print

how to find a duplicate value between two files and print - perl

I have two files, one with a single IP address (which I have already used perl to strip the ip) and one that has ip's with more info. I need to do a commpare or use perl and find the duplicate IP in each file. but I need the second file with more info to remain in tact and when a duplicate is found print the entire line of the second file.
file1 content example (just ip no comma etc)
114.42.141.131
file2 content example (need all this info to print when match found)
114.42.141.131,Host TW,Taipei,25.0391998291,121.525001526
This is a little beyond my skills. Any help would be greatly appreciated!!!
Thank you!

To match on the first field, all you need is:
awk -F, 'FNR==NR { a[$1]; next } $1 in a' file1 file2

I assume you have shell access.
If the first file contains only the IP, then you can do something like:
REF_IP=`cat file1`
Then, you can use grep from the second file:
grep "${REF_IP}" file2
The result should be the line with the duplicated address.
Note: The actual syntax might be slightly different (I don't have access to a shell right now)
HTH

take a look this oneliner, if it is what you want:
Note, this will print duplicated ip line in file2 only once. also assume there is no duplicated ips in file2.
awk -F, 'NR==FNR{p[$1]=$0;next}{a[$0]++}END{for(x in a)if (a[x]>1)print p[x]}' file2 file1
little test:
kent$ head f1 f2
==> f1 <==
1.1.1.1
1.1.1.1
1.1.1.1
2.2.2.2
==> f2 <==
1.1.1.1,Host TW,Taipei,25.0391998291,121.525001526
2.2.2.2,this is for 2.2.
kent$ awk -F, 'NR==FNR{p[$1]=$0;next}{a[$0]++}END{for(x in a)if (a[x]>1)print p[x]}' f2 f1
1.1.1.1,Host TW,Taipei,25.0391998291,121.525001526

Related

how to replace with sed when source contains $

I have a file that contains:
$conf['minified_version'] = 100;
I want to increment that 100 with sed, so I have this:
sed -r 's/(.*minified_version.*)([0-9]+)(.*)/echo "\1$((\2+1))\3"/ge'
The problem is that this strips the $conf from the original, along with any indentation spacing. What I have been able to figure out is that it's because it's trying to run:
echo " $conf['minified_version'] = $((100+1));"
so of course it's trying to replace the $conf with a variable which has no value.

Here is an awk version:
$ awk '/minified_version/{$3+=1} 1' file
$conf['minified_version'] = 101
This looks for lines that contain minified_version. Anytime such a line is found the third field, $3, is incremented by.

My suggested approach to this would be to have a file on-disk that contained nothing but the minified_version number. Then, incrementing that number would be as simple as:
minified_version=$(< minified_version)
printf '%s\n' "$(( minified_version + 1 ))" >minified_version
...and you could just put a sigil in your source file where that needs to be replaced. Let's say you have a file named foo.conf.in that contains:
$conf['minified_version'] = #MINIFIED_VERSION#
...then you could simply run, in your build process:
sed -e "s/#MINIFIED_VERSION#/$(<minified_version)/g" <foo.conf.in >foo.conf
This has the advantage that you never have code changing foo.conf.in, so you don't need to worry about bugs overwriting the file's contents. It also means that if you're checking your files into source control, so long as you only check in foo.conf.in and not foo.conf you avoid potential merge conflicts due to context near the version number changing.
Now, if you did want to do the native operation in-place, here's a somewhat overdesigned approach written in pure native bash (reading from infile and writing to outfile; just rename outfile back over infile when successful to make this an in-place replacement):
target='$conf['"'"'minified_version'"'"'] = '
suffix=';'
while IFS= read -r line; do
if [[ $line = "$target"* ]]; then
value=${line##*=}
value=${value%$suffix}
new_value=$(( value + 1 ))
printf '%s\n' "${target}${new_value}${suffix}"
else
printf '%s\n' "$line"
fi
done <infile >outfile

Match a string, skip if it has a . (DOT) infront of the result

Here's what I use to match a string in a variable and delete the line where the match exists:
sed -i '/'"$domainAndSuffix.cfg"'/d' /etc/file
I'd like to know how to match a string in a variable, but if the match in the file has a . adjacent to it on the immediate left, then it will NOT delete this line and keep going through the file until it finds a match without a .
Sample file Contents:
happy.domain.com
pappy.domain.com
domain.com
String to match:
domain.com
Desired File Output:
happy.domain.com
pappy.domain.com
*Edit:
Actual File Contents:
cfg_file=/etc/nagios/objects/http_url/bob.ca.cfg
cfg_file=/etc/nagios/objects/http_url/therecord.com.cfg
cfg_file=/etc/nagios/objects/http_url/events.therecord.com.cfg
cfg_file=/etc/nagios/objects/http_url/read.therecord.com.cfg
cfg_file=/etc/nagios/objects/http_url/wheels.ca.cfg
cfg_file=/etc/nagios/objects/http_url/used-vehicle-search.autos.ca.msn.com.cfg
cfg_file=/etc/nagios/objects/http_url/womensweekendshow.com.cfg
cfg_file=/etc/nagios/objects/http_url/yorkregion.com.cfg
cfg_file=/etc/nagios/objects/http_url/yourclassifieds.ca.cfg

If the preceding substring is fixed, you can try the following:
PREFIX='cfg_file=\/etc\/nagios\/objects\/http_url\/'
DOMAIN='therecord.com'
sed -i "/^${PREFIX}${DOMAIN}/d" file
If it is not fixed, it would be nice to use a negative lookbehind, but sed can't do that. You can use ssed or GNU grep:
ssed -Ri '/(?<!\.)'"$DOMAIN"'.cfg/d' file
or
grep -vP '(?<!\.)'"$DOMAIN" > file1; mv file1 file

Sed/Awk script to append/insert?

I have a configuration file that looks like the example below. There are a series of definitions grouped by hostname. I just added the "cpu-service" definition to one host "mothership". Now I need to do this to 100+ more in the same file. What I have already done is scraped from config file all pre-existing host names (100+). So now I have a file with the list of servers that now need to have the cpu-service define comment. They already have ping-service so I just want to add the cpu-service to each one. Obviously manually doing this by hand would be tedious.
Is there a sed/awk script I could use to do this type of work. Basically I need to maybe write a skel file with the define part and leave host_name blank. Then feed the host.txt file into that. I could maybe hack this with some VI trickery as well. Not sure?
Thanks in advance!
define{
use cpu-service
host_name mothership
contact_groups systems manager
}
define{
use ping-service
host_name mothership
contact_groups systems manager
}

Although I got the slight feeling to do your work, try the script below:
awk '
BEGIN {
RS = ORS = "}\n"
FS = "\n"
}
NF > 0 {
print
if (sub(/ping-service/, "cpu-service")) print
}
' file
One tradeoff: Somehow I get a trailing "}" but it is not worth worrying about, unless you got to make that every day - just remove it with an editor.
As always with awk: If your vendor ships an historic version of awk you may want to use nawk.

Three steps mister:
1: Host name file (hostnames.txt)
mothership
motherload
motherofpeal
mothersbaugh
2: script (hostup.sh)
#!/bin/bash
HOSTNAME=$1
TEMPLATE="
define{
use cpu-service
host_name ${HOSTNAME}
contact_groups systems manager
}
define{
use ping-service
host_name ${HOSTNAME}
contact_groups systems manager
}"
echo "${TEMPLATE}"
3: command line
chmod +x hostup.sh
while read name; do hostup.sh $name; done < hostnames.txt
while read name; do hostup.sh $name; done < hostnames.txt >> hosts.conf

Sed can insert newlines, just backslash escape them - e.g. the following will go through each line in your 'hosts' file, and replace it with a full definition for the cpu-service. I'm not sure if this is exactly what you want.
sed -e 's/^(.*)$/define{\
use cpu-service\
host_name \1\
contact_groups systems manager\
}/g' hosts.txt > new_directives
if you're happy with new_directives then you can just
cat new_directives >> config_file
NOTE you may get issues with blank/trailing newlines.

Substituting environment variables in a file: awk or sed?

I have a file of environment variables that I source in shell scripts, for example:
# This is a comment
ONE=1
TWO=2
THREE=THREE
# End
In my scripts, I source this file (assume it's called './vars') into the current environment, and change (some of) the variables based on user input. For example:
#!/bin/sh
# Read variables
source ./vars
# Change a variable
THREE=3
# Write variables back to the file??
awk 'BEGIN{FS="="}{print $1=$$1}' <./vars >./vars
As you can see, I've been experimenting with awk for writing the variables back, sed too. Without success. The last line of the script fails. Is there a way to do this with awk or sed (preferably preserving comments, even comments with the '=' character)? Or should I combine 'read' with string cutting in a while loop or some other magic? If possible, I'd like to avoid perl/python and just use the tools available in Busybox. Many thanks.
Edit: perhaps a use case might make clear what my problem is. I keep a configuration file consisting of shell environment variable declarations:
# File: network.config
NETWORK_TYPE=wired
NETWORK_ADDRESS_RESOLUTION=dhcp
NETWORK_ADDRESS=
NETWORK_ADDRESS_MASK=
I also have a script called 'setup-network.sh':
#!/bin/sh
# File: setup-network.sh
# Read configuration
source network.config
# Setup network
NETWORK_DEVICE=none
if [ "$NETWORK_TYPE" == "wired" ]; then
NETWORK_DEVICE=eth0
fi
if [ "$NETWORK_TYPE" == "wireless" ]; then
NETWORK_DEVICE=wlan0
fi
ifconfig -i $NETWORK_DEVICE ...etc
I also have a script called 'configure-network.sh':
#!/bin/sh
# File: configure-network.sh
# Read configuration
source network.config
echo "Enter the network connection type:"
echo " 1. Wired network"
echo " 2. Wireless network"
read -p "Type:" -n1 TYPE
if [ "$TYPE" == "1" ]; then
# Update environment variable
NETWORK_TYPE=wired
elif [ "$TYPE" == "2" ]; then
# Update environment variable
NETWORK_TYPE=wireless
fi
# Rewrite configuration file, substituting the updated value
# of NETWORK_TYPE (and any other updated variables already existing
# in the network.config file), so that later invocations of
# 'setup-network.sh' read the updated configuration.
# TODO
How do I rewrite the configuration file, updating only the variables already existing in the configuration file, preferably leaving comments and empty lines intact? Hope this clears things up a little. Thanks again.

You can't use awk and read and write from the same file (is part of your problem).
I prefer to rename the file before I rewrite (but you can save to a tmp and then rename too).
/bin/mv file file.tmp
awk '.... code ...' file.tmp > file
If your env file gets bigger, you'll see that is is getting truncated at the buffer size of your OS.
Also, don't forget that gawk (the std on most Linux installations) has a built in array ENVIRON. You can create what you want from that
awk 'END {
for (key in ENVIRON) {
print key "=" ENVIRON[key]
}
}' /dev/null
Of course you get everything in your environment, so maybe more than you want. But probably a better place to start with what you are trying to accomplish.
Edit
Most specifically
awk -F"=" '{
if ($1 in ENVIRON) {
printf("%s=%s\n", $1, ENVIRON[$1])
}
# else line not printed or add code to meet your situation
}' file > file.tmp
/bin/mv file.tmp file
Edit 2
I think your var=values might need to be export -ed so they are visible to the awk ENVIRON array.
AND
echo PATH=xxx| awk -F= '{print ENVIRON[$1]}'
prints the existing value of PATH.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.

I don't exactly know what you are trying to do, but if you are trying to change the value of variable THREE ,
awk -F"=" -vt="$THREE" '$1=="THREE" {$2=t}{print $0>FILENAME}' OFS="=" vars

You can do this in just with bash:
rewrite_config() {
local filename="$1"
local tmp=$(mktemp)
# if you want the header
echo "# File: $filename" >> "$tmp"
while IFS='=' read var value; do
declare -p $var | cut -d ' ' -f 3-
done < "$filename" >> "$tmp"
mv "$tmp" "$filename"
}
Use it like
source network.config
# manipulate the variables
rewrite_config network.config
I use a temp file to maintain the existance of the config file for as long as possible.

Best way to parse this particular string using awk / sed?

I need to get a particular version string from a file (call it version.lst) and use it to compare another in a shell script. For example sake, the file contains lines that look like this:
V1.000 -- build date and other info here -- APP1
V1.000 -- build date and other info here -- APP2
V1.500 -- build date and other info here -- APP3
.. and so on. Let's say I am trying to grab the first version (in this case, V1.000) from APP1. Obviously, the versions can change and I want this to be dynamic. What I have right now works:
var = `cat version.lst | grep " -- APP1" | grep -Eo V[0-9].[0-9]{3}`
Pipe to grep will get the line containing APP1 and the second pipe to grep will get the version string. However, I hear grep is not the way to do this so I'd like to learn the best way using awk or sed. Any ideas? I am new to both and haven't found a tutorial easy enough to learn the syntax of it. Do they support egrep? Thanks!

Try this to get the complete version:
#!/bin/sh
app=APP1
var=$(awk -v "app=$app" '$NF == app {print $1}' version.lst)
or to get only the major version number, the last line could be:
var=$(awk -v "app=$app" '$NF == app {split($1,a,"."); print a[1]}' version.lst)
Using sed to get the complete version:
var=$(sed -n "/ $app\$/s/^\([^ ]*\).*/\1/p" version.lst)
or this to get only the major version number:
var=$(sed -n "/ $app\$/s/^\([^.]*\).*/\1/p" version.lst)
Explanations:
The second AWK command:
-v "app=$app" - set an AWK variable equal to a shell variable
$NF == app - if the last field is equal to the contents of the variable (NF is the number of field, so $NF is the contents of the NFth field)
{split($1,a,".") - then split the first field at the dot
print a[1] - and print the first part of the result of the split
The sed commands:
-n - don't print any output unless directed to
"/ $app\$/ - for any line that ends with (\$) the contents of the shell variable $app (not that double quotes are used to allow the variable to be expanded and it's a good idea to escape the second dollar sign)
s/^\([^ ]*\).*/\1/p" - starting at the beginning of the line (^), capture \(\) the sequence of characters that consists of non-spaces ([^ ]) (or non-dots in the second version) of any number (zero or more *) and match but don't capture all the rest of the characters on the line (.*), replace the matched text (the whole line in this case) with the string that was captured (the version number) (\1 refers to the first (only, in this case) capture group, and print it (p)

If I understood correctly: egrep "APP1$" version.lst | awk '{print $1}'

$ awk '/^V1\.00.* APP1$/{print $NF}' version.lst
APP1
That regular expression matches lines that start with "V1.00", followed by any number of any other characters, ending with " APP1". The backslash in the middle there might be really important--it matches only ".", and so it excludes (probably corrupt) lines that might begin with, say, "V1a00". The space before "APP1" excludes things like "APP2_APP1".
"NF" is an automatically generated variable that contains the number of field in the input line. It's also the number of the last field, which happens to be the one you're interested in.
There are a couple of ways to prune off the "V1". Here's one way, although you and I might not be talking about quite the same thing.
$ awk '/^V1\.00.* APP1$/{print substr($1, 1, index($1, ".") - 1), $NF}' version.lst
V1 APP1