Can awk help detect sequence in a file? - sed

Ok, this is an easy one for the Pro's here; yet the answer has eluded me thus far...
Contents of my file contain comma separated values from which I need to extract a number. My problem is the sequence of surrounding values is important to return the correct number.
Example file contents:
car, 00, tar
foo, 01, bar
bar, 02, foo
foo, 04, car
Perhaps using awk or sed, help me to assign variable var to 01 based on the fact that foo appears before bar in the line. Assigning it to 02 would be wrong since bar appears before foo.
Apologies in advance if this is duplicate... I did search here and several other places online; thanks in advance! Also, I'm still trying to get the formatting correct using the various code and tag parms.

Not sure if I follow your question, but for searching a pattern and returning specific column value can be done easliy with awk.
You can play around with the following idea:
$ cat file
car, 00, tar
foo, 01, bar
bar, 02, foo
$ var=$(awk -F', ' '$1=="foo"{print $2}' file)
$ echo "$var"
01

Related

use perl to extract specific output lines

I'm endeavoring to create a system to generalize rules from input text. I'm using reVerb to create my initial set of rules. Using the following command[*], for instance:
$ echo "Bananas are an excellent source of potassium." | ./reverb -q | tr '\t' '\n' | cat -n
To generate output of the form:
1 stdin
2 1
3 Bananas
4 are an excellent source of
5 potassium
6 0
7 1
8 1
9 6
10 6
11 7
12 0.9999999997341693
13 Bananas are an excellent source of potassium .
14 NNS VBP DT JJ NN IN NN .
15 B-NP B-VP B-NP I-NP I-NP I-NP I-NP O
16 bananas
17 be source of
18 potassium
I'm currently piping the output to a file, which includes the preceding white space and numbers as depicted above.
What I'm really after is just the simple rule at the end, i.e. lines 16, 17 & 18. I've been trying to create a script to extract just that component and put it to a new file in the form of a Prolog clause, i.e. be source of(banans, potassium).
Is that feasible? Can Prolog rules contain white space like that?
I think I'm locked into getting all that output from reVerb so, what would be the best way to extract the desirable component? With a Perl script? Or maybe sed?
*Later I plan to replace this with a larger input file as opposed to just single sentences.
This seems wasteful. Why not leave the tabs as they are, and use:
$ echo "Bananas are an excellent source of potassium." \
| ./reverb -q | cut --fields=16,17,18
And yes, you can have rules like this in Prolog. See the answer by #mat. You need to know a bit of Prolog before you move on, I guess.
It is easier, however, to just make the string a a valid name for a predicate:
be_source_of with underscores instead of spaces
or 'be source of' with spaces, and enclosed in single quotes.
You can use probably awk to do what you want with the three fields. See for example the printf command in awk. Or, you can parse it again from Prolog directly. Both are beyond the scope of your current question, I feel.
sed -n 'N;N
:cycle
$!{N
D
b cycle
}
s/\(.*\)\n\(.*\)\n\(.*\)/\2 (\1,\3)/p' YourFile
if number are in output and not jsut for the reference, change last sed action by
s/\^ *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)/\2 (\1,\3)/p
assuming the last 3 lines are the source of your "rules"
Regarding the Prolog part of the question:
Yes, Prolog facts can contain whitespace like this, with suitable operator declarations present.
For example:
:- op(700, fx, be).
:- op(650, fx, source).
:- op(600, fx, of).
Example query and its result, to let you see the shape of terms that are created with this syntax:
?- write_canonical(be source of(a, b)).
be(source(of(a,b))).
Therefore, with these operator declarations, a fact like:
be source of(a, b).
is exactly the same as stating:
be(source(of(a,b)).
Depending on use cases and other definitions, it may even be an advantage to create this kind of facts (i.e., facts of the form be/1 instead of source_of/2). If this is the only kind of facts you need, you can simply write:
source_of(a, b).
This creates no redundant wrappers and is easier to use.
Or, as Boris suggested, you can use single quotes as in 'be source of'/2.

use identifying symbols to identify and edit line/string, then append line/string to previous line in file

Using standard linux utilities (sed and awk, I am guessing)
Sorry about the vague title, I don't really know how to describe the request much better. An easier way to do so is to provide a simple example. I have a file with the following content:
www.example.com
johnsmith#gmail.com
fredflintstone#gmail.com
bettyboop#gmail.com
www.example2.com
kylejohnson#gmail.com
www.example3.com
chadbrown#gmail.com
joshbeck#gmail.com
www.example4.com
tomtom#gmail.com
jeffjeffries#gmail.com
billnorman#gmail.com
stankubrick#gmail.com
andrewanders#gmail.com
So, what I want to do is convert the above to:
www.example.com,johnsmith#gmail.com,fredflintstone#gmail.com,bettyboop#gmail.com
www.example2.com,kylejohnson#gmail.com
www.example3.com,chadbrown#gmail.com,joshbeck#gmail.com,
www.example4.com,tomtom#gmail.com,jeffjeffries#gmail.com,billnorman#gmail.com,stankubrick#gmail.com,andrewanders#gmail.com
I am thinking that the easiest thing to do would be to execute something along the lines of: if the line contains an "#" symbol, input a comma at the beginning of the line/string and then append that line/string to the preceding line. Anyone have any ideas? It would be simpler, I think, if there were a uniform number of email addresses associated with each website, but this is not the case.
Thanks in advance!
A simple approach
awk '{s=/#/?",":"\n";printf s"%s",$0}' file
www.example.com,johnsmith#gmail.com,fredflintstone#gmail.com,bettyboop#gmail.com
www.example2.com,kylejohnson#gmail.com
www.example3.com,chadbrown#gmail.com,joshbeck#gmail.com
s=/#/?",":"\n" Does line contain # yes set s="," no set s="\n" (newline).
printf s"%s",$0 print $0 using s as format. If line has # print newline, then $0, if not print ,, then $0
Try this awk program:
/^[:space:]*www\./ {
if (f) {print line}
f=1; line=$0;
next
}
f {
line=(line "," $0)
}

How do I parse this file and store it in a table?

I have to parse a file and store it in a table. I was asked to use a hash to implement this. Give me simple means to do that, only in Perl.
-----------------------------------------------------------------------
L1234| Archana20 | 2010-02-12 17:41:01 -0700 (Mon, 19 Apr 2010) | 1 line
PD:21534 / lserve<->Progress good
------------------------------------------------------------------------
L1235 | Archana20 | 2010-04-12 12:54:41 -0700 (Fri, 16 Apr 2010) | 1 line
PD:21534 / Module<->Dir,requires completion
------------------------------------------------------------------------
L1236 | Archana20 | 2010-02-12 17:39:43 -0700 (Wed, 14 Apr 2010) | 1 line
PD:21534 / General Page problem fixed
------------------------------------------------------------------------
L1237 | Archana20 | 2010-03-13 07:29:53 -0700 (Tue, 13 Apr 2010) | 1 line
gTr:SLC-163 / immediate fix required
------------------------------------------------------------------------
L1238 | Archana20 | 2010-02-12 13:00:44 -0700 (Mon, 12 Apr 2010) | 1 line
PD:21534 / Loc Information Page
------------------------------------------------------------------------
I want to read this file and I want to perform a split or whatever to extract the following fields in a table:
the id that starts with L should be the first field in a table
Archana20 must be in the second field
timestamp must be in the third field
PD must be in the fourth field
Type (content preceding / must be in the last field)
My questions are:
How to ignore the --------… (separator line) in this file?
How to extract the above?
How to split since the file has two delimiters (|, /)?
How to implement it using a hash and what is the need for this?
Please provide some simple means so that I can understand since I am a beginner to Perl.
My questions are:
How to ignore the --------… (separator line) in this file?
How to extract the above?
How to split since the file has two delimiters (|, /)?
How to implement it using a hash and what is the need for this?
You will probably be working through the file line by line in a loop. Take a look at perldoc -f next. You can use regular expressions or a simpler match in this case, to make sure that you only skip appropriate lines.
You need to split first and then handle each field as needed after, I would guess.
Split on the primary delimiter (which appears to be ' | ' - more on that in a minute), then split the final field on its secondary delimiter afterwards.
I'm not sure if you are asking whether you need a hash or not. If so, you need to pick which item will provide the best set of (unique) keys. We can't do that for you since we don't know your data, but the first field (at a glance) looks about right. As for how to get something like this into a more complex data structure, you will want to look at perldoc perldsc eventually, though it might only confuse you right now.
One other thing, your data above looks like it has a semi-important typo in the first line. In that line only, there is no space between the first field and its delimiter. Everywhere else it's ' | '. I mention this only because it can matter for split. I nearly edited this, but maybe the data itself is irregular, though I doubt it.
I don't know how much of a beginner you are to Perl, but if you are completely new to it, you should think about a book (online tutorials vary widely and many are terribly out of date). A reasonably good introductory book is freely available online: Beginning Perl. Another good option is Learning Perl and Intermediate Perl (they really go together).
When you say This is not a homework...to mean this will be a start to assess me in perl I assume you mean that this is perhaps the first assignment you have at a new job or something, in which case It seems that if we just give you the answer it will actually harm you later since they will assume you know more about Perl than you do.
However, I will point you in the right direction.
A. Don't use split, use regular expressions. You can learn about them by googling "perl regex"
B. Google "perl hash" to learn about perl hashes. The first result is very good.
Now to your questions:
regular expressions will help you ignore lines you don't want
regular expressions with extract items. Look up "capture variables"
Don't split, use regex
See point B above.
If this file is line based then you can do a line by line based read in a while loop. Then skip those lines that aren't formatted how you wish.
After that, you can either use regex as indicated in the other answer. I'd use that to split it up and get an array and build a hash of lists for the record. Either after that (or before) clean up each record by trimming whitespace etc. If you use regex, then use the capture expressions to add to your list in that fashion. Its up to you.
The hash key is the first column, the list contains everything else. If you are just doing a direct insert, you can get away with a list of lists and just put everything in that instead.
The key for the hash would allow you to look at particular records for fast lookup. But if you don't need that, then an array would be fine.
You can try this one,
Points need to know:
read the file line by line
By using regular expression, removing '----' lines.
after that use split function to populate Hashes of array .
#!/usr/bin/perl
use strict;
use warning;
my $test_file = 'test.txt';
open(IN, '<' ,"$test_file") or die $!;
my (%seen, $id, $name, $timestamp, $PD, $type);
while(<IN>){
chomp;
my $line = $_;
if($line =~ m/^-/){ #removing '---' lines
# print "$line:hello\n";
}else{
if ($line =~ /\|/){
($id , $name, $timestamp) = split /\|/, $line, 4;
} else{
($PD, $type) = split /\//, $line , 3;
}
$seen{$id}= [$name, $timestamp, $PD, $type]; //use Hashes of array
}
}
for my $test(sort keys %seen){
my $test1 = $seen{$test};
print "$test:#{$test1}\n";
}
close(IN);

zsh filename globbling/substitution

I am trying to create my first zsh completion script, in this case for the command netcfg.
Lame as it may sound I have stuck on the first hurdle, disclaimer, I know how to do this crudely, however I seek the "ZSH WAY" to do this.
I need to list the files in /etc/networking but only the files, not the directory component, so I do the following.
echo $(ls /etc/network.d/*(.))
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
What I wanted was:
ethernet-dhcp wireless-wpa-config
So I try (excuse my naivity) :
echo ${(s/*\/)$(ls /etc/network.d/*(.))}
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
It seems that this doesn't work, I'm sure there must be some clever way of doing this by splitting into an array and getting the last part but as I say, I'm complete noob at this.
Any advice gratefully received.
General note: There is no need to use ls to generate the filenames. You might as well use echo some*glob. But if you want to protect the possible embedded newline characters even that is a bad idea. The first example below globs directly into an array to protect embedded newlines. The second one uses printf to generate NUL terminated data to accomplish the same thing without using a variable.
It is easy to do if you are willing to use a variable:
typeset -a entries
entries=(/etc/network.d/*(.)) # generate the list
echo ${entries#/etc/network.d/} # strip the prefix from each one
You can also do it without a variable, but the extra stuff to isolate individual entries is a bit ugly:
# From the inside, to the outside:
# * glob the entries
# * NUL terminate them into a single string
# * split at NUL
# * strip the prefix from each one
echo ${${(0)"$(printf '%s\0' /etc/network.d/*(.))"}#/etc/network.d/}
Or, if you are going to use a subshell anyway (i.e. the command substitution in the previous example), just cd to the directory so it is not part of the glob expansion (plus, you do not have to repeat the directory name):
echo ${(0)"$(cd /etc/network.d && printf '%s\0' *(.))"}
Chris Johnsen's answer is full of useful information about zsh, however it doesn't mention the much simpler solution that works in this particular case:
echo /etc/network.d/*(:t)
This is using the t history modifier as a glob qualifier.
Thanks for your suggestions guys, having done yet more reading of ZSH and coming back to the problem a couple of days later, I think I've got a very terse solution which I would like to share for your benefit.
echo ${$(print /etc/network.d/*(.)):t}
I'm used to seeing basename(1) stripping off directory components; also, you can use echo /etc/network/* to get the file listing without running the external ls program. (Running external programs can slow down completion more than you'd like; I didn't find a zsh-builtin for basename, but that doesn't mean that there isn't one.)
Here's something I hope will help:
haig% for f in /etc/network/* ; do basename $f ; done
if-down.d
if-post-down.d
if-pre-up.d
if-up.d
interfaces

BASH: How do you "split" the date command?

Cygwin user here (though if there's a suitable solution I will carry it over to K/Ubuntu, which I also use).
I have a Welcome message in my .bashrc that looks like the following:
SAD=(`date +%A-%B-%d-%Y`)
DUB=(`date -u +%k:%M`)
printf "Today's Date is: ${SAD}.\n"
printf "Dublin time is now ${DUB}. (24-Hour Clock)\n"
After numerous attempts to use whitespaces in the variable SAD above, I gave in and used hyphens. But I am, understandably, not satisfied with this band-aid solution. The problem, as I recall, was that every time I tried using quoted space, \s or some similar escape tag, along with the variables listed in the appropriate section of the GNU manpage for date, the variable for Year was either ignored or returned an error. What I do'nt want to have to do is resort to the full string as returned by date, but rather to keep the order in which I have things in the code above.
As I write this, it occurs to me that setting the IFS around this code for the Welcome message may work, provided I return it to defaults afterwards (the above appears at lines 13-17 of a 68-line .bashrc). However, I can't recall how to do that, nor am I sure that it would work.
Generally-speaking, .bashrc files are in valid BASH syntax, aren't they? Mine certainly resemble the scripts I've either written myself or tested from other sources. All I'm missing, I suppose, is the code for setting and unsetting the field separators around this message block.
No doubt anyone who comes up with a solution will be doing a favor not only to me, but to any other relative newbies on the Net taking their first (or thirteenth through seventeenth) steps in the world of shell scripting.
BZT
Putting
SAD=$(date "+%A %B %d %Y")
DUB=$(date -u +%k:%M)
echo "Today's Date is: ${SAD}."
echo "Dublin time is now ${DUB}. (24-Hour Clock)"
in my .bash_profile prints
Today's Date is: Thursday February 18 2010.
Dublin time is now 12:55. (24-Hour Clock)
I think that's what you want.
the problem is your array declaration.
SAD=(date +%A-%B-%d-%Y) just means you are putting the string "date" as element 0, and "+%A-%B-%d-%Y" as element 1. see for yourself
$ SAD=(date +%A-%B-%d-%Y) #<-- this is an array declaration
$ echo ${SAD[0]}
date
$ echo ${SAD[1]}
+%A-%B-%d-%Y
if you want the value of "date" command to be in a variable, use $(..), eg
$ SAD=$(date +%A-%B-%d-%Y)
$ echo ${SAD}
Thursday-February-18-2010
To get spaces, you need to quote the argument to date so that it's a single string. You're also erroneously declaring SAD and DUB as arrays, when what you really meant to do was evaluate them. Try this:
[/tmp]> $(date "+%A %B %d, %Y")
Thursday February 18, 2010
date +%A\ %B\ %d\ %Y
I found the combination that works:
SAD=$(date "+%A %B %d %Y")
echo $SAD
Thursday February 18 2010
Yet another instance when:
It pays to ask a question
It helps to know where to put your double quotes.
date obviously does not know from quoted space, but Bash does, so
it's a matter of "whispering in the right ear."
Thank you ghostdog74.
BZT