Read from csv file and write output to a file - perl

I am new to Perl and would like your help on following scenario, can you please help on this subject.
I have a CSV files with following information and I am trying to prepare a key-value pair from CSV file. Can you please help me with below scenario.
Line 1: List,ID
Line 2: 1,2,3
Line 3: 4,5,6
Line 4: List,Name
Line 5: Tom, Peter, Joe
Line 6: Jim, Harry, Tim
I need to format the above CSV file to get an output in a new file like below:
Line 1: ID:1,2,3 4,5,6
Line 2: Name:Tom,Peter,Joe Jim, Harry, Tim
Can you please direct me on how I can use Perl functions for this scenario.

You're in luck, this is extremely easy in Perl.
There's a great library called Text::CSV which is available on CPAN, docs are here: https://metacpan.org/pod/Text::CSV
The synopsis at the top of the page gives a really good example which should let you do what you want with minor modifications.

I don't think the issue here is the CSV format so much as the fact that you have different lists broken up with header lines. I haven't tried this code yet, but I think you want something like the following:
while (<>) { # Loop over stdin one line at a time
chomp; # Strip off trailing newline
my ($listToken, $listName) = split(',');
next unless $listToken; # Skip over blank lines
if ($listToken =~ /^List/) { # This is a header row
print "\n$listName: "; # End previous list, start new one
} else { # The current list continues
print "$_ "; # Append the entire row to the output
}
}
print "\n"; # Terminate the last line
Note that this file format is a little dubious, as there is no way to have a data row where the first value is the literal "List". However, I'm assuming that either you have no choice in file format or you know that List is not a legal value.
(Note - I fixed a mistake where I used $rest as a variable; that was caused by my renaming them as I went along and missing one)

Related

Using perl to split over multiple lines

I'm trying to write a perl script to process a log4net log file. The fields in the log file are separated by a semi-colon. My end goal is to capture each field and populate a mysql table.
Usually I have lines that look a little like this (all on a single line)
DEBUG;2017-06-13T03:56:38,316-05:00;2017-06-13 08:56:38,316;79ab0b95-7f58-
44a8-a2c6-1f8feba1d72d;(null);WorkerStartup 1;"Starting services."
These are easy to process. I can simply split by semicolon to get the information I need.
However occassionally the "message" field at the end may span several lines, especially if there is a stack trace. I would want to capture the entire message as a single column. I cannot use split by semicolon, because the next lines would typically look like:
at some.random.classname
at another.classname
...
Can someone give some tips how to solve this problem?
The following solution uses that the number of " in a field is even ($p=~y/"//%2), this condition number of " odd may be changed by other that can indicate the field is not complete.
The number of columns splitted is fixed to 7 (to allow ; in last field) and may be changed for example #array = map {s/;$//} $p=~/\G(?:"[^"]*"|[^;])*;/g;.
The file is read line by line but a line is processed sub process when it's complete $p variable to store the previous line the last line is processed in END block.
perl -ne '
sub process {
#array = split /;/,$p,7;
# do something with array
print ((join "\n---\n", #array),"\n");
}
if ($p=~y/"//%2) {
$p.=$_;
next;
}
process;
$p=$_;
END{process}
' < logfile.txt

Iteration to Match Line Patterns from Text File and Then Parse out N Lines

I have a text file that contains three columns. Using perl, I'm trying to loop through the text file and search for a particular pattern...
Logic: IF column2 = 00z24aug2016 & column3 = e01. When this pattern is matched I need to parse out the matched line and then the next 3 lines. to new files.
Text File:
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02
Desired Output...
New File 1:
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01
New File 2:
site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02
Based on your comment in response to zdim and Borodin, it appears that you're asking for pointers on how to do this with Perl rather than actual working code, so I am answering on that basis.
What you describe in the "logic" portion of your question is extremely simple and straightforward to do in Perl - the actual code would be far shorter than this description of it:
Start your program with use strict; use warnings; - this will catch most common errors and make debugging vastly easier!
Open your input file for reading (open(my $fh, '<', $file_name) or die "Failed to open $file_name: $!")
Read in each line of the file (my $line = <$fh>;)
Optionally use chomp to remove line endings
Use split to break the line into fields (my #column = split /,/, $line;)
Check the values of the first and third fields (note that arrays start counting from 0, not from 1, so these will be $column[1] and $column[2] rather than 2 and 3)
If the field values match your criteria, set a counter to 4 (the total number of lines to output)
If the counter is greater than zero, output the original $line and decrement the counter
The logic mentions "new files" but does not specify when a new output file should be created and when output should continue to be sent to the same file. Since this was not specified, I have ignored it and described all output going to a single destination.
Note, however, that your sample desired output does not match the described logic. According to the specified logic, the output should include the first seven lines of your example data, but not the final line (because none of the three lines preceding it include "e01").
So. Take this information, along with whatever you may already know about Perl, and try to write a solution. If you reach a point where you can't figure out how to make any further progress, post a new question (or update this one) containing a copy of your code and input data, so that we can run it ourselves, and a description of how it fails to work properly, then we'll be much more able to help you with that information (and more people will be willing to help if you can show that you made an effort to do it yourself first).

If Line Number Exists

Im trying to get a code that will do something if a certain line number is in a text file for example "Test.txt"
Ex.
if line "x" exists in test.txt msg $chan working
thanks #denny for being the one to help me out.
if ($read(test.txt, n, x)) {
msg $chan working
}
You have couple of options.
Searching to see if the number is lower or equal to the total lines. e.g: $lines(filename)
You can extract the line and use a condition if it's full. e.g: $read(filename, LINE-NUMBER)
Notes for each method:
Will only tell you if there is a line number, NOT if there is something inside this line.
Will only give you the line if exists, if the line is empty or there is no such line then it will appears like it's empty.

use identifying symbols to identify and edit line/string, then append line/string to previous line in file

Using standard linux utilities (sed and awk, I am guessing)
Sorry about the vague title, I don't really know how to describe the request much better. An easier way to do so is to provide a simple example. I have a file with the following content:
www.example.com
johnsmith#gmail.com
fredflintstone#gmail.com
bettyboop#gmail.com
www.example2.com
kylejohnson#gmail.com
www.example3.com
chadbrown#gmail.com
joshbeck#gmail.com
www.example4.com
tomtom#gmail.com
jeffjeffries#gmail.com
billnorman#gmail.com
stankubrick#gmail.com
andrewanders#gmail.com
So, what I want to do is convert the above to:
www.example.com,johnsmith#gmail.com,fredflintstone#gmail.com,bettyboop#gmail.com
www.example2.com,kylejohnson#gmail.com
www.example3.com,chadbrown#gmail.com,joshbeck#gmail.com,
www.example4.com,tomtom#gmail.com,jeffjeffries#gmail.com,billnorman#gmail.com,stankubrick#gmail.com,andrewanders#gmail.com
I am thinking that the easiest thing to do would be to execute something along the lines of: if the line contains an "#" symbol, input a comma at the beginning of the line/string and then append that line/string to the preceding line. Anyone have any ideas? It would be simpler, I think, if there were a uniform number of email addresses associated with each website, but this is not the case.
Thanks in advance!
A simple approach
awk '{s=/#/?",":"\n";printf s"%s",$0}' file
www.example.com,johnsmith#gmail.com,fredflintstone#gmail.com,bettyboop#gmail.com
www.example2.com,kylejohnson#gmail.com
www.example3.com,chadbrown#gmail.com,joshbeck#gmail.com
s=/#/?",":"\n" Does line contain # yes set s="," no set s="\n" (newline).
printf s"%s",$0 print $0 using s as format. If line has # print newline, then $0, if not print ,, then $0
Try this awk program:
/^[:space:]*www\./ {
if (f) {print line}
f=1; line=$0;
next
}
f {
line=(line "," $0)
}

Perl: pattern match a string and then print next line/lines

I am using Net::Whois::Raw to query a list of domains from a text file and then parse through this to output relevant information for each domain.
It was all going well until I hit Nominet results as the information I require is never on the same line as that which I am pattern matching.
For instance:
Name servers:
ns.mistral.co.uk 195.184.229.229
So what I need to do is pattern match for "Name servers:" and then display the next line or lines but I just can't manage it.
I have read through all of the answers on here but they either don't seem to work in my case or confuse me even further as I am a simple bear.
The code I am using is as follows:
while ($record = <DOMAINS>) {
$domaininfo = whois($record);
if ($domaininfo=~ m/Name servers:(.*?)\n/){
print "Nameserver: $1\n";
}
}
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
EDIT: Forgot to say thanks!
how rude.
So, the $domaininfo string contains your domain?
What you probably need is the m parameter at the end of your regular expression. This treats your string as a multilined string (which is what it is). Then, you can match on the \n character. This works for me:
my $domaininfo =<<DATA;
Name servers:
ns.mistral.co.uk 195.184.229.229
DATA
$domaininfo =~ m/Name servers:\n(\S+)\s+(\S+)/m;
print "Server name = $1\n";
print "IP Address = $2\n";
Now, I can match the \n at the end of the Name servers: line and capture the name and IP address which is on the next line.
This might have to be munged a bit to get it to work in your situation.
This is half a question and perhaps half an answer (the question's in here as I am not yet allowed to write comments...). Okay, here we go:
Name servers:
ns.mistral.co.uk 195.184.229.229
Is this what an entry in the file you're parsing looks like? What will follow immediately afterwards - more domain names and IP addresses? And will there be blank lines in between?
Anyway, I think your problem may (in part?) be related to your reading the file line by line. Once you get to the IP address line, the info about 'Name servers:' having been present will be gone. Multiline matching will not help if you're looking at your file line by line. Thus I'd recommend switching to paragraph mode:
{
local $/ = ''; # one paragraph instead of one line constitutes a record
while ($record = <DOMAINS>) {
# $record will now contain all consecutive lines that were NOT separated
# by blank lines; once there are >= 1 blank lines $record will have a
# new value
# do stuff, e.g. pattern matching
}
}
But then you said
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
so maybe you've already tried what I have just suggested? An alternative would be to just add another variable ($indicator or whatever) which you'll set to 1 once 'Name servers:' has been read, and as long as it's equal to 1 all following lines will be treated as containing the data you need. Whether this is feasible, however, depends on you always knowing what else your data file contains.
I hope something in here has been helpful to you. If there are any questions, please ask :)