How to edit tar files in Unix using SED without untarring them?
I want to change the timestamp available in each of the files in .tar as well as in the filenames.
Unless you are a ninja I think you'll need more than sed. For starters you need to do more than replace dates in the tar block headers, you also need to update the header checksums appropriately, header size field, and deal with all the other intricacies of the header spec.
That being said, I would suggest checking out Perl's Archive::Tar::Stream module, which handles all of the above for you:
http://metacpan.org/pod/Archive::Tar::Stream
Here's an example script that will update the mtime of all files in the tar to the current time, and change filenames, all without untarring:
use Archive::Tar::Stream;
my $infile = $ARGV[0];
my $outfile = $ARGV[1];
open( my $infh, "<", $infile );
open( my $outfh, ">", $outfile );
my $ts = Archive::Tar::Stream->new(infh => $infh, outfh => $outfh);
while (my $header = $ts->ReadHeader()) {
$header->{'mtime'} = time();
$header->{'name'} = "whatever.txt";
$ts->WriteHeader($header);
$ts->CopyBytes($header->{size});
};
Related
I want to make the same calculations in two similar files, but I do not want to double the code for each file nor to create two scripts for this.
my $file = "file1.txt";
my $tempfile = "file1_temp.txt";
if (not defined $file) {
die "Input file not found";
}
open(my $inputFileHandler, '<:encoding(UTF-8)', $file)
or die "Could not open file '$file' $!";
open(my $outs, '>', $tempfile) or die $!;
/*Calculations made*/
close($outs);
copy($tempfile,$file) or die "Copy failed: $!";
unlink($tempfile) or die "Could not delete the file!\n";
close($inputFileHandler);
So i want to do the exact calculations for file2.txt_temp and copy it in file2.txt is there a way to do it without writing the code again for file2?
Thank you very much.
Write your code as a Unix filter. Read the data from STDIN and write it to STDOUT. Your code will be simpler and your program will be more flexible.
#!/usr/bin/perl
use strict;
use warnings;
while (<STDIN>) {
# do a calculation using the data that is in $_
print $output_data
}
The cleverness is in how you call the program:
$ ./my_clever_filter < file1.txt > file1_out.txt
$ ./my_clever_filter < file2.txt > file2_out.txt
See The Unix Filter Model: What, Why and How? for far more information.
Assuming your code is well written (not manipulating any globals, ...) you could use a for-loop
foreach my $prefix ('file1', 'file2') {
my $file = $prefix . ".txt";
my $tempfile = $prefix . "_temp.txt";
...
}
There is a certain Perl feature that is designed especially for cases like this, and that is this:
$ perl -pi -e'/*Calculations made*/' file1.txt file2.txt ... fileN.txt
Loosely referred to as "in-place edit", which basically does what your code does: It writes to a temp file and then overwrites the original.
Which will apply your calculations to the files named as arguments. If you have complex calculations you can put them in a file and skip the -e'....' part
$ perl -pi foo.pl file1.txt ...
Say for example that your "calculations" consist of incrementing each pair of numbers by 1:
s/(\d+) (\d+)/($1 + 1) . ($2 + 1)/ge
You would do either
$ perl -pi -e's/(\d+) (\d+)/($1 + 1) . ($2 + 1)/ge' file1.txt file2.txt
$ perl -pi foo.pl file1.txt file2.txt
Where foo.pl contains the code.
Be aware that the -i switch is destructive, so make backups before running the command. You can supply a backup extension to save a backup, but that backup is overwritten if you run the command again. E.g. -i.bak.
-p places a while (<>) loop around your code, followed by a print of each line,
-i.bak does the editing of the original file, and saves a backup with the extension, if it is supplied.
I have .gz files inside a directory and I am reading them with Perl. Everything is ok but what I don't understand is the order in which this files are being read. For sure, I can tell that it is not alphabetical. So my question is what order does Perl use by default to read files from a directory.
Below is a snippet of my code
# Open the source file
my $dir = "/home/myname/mydir";
# Open directory and loop through
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .gz
next unless ($file =~ m/\.gz$/);
my $gzip_file = "./mydir/$file";
open ( my $gunzip_stream, "-|", "gzip -dc $gzip_file") or die $!;
while (my $line = <$gunzip_stream> ) {
print ("$line\n");
}
}
readdir returns the files in the same order as the system returns them. I'm not aware of any guarantee of order from any OS. I imagine different drives might even behave differently.
I need some help with file manipulations and need some expert advice.
It looks like I am making a silly mistake somewhere but I can't catch it.
I have a directory that contains files with a .txt suffix, for example file1.txt, file2.txt, file3.txt.
I want to add a revision string, say rev0, to each of those files and then open the modified files. For instance rev0_file1.txt, rev0_file2.txt, rev0_file3.txt.
I can append rev0, but my program fails to open the files.
Here is the relevant portion of my code
my $dir = "path to my directory";
my #::tmp = ();
opendir(my $DIR, "$dir") or die "Can't open directory, $!";
#::list = readdir($DIR);
#::list2 = sort grep(/^.*\.txt$/, #::list);
foreach (#::list2) {
my $new_file = "$::REV0" . "_" . "$_";
print "new file is $new_file\n";
push(#::tmp, "$new_file\n");
}
closedir($DIR);
foreach my $cur_file (<#::tmp>) {
$cur_file = $_;
print "Current file name is $cur_file\n"; # This debug print shows nothing
open($fh, '<', "$cur_file") or die "Can't open the file\n"; # Fails to open file;
}
Your problem is here:
foreach my $cur_file(<#::tmp>) {
$cur_file = $_;
You are using the loop variable $cur_file, but you overwrite it with $_, which is not used at all in this loop. To fix this, just remove the second line.
Your biggest issue is the fact you are using $cur_file in your loop for the file name, but then reassign it with $_ even though $_ won't have a value at that point. Also, as Borodin pointed out, $::REV0 was never defined.
You can use the move command from the File::Copy to move the files, and you can use File::Find to find the files you want to move:
use strict;
use warnings;
use feature qw(say);
use autodie;
use File::Copy; # Provides the move command
use File::Find; # Gives you a way to find the files you want
use constant {
DIRECTORY => '/path/to/directory',
PREFIX => 'rev0_',
};
my #files_to_rename;
find (
sub {
next unless /\.txt$/; # Only interested in ".txt" files
push #files_to_rename, $File::Find::name;
}, DIRECTORY );
for my $file ( #files_to_rename ) {
my $new_name = PREFIX . $file;
move $file, $new_name;
$file = $new_name; # Updates #files_to_rename with new name
open my $file_fh, "<", $new_name; # Open the file here?
...
close $file_fh;
}
for my $file ( #files_to_rename ) {
open my $file_fh, "<", $new_name; # Or, open the file here?
...
close $file_fh;
}
See how using Perl modules can make your task much easier? Perl comes with hundreds of pre-installed packages to handle zip files, tarballs, time, email, etc. You can find a list at the Perldoc page (make sure you select the version of Perl you're using!).
The $file = $new_name is actually changing the value of the file name right inside the #files_to_rename array. It's a little Perl trick. This way, your array refers to the file even through it has been renamed.
You have two choices where to open the file for reading: You can rename all of your files first, and then loop through once again to open each one, or you can open them after you rename them. I've shone both places.
Don't use $:: at all. This is very bad form since it overrides use strict; -- that is if you're using use strict to begin with. The standard is not to use package variables (aka global variables) unless you have to. Instead, you should use lexically scoped variables (aka local variables) defined with my.
One of the advantages of the my variable, I really don't need the close command since the variable falls out of scope with each iteration of the loop and disappears entirely once the loop is complete. When the variable that contains the file handle falls out of scope, the file handle is automatically closed.
Always include use strict;, use warnings at the top of EVERY script. And use autodie; anytime you're doing file or directory processing.
There is no reason why you should be prefixing your variables with :: so please simplify your code like the following:
use strict;
use warnings;
use autodie;
use File::Copy;
my $dir = "path to my directory";
chdir($dir); # Make easier by removing the need to prefix path information
foreach my $file (glob('*.txt')) {
my $newfile = 'rev0_'.$file;
copy($file, $newfile) or die "Can't copy $file -> $newfile: $!";
open my $fh, '<', $newfile;
# File processing
}
What you've attempted to store is the updated name of the file in #::tmp. The file hasn't been renamed, so it's little surprise that the code died because it couldn't find the renamed file.
Since it's just renaming, consider the following code:
use strict;
use warnings;
use File::Copy 'move';
for my $file ( glob( "file*.txt" ) ) {
move( $file, "rev0_$file" )
or die "Unable to rename '$file': $!";
}
From a command line/terminal, consider the rename utility if it is available:
$ rename file rev0_file file*.txt
I have placed the text file "FilenameKeyword.txt" file in E:/Test folder, in my perl script i am trying to traverse through the folder and am i am trying to find a file with filename which has the string "Keyword" in it, later i have printed the content of that file in my script.
Now i wish do the same thing for the file which is placed inside tar file which is compressed.
Hypothetical File from where i am trying to extract the details:
E:\test.tar.gz
Wanted to know if there are possibility in perl to search and read the file without decompressing /unzipping the hypothetical file.If that is not possible, I shall also allocate some temperory memory to decompress the file , which should deleted after extracting the content from the particular text file.
While Searching in the internet i could it is possible to extract and read the gzip/tar file by using Archive::Extract, being new to Perl - i am really confused on how actually i should make use of it. Could you please help on this....
Input file:FilenameKeyword.txt
Script:
use warnings;
use strict;
my #dirs = ("E:\\Test\\");
my %seen;
while (my $pwd = shift #dirs) {
opendir(DIR,"$pwd") or die "Cannot open $pwd\n";
my #files = readdir(DIR);
closedir(DIR);
foreach my $file (#files)
{
if (-d $file and ($file !~ /^\.\.?$/) and !$seen{$file})
{
$seen{$file} = 1;
push #dirs, "$pwd/$file";
}
next if ($file !~ /Keyword/i);
my $mtime = (stat("$pwd/$file"))[9];
print "$pwd$file";
print "\n";
open (MYFILE, "$pwd$file");
while (my $line = <MYFILE>){
#print $line;
my ($date) = split(/,/,$line,2);
if ($line =~ s!<messageText>(.+?)</messageText>!!is){
print "$1";
}
}
}
}
Output(In test program file is placed under E:\Test):
E:\Test\FilenameKeyword.txt
1311 messages Picked from the Queue.
Looking for help to retrieve the content of the file which is place under
E:\test.tar.gz
Desired Output:
E:\test.tar.gz\FilenameKeyword.txt
1311 messages Picked from the Queue.
I was stuck in using CPAN module, CPAN module didn't work for me as i have oracle 10g enterprise edition in the same machine, due do some software conflict Active state perl was unable compile and refer to the perl lib for CPAN module, i have uninstalled oracle in my machine to make this work....
#!/usr/local/bin/perl
use Archive::Tar;
my $tar = Archive::Tar->new;
$tar->read("test.tar.gz");
$tar->extract();
If your file was gzipped only, you could read its contents in a "streamed" manner as outlined here (Piping to/from a child process without system or backtick - gzipped tar files). The article illustrates a technique to use open and a fork to open and decompress the file, and then making it available to Perl's while(), allowing you to iterate over it.
As tar is basically concatenating things, it might be possible to adapt this to your scenario.
Basically, I'm trying to create a new directory with today's date, then create a new file and save it in that folder.
I can get all the steps working separately, however the file doesn't want to be saved inside the directory. Basically I'm working with:
mkdir($today);
opendir(DIR, $today) or die "Error in opening dir $today\n";
open SAVEPAGE, ">>", $savepage
or die "Unable to open $savepage for output - $!";
print SAVEPAGE $data;
close(SAVEPAGE);
closedir(DIR);
I've done a lot of searches to try and find an appropriate example, but unfortunately every word in queries I've tried get millions of hits "open/save/file/directory" etc. I realise I could handle errors etc better, that'll be the next step once I get the functionality working. Any pointers would be appreciated, cheers.
Just prefix the file to open with the directory name. No need for opendir
mkdir($today);
open SAVEPAGE, ">>", "$today/$savepage";
Rather than using the fully-qualified filename all the time you may prefer to use chdir $today before you open the file. This will change the current working directory and force a file specified using a relative path or no path at all to be opened relative to the new directory.
In addition, using the autodie pragma will avoid the need to check the status of open, close etc.; and using lexical filehandles is preferable for a number of reasons, including implicit closing of files when the filehandle variable goes out of scope.
This is how your code would look.
use strict;
use warnings;
use autodie;
my $today = 'today';
my $savepage = 'savepage';
my $data = 'data';
mkdir $today unless -d $today;
{
chdir $today;
open my $fh, '>>', $savepage;
print $fh $data;
}
However, if your program deals with files in multiple directories then it is awkward to chdir backwards and forwards between them, and the original directory has to be explicitly saved otherwise it will be forgotten. In this case the File::chdir module may be helpful. It provides the $CWD package variable which will change the current working directory if its value is changed. It can also be localized like any other package variabled so that the original value will be restored at the end of the localizing block.
Here is an example.
use strict;
use warnings;
use File::chdir;
use autodie;
my $today = 'today';
my $savepage = 'savepage';
my $data = 'data';
mkdir $today unless -d $today;
{
local $CWD = $today; # Change working directory to $today
open my $fh, '>>', $savepage;
print $fh $data;
}
# Working directory restored to its previous value