Using GNU find, I can use the -maxdepth option to specify a specific depth to search for files. Unfortunately, my command needs to run on HP-UX, AIX, and Solaris as well which don't support the -maxdepth option.
I have found that I can run find /some/path/* -prune to get only files in a single folder, but I want to recurse down n levels, just like the -maxdepth argument allows. Can this be done in a cross platform way?
Edit: I found I can use the -path option to do a similar filter like so
find ./ ! -path "./*/**"
Unfortunately, AIX find does not support the -path option. I'm at least a little bit closer.
This may not be the most performant solution but it should be quite portable. I tested it on Solaris in addition to OSX and Linux. In the essence, it is a recursive tree-walking depth-first using ls. Feel free to tweak and sanitize it to your needs. Hope, it works on AIX too.
#!/bin/bash
path="$(echo $1 | sed -e 's%/*$%%')" # remove trailing spaces
maxDepth="$2" # maximum search depth
currDepth="$3" # current depth
[ -z $currDepth ] && currDepth=0 # initialize
[ $currDepth -lt $maxDepth ] && { # are we allowed to go deeper
echo "D: \"$path\"" # show where we are
IFS=$'\n' # split the "ls" output by newlines instead of spaces
for entry in $(ls -F "$path"); done # scan directory
[ -d "$path/$entry" ] && { # recursively descent if it is a child directory
$0 "$path/$entry" $maxDepth $((currDepth+1))
continue
}
echo "F: \"$path/$entry\"" # show it if it is not a directory (symink, file, whatever)
done
}
Related
I have a problem with detecting symbolic links under Windows 10, which supports them. First I tried this:
if(! -l $import_filename) {
print "$0: $import_filename is not a symlink";
}
That doesn't work. It gets executed when $import_filename is a symlink. Then I tried this:
use File::stat;
use Fcntl;
my $statbuf = lstat($import_filename);
if(!($statbuf->mode & S_ISLNK)) {
print "$0: $import_filename is not a symlink";
}
And it seems to be a different way to say the same thing. As expected, is there any blessed way to do this under Windows versions with symlink/junction support? If there isn't, a command line tool is also an acceptable answer.
Given
>mklink file_symlink file
symbolic link created for file_symlink <<===>> file
>mklink /d dir_symlink dir
symbolic link created for dir_symlink <<===>> dir
>mklink /h file_hardlink file
Hardlink created for file_hardlink <<===>> file
>mklink /j dir_hardlink dir
Junction created for dir_hardlink <<===>> dir
>dir
...
2018-05-09 12:59 AM <JUNCTION> dir_hardlink [C:\...\dir]
2018-05-09 12:58 AM <SYMLINKD> dir_symlink [dir]
2018-05-09 12:56 AM 6 file_hardlink
2018-05-09 12:58 AM <SYMLINK> file_symlink [file]
...
You can use the following to detect file_symlink, dir_symlink and dir_hardlink (but not file_hardlink) as a link:
use Win32API::File qw( GetFileAttributes FILE_ATTRIBUTE_REPARSE_POINT );
my $is_link = GetFileAttributes($qfn) & FILE_ATTRIBUTE_REPARSE_POINT;
I don't know how to distinguish between hard links and symlinks (though differentiating between files and dirs can be done using & FILE_ATTRIBUTE_DIRECTORY).
There seem to be not much Perl support for working with symlinks on Windows (if any). Neither of related builtins are implemented, according to perlport page for symlink and for readlink.
Most importantly for your direct question, lstat isn't implemented either so one can't have a filetest for symlink. The perlport for -X says that
-g, -k, -l, -u, -A are not particularly meaningful.
I haven't found anything on CPAN, other than using the Windows API.
Then you can go to the Windows command line, and look for <SYMLINK> in dir output
# Build $basename and $path from $import_filename if needed
if ( not grep { /<SYMLINK>.*$basename/ } qx(dir $path) ) {
say "$0: $import_filename is not a symlink";
}
where $basename need be used since dir doesn't show the path. The components of a filename with full path can be obtained for example with the core module File::Spec
use File::Spec;
my ($vol, $path, $basename) = File::Spec->splitpath($import_filename);
If the filename has no path then $vol and $path are empty strings, which is OK for dir as it needs no argument for the current directory. If $import_filename by design refers to the current directory (has no path) then use it in the regex, with qx(dir) (no argument).
The dir output shows the target name as well, what can come in handy.
Background
This is an optimization problem. Oracle Forms XML files have elements such as:
<Trigger TriggerName="name" TriggerText="SELECT * FROM DUAL" ... />
Where the TriggerText is arbitrary SQL code. Each SQL statement has been extracted into uniquely named files such as:
sql/module=DIAL_ACCESS+trigger=KEY-LISTVAL+filename=d_access.fmb.sql
sql/module=REP_PAT_SEEN+trigger=KEY-LISTVAL+filename=rep_pat_seen.fmb.sql
I wrote a script to generate a list of exact duplicates using a brute force approach.
Problem
There are 37,497 files to compare against each other; it takes 8 minutes to compare one file against all the others. Logically, if A = B and A = C, then there is no need to check if B = C. So the problem is: how do you eliminate the redundant comparisons?
The script will complete in approximately 208 days.
Script Source Code
The comparison script is as follows:
#!/bin/bash
echo Loading directory ...
for i in $(find sql/ -type f -name \*.sql); do
echo Comparing $i ...
for j in $(find sql/ -type f -name \*.sql); do
if [ "$i" = "$j" ]; then
continue;
fi
# Case insensitive compare, ignore spaces
diff -IEbwBaq $i $j > /dev/null
# 0 = no difference (i.e., duplicate code)
if [ $? = 0 ]; then
echo $i :: $j >> clones.txt
fi
done
done
Question
How would you optimize the script so that checking for cloned code is a few orders of magnitude faster?
Idea #1
Remove the matching files into another directory so that they don't need to be examined twice.
System Constraints
Using a quad-core CPU with an SSD; trying to avoid using cloud services if possible. The system is a Windows-based machine with Cygwin installed -- algorithms or solutions in other languages are welcome.
Thank you!
Your solution, and sputnick's solution, both take O(n^2) time. This can be done in O(nlog n) time by sorting the files and using a list merge. It can be sped up further by comparing MD5 (or any other cryptographically-strong hash function) of the files, instead of the files themselves.
Assuming you're in the sql directory:
md5sum * | sort > ../md5sums
perl -lane 'print if $F[0] eq $lastMd5; $last = $_; $lastMd5 = $F[0]' < ../md5sums
Using the above code will report only exact byte-for-byte duplicates. If you want to consider two non-identical files to be equivalent for the purposes of this comparison (e.g. if you don't care about case), first create a canonicalised copy of each file (e.g. by converting every character to lower case with tr A-Z a-z < infile > outfile).
The best way to do this is to hash each file, like SHA-1, and then use a set. I'm not sure bash can do this, but python can. Although if you want best performance C++ is the way to go.
To optimize comparison of your files :
#!/bin/bash
for i; do
for j; do
[[ "$i" != "$j" ]] &&
if diff -IEbwBaq "$i" "$j" > /dev/null; then
echo "$i & $j are the same"
else
echo "$i & $j are different"
fi
done
done
USAGE
./script /dir/*
How can I search and replace all files recursively to remove some rogue code injected into php files on a wordpress installation? The hacker added some code (below) to ALL of the .php files in my wordpress installation, and it happens fairly often to many sites, and I spend hours manually removing the code.
Today I tried a number of techniques I found online, but had no luck due to the long code snippet and the many special characters in it that mess up the delimiters. I tried using different delimiters with perl:
perl -p -i -e 's/rogue_code//g' *
to
perl -p -i -e 's{rogue_code}{}g' *
and tried using backslashes to escape the slashes in the code, but nothing seems to work. I'm working on a shared server, so I don't have full access to all the directories outside my own.
Thanks a lot...here's the code:
< ?php /**/ eval(base64_decode("aWYoZnVuY3
... snip tons of this ...
sgIH1lbHNleyAgICB9ICB9"));? >
Without having a chance to poke around the files myself, it's hard to be sure; but it sounds like you need:
find -name '*.php' -exec perl -i -pe 's{<\?php /\*\*/ eval\(base64_decode\("[^"]+"\)\);\?>}{}g' '{}' ';'
(That said, I agree with the commenters above that trying to undo the damage, piecemeal, after it happens is not the best strategy.)
and it happens fairly often to many sites, and I spend hours manually
removing the code....
Sounds like you need to do a better job of cleaning the hack or change hosts. Replace all WP core files and foldere, all plugins, and then all you have to do is search theme files and wp-config.php for the injected scripts.
See How to completely clean your hacked wordpress installation and How to find a backdoor in a hacked WordPress and Hardening WordPress « WordPress Codex and Recommended WordPress Web Hosting
I have the same problem (Dreamhost?) and first run this clean.pl script:
#!/usr/bin/perl
$file0 =$ARGV[0];
open F0,$file0 or die "error opening $file0 : $!";
$t = <F0>;
$hacked = 0;
if($t =~ s#.*base64_decode.*?;\?>##) {
$hacked=1;
}
print "# $file0: " . ($hacked ? "HACKED" : "CLEAN") . "\n";
if(! $hacked) {
close F0;
exit 0;
}
$file1 = $file0 . ".clean";
open F1,">$file1 " or die "error opening $file1 for write : $!";
print F1 $t;
while(<F0>) {
print F1;
}
close F0;
close F1;
print "mv -f $file0 $file0.bak\n"; #comment this if you don't want backup files.
print "mv -f $file1 $file0\n";
with find . -name '*.php' -exec perl clean.pl '{}' \; > cleanfiles.sh
and then I run . cleanfiles.sh
I also found that there were other differently infected files ("boostrap" infecters, those which triggered the other infection), which instead of the base64_decode call had some hex-escaped command... To detect them, this suspicious_php.sh :
#!/bin/sh
# prints filename if first 2 lines has more than 5000 bytes
file=$1
bytes=`head -n 2 $file | wc --bytes `
if (( bytes > 5000 ))
then
echo $file
fi
And then: find . -name '*.php' -type f -exec ./suspicious_php.sh '{}' \;
Of course, all this is not foolproof at all.
I have a file of environment variables that I source in shell scripts, for example:
# This is a comment
ONE=1
TWO=2
THREE=THREE
# End
In my scripts, I source this file (assume it's called './vars') into the current environment, and change (some of) the variables based on user input. For example:
#!/bin/sh
# Read variables
source ./vars
# Change a variable
THREE=3
# Write variables back to the file??
awk 'BEGIN{FS="="}{print $1=$$1}' <./vars >./vars
As you can see, I've been experimenting with awk for writing the variables back, sed too. Without success. The last line of the script fails. Is there a way to do this with awk or sed (preferably preserving comments, even comments with the '=' character)? Or should I combine 'read' with string cutting in a while loop or some other magic? If possible, I'd like to avoid perl/python and just use the tools available in Busybox. Many thanks.
Edit: perhaps a use case might make clear what my problem is. I keep a configuration file consisting of shell environment variable declarations:
# File: network.config
NETWORK_TYPE=wired
NETWORK_ADDRESS_RESOLUTION=dhcp
NETWORK_ADDRESS=
NETWORK_ADDRESS_MASK=
I also have a script called 'setup-network.sh':
#!/bin/sh
# File: setup-network.sh
# Read configuration
source network.config
# Setup network
NETWORK_DEVICE=none
if [ "$NETWORK_TYPE" == "wired" ]; then
NETWORK_DEVICE=eth0
fi
if [ "$NETWORK_TYPE" == "wireless" ]; then
NETWORK_DEVICE=wlan0
fi
ifconfig -i $NETWORK_DEVICE ...etc
I also have a script called 'configure-network.sh':
#!/bin/sh
# File: configure-network.sh
# Read configuration
source network.config
echo "Enter the network connection type:"
echo " 1. Wired network"
echo " 2. Wireless network"
read -p "Type:" -n1 TYPE
if [ "$TYPE" == "1" ]; then
# Update environment variable
NETWORK_TYPE=wired
elif [ "$TYPE" == "2" ]; then
# Update environment variable
NETWORK_TYPE=wireless
fi
# Rewrite configuration file, substituting the updated value
# of NETWORK_TYPE (and any other updated variables already existing
# in the network.config file), so that later invocations of
# 'setup-network.sh' read the updated configuration.
# TODO
How do I rewrite the configuration file, updating only the variables already existing in the configuration file, preferably leaving comments and empty lines intact? Hope this clears things up a little. Thanks again.
You can't use awk and read and write from the same file (is part of your problem).
I prefer to rename the file before I rewrite (but you can save to a tmp and then rename too).
/bin/mv file file.tmp
awk '.... code ...' file.tmp > file
If your env file gets bigger, you'll see that is is getting truncated at the buffer size of your OS.
Also, don't forget that gawk (the std on most Linux installations) has a built in array ENVIRON. You can create what you want from that
awk 'END {
for (key in ENVIRON) {
print key "=" ENVIRON[key]
}
}' /dev/null
Of course you get everything in your environment, so maybe more than you want. But probably a better place to start with what you are trying to accomplish.
Edit
Most specifically
awk -F"=" '{
if ($1 in ENVIRON) {
printf("%s=%s\n", $1, ENVIRON[$1])
}
# else line not printed or add code to meet your situation
}' file > file.tmp
/bin/mv file.tmp file
Edit 2
I think your var=values might need to be export -ed so they are visible to the awk ENVIRON array.
AND
echo PATH=xxx| awk -F= '{print ENVIRON[$1]}'
prints the existing value of PATH.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.
I don't exactly know what you are trying to do, but if you are trying to change the value of variable THREE ,
awk -F"=" -vt="$THREE" '$1=="THREE" {$2=t}{print $0>FILENAME}' OFS="=" vars
You can do this in just with bash:
rewrite_config() {
local filename="$1"
local tmp=$(mktemp)
# if you want the header
echo "# File: $filename" >> "$tmp"
while IFS='=' read var value; do
declare -p $var | cut -d ' ' -f 3-
done < "$filename" >> "$tmp"
mv "$tmp" "$filename"
}
Use it like
source network.config
# manipulate the variables
rewrite_config network.config
I use a temp file to maintain the existance of the config file for as long as possible.
I'm trying to create a process that renames all my filenames to Camel/Capital Case. The closest I have to getting there is this:
perl -i.bak -ple 's/\b([a-z])/\u$1/g;' *.txt # or similar .extension.
Which seems to create a backup file (which I'll remove when it's verified this does what I want); but instead of renaming the file, it renames the text inside of the file. Is there an easier way to do this? The theory is that I have several office documents in various formats, as I'm a bit anal-retentive, and would like them to look like this:
New Document.odt
Roffle.ogg
Etc.Etc
Bob Cat.flac
Cat Dog.avi
Is this possible with perl, or do I need to change to another language/combination of them?
Also, is there anyway to make this recursive, such that /foo/foo/documents has all files renamed, as does /foo/foo/documents/foo?
You need to use rename .
Here is it's signature:
rename OLDNAME,NEWNAME
To make it recursive, use it along with File::Find
use strict;
use warnings;
use File::Basename;
use File::Find;
#default searches just in current directory
my #directories = (".");
find(\&wanted, #directories);
sub wanted {
#renaming goes here
}
The following snippet, will perform the code inside wanted against all the files that are found. You have to complete some of the code inside the wanted to do what you want to do.
EDIT: I tried to accomplish this task using File::Find, and I don't think you can easily achieve it. You can succeed by following these steps :
if the parameter is a dir, capitalize it and obtain all the files
for each file, if it's a dir, go back at the beginning with this file as argument
if the file is a regular file, capitalize it
Perl just got in my way while writing this script. I wrote this script in ruby :
require "rubygems"
require "ruby-debug"
# camelcase files
class File
class << self
alias :old_rename :rename
end
def self.rename(arg1,arg2)
puts "called with #{arg1} and #{arg2}"
self.old_rename(arg1,arg2)
end
end
def capitalize_dir_and_get_files(dir)
if File.directory?(dir)
path_c = dir.split(/\//)
#base = path_c[0,path_c.size-1].join("/")
path_c[-1].capitalize!
new_dir_name = path_c.join("/")
File.rename(dir,new_dir_name)
files = Dir.entries(new_dir_name) - [".",".."]
files.map! {|file| File.join(new_dir_name,file)}
return files
end
return []
end
def camelize(dir)
files = capitalize_dir_and_get_files(dir)
files.each do |file|
if File.directory?(file)
camelize(file.clone)
else
dir_name = File.dirname(file)
file_name = File.basename(file)
extname = File.extname(file)
file_components = file_name.split(/\s+/)
file_components.map! {|file_component| file_component.capitalize}
new_file_name = File.join(dir_name,file_components.join(" "))
#if extname != ""
# new_file_name += extname
#end
File.rename(file,new_file_name)
end
end
end
camelize(ARGV[0])
I tried the script on my PC and it capitalizes all dirs,subdirs and files by the rule you mentioned. I think this is the behaviour you want. Sorry for not providing a perl version.
Most systems have the rename command ....
NAME
rename - renames multiple files
SYNOPSIS
rename [ -v ] [ -n ] [ -f ] perlexpr [ files ]
DESCRIPTION
"rename" renames the filenames supplied according to the rule specified as the first argument. The perlexpr argument is a Perl expression which
is expected to modify the $_ string in Perl for at least some of the filenames specified. If a given filename is not modified by the expression,
it will not be renamed. If no filenames are given on the command line, filenames will be read via standard input.
For example, to rename all files matching "*.bak" to strip the extension, you might say
rename 's/\.bak$//' *.bak
To translate uppercase names to lower, you’d use
rename 'y/A-Z/a-z/' *
OPTIONS
-v, --verbose
Verbose: print names of files successfully renamed.
-n, --no-act
No Action: show what files would have been renamed.
-f, --force
Force: overwrite existing files.
AUTHOR
Larry Wall
DIAGNOSTICS
If you give an invalid Perl expression you’ll get a syntax error.
Since Perl runs just fine on multiple platforms, let me warn you that FAT (and FAT32, etc) filesystems will ignore renames that only change the case of the file name. This is true under Windows and Linux and is probably true for other platforms that support the FAT filesystem.
Thus, in addition to Geo's answer, note that you may have to actually change the file name (by adding a character to the end, for example) and then change it back to the name you want with the correct case.
If you will only rename files on NTFS filesystems or only on ext2/3/4 filesystems (or other UNIX/Linux filesystems) then you probably don't need to worry about this. I don't know how the Mac OSX filesystem works, but since it is based on BSDs, I assume it will allow you to rename files by only changing the case of the name.
I'd just use the find command to recur the subdirectories and mv to do the renaming, but still leverage Perl to get the renaming right.
find /foo/foo/documents -type f \
-execdir bash -c 'mv "$0" \
"$(echo "$0" \
| perl -pe "s/\b([[:lower:]])/\u\$1/g; \
s/\.(\w+)$/.\l\$1/;")"' \
{} \;
Cryptic, but it works.
Another one:
find . -type f -exec perl -e'
map {
( $p, $n, $s ) = m|(.*/)([^/]*)(\.[^.]*)$|;
$n =~ s/(\w+)/ucfirst($1)/ge;
rename $_, $p . $n . $s;
} #ARGV
' {} +
Keep in mind that on case-remembering filesystems (FAT/NTFS), you'll need to rename the file to something else first, then to the case change. A direct rename from "etc.etc" to "Etc.Etc" will fail or be ignored, so you'll need to do two renames: "etc.etc" to "etc.etc~" then "etc.etc~" to "Etc.Etc", for example.