How to cluster based on ssdeep? - perl

Hi I am trying to find the groups out of files based on ssdeep.
I have generated ssdeep of files and kept it in csv file.
I am parsing the file in perl script as follows:
foreach( #all_lines )
{
chomp;
my $line = $_;
my #split_array = split(/,/, $line);
my $md5 = $split_array[1];
my $ssdeep = $split_array[4];
my $blk_size = (split(/:/, $ssdeep))[0];
if( $blk_size ne "")
{
my $cluster_id = check_In_Cluster($ssdeep);
print WFp "$cluster_id,$md5,$ssdeep\n";
}
}
This also checks whether the ssdeep is present in previously clustered group and if not creates new group.
Code for chec_In_Cluster
my $ssdeep = shift;
my $cmp_result;
if( $cluster_cnt > 0 ) {
$cmp_result = ssdeep_compare( $MRU_ssdeep, $ssdeep );
if( $cmp_result > 85 ) {
return $MRU_cnt;
}
}
my $d = int($cluster_cnt/4);
my $thr1 = threads->create(\&check, 0, $d, $ssdeep);
my $thr2 = threads->create(\&check, $d, 2*$d, $ssdeep);
my $thr3 = threads->create(\&check, 2*$d, 3*$d, $ssdeep);
my $thr4 = threads->create(\&check, 3*$d, $cluster_cnt, $ssdeep);
my ($ret1, $ret2, $ret3, $ret4);
$ret1 = $thr1->join();
$ret2 = $thr2->join();
$ret3 = $thr3->join();
$ret4 = $thr4->join();
if($ret1 != -1) {
$MRU_ssdeep = $ssdeep;
$MRU_cnt = $ret1;
return $MRU_cnt;
} elsif($ret2 != -1) {
$MRU_ssdeep = $ssdeep;
$MRU_cnt = $ret2;
return $MRU_cnt;
} elsif($ret3 != -1) {
$MRU_ssdeep = $ssdeep;
$MRU_cnt = $ret3;
return $MRU_cnt;
} elsif($ret4 != -1) {
$MRU_ssdeep = $ssdeep;
$MRU_cnt = $ret4;
return $MRU_cnt;
} else {
$cluster_base[$cluster_cnt] = $ssdeep;
$MRU_ssdeep = $ssdeep;
$MRU_cnt = $cluster_cnt;
$cluster_cnt++;
return $MRU_cnt;
}
and the code for chech:
sub check($$$) {
my $from = shift;
my $to = shift;
my $ssdeep = shift;
for( my $icnt = $from; $icnt < $to; $icnt++ ) {
my $cmp_result = ssdeep_compare( $cluster_base[$icnt], $ssdeep );
if( $cmp_result > 85 ) {
return $icnt;
}
}
return -1;
}
But this process takes very much time( for 20-30MB csv file it takes 8-9Hours).
I have also tried using multithreading while checking in Cluster but not much help i got from this.
Since their is no need of csv parser like Text::CSV (because of less operation on csv) i didn't used it.
can anybody please solve my issue? Is it possible to use hadoop or some other frameworks for grouping based on ssdeep?

There is a hint from Optimizing ssDeep for use at scale (2015-11-27).
Depends on your purpose, loop and match SSDEEP in different chunk size will create a N x (N-1) hash comparison. Unless you need to find partial contents, otherwise, avoid it.
It is possible to breakdown of the hash index in step 1 as suggested in the article. This is a better way for partial contents match with different chunk size.
It is possible to reduce SSDEEP hash by grouping similar hash by generate a "distance cousin" hash.

Related

Parsing a config file with nested braces

I put this script together to get what is inside the member's brackets.
This is not JSON - is it just a bracketed config file. I am trying to think of a better way to do it.
This shop is very limited on the modules that they use, which is kind of why I had to use
the on-off tags. If this could really done by a module, then I suppose I could get a requisition
to add a new module.
Be that as it may, is there a way to just get what is inside the member { ... } like using a while loop instead of tagging each bracket with an on or off tag.
#!/sbcimp/dyn/data/EVT/GSD/scripts/perl/bin/perl
use strict;
use warnings;
my %penny_hash = ();
my $penny_file = "penny_config.cfg";
open(FOOO, "<$penny_file") or die "Can't open $penny_file for reading: $!";
foreach my $line (<FOOO>) {
chomp($line);
if ($line =~ /^\s*TickSize\s+=\s+\{/) {
$ticksize_flag = 1;
}
elsif (($ticksize_flag == 1) && ($line =~ /^\s*penny.*\s+=\s+\{/)) {
$penny_flag = 1;
}
elsif (($penny_flag == 1) && ($line =~ /^\s*members\s+=\s+\{/)) {
$member_flag = 1;
}
elsif (($member_flag == 1) && ($line =~ /\}/)) {
$member_flag = 0;
}
elsif (($member_flag == 1) && ($line =~ /\s*(\S+)\s*$/)) {
print $line ;
print "\n";
}
}
This is the config file
# Tick Size
# tr -d \ | sort -u
TickSizePostOpen = 1.00
TickSize = {
penny1 = {
TickSize = 0.00
members = {
IWM
QQQ
SPY
SPY_TEST
}
}
penny = {
TickSize = </0.000/0.00/0.00
members = {
A
AA
AAL
AAPL
ABT
ABT_SPIN
ZNGA
}
}
notpenny = {
TickSize = </9/9.99/9.99
}
}
BIPP.QuoterOx = {
Cup1and2 = {
BIPP.QuoterOx = BIPP-ox-1
members = {
Cup_1
Cup_2
}
}
TRAP.PxByGroup = 1
TRAP.Px = {
group1 = {
TRAP.Px = OPTxxxxxx
members = {
px.TRAP.1
}
}
group2 = {
TRAP.Px = OPTxxxxxx
members = {
px.TRAP.2
}
}
TRAP.QuoterOx = {
Cup0 = {
TRAP.QuoterOx = QESxxxxx
members = {
Cup_0
Cup_99
}
}
Cup1and4and10 = {
TRAP.QuoterOx = ise-ox-1-4-10-dti
members = {
Cup_1
Cup_4
Cup_10
}
}
Cup56 = {
TRAP.AuctionOx = ise-ox-56-ecl
members = {
Cup_56
}
}
}
TRAP.RotateQuote = {
rotatenames = {
TRAP.RotateQuote = 1
members = {
AAPL
ADY
AEIS
AFAM
AGP
ALNY
ZINC
}
}
}
Underlying = {
Cup0 = {
Underlying = MHR
members = {
Cup_0
}
}
g1 = {
Underlying = CEL
members = {
Cup_1
}
}
}
BIPP.Px = {
group1 = {
BIPP.Px = BOXPXMHR1
members = {
px.BIPP.1
}
}
group2 = {
BIPP.Px = BOXPXMHR2
members = {
px.BIPP.2
}
}
}
TWIG.Px = {
AB = {
TWIG.Px = TWIGPXMHR1
members = {
A
B
}
}
CD = {
TWIG.Px = TWIGPXMHR2
members = {
C
D
}
}
NOPQR = {
TWIG.Px = TWIGPXMHR6
members = {
N
O
P
Q
R
}
}
STUV = {
TWIG.Px = TWIGPXMHR7
members = {
S
T
U
V
}
}
WXYZ = {
TWIG.Px = TWIGPXMHR8
members = {
W
X
Y
Z
}
}
}
There are the results that I get:
IWM
QQQ
SPY
SPY_TEST
A
AA
AAL
AAPL
ABT
ZNGA
ZNGA
Bin_1
Bin_2
px.xisx.1
px.xisx.2
Bin_0
Bin_99
Bin_56
AAPL
ADY
AEIS
AFAM
AGP
ALNY
ZINC
Bin_0
Bin_1
px.xbox.1
px.xbox.2
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
What you have written just strips the comments and field names from the original file. There is no need to go to such lengths to do that.
This program produces the same output
use strict;
use warnings;
use autodie;
open my $fh, '<', 'penny_config.cfg';
while (<$fh>) {
next unless /\S/;
print unless /[#{}=]/;
}
Someone already comment that you should look at Config::General or YAML. You should look at Config::General and YAML.
YAML is JSON compatible, but offers more features than JSON. Since your data already looks a little bit like JSON, converting it to JSON would allow you to use existing JSON or YAML tools for interpreting it.
Config::General is modelled after the Apache Configuration File Format and has a better feature set than YAML. Once you become familiar enough with Config::General it would be just as easy to migrate your data to it as JSON. For one of my projects I wrote a configuration DSL in on top of Config::General. Config::General supports Comments, HereDocs, and multiple values for an item.
YAML and JSON are widely implemented in languages other than Perl. Config::General is Perl only.
If you can get modules installed I recommend that regardless of which you decide to use that you request installation of Config::General, YAML::XS, JSON, and or whichever alternate JSON modules you like. If that is a problem there are a number of solutions for installing Perl Modules either in your home directory or in your application.

Perl using Win32::PerfLib

I'm trying to understand Win32::PerfLib better, and I mustn't use Win32::PerfMon.
Two example I have questions about:
First example, is the classic from CPAN:
use Win32::PerfLib;
my $server = "";`enter code here`
Win32::PerfLib::GetCounterNames($server, \%counter);
%r_counter = map { $counter{$_} => $_ } keys %counter;
# retrieve the id for process object
$process_obj = $r_counter{Process};
# retrieve the id for the process ID counter
$process_id = $r_counter{'ID Process'};
# create connection to $server
$perflib = new Win32::PerfLib($server);
$proc_ref = {};
# get the performance data for the process object
$perflib->GetObjectList($process_obj, $proc_ref);
$perflib->Close();
$instance_ref = $proc_ref->{Objects}->{$process_obj}->{Instances};
foreach $p (sort keys %{$instance_ref})
{
$counter_ref = $instance_ref->{$p}->{Counters};
foreach $i (keys %{$counter_ref})
{
if($counter_ref->{$i}->{CounterNameTitleIndex} == $process_id)
{
printf( "% 6d %s\n", $counter_ref->{$i}->{Counter},
$instance_ref->{$p}->{Name}
);
}
}
}
Could someone explain in depth the 4th line?
I didn't understand why we use $_ for and
what it represents, although I read about it
but in this case I don't know. In addition
what's the $counter{$_} => $_ meaning?
Second question is from this code, which gets the cpu %
from perfmon:
use Win32::PerfLib;
($server) = #ARGV;
# only needed for PrintHash subroutine
#Win32::PerfLib::GetCounterNames($server, \%counter);
$processor = 238;
$proctime = 6;
$perflib = new Win32::PerfLib($server);
$proc_ref0 = {};
$proc_ref1 = {};
$perflib->GetObjectList($processor, $proc_ref0);
sleep 5;
$perflib->GetObjectList($processor, $proc_ref1);
$perflib->Close();
$instance_ref0 = $proc_ref0->{Objects}->{$processor}->{Instances};
$instance_ref1 = $proc_ref1->{Objects}->{$processor}->{Instances};
foreach $p (keys %{$instance_ref0})
{
$counter_ref0 = $instance_ref0->{$p}->{Counters};
$counter_ref1 = $instance_ref1->{$p}->{Counters};
foreach $i (keys %{$counter_ref0})
{
next if $instance_ref0->{$p}->{Name} eq "_Total";
if($counter_ref0->{$i}->{CounterNameTitleIndex} == $proctime)
{
$Numerator0 = $counter_ref0->{$i}->{Counter};
$Denominator0 = $proc_ref0->{PerfTime100nSec};
$Numerator1 = $counter_ref1->{$i}->{Counter};
$Denominator1 = $proc_ref1->{PerfTime100nSec};
$proc_time{$p} = (1- (($Numerator1 - $Numerator0) /
($Denominator1 - $Denominator0 ))) * 100;
printf "Instance $p: %.2f\%\n", $proc_time{$p};
}
}
}
Why does the programmer had to use the method "GetObjectList"
Two times and put the sleep method between them?
And why we can't just take the cpu percent like perfmon shows
and we have to make all those calculations?
Thanks in advance,
Fam Pam.
In this code:
Win32::PerfLib::GetCounterNames($server, \%counter);
%r_counter = map { $counter{$_} => $_ } keys %counter;
You are stroing the perfdata in %counter hash. The map in this case creates a reverse hash where the earlier values becomes keys.
Example:
from apple => 'fruit' to fruit => 'apple

socks 5 proxy on Perl

I am new in Perl and trying to understand this cod in Link : http://codepaste.ru/1374/but I have some problem in understanding this part of code:
while($client || $target) {
my $rin = "";
vec($rin, fileno($client), 1) = 1 if $client;
vec($rin, fileno($target), 1) = 1 if $target;
my($rout, $eout);
select($rout = $rin, undef, $eout = $rin, 120);
if (!$rout && !$eout) { return; }
my $cbuffer = "";
my $tbuffer = "";
if ($client && (vec($eout, fileno($client), 1) || vec($rout, fileno($client), 1))) {
my $result = sysread($client, $tbuffer, 1024);
if (!defined($result) || !$result) { return; }
}
if ($target && (vec($eout, fileno($target), 1) || vec($rout, fileno($target), 1))) {
my $result = sysread($target, $cbuffer, 1024);
if (!defined($result) || !$result) { return; }
}
if ($fh && $tbuffer) { print $fh $tbuffer; }
while (my $len = length($tbuffer)) {
my $res = syswrite($target, $tbuffer, $len);
if ($res > 0) { $tbuffer = substr($tbuffer, $res); } else { return; }
}
while (my $len = length($cbuffer)) {
my $res = syswrite($client, $cbuffer, $len);
if ($res > 0) { $cbuffer = substr($cbuffer, $res); } else { return; }
}
}
can any body explain to me exactly whats happen in these lines:
vec($rin, fileno($client), 1) = 1 if $client;
vec($rin, fileno($target), 1) = 1 if $target;
and
select($rout = $rin, undef, $eout = $rin, 120);
Basically, the select operator is used to find which of your file descriptors are ready (readable, writable or there's an error condition). It will wait until one of the file descriptors is ready, or timeout.
select RBITS, WBITS, EBITS, TIMEOUT
RBITS is a bit mask, usually stored as a string, representing a set of file descriptors that select will wait for readability. Each bit of RBITS represent a file descriptor, and the offset of the file descriptor in this bit mask should the file descriptor number in system. Thus, you could use vec to generate this bit mask.
vec EXPR, OFFSET, BITS
The vec function provides storage of lists of unsigned integers. EXPR is a bit string, OFFSET is the offset of bit in EXPR, and BITS specifies the width of each element you're reading from / writing to EXPR.
So these 2 lines:
vec($rin, fileno($client), 1) = 1;
vec($rin, fileno($target), 1) = 1;
They made up a bit mask string $rin with setting the bit whose offset equals the file descriptor number of $client, as well as the one of $target.
Put it into select operator:
select($rout = $rin, undef, $eout = $rin, 120);
Then select will monitor the readability of the two file handlers ($client and $target), if one of them is ready, select will return. Or it will return after 120s if no one is ready.
WBITS, EBITS use the same methodology. So you could infer that the above select line will also return when the two file handler have any exceptions.

COnverting atime from LDAP to Perl

I have created a script in Perl to connect to LDAP, retrieve values and post them to a CSV file. The values I am retrieving via a query are d"distinguished name, userAccountControl & pwdLastSet. I can pull and parse the first two results correctly and post them to the CSV file, but the pwdLastSet is returning WIN32::OLE=HASH(0x.......). I have tired sprintf, hex(), and the results are either the WIN32 value or 0. I am expecting something 18 digits in length. Thanks for the help.
#!/usr/bin/perl
use xSV;
use Win32;
use Win32::OLE;
# use strict;
.
.
.
.
while ($line = <GROUPS>) {
chomp($line);
if ($line =~ m/^ user .*/) {
$line =~ s/^ user.\s//;
my ($objRootDSE, $strDomain, $strUsername, $objConnection, $objCommand, $objRecordSet, $strDN, $arrSplitResponse, $strLName, $strFName, $strUserType);
use constant ADS_SCOPE_SUBTREE => 2;
# Get domain components
$objRootDSE = Win32::OLE->GetObject('LDAP://RootDSE');
$strDomain = $objRootDSE->Get('DefaultNamingContext');
# Get username to search for
$strUsername = $line;
# Set ADO connection
$objConnection = Win32::OLE->new('ADODB.Connection');
$objConnection->{Provider} = 'ADsDSOObject';
$objConnection->Open('Active Directory Provider');
# Set ADO command
$objCommand = Win32::OLE->new('ADODB.Command');
$objCommand->{ActiveConnection} = $objConnection;
$objCommand->SetProperty("Properties", 'Searchscope', ADS_SCOPE_SUBTREE);
$objCommand->{CommandText} = 'SELECT distinguishedName, userAccountControl, pwdLastSet FROM \'LDAP://' . $strDomain . '\' WHERE objectCategory=\'user\' AND samAccountName = \'' . $strUsername . '\'';
# Set recordset to hold the query result
$objRecordSet = $objCommand->Execute;
# If a user was found - Retrieve the distinguishedName
if (!$objRecordSet->EOF) {
$strDN = $objRecordSet->Fields('distinguishedName')->Value;
$strAcctControl = $objRecordSet->Fields('userAccountControl')->Value;
$strpwdLS = sprintf($objRecordSet->Fields('pwdLastSet')->Value);
#arrSplitResponse = split(/,/, $strDN);
$strLName = substr($arrSplitResponse[0],3);
if ($strLName =~ m/\\$/) {
$strLName = substr($strLName,0,-1);
}
$strFName = $arrSplitResponse[1];
if ($strFName =~ m/OU=/) {
$strUserType = $strFName;
$strFName = "";
$strUserType = substr($strUserType,3);
} else {
$strUserType = substr($arrSplitResponse[2],3);
}
if ($strAcctControl == 512) {
$strAcctControl = "Active";
} else {
$strAcctControl = "Disabled";
}
} else {
print "No user found";
}
&debug("Match!: $line in $group\n");
$csv->print_data(
AccountName => $line,
LastName => $strLName,
FirstName => $strFName,
SYSGenericAcct => $strUserType,
AccessLevel => $group,
AccessCapability => "User",
Description => $desc,
Status => $strAcctControl,
LastPwdChange => $strpwdLS
);
} else {
$group = $line;
chomp($desc = <GROUPS>);
chomp($group2 = <GROUPS>);
&debug("$group\n$desc\n$group\n");
}
}
Use Net::Ldap to search AD server. It is fast and it is portable. It is possible to search AD server from other hosts, even from linux. It is a fast and mature module.
You could also do some debug, using Data::Dumper.
use Data::Dumper;
...
print Dumper($strpwdLS);
I found this thread: http://code.activestate.com/lists/pdk/3876/
# Calculate password age in days
my $PWage;
my $LastPW = $item->{pwdLastSet};
my $fRef = ref ($LastPW);
my ($Hval, $Lval);
if ($fRef eq 'Win32::OLE' )
{
$Hval = $LastPW->HighPart;
$Lval = $LastPW->LowPart;
my $Factor = 10000000; # convert to seconds
my $uPval = pack("II",$Lval,$Hval);
my ($bVp, $aVp) = unpack("LL", $uPval);
$uPval = ($aVp*2**32+$bVp)/$Factor;
if ($uPval != 0)
{
$uPval -= 134774*86400; #Adjust for perl time!
my $EpochSeconds = time;
$PWage = ($EpochSeconds - int($uPval))/(60*60*24) ;
$PWage =~ s/\..*$//;
}
}

Does any one have a working script of LoadRunner Automation API?

Currently I am writing Perl script that creates LoadRunner scenario, execute the test, collect the result, recover the environment and repeat the cycle again with different scenario variables.
I don't have a problem creating new scenario, adding generator, adding 2 groups + script + the run-time settings. But I am having a problem with:
Setting scenario schedule from "Scenario" to "Group".
Setting schedule per group
This the snippet of the code:
use strict;
use v5.10;
use Win32::OLE;
use Win32::OLE::Enum;
use Win32::OLE::Variant;
use Data::Dumper;
use Win32::OLE::Const 'LoadRunner Automation Library';
use constant False => Variant(VT_BOOL,'');
use constant True => Variant(VT_BOOL,1);
my $lrEngine = Win32::OLE->new('wlrun.LrEngine') or die "oops\n";
my $lrScenario = $lrEngine->Scenario();
my $rc = $lrScenario->new(0, 1); # do not save previous, Regular vusers based scenario
if ($rc != 0) {
print "Win32::OLE::LastError: ".Win32::OLE::LastError()."\n";
print "lrScenario->new(0, 1):rc: $rc\n";
}
# snip-snipped - add generator
# snip-snipped - add #groups definition
foreach my $group (#groups) {
print "scriptName: $group->{scriptName}\n";
my $scriptLocation = $group->{scriptLocation};
my $scriptName = Variant(VT_BSTR|VT_BYREF, $group->{scriptName});
{ # add $group->{scriptName} script
$rc = $lrScenario->Scripts->Add($scriptLocation, $scriptName);
if ($rc != 0) {
print "Win32::OLE::LastError: ".Win32::OLE::LastError()."\n";
print "lrScenario->Scripts->Add($scriptLocation, $scriptName):rc: $rc\n";
}
}
#############################################################################
my $groupName = Variant(VT_BSTR|VT_BYREF, $group->{groupName});
{ # add $group->{groupName} group
$rc = $lrScenario->Groups->Add($groupName);
if ($rc != 0) {
print "Win32::OLE::LastError: ".Win32::OLE::LastError()."\n";
print "lrScenario->Groups->Add:rc: $rc\n";
}
$rc = $lrScenario->Groups->Item($groupName)->AddVusers($scriptName, $hostname, 3);
if ($rc != 0) {
print "Win32::OLE::LastError: ".Win32::OLE::LastError()."\n";
print "lrScenario->Groups->Item($groupName)->AddVusers:rc: $rc\n";
}
}
#############################################################################
# snip-snipped - change group script run time setting
}
my $scheduleName = Variant(VT_BSTR|VT_BYREF, 'Schedule123');
my $lrManualScheduleData = $lrScenario->ManualScheduler->AddSchedule($scheduleName, lrGroupSchedule); # Scenario schedule mode
if (!$lrManualScheduleData) {
say "Win32::OLE::LastError: ".Win32::OLE::LastError();
say "lrScenario->ManualScheduler->AddSchedule:rc: $rc";
}
$rc = $lrScenario->ManualScheduler->SetCurrentSchedule($scheduleName);
if ($rc != 0) {
say "Win32::OLE::LastError: ".Win32::OLE::LastError();
say "lrScenario->ManualScheduler->SetCurrentSchedule:rc: $rc";
}
print "\$lrScenario->ManualScheduler->SetScheduleMode($scheduleName, lrGroupSchedule):";
$lrScenario->ManualScheduler->SetScheduleMode($scheduleName, lrGroupSchedule);
#LrManualScheduleMode -> lrGroupSchedule = 1, lrScenarioSchedule = 0
say "Win32::OLE::LastError: ".Win32::OLE::LastError();
$lrManualScheduleData->{'InitAllBeforeRun'} = 'True';
$lrManualScheduleData->{'DurationMode'} = 1;
$lrManualScheduleData->{'Duration'} = 60 * 60;
$lrManualScheduleData->{'RampupBatchSize'} = 1;
$lrManualScheduleData->{'RampupMode'} = lrRampupByGroupBatches;
$lrManualScheduleData->{'RampupTimeInterval'} = 5;
$lrManualScheduleData->{'RampdownBatchSize'} = 1;
$lrManualScheduleData->{'RampdownMode'} = lrRampupByGroupBatches;
$lrManualScheduleData->{'RampdownTimeInterval'} = 5;
$rc = $lrScenario->ManualScheduler->{'ScenarioStartTimeMode'} = 0; # Start scenario without delay
#test
say "$scheduleName: ".$lrScenario->ManualScheduler->Schedule($scheduleName)->{'Duration'}; # returns 300
I have the same problem. Setting those properties and then calling either setschedulemode or setcurrentschedule doesn't seem to work. The only workaround I have found is to use the setscheduledata method passing in xml. You will need to get the current xml for the scheduledata and then change the xml, passing in the modified xml to the setscheduledata method. Hopefully this helps
lrManualScheduleData data = engine.Scenario.ManualScheduler.get_Schedule("Schedule 1");
String scheduleXML,errStr;
int returncode = data.getScheduleData(out scheduleXML, out errStr);
// Manipulate the XML to set whatever schedule you want
data.SetScheduleData(scheduleXML, out errStr);