Best data structure for searching record - perl

I am having a lot of IDs and I want to store them for a temporary purpose and need to search that record for some operation. Which data structure is good for this operation in Perl? Should I use a hash or an array, or is there any other module I could use to do this efficiently?
The records are 4343, 34343, 34343, 2323, 232, ....

A little more information regarding your record layout would go a long way in helping people help you. If your records are linked to id numbers then you can use a hash with the 'id' as the key and store the record as a string or an array reference as the hash value
my %records;
$records{ $id_number } = "Record for $id_number";
## OR
$records{ $id_number } = ['Record', 'for', $id_number];
This will allow you to lookup id's with complexity O(1) and easily manipulate the corresponding record.
# Assuming the records are stored in #records
for my $record (#records) {
$recStore{$record}++;
}
# To search for a record
my $recToFind = 4343;
my $recExists = $recStore{$recToFind} || 0;
The keys of the hash are the id's retrieved from your database and the corresponding values are the number of times the id was found, so for repeating records $recExists will be greater than 1, and for non-existent records it will be equal to 0. To get a list of all id's sorted numerically you could write
my #sortedID = sort {$a <=> $b} keys %records;

Related

How to retrieve a key in a hash table and then update the value in Powershell?

I have a hash table where keys represent email addresses and the values represent a count. The if check sees if the email address is in the hash table, if it is not contained, then add it to the hash table and increment the count.
If the email address is present in the hash table, how do I retrieve the key and then update the value counter?
Thank you!
$targeted_hash = #{}
$count = 0
foreach ($group in $targeted)
{
if (!$targeted_hash.ContainsKey('group.ManagerEmail'))
{
$targeted_hash.Add($group.ManagerEmail, $count + 1)
}
else
{
#TODO
}
}
PowerShell offers two convenient shortcuts:
assigning to an entry by key updates a preexisting entry, if present, or creates an entry for that key on demand.
using ++, the increment operator on a newly created entry implicitly defaults the value to 0 and therefore initializes the entry to 1
Therefore:
$targeted_hash = #{}
foreach ($group in $targeted)
{
$targeted_hash[$group.ManagerEmail]++
}
After the loop, the hash table will contain entries for all distinct manager email addresses containing the count of their occurrence in input array $group.

Sorting hash table (of hash tables)

I have hit a wall trying to get a hash table of hash tables to sort. It seems like the act of sorting is turning the hash table into something else, and I am unable to walk the new structure.
$mainHashTable = #{}
$mainHashTable.Add('B', #{'one'='B1'; 'two'='B2'; 'three'='B3'})
$mainHashTable.Add('D', #{'one'='D1'; 'two'='D2'; 'three'='D3'})
$mainHashTable.Add('A', #{'one'='A1'; 'two'='A2'; 'three'='A3'})
$mainHashTable.Add('C', #{'one'='C1'; 'two'='C2'; 'three'='C3'})
CLS
$mainHashTable
foreach ($hashtable in $mainHashTable.keys) {
foreach ($itemKey in $mainHashTable.$hashtable.keys) {
Write-Host "$itemKey $($mainHashTable.$hashtable.$itemKey)!"
}
}
Write-Host
$sortedHashTable = $mainHashTable.GetEnumerator() | sort-object -property name
$sortedHashTable
foreach ($hashtable_S in $sortedHashTable.keys) {
foreach ($itemKey_S in $sortedHashTable.$hashtable_S.keys) {
Write-Host "$itemKey_S $($sortedHashTable.$hashtable_S.$itemKey2)!"
}
}
The two lines that dump $mainHashTable & $sortedHashTable to the console look like everything is fine. But the second loop set does nothing. I have tried casting like this
$sortedHashTable = [hashtable]($mainHashTable.GetEnumerator() | sort-object -property name)
and that just throws the error
Cannot convert the "System.Object[]" value of type "System.Object[]"
to type "System.Collections.Hashtable".
So, is there some way to convert the system object to a hash table, so the loop structure works on the sorted results? Or am I better off learning to walk the structure of the System.Object? And (academically perhaps), is there a way to sort a hash table and get back a hash table?
What you're currently doing is splitting the hashtable into a list of separate entries (via .GetEnumerator()) and then sorting that - so you end up with $sortedHashTable being just an array of key/value pair objects, sorted by key, not an actual [hashtable].
is there a way to sort a hash table and get back a hash table?
No - you can't "sort a hash table" inline, because hashtables do not retain any guaranteed key order.
The way to go here is to copy the entries to an ordered dictionary, in order:
$sortedHashTable = [ordered]#{}
# Sort the keys, then insert into our ordered dictionary
foreach($key in $mainHashTable.Keys |Sort-Object){
$sortedHashTable[$key] = $mainHashTable[$key]
}

Fetch data with one row and one column from table using Perl DBI

I am trying to fetch data like (Select 1 from table) which returns data with one row and one column.
I dont want to use $sth->fetchrow_array method to retreive the data in to array. Is there any way to collect the data into scalar variable direclty?
fetchrow_array returns a list —it's impossible to return an array— and you can assign that to anything list-like such as a my().
my $sth = $dbh->prepare($stmt);
$sth->execute();
my ($var) = $sth->fetchrow_array()
and $sth->finish();
Or you could simply use
my ($var) = $dbh->selectrow_array($stmt);
my ($value) = #{$dbh−>selectcol_arrayref("select 1 from table")}
or better
my ($value) = $dbh−>selectrow_array($statement);

pymongo - ensureIndex and upserts

I have a simple dict that defines a base record as shown below:
record = {
'h': site_hash, #combination of date (below) and site id hashed with md5
'dt': d, # date - YYYYMMDD
'si': data['site'], # site id
'cl': data['client'], # client id
'nt': data['type'], # site type
}
Then I call the following to update the record if it doesn't exist with the following:
collection.update(
record,
{'$inc':updates}, # updates contain some values that increase such as events: 1, actions:1, etc
True # do upsert
);
I was wondering if I change the above to the following if it would have better performance since the code below only looks existing 'h' values instead of h/dt/si/cl/nt and I'd only need ensureIndex on the 'h' field. However, obviously $set would execute every time causing more writes the record as opposed to just $inc.
record = {
'h': site_hash, #combination of date (below) and site id hashed with md5
}
values = {
'dt': d, # date - YYYYMMDD
'si': data['site'], # site id
'cl': data['client'], # client id
'nt': data['type'], # site type
}
collection.update(
record,
{'$inc':updates,'$set':values},
True # do upsert
);
Does anyone have any tips or suggestions on best practice here?
If 'h' is already unique then you can just create an index on h, there's no need to index 'dt', 'si', etc. In that case I expect your first example to be a little more performant under very heavy load, for the somewhat obscure reason that it will create smaller entries in the journal.

Access nested hash in Perl HoH without using keys()?

Consider the following HoH:
$h = {
a => {
1 => x
},
b => {
2 => y
},
...
}
Is there a way to check whether a hash key exists on the second nested level without calling keys(%$h)? For example, I want to say something like:
if ( exists($h->{*}->{1}) ) { ...
(I realize you can't use * as a hash key wildcard, but you get the idea...)
I'm trying to avoid using keys() because it will reset the hash iterator and I am iterating over $h in a loop using:
while ( (my ($key, $value) = each %$h) ) {
...
}
The closest language construct I could find is the smart match operator (~~) mentioned here (and no mention in the perlref perldoc), but even if ~~ was available in the version of Perl I'm constrained to using (5.8.4), from what I can tell it wouldn't work in this case.
If it can't be done I suppose I'll copy the keys into an array or hash before entering my while loop (which is how I started), but I was hoping to avoid the overhead.
Not really. If you need to do that, I think I'd create a merged hash listing all the second level keys (before starting your main loop):
my $h = {
a => {
1 => 'x'
},
b => {
2 => 'y'
},
};
my %all = map { %$_ } values %$h;
Then your exists($h->{*}->{1}) becomes exists($all{1}). Of course, this won't work if you're modifying the second-level hashes inside the loop (unless you update %all appropriately). The code also assumes that all values in $h are hashrefs, but that would be easy to fix if necessary.
No. each uses the hash's iterator, and you cannot iterate over a hash without using its iterator, not even in the C API. (That means smart match wouldn't help anyway.)
Since each hash has its own iterator, you must be calling keys on the same hash that you are already iterating over using each to run into this problem. Since you have no problem calling keys on that hash, could you just simply use keys instead of each? Or maybe call keys once, store the result, then iterate over the stored keys?
You will almost certainly find that the 'overhead' of aggregating the second-level hashes is less than that of any other solution. A simple hash lookup is far faster than iterating over the entire data structure every time you want to make the check.
are you trying to do this without any while loop? You can test for existence in a hash just by referencing it, without generating an error
while ( my ($key, $value) = each %{$h} ) {
if ($value->{1}) { .. }
}
Why not do this in Sybase itself instead of Perl?
You are trying to do a set operation which is what Sybase is built to do in the first place.
Assuming you retrieved the data from table with columns "key1", "key2", "valye" as "select *", simply do:
-- Make sure mytable has index on key1
SELECT key1
FRIN mytable t1
WHERE NOT EXISTS (
SELECT 1 FROM mytable t2
WHERE t1.key1=t2.key1
AND t2.key2 = 1
)
-----------
-- OR
-----------
SELECT DISTINCT key1
INTO #t
FROM mytable
CREATE INDEX idx1_t on #t (key1)
DELETE #t
FROM mytable
WHERE #t.key1=mytable.key1
AND mytable.key2 = 1
SELECT key1 from #t
Either query returns a list of 1st level keys that don't have key2 of 1