Is there a way the define a Set data-structure in PowerShell?
In computer science, a set is an abstract data type that can store certain values, without any particular order, and no repeated values. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.
I need to use a data structure as keystore that:
assures no-repetitions;
minimizes the computational effort in order to retrieve and remove an element.
You can use the .NET HashSet class that is found under System.Collections.Generic:
$set = New-Object System.Collections.Generic.HashSet[int]
The collection guarantees unique items and the Add, Remove, and Contains methods all operate with O(1) complexity on average.
If you prefer to stick with native PowerShell types, you can use HashTable and just ignore the key values:
# Initialize the set
$set = #{}
# Add an item
$set.Add("foo", $true)
# Or, if you prefer add/update semantics
$set["foo"] = $true
# Check if item exists
if ($set.Contains("foo"))
{
echo "exists"
}
# Remove item
$set.Remove("foo")
For more information see: https://powershellexplained.com/2016-11-06-powershell-hashtable-everything-you-wanted-to-know-about/#removing-and-clearing-keys
Hashset is what you are looking for if you want to store only unique values in an array with relatively faster add, remove and find operations. It can be created as -
$set = [System.Collections.Generic.HashSet[int]]#()
Related
Not sure if this is science fiction, but would it be possible to create a type that represents an Array that matches a certain condition, such as being always sorted?
Or a 2-tuple where the first element is always bigger than the second?
What you're describing is called a dependent type (https://en.wikipedia.org/wiki/Dependent_type). Swift does not have these, and I'm not aware of any mainstream (non-research) language that does. You can of course create a special kind of collection that is indexed like an array and sorts itself whenever it is modified, and you can crate a struct with greater and lessor properties that always reorders itself. But these criteria cannot be attached to the existing Array or tuple types.
The RFC6020 says:
The "key" statement [...] takes as an argument a
string that specifies a space-separated list of leaf identifiers of
this list. [...] Each such leaf identifier MUST refer to a child leaf of the
list. The leafs can be defined directly in substatements to the
list, or in groupings used in the list.
Despite this fact it is possible to successfully validate the below grouping in pyang:
grouping my-grouping {
list my-list-in-a-grouping {
key there-is-no-such-leaf;
}
}
If the list is outside of a grouping, or if I use the grouping without any augmentations, then I get an error (which is expected):
error: the key "there-is-no-such-leaf" does not reference an existing leaf
What is the point of having groupings that require augmentations in order to be used?
According to Martin Bjorklund, an author of the related RFCs, this is not valid YANG. Pyang fails to detect this due to a bug in its implementation. The RFC text which you quoted in your question does not permit any other interpretation and appears to be intentional. Groupings were never meant to be used in such a way.
Could it be because grouping is not a data definition node and pyang validates only such nodes?
The grouping statement is not a data
definition statement and, as such, does not define any nodes in
the schema tree.
RFC6020
I have a LinkedHashSet which was created from a Seq. I used a LinkedHashSet because I need to keep the order of the Seq, but also ensure uniqueness, like a Set. I need to check this LinkedHashSet against another sequence to verify that various properties within them are the same. I assumed that I could loop through using an index, i, but it appears not. Here is an example of what I would like to accomplish.
var s: Seq[Int] = { 1 to mySeq.size }
return s.forall { i =>
myLHS.indexOf(i).something == mySeq.indexOf(i).something &&
myLHS.indexOf(i).somethingelse == mySeq.indexOf(i).somethingelse
}
So how do I access individual elements of the LHS?
Consider using the zip method on collections to create a collection of pairs (Tuples). The specifics of this depend on your specifics. You may want to do mySeq.zip(myLHS) or myLHS.zip(mySeq), which will create different structures. You probably want mySeq.zip(myLHS), but I'm guessing. Also, if the collections are very large, you may want to take a view first, e.g. mySeq.view.zip(myLHS) so that the pair collection is also non-strict.
Once you have this combined collection, you can use a for-comprehension (or directly, myZip.foreach) to traverse it.
A LinkedHashSet is not necessary in this situation. Since I made it from a Seq, it is already ordered. I do not have to convert it to a LHS in order to also make it unique. Apparently, Seq has the distinct method which will remove duplicates from the sequence. From there, I can access the items via their indexes.
I want to store a collection of data in ArrayList or Hastable but data retrival should be efficient and fast. I want to know the data structure hides between ArrayList and Hastable (i.e Linked list,Double Linked list)
An ArrayList is a dynamic array that grows as new items are added that go beyond the current capacity of the list. Items in ArrayList are accessed by index, much like an array.
The Hashtable is a hashtable behind the scenes. The underlying data structure is typically an array but instead of accessing via an index, you access via a key field which maps to a location in the hashtable by calling the key object's GetHashCode() method.
In general, ArrayList and Hashtable are discouraged in .NET 2.0 and above in favor of List<T> and Dictionary<TKey, TValue> which are much better generic versions that perform better and don't have boxing costs for value types.
I've got a blog post that compares the various benefits of each of the generic containers here that may be useful:
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
While it talks about the generic collecitons in particular, ArrayList would have similar complexity costs to List<T> and Hashtable to Dictionary<TKey, TValue>
A hashtable will map string values to values in your hashtable. An arraylist puts a bunch of items in numbered order.
Hastable ht = new Hastable();
ht("examplenum") = 5;
ht("examplenum2") = 7;
//Then to retrieve
int i = ht("example"); //value of 5
ArrayList al = new ArrayList();
al.Add(2);
al.Add(3);
//Then to retrieve
int j = al[0] //value of 2
As its name implies an ArrayList (or a List) is implemented with an Array... and in fact a Hashtable is also implemented with the same data structure. So both of them have a constant access cost (the best of all possible).
What you have to think about is what kind of key do you need. If your data must be accessed with an arbitrary key (for example, a string) you will not be able to use an ArrayList. Also, a Hashtable should be your preferred choice if the keys are not (more or less) correlative.
Hope it helps.
I'm just trying to get a grip on when you would need to use a hash and when it might be better to use an array. What kind of real-world object would a hash represent, say, in the case of strings?
I believe sometimes a hash is referred to as a "dictionary", and I think that's a good example in itself. If you want to look up the definition of a word, it's nice to just do something like:
definition['pernicious']
Instead of trying to figure out the correct numeric index that the definition would be stored at.
This answer assumes that by "hash" you're basically just referring to an associative array.
I think you're looking at things in the wrong direction. It is not the object which determines if you should use a hash but the manner in which you are accessing it. A common use of a hash is when using a lookup table. If your objects are strings and you want to check if they exist in a Dictionary, looking them up will (assuming the hash works properly) by O(1). WIth sorting, the time would instead be O(logn), which may not be acceptable.
Thus, hashes are ideal for use with Dictionaries (hashmaps), sets (hashsets), etc.
They are also a useful way of representing an object without storing the object itself (for passwords).
The phone book - key = name, value = phone number.
I also think of the old World Book Encyclopedias (actual books). Each article is "hashed" into a single book (cat goes in the "C" volume).
Any time you have data that is well served by a 1-to-1 map.
For example, grades in a class:
"John Smith" => "B+"
"Jacob Jenkens" => "C"
etc
In general hashes are used to find things fast - a hash map can be used to assosiate one thing with another fast, a hash set will just store things "fast".
Please consider also the hash function complexity and cost when considering whether it's better to use a hash container or a normal less then container - the additional size of the hash value and the time needed to compute a "perfect" hash, and the time needed to make a 1:1 comparision at the end in case of a hash function conflict may in fact be a lot higher then just going through a tree structure with logharitmic complexity using the less then operators.
When you need to associate one variable with another. There isn't a "type limit" to what can be a key/value in a hash.
Hashed have many uses. Aside from cryptographic uses, they are commonly used for quick lookups of information. To get similarly quick lookups using an array you would need to keep the array sorted and then used a binary search. With a hash you get the fast lookup without having to sort. This is the reason most scripting languages implement hashing under one name or another (dictionaries, et al).
I use one often for a "dictionary" of settings for my app.
Setting | Value
I load them from the database or config file, into hashtable for use by my app.
Works well, and is simple.
One example could be zip code associated with an area, city or any postal address.
A good example is a cache with lot's of elements in it. You have some identifer by which you want to look up the a value (say an URL, and you want to find the according cached webpage). You want these lookups to be as fast as possible and don't want to search through all the stored pages everytime some URL is requested. A hash table is a great data structure for a problem like this.
One real world example I just wrote is when I was adding up the amount people spent on meals when filing expense reports.I needed to get a daily total with no idea how many items would exist on a particular day and no idea what the date range for the expense report would be. There are restrictions on how much a person can expense with many variables (What city, weekend, etc...)
The hash table was the perfect tool to handle this. The key was the date the value was the receipt amount (converted to USD). The receipts could come in in any order, i just keep getting the value for that date and adding to it until the job was done. Displaying was easy as well.
(php code)
$david = new stdclass();
$david->name = "david";
$david->age = 12;
$david->id = 1;
$david->title = "manager";
$joe = new stdclass();
$joe->name = "joe";
$joe->age = 17;
$joe->id = 2;
$joe->title = "employee";
// option 1: lets put users by index
$users[] = $david;
$users[] = $joe;
// option 2: lets put users by title
$users[$david->title] = $david;
$users[$joe->title] = $joe;
now the question: who is the manager?
answer:
$users["manager"]