I have a question: I want to store data in a variable with three columns and then process it. So I looked at an example with hash tables, which seemed great but then I need three columns, and I want to be able to run queries against this with it having say 100 rows.
What's the best way of doing this?
Example
You can create custom objects, each with three properties. That will give you the three column output. If you have V3 you can create custom objects using a hashtable like so:
$obj = [pscustomobject]#{Name='John';Age=42;Hobby='Music'}
PS> $obj | ft -auto
Name Age Hobby
---- --- -----
John 42 Music
If you are on V2 you can create these objects with New-Object:
$obj = new-object psobject -Property #{Name='John';Age=42;Hobby='Music'}
I would create an array or collection of custom PS Objects, each having 3 properties, then use the Powershell comparison operators on that array/collection to do my queries.
see:
Get-Help about_object_creation
Get-Help about_comparison_operators
Get-Help Where-Object
Related
I feel like this shouldn't be that hard, but I'm having trouble getting a data structure in mind that will give me what I want. I have a large amount of data and I need to find the instances where there are multiple Secondary Identifiers as defined below.
Primary Identifier,Secondary Identifier,Fruit
11111,1,apple
11111,1,pear
22222,1,banana
22222,1,grapefruit
33333,1,apple
33333,1,pear
33333,2,apple
33333,2,orange
That might not be a great example to use - but basically only two of the columns matter. What I'd really like is to return the Primary Identifiers where the unique count of Secondary Identifiers is greater than 1. So I'm thinking maybe a HashTable would be my best bet, but I tried to doing something in a pipeline oriented way and failed so I'm wondering if there is an easier method or Cmdlet that I haven't tried.
The final array (or hashtable) would be something like this:
ID Count of Secondary ID
----- ---------------------
11111 1
22222 1
33333 2
At that point, getting the instances of multiple would be as easy as $array | Where-Object {$_."Count of Secondary ID" -gt 1}
If this example sucks or what I'm after doesn't make sense, let me know and I can rewrite it; but it's almost like I need an implementation of Select-Object -Unique that would allow you to use two or more input objects/columns. Basically the same as Excel's remove duplicates and then selecting which headers to include. Except there are too many rows to open in Excel
Use Group-Object twice - first to group the objects by common Primary Identifier, then use Group-Object again to count the number of distinct Secondary Identifier's within each group:
$data = #'
Primary Identifier,Secondary Identifier,Fruit
11111,1,apple
11111,1,pear
22222,1,banana
22222,1,grapefruit
33333,1,apple
33333,1,pear
33333,2,apple
33333,2,orange
'# |ConvertFrom-Csv
$data |Group-Object 'Primary Identifier' |ForEach-Object {
[pscustomobject]#{
# Primary Identifier value will be the name of the group, since that's what we grouped by
'Primary Identifier' = $_.Name
# Use `Group-Object -NoElement` to count unique values - you could also use `Sort-Object -Unique`
'Count of distinct Secondary Identifiers' = #($_.Group |Group-Object 'Secondary Identifier' -NoElement).Count
}
}
I have two CSV files. The first CSV is Card Data, which holds about 30,000 records and contains the card's name, UUID, and price (which is currently empty). The second CSV is Pricing Data, which holds around 50,000 records and contains UUID and some pricing information for that specific UUID.
These are two separate CSV files that are generated elsewhere.
For each record in Card Data CSV I am taking the UUID and finding the corresponding UUID in the Pricing Data CSV using the Where-Object function in PowerShell. This is so I can find the pricing information for the respective card and run that through a pricing algorithm to generate a price for each record in the Card Data CSV.
At the moment is seems to take around 1 second per record in the Card Data CSV file and with 30,000 records to process, it would take over 8 hours to run through. Is there a better more efficient way to perform this task.
Code:
Function Calculate-Price ([float]$A, [float]$B, [float]$C) {
#Pricing Algorithm
....
$Card.'Price' = $CCPrice
}
$PricingData = Import-Csv "$Path\Pricing.csv"
$CardData = Import-Csv "$Update\Cards.csv"
Foreach ($Card In $CardData) {
$PricingCard = $PricingData | Where-Object { $_.UUID -eq $Card.UUID }
. Calculate-Price -A $PricingCard.'A-price' -B $PricingCard.'B-price' -C $PricingCard.'C-price'
}
$CardData | Select "Title","Price","UUID" |
Export-Csv -Path "$Update\CardsUpdated.csv" -NoTypeInformation
The first CSV is Card Data, which holds about 30,000 records
The second CSV is Pricing Data, which holds around 50,000 records
No wonder it's slow, you're calculating the expression $_.UUID -eq $Card.UUID ~1500000000 (that's 1.5 BILLION, or 1500 MILLION) times - that already sounds pretty compute-heavy, and we've not even considered the overhead from the pipeline having to bind input arguments to Where-Object the same amount of times.
Instead of using the array of objects returned by Import-Csv directly, use a hashtable to "index" the records in the data set you need to search, by the property that you're joining on later!
$PricingData = Import-Csv "$Path\Pricing.csv"
$CardData = Import-Csv "$Update\Cards.csv"
$PricingByUUID = #{}
$PricingData |ForEach-Object {
# Let's index the price cards using their UUID value
$PricingByUUID[$_.UUID] = $_
}
Foreach ($Card In $CardData) {
# No need to search through the whole set anymore
$PricingCard = $PricingByUUID[$Card.UUID]
. Calculate-Price -A $PricingCard.'A-price' -B $PricingCard.'B-price' -C $PricingCard.'C-price'
}
Under the hood, hashtables (and most other dictionary types in .NET) are implemented in a way so that they have extremely fast constant-time lookup/retrieval performance - which is exactly the kind of thing you want in this situation!
I'm mostly just looking to be pointed in the right direction so I can piece it together myself. I have a decent amount of batch file scripting experience. I'm a PS noob but I think PS would be better for the project below.
We have software which requires the client ID to be part of the install string (along with switches, usr/pass, other switches, logging paths, etc).
I've created a batch file (hundreds actually) which I execute with PSEXEC on remote machines which does work but it's burly to maintain. The only change in each is the client ID.
What I'm attempting to do is have a CSV with 2 columns as input (so I just have to maintain the CSV): machine name (as presented by %hostname%) & client ID. I want to create a script which matches %hostname% to a corresponding row in column 1, read the data in column 2 of the same row, and then be able to call that as a variable in the install string.
E.G.
If my CSV has bobs-pc in column 1, row 6, then insert the data from column 2, row 6 (let's call it 0006) in the following install string:
install.exe /client_ID=0006
no looping
I don't want it to install on all machines simultaneously due to the multiple time zones we operate in.
Something like this would be really useful for many projects I have so I'm more interested in learning than having anyone write it for me.
I understand I should be using Import-Csv. I've created a sample csv and can get certain fields to print out in PS. What I need is for a script to be able to insert those fields as variables in the install string.
Sounds like you want something along the lines of this, (assumes your CSV has a header row of col1 and col2):
$hostname = 'server1'
$value = Import-CSV myfile.csv | where { $_.col1 -eq $hostname } | select -expandproperty col2
Install.exe /client_id=$value
I am trying to read in a large CSV with millions of rows for testing. I know that I can treat the CSV as a database using the provider Microsoft.ACE.OLEDB.12.0
Using a small data set I am able to read the row contents positionally using .GetValue(int). I am having a tough time finding a better was to read the data (assuming there even is one.). If I know the column names before hand this is easy. However if I didn't know them I would have to read in the first line of the file to get that data which seems silly.
#"
id,first_name,last_name,email,ip_address
1,Edward,Richards,erichards0#businessweek.com,201.133.112.30
2,Jimmy,Scott,jscott1#clickbank.net,103.231.149.144
3,Marilyn,Williams,mwilliams2#chicagotribune.com,52.180.157.43
4,Frank,Morales,fmorales3#google.ru,218.175.165.205
5,Chris,Watson,cwatson4#ed.gov,75.251.1.149
6,Albert,Ross,aross5#abc.net.au,89.56.133.54
7,Diane,Daniels,ddaniels6#washingtonpost.com,197.156.129.45
8,Nancy,Carter,ncarter7#surveymonkey.com,75.162.65.142
9,John,Kennedy,jkennedy8#tumblr.com,85.35.177.235
10,Bonnie,Bradley,bbradley9#dagondesign.com,255.67.106.193
"# | Set-Content .\test.csv
$conn = New-Object System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='C:\Users\Matt';Extended Properties='Text;HDR=Yes;FMT=Delimited';")
$cmd=$conn.CreateCommand()
$cmd.CommandText="Select * from test.csv where first_name like '%n%'"
$conn.open()
$data = $cmd.ExecuteReader()
$data | ForEach-Object{
[pscustomobject]#{
id=$_.GetValue(0)
first_name=$_.GetValue(1)
last_name=$_.GetValue(2)
ip_address=$_.GetValue(4)
}
}
$cmd.Dispose()
$conn.Dispose()
Is there a better way to deal with the output from $cmd.ExecuteReader()? Finding hard to get information for a CSV import. Most of the web deals with exporting to CSV using this provider from a SQL database. The logic here would be applied to a large CSV so that I don't need to read the whole thing in just to ignore most of the data.
I should have looked closer on TechNet for the OleDbDataReader Class. There are a few methods and properties that help understand the data returned from the SQL statement.
FieldCount: Gets the number of columns in the current row.
So if nothing else you know how many columns your rows have.
Item[Int32]: Gets the value of the specified column in its native format given the column ordinal.
Which I can use to pull back the data from each row. This appears to work the same as GetValue().
GetName(Int32): Gets the name of the specified column.
So if you don't know what the column is named this is what you can use to get it from a given index.
There are many other methods and some properties but those are enough to shed light if you are not sure what data is contained within a csv (assuming you don't want to manually verify before hand). So, knowing that, a more dynamic way to get the same information would be...
$data | ForEach-Object{
# Save the current row as its own object so that it can be used in other scopes
$dataRow = $_
# Blank hashtable that will be built into a "row" object
$properties = #{}
# For every field that exists we will add it name and value to the hashtable
0..($dataRow.FieldCount - 1) | ForEach-Object{
$properties.($dataRow.GetName($_)) = $dataRow.Item($_)
}
# Send the newly created object down the pipeline.
[pscustomobject]$properties
}
$cmd.Dispose()
$conn.Dispose()
Only downside of this is that the columns will likely be output in not the same order as the originating CSV. That can be address by saving the row names in a separate variable and using a Select at the end of the pipe. This answer was mostly trying to make sense of the column names and values returned.
I created a table in a power shell script with some data in it. I need to find a way to do something like a replace in a column. For every value (x) i find in the column, change it to (y). How is this possible i power shell?
Thnaks in advance. Also, I could not find anything like this from google, and it has to be done after the table is already built, not while building the table columns and rows. Thanks!
I'm not sure exactly what you mean by a table. However, assuming you are referring to a collection of objects, it's simple:
$collectionToUpdate | Where-Object { $_.PropertyToCheck -eq $valueToCheck } | ForEach-Object { $_.PropertyToCheck = $replacementValue }
Obviously, replace the names of the variables and property with the correct values from your code.