Copy block of text from webpage using PowerShell - powershell

I've extracted a whole web page as text and that text is assigned to a variable. Now I need to select a portion of that text and assign it to another variable. Let's say, the text I have is:
Note: Your feedback is very important to us, however, we do not
respond to individual submissions through this channel. If you require
support, please visit the Safety & Security Center. Follow: Change log
for version 1.211.2457.0 This page shows you what's changed in the
most recent definitions update for Microsoft antimalware and
antispyware software.
You can also see changes in the last 20 updates from the Change
definition version menu on the right.
The latest update is:
1.211.2457.0
Download the latest update.
 New definitions (?)
Antimalware (Antivirus + Antispyware)
I would like the following text to be assigned to a variable
1.211.2457.0
The code I have for now is
$URI = "http://www.example.com/mynewpage"
$HTML = Invoke-WebRequest -Uri $URI
$WebPageText = ($HTML.ParsedHtml.getElementsByTagName("div") | Where-Object{$_.className -eq "span bp0-col-1-1 bp1-col-1-1 bp2-col-1-1 bp3-col-1-1"}).innerText
I tried Select-String -SimpleMatch "The latest update is:*Download the latest update." -InputObject $WebPageText, but I'm pretty sure that's wrong.
I'm new to PowerShell scripting. So please pardon me if I'm missing something obvious.
Thank you in advance!

SimpleMatch would ignore any regex metacharaters. It would not allow any wildcards either. From TechNet:
Uses a simple match rather than a regular expression match. In a simple match, Select-String searches the input for the text in the Pattern parameter. It does not interpret the value of the Pattern parameter as a regular expression statement
What you could do is use regex to find a string where the line only contains digits and periods: "^[\d\.]+$".
$version = ($WebPageText | Select-String "^[\d\.]+$").Matches.Value
It is possible more that one could be returned so you might need to account for that.
If you wanted a more targeted (but no guaranteed unique result) you could just use the -match operator.
If(($WebPageText | out-string) -match "(?sm)The latest update is:\s+(.*?)\s+Download the latest update"){
$version = $Matches[1]
}

Related

in power shell I have a variable which contains a number I want to add some dots (.) to the number how I can do this?

here is a part of my script in powershell
$Uri = 'https://codecguide.com/download_k-lite_codec_pack_basic.htm'
$web = Invoke-WebRequest -UseBasicParsing -Uri $uri
( $downloadurl=$web.Links |Where-Object href -Like "*.exe" |Select-Object -First 1 -expand href )
$downloadurl -match "(\d{4,})"
( $latestversion = "$($Matches[1])" )
and the $lateestversion is 1730 , How I can change this number to look like 17.3.0 ???
thanks in advance
What you are asking is not possible to do in a reliable way. When the file is renamed in such a way that version number dots are simply removed, one cannot recover the lost information.
As Mathias commented, one cannot tell stripped-dot version number of 1730 apart from 17.3.0, 1.7.30, 1.7.3.0, 173.0, 17.30, or 1.73.0.
When you stuff 5 or more characters in a string that's only four characters long, you are going to have a collission. It's a mathematical fact, see the pigeonhole principle for further explanation.
What might be possible is to recover the dots, iff you know that the version follows a certain pattern. For example, if there always are two digits for the minor version and one for the build number, you can insert the lost dots. But as said, that requires information that one needs to have in advance.

How to sort the output of winget list by column in powershell?

I'm not getting the expected output when trying to sort the output from winget list in powershell. The Id column is not sorted.
# winget list | Sort-Object -Property Id
ScreenToGif NickeManarin.ScreenToGif 2.37.1 winget
Microsoft Visual C++ 2015-2019 Redist… Microsoft.VCRedist.2015+.x64 14.28.29325.2 14.34.318… winget
paint.net {28718A56-50EF-4867-B4C8-0860228B5EC9} 4.3.8
Python 3.10.0 (64-bit) {21b42743-c8f9-49d7-b8b6-b5855317c7ed} 3.10.150.0
Microsoft Support and Recovery Assist… 0527a644a4ddd31d 17.0.7018.4
-----------------------------------------------------------------------------------------------------------------------
Name Id Version Available Source
Paint 3D Microsoft.MSPaint_8wekyb3d8bbwe 6.2009.30067.0
Microsoft .NET SDK 6.0.402 (x64) Microsoft.DotNet.SDK.6 6.0.402 winget
3D Viewer Microsoft.Microsoft3DViewer_8wekyb3d8… 7.2010.15012.0
Microsoft Sticky Notes Microsoft.MicrosoftStickyNotes_8wekyb… 3.8.8.0
Q: How can I sort the output of winget list by the Id column in powershell?
I would like to see a powershell solution similar to the Bash sort -k <column-number>, to sort on any column. I fail to see why this obvious function is not available in powershell?
It outputs text, not an object with properties like "Id". This program's output isn't very smart. It looks like it outputs some special characters as well like … (U+2026 HORIZONTAL ELLIPSIS). The first thing that occurs to me is to cut off the first 39 characters and then sort it by column 40 onward, where Id starts. That should be like sort -k in unix. I believe a powershell version of winget is coming in the future. Replacing non-ascii with spaces and skipping the first 4 lines.
# or -creplace '\P{IsBasicLatin}'
(winget list) -replace '[^ -~]',' ' | select-object -skip 4 |
sort-object { $_.substring(39) }
Python 3.10.0 (64-bit) {21b42743-c8f9-49d7-b8b6-b5855317c7ed} 3.10.150.0
paint.net {28718A56-50EF-4867-B4C8-0860228B5EC9} 4.3.8
Microsoft Support and Recovery Assist 0527a644a4ddd31d 17.0.7018.4
Name Id Version Available Source
ScreenToGif NickeManarin.ScreenToGif 2.37.1 winget
Microsoft .NET SDK 6.0.402 (x64) Microsoft.DotNet.SDK.6 6.0.402 winget
3D Viewer Microsoft.Microsoft3DViewer_8wekyb3d8 7.2010.15012.0
Microsoft Sticky Notes Microsoft.MicrosoftStickyNotes_8wekyb 3.8.8.0
Paint 3D Microsoft.MSPaint_8wekyb3d8bbwe 6.2009.30067.0
Microsoft Visual C++ 2015-2019 Redist Microsoft.VCRedist.2015+.x64 14.28.29325.2 14.34.318 winget
Trying out the Cobalt module that uses Crescendo to parse Winget. There's no name property, and version is just a string (apparently these things are more of a challenge). There's a lot of guid's at the top.
install-module cobalt -scope currentuser
get-wingetpackage | sort id
ID Version Available Source
-- ------- --------- ------
{04F3299A-F322-45A6-8281-046777B9C736} 21.0.3
{0E8670B8-3965-4930-ADA6-570348B67153} 11.0.2100.60
{0EDB70B6-EEA7-413B-BBC4-89E2CD36EFDE} 11.5.18
#...
7zip.7zip 21.07 22.01 winget
Acrylic Suite
Acrylic Wi-Fi Home
To complement the existing, helpful answers:
I would like to see a powershell solution similar to the Bash sort -k <column-number>, to sort on any column.
I fail to see why this obvious function is not available in powershell?
The sort utility does not sort by columns with -k (--key); it sorts by fields, with any non-empty run of whitespace acting as the field separator by default.
Given that a field-based solution isn't possible here - the fields have fixed width, so there's no separator (-t, --field-separator) you can specify - you'd have to use -k 1.40 to achieve column-based sorting, which is (a) far from obvious and (b) is the equivalent of passing { $_.substring(39) } to Sort-Object's -Property parameter, as in js2010's answer.
winget list | Sort-Object -Property Id
While -Property Id would indeed be wonderful if it worked, it cannot be expected to work with the text representations that the external program winget.exe outputs: what PowerShell then sees in the pipeline are strings, about whose content nothing is known, so they can't be expected to have an .Id property.
Should the functionality provided by winget.exe ever be exposed in a PowerShell-native way,[1] i.e. via cmdlets, they would indeed produce (non-string) objects with properties that would allow you to use Sort-Object -Property Id.
Dealing with winget.exe directly comes with the following challenges, owing to its nonstandard behavior (see also the bottom section):
It doesn't respect the current console's code page and instead invariably outputs UTF-8-encoded output.
To compensate for that, [Console]::OutputEncoding must (temporarily) be set to [System.Text.UTF8Encoding]::new()
It doesn't modify its progress-display behavior based on whether its stdout stream is connected directly to a console (terminal) or not; that is, it should suppress progress information when its output is being captured or redirected, but it currently isn't.
To compensate for that, the initial output lines that are the result of winget.exe's progress display must be filtered out.
Thus, an adapted version of js2010's answer would look like this:
# Make PowerShell interpret winget.exe's output as UTF-8.
# You may want to restore the original [Console]::OutputEncoding afterwards.
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
(winget list) -match '^\p{L}' | # filter out progress-display and header-separator lines
Select-Object -Skip 1 | # skip the header line
Sort-Object { $_.Substring(39) }
Parsing winget.exe list output into objects:
The textual output from winget.exe list reveals an inherent limitation that PowerShell-native commands with their separation of data output from its presentation do not suffer from: truncating property values with … represents omission of information that cannot be recovered.
Thus, the following solution is limited by whatever information is present in winget.exe's textual output.
Assuming that helper function ConvertFrom-FixedColumnTable (source code below) is already defined, you can use it to transform the fixed-with-column textual output into objects ([pscustomobject] instances) whose properties correspond to the table's columns, which then allows you to sort by properties (columns), and generally enables OOP processing of the output.
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
(winget list) -match '^(\p{L}|-)' | # filter out progress-display lines
ConvertFrom-FixedColumnTable | # parse output into objects
Sort-Object Id | # sort by the ID property (column)
Format-Table # display the objects in tabular format
ConvertFrom-FixedColumnTable source code:
# Note:
# * Accepts input only via the pipeline, either line by line,
# or as a single, multi-line string.
# * The input is assumed to have a header line whose column names
# mark the start of each field
# * Column names are assumed to be *single words* (must not contain spaces).
# * The header line is assumed to be followed by a separator line
# (its format doesn't matter).
function ConvertFrom-FixedColumnTable {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)] [string] $InputObject
)
begin {
Set-StrictMode -Version 1
$lineNdx = 0
}
process {
$lines =
if ($InputObject.Contains("`n")) { $InputObject.TrimEnd("`r", "`n") -split '\r?\n' }
else { $InputObject }
foreach ($line in $lines) {
++$lineNdx
if ($lineNdx -eq 1) {
# header line
$headerLine = $line
}
elseif ($lineNdx -eq 2) {
# separator line
# Get the indices where the fields start.
$fieldStartIndices = [regex]::Matches($headerLine, '\b\S').Index
# Calculate the field lengths.
$fieldLengths = foreach ($i in 1..($fieldStartIndices.Count-1)) {
$fieldStartIndices[$i] - $fieldStartIndices[$i - 1] - 1
}
# Get the column names
$colNames = foreach ($i in 0..($fieldStartIndices.Count-1)) {
if ($i -eq $fieldStartIndices.Count-1) {
$headerLine.Substring($fieldStartIndices[$i]).Trim()
} else {
$headerLine.Substring($fieldStartIndices[$i], $fieldLengths[$i]).Trim()
}
}
}
else {
# data line
$oht = [ordered] #{} # ordered helper hashtable for object constructions.
$i = 0
foreach ($colName in $colNames) {
$oht[$colName] =
if ($fieldStartIndices[$i] -lt $line.Length) {
if ($fieldLengths[$i] -and $fieldStartIndices[$i] + $fieldLengths[$i] -le $line.Length) {
$line.Substring($fieldStartIndices[$i], $fieldLengths[$i]).Trim()
}
else {
$line.Substring($fieldStartIndices[$i]).Trim()
}
}
++$i
}
# Convert the helper hashable to an object and output it.
[pscustomobject] $oht
}
}
}
}
Optional reading: potential winget.exe improvements:
The fact that winget.exe doesn't honor the console code page (as reported by chcp / [Console]::OutputEncoding) and instead invariably outputs UTF-8 is problematic, but somewhat justifiable nowadays, given that UTF-8 has become the most widely used character encoding, across all platforms, and is capable of encoding all Unicode characters, whereas the legacy Windows code pages are limited to 256 characters. Other utilities have made a similar decision, notably node.exe, the NodeJS CLI (Python is non-standard too, but has chosen the legacy ANSI code page as its default, though can be configured to use UTF-8).
In fact, it is the use of UTF-8 that enables use of … (the horizontal ellipsis character U+2026) in the output, which is a space-efficient way to indicate omission of data (the ASCII alternative would be to use ..., i.e. three (.) characters.
winget.exe's encoding behavior isn't a problem if you've configured your (Windows 10 and above) system to use UTF-8 system-wide, which, however, has far-reaching consequences - see this answer.
Now that PowerShell (Core) itself consistently defaults to UTF-8, you could argue that even if the system as a whole doesn't use UTF-8 PowerShell console windows should - see GitHub issue #7233.
winget.exe should test whether its stdout stream is connected to a console (terminal) and only then output progress information, so as to avoid polluting its stdout data output.
The currently unavoidable truncation of column values that exceed the fixed column width could be avoided with an opt-in mechanism to provide output in a structured text format that is suitable for programmatic processing, such as CSV, similar to what the (now deprecated) wmic.exe utility has always offered with its /format option.
As noted, if in the future PowerShell cmdlets that provide the same functionality as winget.exe are made available, the problem wouldn't even arise there, given PowerShell's fundamental separation between (strongly typed) data and its - selectable - for-display representation.
[1] WinGet for PackageManagement is an example of a third-party module aimed at that.
Here is my take on the problem to avoid hardcoded position of the ID column. At least on my german-localized system, the column is one char off to the left. Search for the ID word in the header row to determine how much to chop off for sorting.
# Strip two blank lines and split into header string and items array
$wgHdr, $null, $wgItems = winget list | Select-Object -skip 2
# Get the position of the 'ID' column.
$idPos = [regex]::Match( $wgHdr,'\bID\b' ).Index
# Sort beginning at the position of the 'ID' column
$wgItems | Sort-Object { $_.Substring( $idPos ) }
There's a request in the offical repo for a proper PS module, which has partially been completed, except it has to be built from source:
https://github.com/microsoft/winget-cli/tree/master/src/PowerShell/Microsoft.WinGet.Client
Although someone has packaged it using Scoop (another package manager).
https://github.com/microsoft/winget-cli/issues/221#issuecomment-1403206756
For now, it can be installed using the winget-ps Scoop package:
Install Scoop: irm get.scoop.sh | iex
Install winget-ps: scoop install winget-ps
Reload PS or import module: Import-Module Microsoft.WinGet.Client
However, there appears to be a bug in version 1.4.10173 of winget-ps, although I wouldn't rule out another issue with my troubled Windows 11 (22H2) environment. It was me.
In case that my edit on post of mklement0 may be removed; I am adding my fix (at $colNames creation) on ConvertFrom-FixedColumnTable source code at here:
Update: I removed try-catch and added a if-else check for that.
Update: The source blog is fixed at last. Tou can get code for ConvertFrom-FixedColumnTable at here. You may also take a look to my winget_list_OrderBy function at here (which uses ConvertFrom-FixedColumnTable in it) to take it a step forward.

How to read / determine / check the foreground color of pipeline output?

Question
I would like to suppress Yellow host output. How, if at all, can I do the following?
some-cli.exe | Where-Object { $_.ForegroundColor -ne Yellow }
The $_.ForegroundColor -ne Yellow is not supported. What, if anything, is supported?
The some-cli.exe could be anything that produces multicolored output. For instance, it could be choco, nuget, msbuild...
I don't believe this is possible if the foreground color wasn't set in PowerShell itself (and the method of detection in this case escapes me anyway). When you assign the output of an external command which produces color output to a variable for later processing (this would extend to passing the output down the pipeline), the color information is lost. For example, if you have Chocolatey installed (this is one example program I know which produces colorized output) running:
$output = choco install -y nonexistent-package
$output
loses the color which is normally red for the error text provided by choco.exe.
Note: As Jeroen pointed out in comments, "Console applications can explicitly set the color used for console output, but unless they use ANSI terminal sequences for it (supported in the new W10 console only, so not many do) there is no way for hosts to detect it, whether PowerShell or anything else."
However, based on your comments and specific use-case you should be able to pipe the output of nuget restore to a Where-Object clause and filter out any lines starting with WARNING:
nuget restore | Where-Object {
$_ -notmatch '^WARNING'
}
I'm not sure if you tried filtering the way I showed above but based on your comment it sounds like perhaps all of the output might not be separated per line, but retrieved as a raw string for some reason. If that is the case you can make one modification for this to work, split the output on new lines:
( nuget restore ) -split "`r?`n" | Where-Object { ....
The downside to this approach is that if the warning spans multiple lines, it won't catch the subsequent line. The warning would have to be localized to one line for this to work.

Edit and save changes in file with powershell script

please tel me how to edit variable content in xml file with powershell script.
<application>
<name>My Application</name>
<platforms>
<platform>android</platform>
<icon gap:density="ld" src="/icon-1.png" />
<icon gap:density="md" src="/icon-2.png" />
</platforms>
</application>
i tried this but, it's not what i want, i want to edit based on the name of the variable: name, platform... but i dont know how in powershell
$editfiles=get-childitem . *.xml -rec
foreach ($file in $editfiles)
{
(get-content $file.pspath) |
foreach-object {$_ -replace "My Application", "My New App"} | set-content $file.pspath }
Tks
Many tks for your help
It is better to edit XML documents using an XML Api rather than text search/replace. Try this:
[xml]$xml = Get-Content foo.xml
$xml.application.name = "new name"
$xml.Save("$pwd\foo.xml")
$newitem = $xml.CreateElement("Value")
$newitem.set_InnerXML("111")
$xml.Items.ReplaceChild($newitem, $_)
something like this i think.. i didn't try it so i could be off track
Please use Keith Hill's answer, I'm only leaving mine here for reference. He's right, it's better to modify it through an XML API. I never use XML, I'm not familiar with it, so I didn't even think of it.
I gotta ask, did you try anything to do this? Did you look for an answer? This is pretty basic stuff that just a minute or two on Google probably would have gotten you an answer for.
(Get-Content "C:\Source\SomeFile.XML") -replace "My Application","Shiny New App"|Set-Content "C:\Source\SomeFile.XML"
Or if you wanted to change something less specific, such as the word "android" for the platform tag you could just include the tags to make sure it gets the right thing. (some shorthand used in this example)
(GC "C:\Source\SomeFile.XML") -replace "<platform>android</platform>","<platform>tacos</platform>"|SC "C:\Source\SomeFile.XML"
Seriously though, at least try and help yourself before coming and asking to be spoon-fed answers. I just hit up Google and searched for "powershell replace text in file" and the very first link would have given you the answer.
Edit:
Without knowing what you are looking for and going based solely off tags you will need to perform a RegEx (Regular Expression) search.
(GC "C:\Source\SomeFile.XML") -replace "(?<=`<platform`>).*?(?=`</platform`>)", 'New Platform'
That will pull the content of the file, look for any length of text that is preceded with and followed by and replace that text with 'New Platform'. Note that the Greater Than and Less Than symbols are escaped with a grave character (to the left of the 1, and above the Tab on your keyboard). Here's a breakdown of the RegEx:
(?<=<platform>) Checks that immediately preceding the string that we're looking for is the string <platform>. This will not be replaced, it just makes sure we have the right starting point.
.*? searches for any number of characters except a new line, and accepts the possibility that it may be blank. This is our match that will be replaced.
(?=</platform>) Checks that immediately following the string it just found should be the string </platform>. This will not be replaced, it just makes sure our match ends at the correct place.

Powershell -like syntax with hyphens

This is probably a newb question, but I've spent a couple hours on this now.
I am creating a powershell script, trying to determine if an entry already exists.
(This is for an Exchange multi-tenant solution)
This works:
Get-GlobalAddressList | Where{$_.Identity -Like "\MyCompany_com*"}
But this fails:
Get-GlobalAddressList | Where{$_.Identity -Like "\MyCompany_com - GAL"}
For some reason I can't fathom, the spaces in the entry won't match.
Yes, I am certain that the entry \MyCompany_com - GAL exists.
I have tried every combination I can think of using -match, -eq, -contains
Any help is appreciated!
---- Edit ----------------------
Tried a new tact, still failing miserably:
$NewVal = "\MyCompany_com - GAL"
$Prop = Get-GlobalAddressList | Select Name
foreach($PropVal in $Prop.Name){
write-output "comparing: $NewVal to $PropVal"
if($NewVal -like $PropVal){write-output "MATCH"} else {write-output "no-match"}
}
The write-output 'shows' a match character for character.
I have scripted in many languages for over 3 decades, but this PowerShell crap has me baffled. #frustrated#
---- Edit #2 (showing output) ----------------------
comparing: MyCompany_com - GAL to MyCustomer_com - GAL
no-match
comparing: MyCompany_com - GAL to MyCompany_com - GAL
no-match
comparing: MyCompany_com - GAL to Default Global Address List
no-match
Any way to force a string comparison?
Are the space characters still messing me up?
---- Edit #3 (still trying) ----------------------
I created a new GlobalAddressList: "MCC-GAL" purposely with no spaces.
This still does not work:
Get-GlobalAddressList | Where{$_.Name -Like "MCC-GAL"}
However, this DOES match:
Get-GlobalAddressList | Where{$_.Name -Like "MCC?GAL"}
So in addition to the space characters, the hyphen (-) is also causing match problems. I did try to escape the hyphen: "\-", but still no match.
Is there ANY WAY to force a simple string comparison?
The method I am using to build the compared string will be what I need to match with.
While I had a work-around for GetAddressList, the next part of my script I was forced to figure out this issue.
Determination: I shared the same concern as #user2460798, but felt safe to disregard since I had copy/pasted to the power-shell line the GlobalAddressList name with the "normal" dash. As it turns out, the sample commands we copied contained the en-dash. Ouch, lesson learned. Over 14 hours wasted on this for my co-worker and me. #crying#
I finally stumbled upon a script, How to convert a ascii string into a decimal representative string in a powershell script?
, that with some slight modification I was able to reveal that the value stored in AD was in fact an "en-dash", [char]8211 (a normal dash is [char]045).
So, here is the command that will match the entry I was trying to retrieve:
Get-GlobalAddressList | Where{$_.Name.replace([convert]::ToChar(8211),"-") -eq "MyCompany_com - GAL"}
As it turns out, I was able to match on a different object in the GlobalAddressList object:
Get-GlobalAddressList | Where{$_.ConditionalCustomAttribute1 -eq "MMC"}
With this being a multi-tenant solution, we have been adding a value to the ConditionalCustomAttribute1 object, that (in our situation) will be unique and therefore suitable for testing the existence of.
This doesn't answer the original question, but it solves my scripting task.
What do you get if you enter just type in:
"\MyCompany_com - GAL" -like "\MyCompany_com - GAL"
If that works then it strongly suggests that the $_.Identity property has some characters in it that are different than what you think. Here is one way you can see exactly what characters a string contains:
[char[]]$stringWithOddCharacters # breaks the string into characters
[byte[]][char[]]$stringWithOddCharacters # converts each character into a byte
So you could do something like
Get-GlobalAddressList | foreach-object {
if ($_.Identity -Like "\MyCompany_com*") {
[char[]]($_.Identity.ToString())
[byte[]][char[]]($_.Identity.ToString())
}
}
to see exactly what's in the Identity property. Note that if Identity contains Unicode characters that don't convert to ASCII you'll need to change [byte[]] to [int16[]]
I ran across this post while searching for the -Filter switch syntax. Your first script example was correct because the -like switch should be used with a wildcard character. Try replacing -Like with -Match (no wildcard) in your second script example.