Getting null values in powershell - powershell

Using PowerShell, I have code that will count the number of times a value appears anywhere in a .csv file. If I put "\bhello\b", it will count the times "hello" appears anywhere in the .csv. The problem is that it doesn't work for counting the times null appears in the CSV. It gives me a number bigger than the number of values in the entire CSV file.
(select-string -Path 'D:\AaronR\Desktop\Book.csv' -Pattern "\b$null\b" -AllMatches | Select-Object -ExpandProperty Matches).Count

There are 3 problems with your regular expression:
You defined your pattern as a double quoted string \b$null\b, so PowerShell automatically expands the variable $null to a null value and casts that to an empty string (to fit it into the string). Because of this you're effectively matching a pattern \b\b.
The character $ has a special meaning in regular expressions (the end of a string), so it must be escaped (\$) in order to match a literal $ character.
The character $ isn't a word-character, so even when escaping the $ the pattern \b\$word\b will only match if you have a word-character right before the $ (e.g. something$null).
If you want to match literal strings $null in your CSV you need to escape the $ and put the first word boundary marker between $ and n. Also, I'd recommend to use single quotes for regular expression strings, unless you want variables expanded in them.
Select-String -Pattern '\$\bnull\b' ...

Related

Editing Powershell Object

I'm using powershell to run a command like so:
$getlist=rclone sha1sum remote:"\My Pictures\2009\03" --dry-run
Write-Output $getlist
that outputs a object with the results. Problem being I only want the first column of those results. I've tried things like custom-format --Depth 1 and the other *-format commands but they don't work on this object??
that outputs a object with the results
While that is technically true, it is more specifically an [object[]]-typed array of lines ([string] instances) that assigning the stream of output lines - produced by the external rclone program - to a PowerShell variable implicitly created. (Arrays created by PowerShell are [object[]]-typed, even if all the elements are of the same type, such as [string] in this case).
PowerShell fundamentally only "speaks text" when communicating with external programs.
Therefore, to extract substrings from these lines you must perform text parsing, as implied by AdminOfThings' comment on the question.
A simplified approach is to use the unary form of the -split operator:
# Simulate lines input whose first whitespace-separated token is to
# be extracted.
$getlist = 'foo bar baz', 'more stuff here'
$getlist.ForEach({ (-split $_)[0] })
The above yields:
foo
more
zett42's helpful answer shows a simpler alternative that relies on the -replace operator's (among others) ability to operate directly on each element of an array-valued LHS.
However, the -split approach is useful if you want to extract multiple column values.
If you don't need / want to capture all of the external program's (rclone's) output in memory first, you can use streaming processing in the pipeline, via the ForEach-Object cmdlet:
'foo bar baz', 'more stuff here' | ForEach-Object { (-split $_)[0] }
Note: While slightly slower than collecting all lines in memory up front, the advantage of a pipeline-based approach is reduced memory load: only the extracted substrings are kept in memory (if assigned to a variable).
You can use a regular expression to remove the undesired parts of the output:
$getlist = $getlist -replace '\s.*'
When a PowerShell operator such as -replace is applied to a collection, it will be applied to each element individually, creating a new array that stores the results (see Substitution in a collection).
The regular expression removes everything from the first whitespace up to the end of the string.
RegEx breakdown:
\s - a single whitespace character like space and tab
.* - any character, zero or more times

Rename files with Powershell if file has certain structure

I am trying to rename files in multiple folder with same name structure. I got the following files:
(1).txt
(2).txt
(3).txt
I want to add the following text in front of it: "Subject is missing"
I only want to rename these files all other should remain the same
Tip of the hat to LotPings for suggesting the use of a look-ahead assertion in the regex.
Get-ChildItem -File | Rename-Item -NewName {
$_.Name -replace '^(?=\(\d+\)\.)', 'Subject is missing '
} -WhatIf
-WhatIf previews the renaming operation; remove it to perform actual renaming.
Get-ChildItem -File enumerates files only, but without a name filter - while you could try to apply a wildcard-based filter up front - e.g., -Filter '([0-9]).*' - you couldn't ensure that multi-digit names (e.g., (13).txt) are properly matched.
You can, however, pre-filter the results, with -Filter '(*).*'
The Rename-Item call uses a delay-bind script block to derive the new name.
It takes advantage of the fact that (a) -rename returns the input string unmodified if the regex doesn't match, (b) Rename-Item does nothing if the new filename is the same as the old.
In the regex passed to -replace, the positive look-ahead assertion (?=...) (which is matched at the start of the input string (^)) looks for a match for subexpression \(\d+\)\. without considering what it matches a part of what should be replaced. In effect, only the start position (^) of an input string is matched and "replaced".
Subexpression \(\d+\)\. matches a literal ( (escaped as \(), followed by 1 or more (+) digits (\d), followed by a literal ) and a literal . (\.), which marks the start of the filename extension. (Replace .\ with $, the end-of-input assertion if you want to match filenames that have no extension).
Therefore, replacement operand 'Subject is missing ' is effectively prepended to the input string so that, e.g., (1).txt returns Subject is missing (1).txt.

What constitutes a "line" for Select-String method in Powershell?

I would expect that Select-String consider \r\n (carriage-return + newline) the end of a line in Powershell.
However, as can be seen below, abc matches the whole the whole input:
PS C:\Tools\hashcat> "abc`r`ndef" | Select-String -Pattern "abc"
abc
def
If I break the string up into two parts, then Select-String behaves as I would expect:
PS C:\Tools\hashcat> "abc", "def" | Select-String -Pattern "abc"
abc
How can I give Select-String a string whose lines are terminated by \r\n, and then make this cmdlet only returns those strings that contain a match?
Select-String operates on each (stringified on demand[1]) input object.
A multi-line string such as "abc`r`ndef" is a single input object.
By contrast, "abc", "def" is a string array with two elements, passed as two input objects.
To ensure that the lines of a multi-line string are passed individually, split the string into an array of lines using PowerShell's -split operator: "abc`r`ndef" -split "`r?`n"
(The ? makes the `r optional so as to also correctly deal with `n-only (LF-only, Unix-style) line endings.)
In short:
"abc`r`ndef" -split "`r?`n" | Select-String -Pattern "abc"
The equivalent, using a PowerShell string literal with regular-expression (regex) escape sequences (the RHS of -split is a regex):
"abc`r`ndef" -split '\r?\n' | Select-String -Pattern "abc"
It is somewhat unfortunate that the Select-String documentation talks about operating on lines of text, given that the real units of operations are input objects - which may themselves comprise multiple lines, as we've seen.
Presumably, this comes from the typical use case of providing input objects via the Get-Content cmdlet, which outputs a text file's lines one by one.
Note that Select-String doesn't return the matching strings directly, but wraps them in [Microsoft.PowerShell.Commands.MatchInfo] objects containing helpful metadata about the match.
Even there the line metaphor is present, however, as it is the .Line property that contains the matching string.
[1] Optional reading: How Select-String stringifies input objects
If an input object isn't a string already, it is converted to one, though possibly not in the way you might expect:
Loosely speaking, the .ToString() method is called on each non-string input object[2]
, which for non-strings is not the same as the representation you get with PowerShell's default output formatting (the latter is what you see when you print an object to the console or use Out-File, for instance); by contrast, it is the same representation you get with string interpolation in a double-quoted string (when you embed a variable reference or command in "...", e.g., "$HOME" or "$(Get-Date)").
Often, .ToString() just yields the name of the object's type, without containing any instance-specific information; e.g., $PSVersionTable stringifies to System.Management.Automation.PSVersionHashTable.
# Matches NOTHING, because Select-String sees
# 'System.Management.Automation.PSVersionHashTable' as its input.
$PSVersionTable | Select-String PSVersion
In case you do want to search the default output format line by line, use the following idiom:
... | Out-String -Stream | Select-String ...
However, note that for non-string input it is more robust and preferable for subsequent processing to filter the input by querying properties with a Where-Object condition.
That said, there is a strong case to be made for Select-String needing to implicitly apply Out-String -Stream stringification, as discussed in this GitHub feature request.
[2] More accurately, .psobject.ToString() is called, either as-is, or - if the object's ToString method supports an IFormatProvider-typed argument - as .psobject.ToString([cultureinfo]::InvariantCulture) so as to obtain a culture-invariant representation - see this answer for more information.
"abc`r`ndef"
is one string which if you echo (Write-Output) out in console would result in:
PS C:\Users\gpunktschmitz> echo "abc`r`ndef"
abc
def
The Select-String will echo out every string where "abc" is part of it. As "abc" is part the string this very string will be selected.
"abc", "def"
is a list of two strings. Using the Select-String here will first test "abc" and then "def" if the pattern matches "abc". As only the first one matches only it will be selected.
Use the following to split the string into a list and select only the elements containing "abc"
"abc`r`ndef".Split("`r`n") | Select-String -Pattern "abc"
Basically Mr. Guenther Schmitz explained the correct usage of Select-String, but I want to just add some points to support his answer.
I did some reverse engineering work against this Select-String cmdlet. It's in the Microsoft.PowerShell.Utility.dll. Some relevant code snippets are as follows, notice these are codes from reverse engineering for reference, not the actual source code.
string text = inputObject.BaseObject as string;
...
matchInfo = (inputObject.BaseObject as MatchInfo);
object operand = ((object)matchInfo) ?? ((object)inputObject);
flag2 = doMatch(operand, out matchInfo2, out text);
We can find out that it just treat the inputObject as a whole string, it doesn't do any split.
I don't find the actual source code of this cmdlet on github, probably this utility part is not open source yet. But I find the unit test of this Select-String.
$testinputone = "hello","Hello","goodbye"
$testinputtwo = "hello","Hello"
The test strings they are using for unit test are actually lists of strings. It means that they were not even thinking about your use case and very possibly it's just designed to accept input of string collection.
However if we look at the official document of Microsoft regarding Select-String we do see it talks about line a lot while it can't recognize a line in a string. My personal guess is the concept of line is only meaningful while the cmdlet accept a file as an input, in the case the file is like a list of string, each item in the list represents a single line.
Hope it can make things more clear.

Add quotes to each column in a CSV via Powershell

I am trying to create a Powershell script which wraps quotes around each columns of the file on export to CSV. However the Export-CSV applet only places these where they are needed, i.e. where the text has a space or similar within it.
I have tried to use the following to wrap the quotes on each line but it ends up wrapping three quotes on each column.
$r.SURNAME = '"'+$r.SURNAME+'"';
Is anyone able to share how to forces these on each column of the file - so far I can just find info on stripping these out.
Thanks
Perhaps a better approach would be to simply convert to CSV (not export) and then a simple regex expression could add the quotes then pipe it out to file.
Assuming you are exporting the whole object $r:
$r | ConvertTo-Csv -NoTypeInformation `
| % { $_ -replace ',(.*?),',',"$1",' } `
| Select -Skip 1 | Set-Content C:\temp\file.csv
The Select -Skip 1 removes the header. If you want the header just take it out.
To clarify what the regex expression is doing:
Match: ,(.*?),
Explanation: This will match section of each line that has a comma followed by any number of characters (.*) without being greedy (? : basically means it will only match the minimum number of characters that is needed to complete the match) and the finally is ended with a comma. The parenthesis will hold everything between the two commas in a match variable to be used later in the replace.
Replace: ,"$1",
Explanation: The $1 holds the match between the two parenthesis mention above in the match. I am surrounding it with quotes and re-adding the commas since I matched on those as well they must be replaced or they are simply consumed. Please note, that while the match portion of the -replace can have double quotes without an issue, the replace section must be surrounded in single quotes or the $1 gets interpreted by PowerShell as a PowerShell variable and not a match variable.
You can also use the following code:
$r.SURNAME = "`"$($r.SURNAME)`""
I have cheated to get what I want by re-parsing the file through the following - guess that it acts as a simple find and replace on the file.
get-content C:\Data\Downloads\file2.csv
| foreach-object { $_ -replace '"""' ,'"'}
| set-content C:\Data\Downloads\file3.csv
Thanks for the help on this.

How to remove double quotes on specific column from CSV file using Powershell script

"ID","Full Name","Age"
"1","Jone Micale","25"
Here a sample from a CSV file that I created, and now I want to remove double quotes from only the ID and Age column value.
I tried different ways but I don't want to create a new file out of it. I just want to update the file with changes using PowerShell v1.
Export-Csv will always put all fields in double quotes, so you have to remove the undesired quotes the hard way. Something like this might work:
$csv = 'C:\path\to\your.csv'
(Get-Content $csv) -replace '^"(.*?)",(.*?),"(.*?)"$', '$1,$2,$3' |
Set-Content $csv
Regular expression breakdown:
^ and $ match the beginning and end of a string respectively (Get-Content returns an array with the lines from the file).
"(.*?)" matches text between two double quotes and captures the match (without the double quotes) in a group.
,(.*?), matches text between two commas and captures the match (including double quotes) in a group.
$1,$2,$3 replaces a matching string with the comma-separated first, second and third group from the match.