How would I test that a PowerShell function properly streams input from the pipeline? - powershell

I know how to write a function that streams input from the pipeline. I can reasonably tell by reading the source for a function if it will perform properly. However, is there any method for actually testing for the correct behavior?
I accept any definition of "testing"... be that some manual test that I can run or something more automated.
If you need an example, let's say I have a function that splits text into words.
PS> Get-Content ./warandpeace.txt | Split-Text
How would I check that it streams input from the pipeline and begins splitting immediately?

You can write a helper function, which would give you some indication as pipeline items passed to it and processed by next command:
function Print-Pipeline {
param($Name, [ConsoleColor]$Color)
begin {
$ColorParameter = if($PSBoundParameters.ContainsKey('Color')) {
#{ ForegroundColor = $Color }
} else {
#{ }
}
}
process {
Write-Host "${Name}|Before|$_" #ColorParameter
,$_
Write-Host "${Name}|After|$_" #ColorParameter
}
}
Suppose you have some functions to test:
$Text = 'Some', 'Random', 'Text'
function CharSplit1 { $Input | % GetEnumerator }
filter CharSplit2 { $Input | % GetEnumerator }
And you can test them like that:
PS> $Text |
>>> Print-Pipeline Before` CharSplit1 |
>>> CharSplit1 |
>>> Print-Pipeline After` CharSplit1
Before CharSplit1|Before|Some
Before CharSplit1|After|Some
Before CharSplit1|Before|Random
Before CharSplit1|After|Random
Before CharSplit1|Before|Text
Before CharSplit1|After|Text
After CharSplit1|Before|S
S
After CharSplit1|After|S
After CharSplit1|Before|o
o
After CharSplit1|After|o
After CharSplit1|Before|m
m
After CharSplit1|After|m
After CharSplit1|Before|e
e
After CharSplit1|After|e
After CharSplit1|Before|R
R
After CharSplit1|After|R
After CharSplit1|Before|a
a
After CharSplit1|After|a
After CharSplit1|Before|n
n
After CharSplit1|After|n
After CharSplit1|Before|d
d
After CharSplit1|After|d
After CharSplit1|Before|o
o
After CharSplit1|After|o
After CharSplit1|Before|m
m
After CharSplit1|After|m
After CharSplit1|Before|T
T
After CharSplit1|After|T
After CharSplit1|Before|e
e
After CharSplit1|After|e
After CharSplit1|Before|x
x
After CharSplit1|After|x
After CharSplit1|Before|t
t
After CharSplit1|After|t
PS> $Text |
>>> Print-Pipeline Before` CharSplit2 |
>>> CharSplit2 |
>>> Print-Pipeline After` CharSplit2
Before CharSplit2|Before|Some
After CharSplit2|Before|S
S
After CharSplit2|After|S
After CharSplit2|Before|o
o
After CharSplit2|After|o
After CharSplit2|Before|m
m
After CharSplit2|After|m
After CharSplit2|Before|e
e
After CharSplit2|After|e
Before CharSplit2|After|Some
Before CharSplit2|Before|Random
After CharSplit2|Before|R
R
After CharSplit2|After|R
After CharSplit2|Before|a
a
After CharSplit2|After|a
After CharSplit2|Before|n
n
After CharSplit2|After|n
After CharSplit2|Before|d
d
After CharSplit2|After|d
After CharSplit2|Before|o
o
After CharSplit2|After|o
After CharSplit2|Before|m
m
After CharSplit2|After|m
Before CharSplit2|After|Random
Before CharSplit2|Before|Text
After CharSplit2|Before|T
T
After CharSplit2|After|T
After CharSplit2|Before|e
e
After CharSplit2|After|e
After CharSplit2|Before|x
x
After CharSplit2|After|x
After CharSplit2|Before|t
t
After CharSplit2|After|t
Before CharSplit2|After|Text

Add some Write-Verbose statements to your Split-Text function, and then call it with the -Verbose parameter. You should see output in real-time.

Ah, I've got a very simple solution. The concept is to insert your own step into the pipeline with obvious side-effects before the function that you're testing. For example...
PS> 1..10 | %{ Write-Host $_; $_ } | function-under-test
If your function-under-test is "bad", you will see all of the output from 1..10 twice, like this
1
2
3
1
2
3
If the function-under-test is processing items lazily from the pipeline, you'll see the output interleaved.
1
1
2
2
3
3

Related

Dataflow job doesn't emit messages after GroupByKey()

I have a streaming dataflow pipeline that writes to BQ, and I want to window all the failed rows and do some further analysis. The pipeline looks like this, I'm getting all the error messages in the 2nd step but all the messages are getting stuck to the beam.GroupByKey(). Nothing moves downstream after that. Does anyone have any idea how to fix this?
data = (
| "Read PubSub Messages" >> beam.io.ReadFromPubSub(subscription=options.input_subscription,
with_attributes=True)
...
| "write to BQ" >> beam.io.WriteToBigQuery(
table=f"{options.bq_dataset}.{options.bq_table}",
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
method='STREAMING_INSERTS',
insert_retry_strategy=beam.io.gcp.bigquery_tools.RetryStrategy.RETRY_NEVER
)
)
(
data[beam.io.gcp.bigquery.BigQueryWriteFn.FAILED_ROWS]
| f"Window into: {options.window_size}m" >> GroupWindowsIntoBatches(options.window_size)
| f"Failed Rows for " >> beam.ParDo(BadRows(options.bq_dataset, 'table'))
)
and
class GroupWindowsIntoBatches(beam.PTransform):
"""A composite transform that groups Pub/Sub messages based on publish
time and outputs a list of dictionaries, where each contains one message
and its publish timestamp.
"""
def __init__(self, window_size):
# Convert minutes into seconds.
self.window_size = int(window_size * 60)
def expand(self, pcoll):
return (
pcoll
# Assigns window info to each Pub/Sub message based on its publish timestamp.
| "Window into Fixed Intervals" >> beam.WindowInto(window.FixedWindows(10))
# If the windowed elements do not fit into memory please consider using `beam.util.BatchElements`.
| "Add Dummy Key" >> beam.Map(lambda elem: (None, elem))
| "Groupby" >> beam.GroupByKey()
| "Abandon Dummy Key" >> beam.MapTuple(lambda _, val: val)
)
also, I don't know if it's relevant but the beam.DoFn.TimestampParam inside my GroupWindowsIntoBatches has invalid timestamp (negative)
Ok, so the issue was that the messages coming from BigQuery FAILED_ROWS were not timestamped. adding | 'Add Timestamps' >> beam.Map(lambda x: beam.window.TimestampedValue(x, time.time())) seems to fix the group by.
class GroupWindowsIntoBatches(beam.PTransform):
"""A composite transform that groups Pub/Sub messages based on publish
time and outputs a list of dictionaries, where each contains one message
and its publish timestamp.
"""
def __init__(self, window_size):
# Convert minutes into seconds.
self.window_size = int(window_size * 60)
def expand(self, pcoll):
return (
pcoll
| 'Add Timestamps' >> beam.Map(lambda x: beam.window.TimestampedValue(x, time.time())) <----- Added This line
| "Window into Fixed Intervals" >> beam.WindowInto(window.FixedWindows(30))
| "Add Dummy Key" >> beam.Map(lambda elem: (None, elem))
| "Groupby" >> beam.GroupByKey()
| "Abandon Dummy Key" >> beam.MapTuple(lambda _, val: val)
)

Edit column in Tab-delimited Text file using Powershell

I have a very large (~250k row and 171 Column) Tab delimited text file that I need to edit. I need to add the letter "H" to the third column on every row.
So I need it to go from 03/20/2020 09:00 03/20/2020 10:00 1269805 ......
to 03/20/2020 09:00 03/20/2020 10:00 H1269805 .....
I actually have this working with the following code:
$source = Get-ChildItem "C:\test\input\*.txt"
$target = "C:\test\test.txt"
$data = Get-Content -Path $source | ConvertFrom-Csv -Delimiter "`t" -Header Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120,
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171
$data | % {
If ($_.Column3) {
#import ID
$_.Column3 = "H$($_.Column3)"
} }
$data | Select Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120,
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171 | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | % { $_ -replace '"', "" } | Select-Object -Skip 1 | Set-Content -Path $target
The problem I have is it takes a long time. I understand it is a large file, but is there any other way to do this faster? I feel like the converting to and from CSV is what is taking the longest, but I may be wrong. The whole process takes roughly 25 minutes to complete. Any help would be great.
To speed up processing, avoid the pipeline, use .NET types for file I/O and use plain-text operations:
# Create the output file.
$outFile = [IO.File]::CreateText($target)
# Loop over all input files
foreach ($file in Get-ChildItem C:\test\input\*.txt) {
# Loop over a given file's lines.
foreach ($line in [IO.File]::ReadLines($file.FullName)) {
# Prepend 'H' to the 3rd column and append to the output file.
$outFile.WriteLine(($line -replace '^.*?\t.*?\t', '$&H'))
}
}
$outFile.Close()
Note:
Be sure to always pass full file paths to .NET methods, because .NET's working directory usually differs from PowerShell's.
.NET file I/O methods default to BOM-less UTF-8 encoding.
The H is inserted in front of the 3rd tab-separated column using PowerShell's regex-based -replace operator.

Powershell Subinacl.exe Double-Spaced Output, inability to capture summary information (statistics)

I try to run the following script on a remote machine, from the first line I get the normal output, "Command was successful" or something like that. For the second one it seems that its working, but the output its spaced and its not full, there are like 4 lines of output missing.
# This works as expected.
$output = Invoke-Command -ComputerName ServerName -ScriptBlock {auditpol /set /subcategory:"Registry" /success:enable /failure:enable}
# This creates double-spaced output and is missing the last 3 output lines.
$output = Invoke-Command -ComputerName ServerName -ScriptBlock {Subinacl.exe /verbose=1 /keyreg "HKEY_LOCAL_MACHINE\SYSTEM\Path" /sallowdeny="everyone"=SCD}
I want this output for the second code line:
SYSTEM\Path : delete Audit ACE 0 \everyone
SYSTEM\Path : new ace for \everyone
HKEY_LOCAL_MACHINE\SYSTEM\Path : 2 change(s)
Elapsed Time: 00 00:00:00
Done: 1, Modified 1, Failed 0, Syntax errors 0
Last Done : HKEY_LOCAL_MACHINE\SYSTEM\Path
But instead I get:
S Y S T E M \ P a t h : d e l e t e A u d i t A C E 0 \ e v e r y o n e
S Y S T E M \ P a t h : n e w a c e f o r \ e v e r y o n e
H K E Y _ L O C A L _ M A C H I N E \ S Y S T E M \ P a t h : 2 c h a n g e ( s )
Without the last 3 lines, which I want to see. I tried change the Output Encoding to Unicode or UTF8 but are not working. Any other solutions?
There are two unrelated problems:
(a) subinacl.exe produces UTF-16LE-encoded output.
(b) Its on-by-default /statistic option seems to write directly to the console, bypassing stdout, and therefore cannot be captured - or at least not easily; do tell us if you know how.
Therefore, the last block of lines containing statistics (summary information), which starts with Elapsed: ..., always prints to the console.
Related question subinacl get full output was prompted by the same problem.
(a), as stated, can be remedied by telling PowerShell what character encoding to expect when capturing output from external programs, via [Console]::OutputEncoding
(b) cannot be remedied if you do want to capture the statistics lines too; the next best thing is to suppress statistics output altogether with /nostatistic, which at least doesn't produce unwanted console output (but, obviously, you won't have the information at all).
Putting it all together:
$output = Invoke-Command -ComputerName ServerName -ScriptBlock {
# Tell PowerShell what character encoding to expect in subinacl's output.
[Console]::OutputEncoding = [Text.Encoding]::Unicode # UTF-16LE
# Note the addition of /nostatistic to suppress the direct-to-console summary info.
Subinacl.exe /nostatistic /verbose=1 /keyreg "HKEY_LOCAL_MACHINE\SYSTEM\Path" /sallowdeny="everyone"=SCD
}
Note: Normally, you'd restore the previous value of [Console]::OutputEncoding afterward, but since the session on the remote computer in which the script block runs ends right after, it isn't necessary here.
These tools don't often return proper object, hence your string output on the later.
You can work to handle that output differently than it's default and / or parse the string return to get the format you are after. Using the string cmdlets...
Get-Command -Name '*string*' | Format-Table -AutoSize
CommandType Name Version Source
----------- ---- ------- ------
Function ConvertFrom-SddlString 3.1.0.0 Microsoft.PowerShell.Utility
...
Function Format-String 1.3.6 PowerShellCookbook
...
Cmdlet ConvertFrom-String 3.1.0.0 Microsoft.PowerShell.Utility
Cmdlet ConvertFrom-StringData 3.1.0.0 Microsoft.PowerShell.Utility
Cmdlet Convert-String 3.1.0.0 Microsoft.PowerShell.Utility
...
Cmdlet Out-String 3.1.0.0 Microsoft.PowerShell.Utility
...
Since Subinacl is used to display or modify Access ControlEntries (ACEs) for file and folder Permissions, Ownership and Domain, whcih is the same thing that the native cmdlets...
Get-Command -Name '*acl*' | Format-Table -AutoSize
CommandType Name Version Source
----------- ---- ------- ------
...
Cmdlet Get-Acl 3.0.0.0 Microsoft.PowerShell.Security
...
Cmdlet Set-Acl 3.0.0.0 Microsoft.PowerShell.Security
...
Application cacls.exe 10.0.17134.1 C:\WINDOWS\system32\cacls.exe
Application icacls.exe 10.0.17134.1 C:\WINDOWS\system32\icacls.exe
...
... provide. Why not just use them instead as they return proper objects vs Subinacl?
As for encoding.
Are are you saying, you tried this answer, from this discussion and it did nto work for you?
Double spacing of output from SubInACL called from PowerShell
#set output encoding to unicode
[Console]::OutputEncoding = [Text.Encoding]::Unicode
$func_filePath = "G:\test\func.txt"
#use subinacl
[string]$SubInACLCommand = #"
subinacl.exe /file "$func_filePath" /setowner="hostname\Administrators"
"#
Invoke-Expression $SubInACLCommand
#revert output encoding back to default
[Console]::OutputEncoding = [Text.Encoding]::Default
Update for OP
Using RegEx to clean this up on your side. Remove double spaces and empty lines from a string.
('S Y S T E M \ P a t h : d e l e t e A u d i t A C E 0 \ e v e r y o n e
S Y S T E M \ P a t h : n e w a c e f o r \ e v e r y o n e
H K E Y _ L O C A L _ M A C H I N E \ S Y S T E M \ P a t h : 2 c h a n g e ( s )').replace(' ','|').Replace(' ','').Replace('|',' ') -creplace('(?m)^\s*\r?\n','')
# Results
SYSTEM\Path : delete Audit ACE 0 \everyone
SYSTEM\Path : new ace for \everyone
HKEY_LOCAL_MACHINE\SYSTEM\Path : 2 change(s)
Update for OP
Try this on your machine and see if the full results are actually coming back as you'd expect.
$SubinaclResults = Invoke-Command -ComputerName ServerName -ScriptBlock {Subinacl.exe /verbose=1 /keyreg "HKEY_LOCAL_MACHINE\SYSTEM\Path" /sallowdeny="everyone"=SCD}
$SubinaclResults
If the above does to bring back the full result set. My final suggestion would be to output this as a temp file on the remote machine and read it back to your workstation with Get-Content.

Collecting output from Apache Beam pipeline and displaying it to console

I have been working on Apache Beam for a couple of days. I wanted to quickly iterate on the application I am working and make sure the pipeline I am building is error free. In spark we can use sc.parallelise and when we apply some action we get the value that we can inspect.
Similarly when I was reading about Apache Beam, I found that we can create a PCollection and work with it using following syntax
with beam.Pipeline() as pipeline:
lines = pipeline | beam.Create(["this is test", "this is another test"])
word_count = (lines
| "Word" >> beam.ParDo(lambda line: line.split(" "))
| "Pair of One" >> beam.Map(lambda w: (w, 1))
| "Group" >> beam.GroupByKey()
| "Count" >> beam.Map(lambda (w, o): (w, sum(o))))
result = pipeline.run()
I actually wanted to print the result to console. But I couldn't find any documentation around it.
Is there a way to print the result to console instead of saving it to a file each time?
You don't need the temp list. In python 2.7 the following should be sufficient:
def print_row(row):
print row
(pipeline
| ...
| "print" >> beam.Map(print_row)
)
result = pipeline.run()
result.wait_until_finish()
In python 3.x, print is a function so the following is sufficient:
(pipeline
| ...
| "print" >> beam.Map(print)
)
result = pipeline.run()
result.wait_until_finish()
After exploring furthermore and understanding how I can write testcases for my application I figure out the way to print the result to console. Please not that I am right now running everything to a single node machine and trying to understand functionality provided by apache beam and how can I adopt it without compromising industry best practices.
So, here is my solution. At the very last stage of our pipeline we can introduce a map function that will print result to the console or accumulate the result in a variable later we can print the variable to see the value
import apache_beam as beam
# lets have a sample string
data = ["this is sample data", "this is yet another sample data"]
# create a pipeline
pipeline = beam.Pipeline()
counts = (pipeline | "create" >> beam.Create(data)
| "split" >> beam.ParDo(lambda row: row.split(" "))
| "pair" >> beam.Map(lambda w: (w, 1))
| "group" >> beam.CombinePerKey(sum))
# lets collect our result with a map transformation into output array
output = []
def collect(row):
output.append(row)
return True
counts | "print" >> beam.Map(collect)
# Run the pipeline
result = pipeline.run()
# lets wait until result a available
result.wait_until_finish()
# print the output
print output
Maybe logging info instead of print?
def _logging(elem):
logging.info(elem)
return elem
P | "logging info" >> beam.Map(_logging)
Follow an example from pycharm Edu
import apache_beam as beam
class LogElements(beam.PTransform):
class _LoggingFn(beam.DoFn):
def __init__(self, prefix=''):
super(LogElements._LoggingFn, self).__init__()
self.prefix = prefix
def process(self, element, **kwargs):
print self.prefix + str(element)
yield element
def __init__(self, label=None, prefix=''):
super(LogElements, self).__init__(label)
self.prefix = prefix
def expand(self, input):
input | beam.ParDo(self._LoggingFn(self.prefix))
class MultiplyByTenDoFn(beam.DoFn):
def process(self, element):
yield element * 10
p = beam.Pipeline()
(p | beam.Create([1, 2, 3, 4, 5])
| beam.ParDo(MultiplyByTenDoFn())
| LogElements())
p.run()
Output
10
20
30
40
50
Out[10]: <apache_beam.runners.portability.fn_api_runner.RunnerResult at 0x7ff41418a210>
I know it isn't what you asked for but why don't you store it to a text file? It's always better than printing it via stdout and it isn't volatile

Re-Use of a line fails as "`r" doesn't move the 'line-pointer' ("`b" fails as well)

As C# does not has the options of powershell's write-progress I start to try to 're-use' the same line:
function ReUseFails {
$x=0
$nDec = 10
while ($true) {
$x++
$a = $x.ToString().PadLeft($nDec)
$z = $x.ToString().PadLeft($nDec)
Write-Host "`r$a $z" -noNewLine
Start-Sleep -s 1
}
}
ReUseFails
I expected because of `r (carriage return) to see in the same line (the next line overrides the prev.):
1 1 # and after 1 Second in that line:
2 2 # after the 2nd second
3 3 # (and so on)
but what I get is
1 1 2 2 3 3 4 4 ...
So the carriage return r has no effect?<br>
Is there a way to 'enable' thisr??
Even when I try to use `b I see additional spots but not a back-space.
Can it be that the DOS-console can do that (to show a spinning wheel) but bot the PS-console?
Regards