Apache beam pipeline freezing on Windows - apache-beam

I've tried running a pipeline on Google Cloud Dataflow (therefore with DataflowRunner), as well as with the DirectRunner on a Unix machine, and it seems to have a 100% sucess rate.
However, when running the same pipeline on Windows, with DirectRunner, it gets completely stuck ocasionally. If I press Ctrl + C on the Windows CMD, the execution continues perfectly fine.
The freezes can seemingly occur on any step of the pipeline, but they happen much more frequently during a ParDo process that performs an upload to an API, similar to this example. When the freezing happens in this step, pressing Ctrl + C prints the upload responses, meaning they had already been performed, and were stuck for no apparent reason. The problem also happens when uploading data to a different API. Most of the uploads are succesful.
I've tried setting network timeouts and limiting the execution to a single worker, with no success.
For reference, the pipeline is:
data = (
pipeline
| 'Read CSV File' >>
fileio.MatchFiles(dataflow_options.input_file)
| fileio.ReadMatches()
| beam.Reshuffle()
| beam.FlatMap(
lambda rf: csv.DictReader(io.TextIOWrapper(rf.open(), encoding='utf-8')))
)
batches = (
data
| 'Batch Data' >>
beam.util.BatchElements()
)
transformed = (
data
| 'Transform Data' >>
beam.Map(transformFn)
)
uploaded = (
transformed
| 'Upload Data' >>
beam.ParDo(UploadDoFn())
)
What could be the cause of the freezing? Could it be a library incompatibility on Windows? The logging library on debug mode wasn't particularly helpful, so I'm unsure on how to proceed.
Any help would be appreciated.

I found a solution. Turns out the Windows CMD was actually at fault, not Beam: Why is my command prompt freezing on Windows 10?

Related

Trying to get a Powershell Script that will run in a 2nd window and monitor in real time other running scripts / report all Errors / ExitCodes

I am fairly new to writing code in Powershell. For my job I have to write multiple Powershell scripts to make changes in the Hardware and Software settings as well as the Registry and Group Policy Editor to get these applications to run. These applications are a little older. Upgrading these software applications or the hardware then run on is NOT an option. as an example, when Microsoft releases the new patches on like Patch Tuesday...when those patches are applied there is a high probability that something will be changed which is where I come in to write a script to fix the issue. I have multiple scripts that I run. When those scripts are ran they may end up terminating because of an Error Code or an Exit Code. A large part of the time I do not that the script has failed immediately.
I am trying to figure out a script that I can run in a 2nd PowerShell Console Window. I am thinking that the only purpose of this script is to just sit there on the screen and wait and monitor. Then when I execute a script or Application (the only file extensions that I am worried about are: EXE, BAT, CMD, PS1) if the script/application that I just ran ends with an exit code or an error code....then output that to the screen...in REAL TIME.
Below, I have a small piece of code that kind of works, but it is not what I am wanting.
I have researched online and read and read tons of stuff. But I just can't seem to find what I am looking for.
Could someone please help me with getting a script that will do what I am wanting.
Thank you for your help!!!!
$ExitErrorCode =
"C:\ThisFolder\ThatFolder\AnotherFolder\SomeApplication.EXE # (this
would
# either be an EXE or CMD or BAT or PS1)"
$proc = Start-Process $ExitErrorCode -PassThru
$handle = $proc.Handle # cache proc.Handle
$proc.WaitForExit();
if ($proc.ExitCode -ne 0) {
Write-Warning "$_ exited with status code $($proc.ExitCode)"
}
Possible duplicate of the approaches shown here:
Monitoring jobs in a PowerShell session from another PowerShell session
Monitoring jobs in a PowerShell session from another PowerShell session
PowerShell script to monitor a log and output progress to another
PowerShell script to monitor a log and output progress to another

Shell scripting in controlled web environment

I'm working on some web-based tools for our IT department, and need to launch a PowerShell script to do remote maintenance queries prior to loaning out a laptop.
I have an existing web-based registry of the laptop inventory, when one is loaned out we get the borrower to log in before they leave to ensure their profile is on the machine. At that point, I would like to query the laptop using a PowerShell script collecting information from it which would feed to the database and show on the display page.
The page itself is restricted to IT personnel, and our environments are very homogeneous -- everyone is running Windows (7 or 10) and IE (10 or 11).
I can launch a PowerShell script server-side easily enough from the VB code.
[ CommandToRun is a textbox, PStext an output span ]
Dim app = New Process()
Dim psi As ProcessStartInfo = app.StartInfo
psi.FileName = "powershell.exe"
psi.Arguments = CommandToRun.Text
psi.RedirectStandardOutput = True
psi.RedirectStandardError = True
psi.UseShellExecute = False
PStext.InnerHtml &= "<br/>Launching app: " & psi.FileName & " " & psi.Arguments
app.Start()
app.WaitForExit(5000)
PStext.InnerHtml &= "<br/>OUTPUT<br/><pre>" & app.StandardOutput.ReadToEnd() & "</pre>"
PStext.InnerHtml &= "<br/>ERROR<br/><pre>" & app.StandardError.ReadToEnd() & "</pre>"
But it runs as the local system which does not have the authority to run remote PS commands. Lacking Administrator status, the server cannot execute anything on remote computers, therefore cannot query the laptop being loaned out.
I found a Javascript client-side solution that looks easy enough:
function tryLaunch(commandtoRun, commandParms) {
// Instantiate the Shell object and invoke its execute method.
var oShell = new ActiveXObject("Shell.Application");
// Invoke the execute method.
oShell.ShellExecute(commandtoRun, commandParms, "", "open", "1");
}
But it gets a Permission denied error launching the shell.
Other sources have had the same JavaScript with "WScript.Shell" as the ActiveXObject parameter -- that fails on the construction, whereas the "Shell.Application" version fails on the ShellExecute command.
I know what I'm doing is intentionally disallowed, as its usually a pretty dangerous idea to allow web servers to launch code on clients, and portability issues kill even the most well-meaning ideas -- but like I said, this is a very controlled and homogeneous environment.
So ... my questions:
Am I missing an obviously better way to do this?
Should I focus on trying to make PS scripts run on the client? ( If so, any clues?? )
Should I focus on running the script server-side with different
access rights? ( Can I shell as the web site user? I have their
"user" object from AD )
Am I ignoring some capability of IIS to install a PS script as an
application using an Administrator account? (Seems like this should exist, which means I'm probably looking right at it and not seeing it. )

Powershell hangs on Launch

When I run powershell ISE, I can execute commands/scripts without issue. When I launch either the 32 or 64 bit command line, the window takes about 30 seconds to load and then it is frozen. It will not accept inputs of any kind (keyboard or copy/paste). I have tried doing a system check, no errors. I even tried updating to powershell 4.0, the install was successful but the command line still locks on launch. Can anyone advise how to fix this? I am using windows server 2008 R2.
Update
It appears that the powershell is in fact accepting input, except at a glacial speed. I left the window open while I was writing this post initially, and then grabbed a coffee. Upon my return I found that what I have tried to copy/paste and type was now in the powershell command line. I have now attempted to execute $PSVersionTable.PSVersion, and going on 3 minutes now I still have no response. My guess is it will come back at some point but this is obviously not acceptable. Any ideas on how to debug/fix this?
Update2
As far as I can tell all the locations listed in $PROFILE | Select * don't exist. I also tried launching: powershell.exe -noprofile, but this did not help
After reading this post I decided to try that tool and see if I had a similar problem and discovered that there were literally hundreds of writes per second happening when powershell command line was running and they were all to the FusionLog. Disabling the Fusion logging fixed the issue completely (this was enabled a while ago to debug a different issue with an app and I must have forgotten to disable it). Everything else on the machine seemed to hum along just fine with FusionLogging in the background but powershell was horribly crippled. Hope this helps someone some day.

Unable to print PDFs or office documents via Scheduled task

I have a scheduled task set to run on a machine overnight. This task will iterate through a folder printing all the files found therein. I can run the process without issue while logged in however it does not print when run via the scheduled task.
More Info:
The scheduled task executes a powershell script that performs multiple functions such as generating and emailing reports, copying files across network folders and finally printing the contents of a folder. All of these tasks are performed without error if the executing account is currently logged in. If the account is not logged in and run via a scheduled task everything except the printing of Office and PDF documents works correctly (Text documents print fine).
Here is the function I am using to print the documents.
Function Print-File($file)
{
begin
{
function internal-printfile($thefile)
{
if ($thefile -is [string])
{
$filename = $thefile
}
else
{
if ($thefile.FullName -is [string] )
{
$filename = $thefile.FullName
}
}
$start = new-object System.Diagnostics.ProcessStartInfo $filename
$start.Verb = "print"
[System.Diagnostics.Process]::Start($start)
}
if ($file -ne $null)
{
$filespecified = $true;
internal-printfile $file
}
}
process
{
if (!$filespecified)
{
$test = write-Host process ; internal-printfile $_
}
}
}
When running from a scheduled task I can see the process start (Winword or AcroRd32) as I am dumping output to a text file however I do not see anything print. One other difference I noticed is that when I use this function while logged in the Applications other than Adobe reader (Office Apps) start to print the document then close. However when run from a scheduled task the applications do not close on their own.
I would appreciate any feedback, suggestions or pointers at this time as I have hit a wall as far as knowing what else I can check. I would also take suggestions as to an alternative way to accomplish the printing of the files. (NOTE: I cannot predict the file type in the folder)
NOTE: These symptoms are present on two machines, Windows server 2008 and Windows 7, both running Office 2007 and Adobe Reader 10.1.7
I'm trying to do the same thing that you are attempting. I'm pretty sure what you're running into is session 0 isolation. You can read more about it at this MSDN site and this Windows blog post.
I haven't tried the suggestions in the following answer to another question on SO, but it might be worth a try.
Creating Desktop-Folders for session 0
Here is another guy who is trying to print without having a user logged into the machine. There is an answer from someone who claims to know how to do what we're all trying to do, but he doesn't actually post the answer.
Too late for OP, but for future readers... I had the same problem, but with a windows shell .bat file, not PowerShell. As a scheduled task, the script would launch AcroRd32.exe /t, but it wouldn't print anything. Then after a delay, Acrobat was stopped, and the file was moved to the "Printed" folder like everything was good. It printed fine standalone, just not as a scheduled task.
(Background: I'm running Windows 10 x86 on one older computer so that we can use our two bulletproof HP LaserJet 1000 printers. However, the program we used for this in Win 7, batchdocprint, is incompatible with Win 10 and the company is gone. Due to having to learn arcane syntax and workarounds, I've spent way more money in hours getting a few lines of code (below) working right than the program cost, but I couldn't find a suitable replacement. The programs that I found either printed incorrectly, or had options for only one printer.)
The problem for me did seem to be Session 0 isolation blocking GDI+. I went with the seemingly "spammy" suggestion of getting Foxit Reader. It worked, and like Acrobat, the reader is free. I just replaced the path to AcroRd32.exe with the path to FoxitReader.exe
I don't think this will ever be possible with Acrobat Reader. The CLI is not officially supported, so the likelihood of Adobe ever changing it to print without launching the GUI is minimal.
As far as other file types, it depends on what you're using to print them, and whether it can open and print without a GUI. I haven't decided whether to implement this for other common file types so that we can just drag-and-drop, or to keep forcing the users to use the Acrobat PDF printers that are set up to save PDFs in the hot folders. Right now, it's the latter.
My code, for reference, to hopefully save someone else my headache. I only changed/shortened names, and removed duplicate code for the second folder/printer. Note that taskkill requires administrative privileges. Also, you probably need to have a folder named "Printed" in your hot folder, since I don't check for its existence:
#ECHO OFF
REM Monitors folders for PDF's and prints.
REM Use PING for delay - no built-in sleep functionality.
REM Using START backgrounds the process so the script can move on.
:LOOP
cd C:\Hot Folder\
for %%a in ("*.pdf") do (
start "" "C:\Path\To\FoxitReader.exe" /t "C:\Hot Folder\%%a" "HP 1000"
ping 1.1.1.1 -n 2 -w 5000 >NUL
taskkill /IM FoxitReader.exe /F
move /Y "%%a" ".\Printed\%%a")
ping 1.1.1.1 -n 2 -w 5000 >NUL
goto LOOP
Not sure if you ever found the solution to this, but it happens that the printer to be used by task scheduler job should be registered under:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Print\Printers (local printers)
vs
HKEY_Current_User\Printers\Connections (session printers)

Powershell Output inconsistencies

I have a powershell script that works just fine when I open powershell manually and run the script. It produces an output like such:
10.52.30.131 BALL-AIRKYYCP0 Not installed Ping successful Windows
10.52.30.133 BALL-4FNRAMLOD Not installed Ping successful Windows
10.52.30.134 BALL-5UU20W8E2 Not installed Ping successful Windows
If I right click the script file and then click run in powershell the script runs fine and does everything it needs to do but the output returned is different, see below:
10.52.30.131 BALL-AIRKYYCP0 Not installed Ping successful Wind
ows
10.52.30.133 BALL-4FNRAMLOD Not installed Ping successful Wind
ows
10.52.30.134 BALL-5UU20W8E2 Not installed Ping successful Wind
ows
For some reason running it by right click 'Run in Powershell' causes the output to be messy and cells to be cut off and finished on the next row. This is a small sample
Any ideas why the output would be different when running the script this way?
It appears the console windows were different sizes depending how you were launching it.
You can set your console window size from within your Powershell script if you want, using get-host.
For example, this will set the width of the console to 120:
$ws = (get-host).UI.RawUI.WindowSize
$ws.Width = 120
(get-host).UI.RawUI.WindowSize = $ws
Got this technique from here: http://technet.microsoft.com/en-ca/library/ee156814.aspx