Office 365 Powershell Logs Got Very Large - powershell

I have many scheduled jobs running on a server in our environment, but this morning a monitoring team contacted me about a service account profile eating up all the disk space. I dug around and found logs in C:\Users\service_account\AppData\Local\Microsoft\Office365\Powershell directory, but 3 of them on different dates were over 10 GB in size.
I narrowed it down to one script we have the checks the AD dir sync every hour to make sure it doesn't take longer than an hour, but I'm not sure what the best practice is for this? Also, most logs in this location are only 4 kb in size and indicate typical tracking like, connection times, cmdlets initializing, etc.
I'm not sure how to open such a huge file to see what may be the problem. Has anyone else run into this kind of thing? I can't find much on line. Also, I do run Remove-pssession * to close my sessions. Thx in advance...

You have a couple of options.
Read your file from disk one line or groups of lines at a time, work with each line, and then write each line back out to disk., if needed.
Instead of caching the entire file in RAM by using plain Get-Content, you’re reading it off disk a bit at a time. The below represents a very simplistic example of how to do this.
$file = New-Object System.IO.StreamReader -Arg "test.txt"
while ($line = $file.ReadLine()) {
# $line has your line
}
$file.close()
Or using ...
Get-Content -Readcount 100
... to process chunks of lines at a time, the above will give you arrays of 100 lines each.
There are recent posts on this forum regarding folks working with large files and PoSH and having to resort to using th e System.IO.StreamReader / System.IO.StreamWriter .Net aproach to handle the use case.
For example:
Unable to Find and Replace a string in 2GB XML file using Powershell
But there are lots more on this forum, just search for them using [powershell] large file

Related

add-content, stream not readable, root cause?

I have a script that is using Add-Content to log progress. The log files are written to a network share, and on some networks I have gotten stream not readable errors. Rare, but often enough to want to address the issue.
I found [this thread][1] that seems to offer an answer. And I initially implemented this loop
$isWritten = $false
do {
try {
Add-Content -Path $csv_file -Value $newline -ErrorAction Stop
$isWritten = $true
}
catch {
}
} until ( $isWritten )
but I added a 1 second wait between tries, and limited myself to 20 tries. I figured no network could be such crap that it would timeout for longer than that. But on one network I still have problems, so I bumped the count to 60, and STILL have failures to write to the log. I tried [System.IO.File]::AppendAllText($file, $line) and that seems to solve all the timeouts, at least in 20 some odd tries it hasn't failed, where before I would get 1 or two failures in 10 tries. But the formatting is off, likely I need to set the encoding.
But more importantly, I wonder what is actually the SOURCE of the issue in Add-Content, and why does [System.IO.File]::AppendAllText() not have the issue, and is this a sign of potentially other problems with the network, or with the machines at this one location? Or just a bug in PowerShell that I need to work around. FWIW, it's PS 5.1 on Windows 10 21H2.
Also, FWIW, the logs can get to a few hundred lines long, but I often see the error in the first 10 lines.
[1]: add-content produces stream not readable

Has anyone managed to extract a list of all Azure Active Directory users through Powershell within Azure Function App?

I've a Powershell script (below) that extracts a list of 'AccountEnabled' Azure Active Directory (AAD) users (approx. 35k users and 13MB in file size) which runs fine on my local machine although it takes a while (approx. 10 minutes)
However, when I deploy the code to Azure Function App using Durable Functions (tried both Consumption & Premium EP1 Plan) and do a test run, it authenticates successfully but from the log, it keeps repeating "no traces within the x minutes". When I inspect the blob storage location where the file is meant to be exported to, the file size is initially around 1.4MB but after a few minutes it seems to get replaced with a file of 700kB size and then 0kB size which is weird. Left it running for hours and nothing happened.
I then modified the script to only list users where their UserPrincipalName starts with the letter 'a' and it ran fine on Azure Functions; i.e: exported approx. 3k users in about 2 minutes and the Azure Function log shows it ran successfully too.
I'm kinda new to Azure Function and not sure where's the bottleneck/issue so it will be much appreciated if someone can please shed some light on it. Thanks in advance.
Get-AzureADUser -All $true -Filter 'AccountEnabled eq true' | Select-Object DeletionTimestamp,ObjectId,
ObjectType,AccountEnabled,AgeGroup,City,CompanyName,Country,CreationType,Department,DirSyncEnabled,
DisplayName,FacsimileTelephoneNumber,GivenName,IsCompromised,JobTitle,LastDirSyncTime,LegalAgeGroupClassification,
Mail,MailNickName,Mobile,OtherMails,PhysicalDeliveryOfficeName,PostalCode,PreferredLanguage,
RefreshTokensValidFromDateTime,ShowInAddressList,SignInNames,State,StreetAddress,Surname,TelephoneNumber,
UsageLocation,UserPrincipalName,UserState,UserStateChangedOn,UserType,
#{name='Licensed';expression={if($_.AssignedLicenses){$TRUE}else{$False}}},
#{name='Plan';expression={if($_.AssignedPlans){$TRUE}else{$False}}},
#{N="EmployeeId";E={$_.ExtensionProperty["employeeId"]} },
#{N="CreatedDateTime";E={$_.ExtensionProperty["createdDateTime"]} }
| Export-csv $FilePath -NoTypeInformation #stores results in a csv
Most likely, your function times out. Have you tried to configure functionTimeout?
If your code requires more than 60 minutes on Premium or more than 10 minutes on Consumption, and you can split processing into smaller parts (e.g. by the first letter), consider using Durable Functions: you can use either Function chaining or Fan out/fan in pattern. The timeout will be applied to each activity invocation individually, but the entire orchestration execution time is unlimited.

powershell write to file

I've written a GUI for making changes in AD and I need every action logged. This GUI is used by multiple users at once and writes to one file but only first person that writes to the log file can actually write to it. Everyone else has access denied.
I'm using streamwriter like this.
$File = "$LogPath\$LogDate.log"
$stream = [System.IO.StreamWriter] $File
$stream.WriteLine("----------------------------------------------------")
$stream.WriteLine("$LogTime $ExecUser | Set expire date for user $setenddateuser to $usernewenddate")
$stream.close()
What am I doing wrong here that the handle for this file is not released for someone else to use?
Creating a stream basically blocks the access or the creation of other streams for any files, unless it's al xlsx in sharemode, what I do with the log similar logs is to use add-content.
I.e:
Add-content -value "test" -path $logRoute
Sorry for any typos, I'm on my cell atm. But that should fix your issue
Have you tried using a fine-grained .NET mutex? Each time you need to log a message, lock the mutex, use Add-Content to append the message, then release the mutex. That should make sure the file is never opened multiple times simultaneously, which is almost certainly the cause of your problem.
Here's an article on almost exactly this issue.
Using ACL commands helped to fix the issue as the newly created file was lacking perrmissions for write for other users.

preplog.exe ran in foreach log file

I have a folder with x amount of web log files and I need to prep them for bulk import to SQL
for that I have to run preplog.exe into each one of them.
I want to create a Power script to do this for me, the problem that I'm having is that preplog.exe has to be run in CMD and I need to enter the input path and the output path.
For Example:
D:>preplog c:\blah.log > out.log
I've been playing with Foreach but I haven't have any luck.
Any pointers will be much appreciated
I would guess...
Get-ChildItem "C:\Folder\MyLogFiles" | Foreach-Object { preplog $_.FullName | Out-File "preplog.log" -Append }
FYI it is good practice on this site to post your not working code so at least we have some context. Here I assume you're logging to the current directory into one file.
Additionally you've said you need to run in CMD but you've tagged PowerShell - it pays to be specific. I've assumed PowerShell because it's a LOT easier to script.
I've also had to assume that the folder contains ONLY your log files, otherwise you will need to include a Where statement to filter the items.
In short I've made a lot of assumptions that means this may not be an accurate answer, so keep all this in mind for your next question =)

Get-Content -wait not working as described in the documentation

I've noticed that when Get-Content path/to/logfile -Wait, the output is actually not refreshed every second as the documentation explains it should. If I go in Windows Explorer to the folder where the log file is and Refresh the folder, then Get-Content would output the latest changes to the log file.
If I try tail -f with cygwin on the same log file (not at the same time than when trying get-content), then it tails as one would expect, refreshing real time without me having to do anything.
Does anyone have an idea why this happens?
Edit: Bernhard König reports in the comments that this has finally been fixed in Powershell 5.
You are quite right. The -Wait option on Get-Content waits until the file has been closed before it reads more content. It is possible to demonstrate this in Powershell, but can be tricky to get right as loops such as:
while (1){
get-date | add-content c:\tesetfiles\test1.txt
Start-Sleep -Milliseconds 500
}
will open and close the output file every time round the loop.
To demonstrate the issue open two Powershell windows (or two tabs in the ISE). In one enter this command:
PS C:\> 1..30 | % { "${_}: Write $(Get-Date -Format "hh:mm:ss")"; start-sleep 1 } >C:\temp\t.txt
That will run for 30 seconds writing 1 line into the file each second, but it doesn't close and open the file each time.
In the other window use Get-Content to read the file:
get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }
With the -Wait option you need to use Ctrl+C to stop the command so running that command 3 times waiting a few seconds after each of the first two and a longer wait after the third gave me this output:
PS C:\> get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }
8: Write 12:15:09 read at 12:15:09
PS C:\> get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }
13: Write 12:15:14 read at 12:15:15
PS C:\> get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }
19: Write 12:15:20 read at 12:15:20
20: Write 12:15:21 read at 12:15:32
21: Write 12:15:22 read at 12:15:32
22: Write 12:15:23 read at 12:15:32
23: Write 12:15:24 read at 12:15:32
24: Write 12:15:25 read at 12:15:32
25: Write 12:15:26 read at 12:15:32
26: Write 12:15:27 read at 12:15:32
27: Write 12:15:28 read at 12:15:32
28: Write 12:15:29 read at 12:15:32
29: Write 12:15:30 read at 12:15:32
30: Write 12:15:31 read at 12:15:32
From this I can clearly see:
Each time the command is run it gets the latest line written to the file. i.e. There is no problem with caching and no buffers needing flushed.
Only a single line is read and then no further output appears until the command running in the other window completes.
Once it does complete all of the pending lines appear together. This must have been triggered by the source program closing the file.
Also when I repeated the exercise with the Get-Content command running in two other windows one window read line 3 then just waited, the other window read line 6, so the line is definitely being written to the file.
It seems pretty conclusive that the -Wait option is waiting for a file close event, not waiting for the advertised 1 second. The documentation is wrong.
Edit:
I should add, as Adi Inbar seems to insistent that I'm wrong, that the examples I gave here use Powershell only as that seemed most appropriate for a Powershell discussion. I did also verify using Python that the behaviour is exactly as I described:
Content written to a file is readable by a new Get-Content -Wait command immediately provided the application has flushed its buffer.
A Powershell instance using Get-Content -Wait will not display new content in the file that is being written even though another Powershell instance, started later, sees the later data. This proves conclusively that the data is accessible to Powershell and Get-Content -Wait is not polling at 1 second intervals but waiting for some trigger event before it next looks for data.
The size of the file as reported by dir is updating while lines are being added, so it is not a case of Powershell waiting for the directory entry size to be updated.
When the process writing the file closes it, the Get-Content -Wait displays the new content almost instantly. If it were waiting until the data was flushed to disk there would be up to a delay until Windows flushed it's disk cache.
#AdiInbar, I'm afraid you don't understand what Excel does when you save a file. Have a closer look. If you are editing test.xlsx then there is also a hidden file ~test.xlsx in the same folder. Use dir ~test.xlsx -hidden | select CreationTime to see when it was created. Save your file and now test.xlsx will have the creation time from ~test.xlsx. In other words saving in Excel saves to the ~ file then deletes the original, renames the ~ file to the original name and creates a new ~ file. There's a lot of opening and closing going on there.
Before you save it has the file you are looking at open, and after that file is open, but its a different file. I think Excel is too complex a scenario to say exactly what triggers Get-Content to show new content but I'm sure you mis-interpreted it.
It looks like Powershell is monitoring the file's Last Modified property. The problem is that "for performance reasons" the NTFS metadata containing this property is not automatically updated except under certain circumstances.
One cirumstance is when the file handle is closed (hence #Duncan's observations). Another is when the file's information is queried directly, hence the Explorer refresh behaviour mentioned in the question.
You can observe the correlation by having Powershell monitoring a log with Get-Content -Wait and having Explorer open in the folder in details view with Last Modified column visible. Notice that Last Modified doesn't update automatically as the file is modified.
Now get the properties of the file in another window. E.g. at a command prompt, type the file. Or open another Explorer window in the same folder, and right-click the file and get its properties (for me, just right-clicking is enough). As soon as you do that, the first Explorer window will automatically update the Last Modified column and Powershell will notice the update and catch up with the log. In Powershell, touching the LastWriteTime property is enough:
(Get-Item file.log).LastWriteTime = (Get-Item file.log).LastWriteTime
or
(Get-Item file.log).LastWriteTime = Get-Date
So this is now working for me:
Start-Job {
$f=Get-Item full\path\to\log
while (1) {
$f.LastWriteTime = Get-Date
Start-Sleep -Seconds 10
}
}
Get-Content path\to\log -Wait
Can you tell us how to reproduce that?
I can start this script on one PS session:
get-content c:\testfiles\test1.txt -wait
and this in another session:
while (1){
get-date | add-content c:\tesetfiles\test1.txt
Start-Sleep -Milliseconds 500
}
And I see the new entries being written in the first session.
It appears that get-content only works if it goes through the windows api and that versions of appending to a file are different.
program.exe > output.txt
And then
get-content output.txt -wait
Will not update. But
program.exe | add-content output.txt
will work with.
get-content output.txt -wait
So I guess it depends on how the application does output.
I can assure you that Get-Content -Wait does refresh every second, and shows you changes when the file changes on the disk. I'm not sure what tail -f is doing differently, but based on your description I'm just about certain that this issue is not with PowerShell but with write caching. I can't rule out the possibility that log4net is doing the caching, but I strongly suspect that OS-level caching is the culprit, for two reasons:
The documentation for log4j/log4net says that it flushes the buffer after every append operation by default, and I presume that if you had explicitly configured it not to flush after every append, you'd be aware of that.
I know for a fact that refreshing Windows Explorer triggers a write buffer flush if any files in the directory have changed. That's because it actually reads the file contents, not just the metadata, in order to provide extended information such as thumbnails and previews, and the read operation causes the write buffer to flush. So, if you're seeing the delayed updates every time you refresh the logfile's directory in Windows Explorer, that points strongly in this direction.
Try this: Open Device Manager, expand the Disk Drives node, open the Properties of the disk on which the logfile is stored, switch to the Policies tab, and uncheck Enable write caching on the device. I think you'll find that Get-Content -Wait will now show you the changes as they happen.
As for why tail -f is showing you the changes immediately as it is, I can only speculate. Maybe you're using it to monitor a logfile on a different drive, or perhaps Cygwin requests frequent flushes while you're running tail -f, to address this very issue.
UPDATE:
Duncan commented below that it is an issue with PowerShell, and posted an answer contending that Get-Content -Wait doesn't output new results until the file is closed, contrary to the documentation.
However, based on information already established and further testing, I've confirmed conclusively that it does not wait for the file to be closed, but outputs new data added to the file as soon as it's written to disk, and that the issue the OP is seeing is almost definitely due to write buffering.
To prove this, let the facts be submitted to a candid world:
I created an Excel spreadsheet, and ran Get-Content -Wait against the .xlsx file. When I entered new data into the spreadsheet, the Get-Content -Wait did not produce new output, which is expected while the new information is only in RAM and not on disk. However, whenever I saved the spreadsheet after adding data, new output was produced immediately.
Excel does not close the file when you save it. The file remains open until you close the Window from Excel, or exit Excel. You can verify this by trying to delete, rename, or otherwise modify the .xlsx file after you've saved it, while the window is still open in Excel.
The OP stated that he gets new output when he refreshes the folder in Windows Explorer. Refreshing the folder listing does not close the file. It does flush the write buffer if any of the files have changed. That's because it has to read the file's attributes, and this operation flushes the write buffer. I'll try to find some references for this, but as I noted above, I know for a fact that this is true.
I verified this behavior by running the following modified version of Duncan's test, which runs for 1,000 iterations instead of 50, and displays progress at the console so that you can track exactly how the output in your Get-Content -Wait window relates to the data that the pipeline has added to the file:
1..1000 | %{"${_}: Write $(Get-Date -Format "hh:mm:ss")"; Write-Host -NoNewline "$_..."; Start-Sleep 1} > .\gcwtest.txt
While this was running, I ran Get-Content -Wait .\gcwtest.txt in another window, and opened the directory in Windows Explorer. I found that if I refresh, more output is produced any time the file size in KB changes, and sometimes but not always even if nothing visible has changed. (More on the implications of that inconsistency later...)
Using the same test, I opened a third PowerShell window, and observed that all of the following trigger an immediate update in the Get-Content -Wait listing:
Listing the file's contents with plain old Get-Content .\gcwtest.txt
Reading any of the file's attributes. However, for attributes that don't change, only the first read triggers an update.
For example, (gi .\gcwtest.txt).lastwritetime triggers more output multiple times. On the other hand, (gi .\gcwtest.txt).mode or (gi .\gcwtest.txt).directory trigger more output the first time each, but not if you repeat them. Also note the following:
» This behavior is not 100% consistent. Sometimes, reading Mode or Directory doesn't trigger more output the first time, but it does if you repeat the operation. All subsequent repetitions after the first one that triggers updated output have no effect.
» If you repeat the test, reading attributes that are the same does not trigger output, unless you delete the .txt file before running the pipeline again. In fact, sometimes even (gi .\gcwtest.txt).lastwritetime doesn't trigger more output if you repeat the test without deleting gcwtest.txt.
» If you issue (gi .\gcwtest.txt).lastwritetime multiple times in one second, only the first one triggers output, i.e. only when the result has changed.
Opening the file in a text editor. If you use an editor that keeps the file handle open (notepad does not), you'll see that closing the file without saving does not cause Get-Content -Wait to output the lines added by the pipeline since you opened the file in the editor.
Tab-completing the file's name
After you try any of the tests above a few times, you many find that Get-Content -Wait outputs more lines periodically for the remainder of the pipeline's execution, even if you don't do anything. Not one line at a time, but in batches.
The inconsistency in behavior itself points to buffer flushing, which occurs according to variable criteria that are hard to predict, as opposed to closing, which occurs under clear-cut and consistent circumstances.
Conclusion: Get-Content -Wait works exactly as advertised. New content is displayed as soon as it's physically written to the file on disk*.
It should be noted that my suggestion to disable write caching on the drive did not for the test above, i.e. it did not result in `Get-Content -Wait displaying new lines as soon as they're added to the text file by the pipeline, so perhaps the buffering responsible for the output latency is occurring on a filesystem or OS level as opposed to the disk's write cache. However, write buffering is clearly the explanation for the behavior observed in the OP's question.
* I'm not going to get into this in detail, since it's out of the scope of the question, but Get-Content -Wait does behave oddly if you add content to the file not at the end. It displays data from the end of the file equal in size to the amount of data added. The newly displayed data generally repeats data that was previously displayed, and may or may not include any of the new data, depending on whether the size of the new data exceeds the size of the data that follows it.
I ran in to the same issue while trying to watch WindowsUpdate.log in realtime. While not ideal, the code below allowed me to monitor the progress. -Wait didn't work due to the same file-writing limitations discussed above.
Displays the last 10 lines, sleeps for 10 seconds, clears the screen and then displays the last 10 again. CTRL + C to stop stream.
while(1){
Get-Content C:\Windows\WindowsUpdate.log -tail 10
Start-Sleep -Seconds 10
Clear
}