Batch or Powershell to find lines equal to value and remove ones that are not - powershell

I am attempting to automate the manual validation of a file that I get daily. Currently the file I get is suppose to have 42 characters in a each line, mix characters. But randomly the file comes missing a space or invalid data length in a field. I am lost on how to check each lines length, and then remove the invalid lines from the master file and insert them into their own output file. I have made some head way with line length validation.
Get-Content dailyfile.txt | ForEach-Object { $_ | Measure-Object -Character } >> output.txt
But I cant wrap my head around how to use the output to find the specific line that doesn't equal 42. I may be asking more then a mouth full, but I cant even see light at the end of the tunnel on this one.

So something like this then.
Get-Content dailyfile.txt | Where-Object{$_.Length -lt 42} | Set-Content output.txt
Get-Content returns an array of strings. We use a Where-Object to pass the lines in the text file that contain a length of less than 42. If there is a chance it could be more than -ne would also work.
Mostly because I could not resist I wanted to help you with the code you had in your OP. While it is inefficient and longer this is what you could have done to complete your original code.
$TheAnswertotheUltimateQuestionofLifeTheUniverseandEverything = 42
Get-Content C:\temp\data.log | Where-Object{($_ | Measure-Object -Character | Select-Object -ExpandProperty Characters) -lt $TheAnswertotheUltimateQuestionofLifeTheUniverseandEverything} | Set-Content output.txt

Related

Powershell - Return Line or Row number from input file

I found an answer to a previous question incredibly helpful, but I can't quite figure out how Get-Content is able able to store the 'line number' from the input.
Basically I'm wondering if PSObjects store information such as line number or row number. In the example below, it is basically like using Get-Content is able to store the line number as a variable you can use later. In the pipeline, the variable would be $_.psobject.Properties.value[5]
A bit of that seems redundant to me since $_ is an object (I think), but still it is very cool that .value[5] seems to be the line number or row number. The same is not true of Import-CSV and while I'm looking for a similar option with Import-CSV; I'd like to better understand why this works the way it does.
https://stackoverflow.com/a/23119235/15243610
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if(!($_.split("|").Count -eq 210)){"Process stopped at line number $($_.psobject.Properties.value[5]), incorrect column count of: $($_.split("|").Count).";break}}
The answer in the other question works because Get-Content does indeed include the line number when it reads in the strings. When you run Get-Content each line will have a $_.ReadCount property as the 6th property on the object, which in my old answer I referenced in the PSObject for it as $_.psobject.Properties.value[5] (it was 7 years ago and I didn't know better yet, sorry). Mind you, if you use the -ReadCount parameter it will send that many lines through at a time, so Get-Content $file -readcount 5 | Select -first 1 | ForEach-Object{ $_.ReadCount } will come out as 5. Also -Raw sends everything through at once so it won't work with that.
Honestly, this isn't that hard to adapt to Import-Csv, we just increment a variable defined in the ForEach-Object loop.
Import-Csv C:\Path\To\SomeFile.csv | ForEach-Object -Begin {$x=1} -Process {
If($_.Something -eq $SomethingElse){
Write-Warning "Somethin' bad happened on line $x!"
break
}else{$_}
$x++
}

Delete a file, if it is empty except for a header row

I am trying to write a PowerShell script to delete a file if its empty, apart from the header.
postanote's answer provides some useful background information on the use of the Measure-Object cmdlet.
In the case at hand, however, it's simpler and faster to use the following:
$file = 'C:\path\to\FileOfInterest'
if ((Get-Content -First 2 $file).Count -le 1) {
Remove-Item $file
}
Get-Content -First 2 $file returns up to 2 lines from the start of file $file, as an array.
Note:-First is a more descriptive alias for the -TotalCount parameter; in PowerShell v2, use the latter.
(...).Count counts the elements of that array, i.e., the number of lines actually read.[1]
-le 1 (-le meaning less-than-or-equal) returns $true if, despite asking for 2 lines, only 0 or 1 are returned.
The Remove-Item call then removes file $file.
[1] Up to PowerShell version 2, .Count would return $null if only 1 line had been read, because PowerShell returns a single output object as-is instead of wrapping it in a single-element array. However, since $null is coerced to 0 in a numerical comparison such as with -le, ths solution works in v2 as well. PowerShell versions 3 and higher implicitly implement a .Count property even on scalars (single objects), which - sensibly - returns 1.
Agreed Olaf...
Khader - What did you search for. There are samples of how to count lines in a file all over the web.
Just search for 'powershell count lines in file'
Example hits.
Use a PowerShell Cmdlet to Count Files, Words, and Lines
How to count number of lines and words in a file using Powershell?
If I want to know how many lines are contained in the file, I use the
Measure-Object cmdlet with the line switch. This command is shown
here:
Get-Content C:\fso\a.txt | Measure-Object –Line
If I need to know the number of characters, I use the character
switch:
Get-Content C:\fso\a.txt | Measure-Object -Character
There is also a words switched parameter that will return the number
of words in the text file. It is used similarly to the character or
line switched parameter. The command is shown here:
Get-Content C:\fso\a.txt | Measure-Object –Word
In the following figure, I use the Measure-Object cmdlet to count
lines; then lines and characters; and finally lines, characters, and
words. These commands illustrate combining the switches to return
specific information.
Update for OP.
You should have updated your original question for context vs putting your code in the comment
As for …
Is there any way I can return just the count and use it with an if
statement to check if it is equal to 1, and then del the file
Just use the if statement when checking for the 'lines' count greater than 1
If (Get-Content $_.FullName | Measure-Object –Line | Where-Object -Property Lines -gt 1)
{
'Count is greater than one'
Remove-Item ...
}
Again, this is very basic PowerShell overview stuff, so it's prudent you take Olaf's suggestion to limit future confusion, frustrations, misconceptions and errors you are going to encounter.

Whitespace and truncation with ellipsis on Select-Object

I'm trying to figure out why Select-Object
adds a lot of whitespace at the start of its output; and
truncates long properties with ellipsis.
Here's a repro of what I mean. Suppose you run these commands on C:\:
New-Item "MyTest" -Type Directory
cd MyTest
"Some very long lorem ipsum like text going into a certain file, bla bla bla and some more bla." | Out-File test.txt
Get-ChildItem | Select-String "text" | Select-Object LineNumber,Line
This will show output like this:
The ellipsis I can understand, that would be just the way the command ends up getting formatted when the result is written to the console host. However, the whitespace at the start still confuses me in this case.
Things get weirder for me though when I pipe the result to either clip or Out-File output.txt. I get similarly formatted output, with a lot of whitespace at the start and truncated Line properties.
Which command is causing this behavior, and how can I properly solve this? Most importantly: how can I get the full results into a file or onto my clipboard?
The default behavior of outputting the data is to use Format-Table without any modifiers, and the default behavior of Format-Table is to split the viewport into columns of equal width. This makes no assumption on the output width, and is faster in that the cmdlet doesn't need to process any string data from the pipeline prior to output.
To reduce the whitespace, you should use Format-Table -AutoSize as the output method. The -AutoSize switch first measures the widths of data, then outputs with regard to calculated width. If you need to not receive ellipsis and always display the full data set, add -Wrap switch to Format-Table, this way the value will be wrapped into more than a single line, but you can copy it via selecting a square area in Powershell window, just strip newlines off the clipped contents.
Get-ChildItem | Select-String "text" | Select-Object LineNumber,Line | Format-Table -AutoSize -Wrap
I'd say the best way to get the full output into a file would be to export the result as a CSV:
Get-ChildItem |
Select-String "text" |
Select-Object LineNumber,Line |
Export-Csv 'out.csv'
You could also build a string from the selected properties, which might be better for copying the data to the clipboard:
Get-ChildItem |
Select-String "text" |
ForEach-Object { '{0}:{1}' -f $_.LineNumber, $_.Line } |
Tee-Object 'out.txt' | clip
The behavior you observed is caused by the way PowerShell displays output. Basically, it looks at the first object and counts the properties. Objects with less than 5 properties are sent to Format-Table, otherwise to Format-List. The columns of tabular output are spread evenly across the available space. As #Vesper already mentioned you can enforce proportional column width by using the -AutoSize parameter, and wrapping of long lines by using the -Wrap parameter. Format-List wraps long strings by default.
See this blog post from Jeffrey Snover for more information.

Replace lines with specific string and save with the same name

I'm working with an application that creates a log file. Due to an error in the software itself, it keeps producing three errors I'm not interested in. Each line has a unique identifier so I can't just replace the line since each one is different.
I have two main issues with this: I need to save it with the same name, and while it works the file should be available (in case the logger needs to write something).
I can't hard-code the original app to prevent it from writing that part of the log.
I have tried so far:
Get-Content log.log | Where-Object {$_-notmatch 'ERROR1' -And $_-notmatch 'ERROR2' -And $_-notmatch 'ERROR3' } `|Set-Content log_stripped.log
^ It only works if the output file has a different name.
Get-Content error.log | foreach-object { Where-Object {$_-notmatch 'ERROR1' -And $_-notmatch 'ERROR2' -And $_-notmatch 'ERROR3' } } | Set-Content error.log
^ This one froze my PS session.
I also tried reading the file to a variable:
$logcontent = ${h:error.log}
but I got System.OutOfMemoryException.
Ideally, what I need is something that reads the log file, takes away all the lines I don't want, and then save it with its original name.
Ideas? (Keep in mind that the log file is +/- 900 MB with the unnecesary data and 45mb once I strip the data with the first method - but I need it to save the file with its original name)
You can't save the file back to the same name while you're still reading from it, which means you'd have to read the whole 900MB into memory before you start writing. Not a good idea.
Try this:
Remove-Item log_stripped.log
Get-Content log.log -ReadCount 1000 |
foreach {$_ -notmatch 'ERROR1|ERROR2|ERROR3' | Add-Content log_stripped.log }
Remove-item log.log
Rename-Item log_stripped.log log.log
I know you said you want to save to the same filename, but if the reason you want that is that you want the log to be continuously updated, then you could do the following:
Get-Content -Wait log.log |
? {$_ -notmatch 'ERROR1|ERROR2|ERROR3' } |
Out-File log_stripped.log
Note the -Wait on the Get-Content.
log_stripped.log will be continuously updated as log.log is updated.

PowerShell Out-file manipulation

i hope someone can help.
I am trying to manipulate a file created by powershell.
I managed to get to the end result that i want, but i am sure it would be easier if it was only one command.
# Invoke the Exchange Snapping ( make sure you are Exchange Admin to do it SO)
add-pssnapin Microsoft.Exchange.Management.PowerShell.E2010
#Create a file with list of DL in the organization
Get-DistributionGroup | Select-Object Name | Out-File C:\Pre_DLGroups.txt
$content = Get-Content C:\Pre_DLGroups.txt
#Remove the 3 first lines of the file that you dont need it
$content | Select-Object -Skip 3 | Out-file C:\DLGroups.txt
#Trim the space in the end and crate the Final file
Get-Content C:\DLGroups.txt | Foreach {$_.TrimEnd()} | Set-Content c:\FinalDLGroup.txt
is that way to make the end result in a single file rather than creating 3?
cheers
Elton
You can send your content across the pipeline without writing it out to files. You can use parenthesis to group the output of certain sets of cmdlets and/or functions, and then pipe that output through to the intended cmdlets.
This can all be applied on a single line, but I've written it here on multiple lines for formatting reasons. The addition of Out-String is something of a safety measure to ensure that whatever output you're intending to trim can actually be trimmed.
Since we're not getting this content from a text file anymore, powershell could possibly return an object that doesn't understand TrimEnd(), so we need to be ready for that.
(Get-DistributionGroup | Select-Object Name) |
Out-String |
Select-Object -Skip 3 |
Foreach {$_.TrimEnd()} |
Set-Content c:\FinalDLGroup.txt
However, an even smaller solution would involve just pulling each name and manipulating it directly. I'm using % here as an alias for Foreach-Object. This example uses Get-ChildItem, where I have some files named test in my current directory:
(Get-ChildItem test*) |
% { $_.Name.TrimEnd() } |
Set-Content c:\output.txt
Get-DistributionGroup |
Select-Object -ExpandProperty Name -Skip 3 |
Set-Content c:\FinalDLGroup.txt