Powershell - Return Line or Row number from input file - powershell

I found an answer to a previous question incredibly helpful, but I can't quite figure out how Get-Content is able able to store the 'line number' from the input.
Basically I'm wondering if PSObjects store information such as line number or row number. In the example below, it is basically like using Get-Content is able to store the line number as a variable you can use later. In the pipeline, the variable would be $_.psobject.Properties.value[5]
A bit of that seems redundant to me since $_ is an object (I think), but still it is very cool that .value[5] seems to be the line number or row number. The same is not true of Import-CSV and while I'm looking for a similar option with Import-CSV; I'd like to better understand why this works the way it does.
https://stackoverflow.com/a/23119235/15243610
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if(!($_.split("|").Count -eq 210)){"Process stopped at line number $($_.psobject.Properties.value[5]), incorrect column count of: $($_.split("|").Count).";break}}

The answer in the other question works because Get-Content does indeed include the line number when it reads in the strings. When you run Get-Content each line will have a $_.ReadCount property as the 6th property on the object, which in my old answer I referenced in the PSObject for it as $_.psobject.Properties.value[5] (it was 7 years ago and I didn't know better yet, sorry). Mind you, if you use the -ReadCount parameter it will send that many lines through at a time, so Get-Content $file -readcount 5 | Select -first 1 | ForEach-Object{ $_.ReadCount } will come out as 5. Also -Raw sends everything through at once so it won't work with that.
Honestly, this isn't that hard to adapt to Import-Csv, we just increment a variable defined in the ForEach-Object loop.
Import-Csv C:\Path\To\SomeFile.csv | ForEach-Object -Begin {$x=1} -Process {
If($_.Something -eq $SomethingElse){
Write-Warning "Somethin' bad happened on line $x!"
break
}else{$_}
$x++
}

Related

How does powershell lazily evaluate this statement?

I was searching for a way to to read only the first few lines of a csv file and came across this answer. The accepted answer suggests using
Get-Content "C:\start.csv" | select -First 10 | Out-File "C:\stop.csv"
Another answers suggests using
Get-Content C:\Temp\Test.csv -TotalCount 3
Because my csv is fairly large I went with the second option. It worked fine. Out of curiosity I decided to try the first option assuming I could ctrl+c if it took forever. I was surprised to see that it returned just as quickly.
Is it safe to use the first approach when working with large files? How does powershell achieve this?
Yes, Select-Object -First n is "safe" for large files (provided you want to read only a small number of lines, so pipeline overhead will be insignificant, else Get-Content -TotalCount n will be more efficient).
It works like break in a loop, by exiting the pipeline early, when the given number of items have been processed. Internally it throws a special exception that the PowerShell pipeline machinery recognizes.
Here is a demonstration that "abuses" Select-Object to break from a ForEach-Object "loop", which is not possible using normal break statement.
1..10 | ForEach-Object {
Write-Host $_ # goes directly to console, so is ignored by Select-Object
if( $_ -ge 3 ) { $true } # "break" by outputting one item
} | Select-Object -First 1 | Out-Null
Output:
1
2
3
As you can see, Select-Object -First n actually breaks the pipeline instead of first reading all input and then selecting only the specified number of items.
Another, more common use case is when you want to find only a single item in the output of a pipeline. Then it makes sense to exit from the pipeline as soon as you have found that item:
Get-ChildItem -Recurse | Where-Object { SomeCondition } | Select-Object -First 1
According to Microsoft the Get-Content cmdlet has a parameter called -ReadCount. Their documentation states
Specifies how many lines of content are sent through the pipeline at a time. The default value is 1. A value of 0 (zero) sends all of the content at one time.
This parameter does not change the content displayed, but it does affect the time it takes to display the content. As the value of ReadCount increases, the time it takes to return the first line increases, but the total time for the operation decreases. This can make a perceptible difference in large items.
Since -ReadCount defaults to 1 Get-Content effectively acts as a generator for reading a file line-by-line.

From a CSV file get the file header and a portion of the file based on starting and ending line number parameters using PowerShell

So I have a very huge CSV file, the first line has the column headers. I want to keep the first line as a header and add a portion of the file from the file's mid-section or perhaps the end. I'm also trying to select only a few of the columns from the file. And finally, it would be great if the solution also changed the file delimiter from a comma to a tab.
I'm aiming for a solution that's a one-liner or perhaps 2?
Non-working Code version 30 ...
Get-Content -Tail 100 filename.csv | Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\filename_out.csv
I'm trying to get a better grip on PowerShell. So far, so good but I'm not quite there yet. But trying to solve such challenges are helping me (and hopefully others) build a good collection of coding idioms. (FYI - the boss is trying PowerShell due to our efforts so.)
OK thanks to iRon tip. Import-CSV defaults to comma separated, the Select-Object -Property get the columns I want, the select -Last gets the last 200 rows, and the Export-CSV changes the delimiter to a tab:
Import-Csv iarf.csv |
Select-Object -Property Id,Name,RecordTypeId,CreatedDate |
select -Last 200 |
Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\iarf100props6.csv
iRon provided the crucial pointer: Using Import-Csv rather than Get-Content allows you to retrieve arbitrary ranges from the original file as objects, if selected via Select-Object, and exporting these objects again via Export-Csv automatically includes a header line whose column names are the input objects' property names, as initially derived from the input file's header line.
In order to select an arbitrary range of rows, combine Select-Object's -Skip and -First parameters:
To only get rows from the beginning, use just -First $count:
To only get rows from the end, use just -Last $count
To get rows in a given range, use just -Skip $startRowMinus1 -First $rangeRowCount
For instance, the following command extracts rows 10 through 30:
Import-Csv iarf.csv |
Select-Object -Property Id,Name,RecordTypeId,CreatedDate -Skip 9 -First 20 |
Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\iarf100props6.csv

Delete a file, if it is empty except for a header row

I am trying to write a PowerShell script to delete a file if its empty, apart from the header.
postanote's answer provides some useful background information on the use of the Measure-Object cmdlet.
In the case at hand, however, it's simpler and faster to use the following:
$file = 'C:\path\to\FileOfInterest'
if ((Get-Content -First 2 $file).Count -le 1) {
Remove-Item $file
}
Get-Content -First 2 $file returns up to 2 lines from the start of file $file, as an array.
Note:-First is a more descriptive alias for the -TotalCount parameter; in PowerShell v2, use the latter.
(...).Count counts the elements of that array, i.e., the number of lines actually read.[1]
-le 1 (-le meaning less-than-or-equal) returns $true if, despite asking for 2 lines, only 0 or 1 are returned.
The Remove-Item call then removes file $file.
[1] Up to PowerShell version 2, .Count would return $null if only 1 line had been read, because PowerShell returns a single output object as-is instead of wrapping it in a single-element array. However, since $null is coerced to 0 in a numerical comparison such as with -le, ths solution works in v2 as well. PowerShell versions 3 and higher implicitly implement a .Count property even on scalars (single objects), which - sensibly - returns 1.
Agreed Olaf...
Khader - What did you search for. There are samples of how to count lines in a file all over the web.
Just search for 'powershell count lines in file'
Example hits.
Use a PowerShell Cmdlet to Count Files, Words, and Lines
How to count number of lines and words in a file using Powershell?
If I want to know how many lines are contained in the file, I use the
Measure-Object cmdlet with the line switch. This command is shown
here:
Get-Content C:\fso\a.txt | Measure-Object –Line
If I need to know the number of characters, I use the character
switch:
Get-Content C:\fso\a.txt | Measure-Object -Character
There is also a words switched parameter that will return the number
of words in the text file. It is used similarly to the character or
line switched parameter. The command is shown here:
Get-Content C:\fso\a.txt | Measure-Object –Word
In the following figure, I use the Measure-Object cmdlet to count
lines; then lines and characters; and finally lines, characters, and
words. These commands illustrate combining the switches to return
specific information.
Update for OP.
You should have updated your original question for context vs putting your code in the comment
As for …
Is there any way I can return just the count and use it with an if
statement to check if it is equal to 1, and then del the file
Just use the if statement when checking for the 'lines' count greater than 1
If (Get-Content $_.FullName | Measure-Object –Line | Where-Object -Property Lines -gt 1)
{
'Count is greater than one'
Remove-Item ...
}
Again, this is very basic PowerShell overview stuff, so it's prudent you take Olaf's suggestion to limit future confusion, frustrations, misconceptions and errors you are going to encounter.

Import-Csv include empty fields in end of row

Edit
I'll conclude that Import-Csv is not ideal for incorrect formatted CSV and will use Get-Content and split. Thanks for all the answers.
Example CSV:
"SessionID","ObjectName","DatabaseName",,,,,,,,
"144","","AC"
Using Import-Csv none of the empty fields at the end will be counted - it will simply stop after "DatabaseName".
Is there any way to include the empty fields?
Edit:
I simply need to count the fields and make sure there are less than X amount of them. It is not only the header that might contain empty fields but also the content. These files are often manually made and not properly formatted. Since the files also can get very large, I would prefer to not also use Get-Content and split since I'm already using Import-Csv and its properties.
Looks like it's missing its headers. If you would add some, it would work fine.
You could do something like
Get-Content My.CSV | Select -skip 1 | ConvertFrom-Csv -Header "SessionID","ObjectName","DatabaseName",'Whatnot1', 'Whatnot2', 'Whatnot3'
As dbso suggested split and Length will help you. I was on the way to code a header routine which now is obsolete. Nevertheless here it is:
$FileIn = "Q:\test\2017-01\06\SO_41505840.csv"
$Header= (Get-Content $FileIn|select -first 1)-split(",")
"Fieldcount for $FileIn is $($Header.Length)"
for($i=0; $i -lt $Header.Length; $i++){if ($Header[$i] -eq ""){$Header[$i]="`"Column$($i+1)`""}}
$Header -Join(",")
Returning this output
Fieldcount for Q:\test\2017-01\06\SO_41505840.csv is 11
"SessionID","ObjectName","DatabaseName","Column4","Column5","Column6","Column7","Column8","Column9","Column10","Column11"

Batch or Powershell to find lines equal to value and remove ones that are not

I am attempting to automate the manual validation of a file that I get daily. Currently the file I get is suppose to have 42 characters in a each line, mix characters. But randomly the file comes missing a space or invalid data length in a field. I am lost on how to check each lines length, and then remove the invalid lines from the master file and insert them into their own output file. I have made some head way with line length validation.
Get-Content dailyfile.txt | ForEach-Object { $_ | Measure-Object -Character } >> output.txt
But I cant wrap my head around how to use the output to find the specific line that doesn't equal 42. I may be asking more then a mouth full, but I cant even see light at the end of the tunnel on this one.
So something like this then.
Get-Content dailyfile.txt | Where-Object{$_.Length -lt 42} | Set-Content output.txt
Get-Content returns an array of strings. We use a Where-Object to pass the lines in the text file that contain a length of less than 42. If there is a chance it could be more than -ne would also work.
Mostly because I could not resist I wanted to help you with the code you had in your OP. While it is inefficient and longer this is what you could have done to complete your original code.
$TheAnswertotheUltimateQuestionofLifeTheUniverseandEverything = 42
Get-Content C:\temp\data.log | Where-Object{($_ | Measure-Object -Character | Select-Object -ExpandProperty Characters) -lt $TheAnswertotheUltimateQuestionofLifeTheUniverseandEverything} | Set-Content output.txt