Extracting columns from text file using PowerShell - powershell

I have to extract columns from a text file explained in this post:
Extracting columns from text file using Perl one-liner: similar to Unix cut
but I have to do this also in a Windows Server 2008 which does not have Perl installed. How could I do this using PowerShell? Any ideas or resources? I'm PowerShell noob...

Try this:
Get-Content test.txt | Foreach {($_ -split '\s+',4)[0..2]}
And if you want the data in those columns printed on the same line:
Get-Content test.txt | Foreach {"$(($_ -split '\s+',4)[0..2])"}
Note that this requires PowerShell 2.0 for the -split operator. Also, the ,4 tells the the split operator the maximum number of split strings you want but keep in mind the last string will always contain all extras concat'd.
For fixed width columns, here's one approach for column width equal to 7 ($w=7):
$res = Get-Content test.txt | Foreach {
$i=0;$w=7;$c=0; `
while($i+$w -lt $_.length -and $c++ -lt 2) {
$_.Substring($i,$w);$i=$i+$w-1}}
$res will contain each column for all rows. To set the max columns change $c++ -lt 2 from 2 to something else. There is probably a more elegant solution but don't have time right now to ponder it. :-)

Assuming it's white space delimited this code should do.
$fileName = "someFilePath.txt"
$columnToGet = 2
$columns = gc $fileName |
%{ $_.Split(" ",[StringSplitOptions]"RemoveEmptyEntries")[$columnToGet] }

To ordinary、
type foo.bar | % { $_.Split(" ") | select -first 3 }

Try this. This will help to skip initial rows if you want, extract/iterate through columns, edit the column data and rebuild the record:
$header3 = #("Field_1","Field_2","Field_3","Field_4","Field_5")
Import-Csv $fileName -Header $header3 -Delimiter "`t" | select -skip 3 | Foreach-Object {
$record = $indexName
foreach ($property in $_.PSObject.Properties){
#doSomething $property.Name, $property.Value
if($property.Name -like '*CUSIP*'){
$record = $record + "," + '"' + $property.Value + '"'
}
else{
$record = $record + "," + $property.Value
}
}
$array.add($record) | out-null
#write-host $record
}

Related

How to seperate CSV values within a CSV into new rows in PowerShell

I'm receiving an automated report from a system that cannot be modified as a CSV. I am using PowerShell to split the CSV into multiple files and parse out the specific data needed. The CSV contains columns that may contain no data, 1 value, or multiple values that are comma separated within the CSV file itself.
Example(UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542, 340668, 292196"
"Approval","AA-334454, 344366, 323570, 322827, 360225, 358850, 345935"
"ITS","345935, 358850"
"Services",""
I want the data to have one entry per line like this (UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542"
"Risk","340668"
"Risk","292196"
#etc.
I've tried splitting the data and I just get an unknown number of columns at the end.
I tried a foreach loop, but can't seem to get it right (pseudocode below):
Import-CSV $Groups
ForEach ($line in $Groups){
If($_.'Members'.count -gt 1, add-content "$_.Group,$_.Members[2]",)}
I appreciate any help you can provide. I've searched all the stackexchange posts and used Google but haven't been able to find something that addresses this exact issue.
Import-Csv .\input.csv | ForEach-Object {
ForEach ($Member in ($_.Members -Split ',')) {
[PSCustomObject]#{Group = $_.Group; Member = $Member.Trim()}
}
} | Export-Csv .\output.csv -NoTypeInformation
# Get the raw text contents
$CsvContents = Get-Content "\path\to\file.csv"
# Convert it to a table object
$CsvData = ConvertFrom-CSV -InputObject $CsvContents
# Iterate through the records in the table
ForEach ($Record in $CsvData) {
# Create array from the members values at commas & trim whitespace
$Record.Members -Split "," | % {
$MemberCount = $_.Trim()
# Check if the count is greater than 1
if($MemberCount -gt 1) {
# Create our output string
$OutputString = "$($Record.Group), $MemberCount"
# Write our output string to a file
Add-Content -Path "\path\to\output.txt" -Value $OutputString
}
}
}
This should work, you had the right idea but I think you may have been encountering some syntax issues. Let me know if you have questions :)
Revised the code as per your updated question,
$List = Import-Csv "\path\to\input.csv"
foreach ($row in $List) {
$Group = $row.Group
$Members = $row.Members -split ","
# Process for each value in Members
foreach ($MemberValue in $Members) {
# PS v3 and above
$Group + "," + $MemberValue | Export-Csv "\path\to\output.csv" -NoTypeInformation -Append
# PS v2
# $Group + "," + $MemberValue | Out-File "\path\to\output.csv" -Append
}
}

How to set as variable csv column using powershell?

I have csv file like this
ID Name
4 James
6 John
1 Cathy
I want to save those file as .cmd with this format
SET NUMBER1=4
SET NUMBER2=6
SET NUMBER3=1
The total of ID in the csv file is not always 3. If the ID more than 3, it means my cmd file be like this
SET NUMBER1=4
SET NUMBER2=6
SET NUMBER3=1
SET NUMBERN=N
Anyone can help please. I really new in powershell, really need help and advice please. Thanks
$ID = Import-Csv .\Data.csv | Select-Object -ExpandProperty ID
$ID.Count
ForEach ( $id in $ID ) {
}
I am stuck here
An alternative approach is below if your headers are always present in the file. It doesn't matter what the delimiter is as long as it isn't a number. Your delimited data in the sample is not consistent. Otherwise, Import-Csv would be a safer option.
$fileData = Get-Content file.csv
$output = for ($i = 1; $i -lt $fileData.count; $i++) {
"SET NUMBER{0}={1}" -f $i,($fileData[$i] -replace "(?<=^\d+).*")
}
$output | Out-File file.cmd
Explanation:
The format operator (-f) is used to help construct the output strings. The ID numbers are selected using regex by replacing everything that comes after the beginning digits on each line.
Try this:
# set current directory to script directory
Set-Location $PSScriptRoot
# import csv-file, delimiter = space
$content = Import-Csv 'test.csv' -Delimiter ' '
$output = ''
# create output lines
for( $i = 1; $i -le $content.Count; $i++ ) {
$output += 'SET NUMBER' + $i.ToString() + '=' + $content[$i-1].ID.ToString() + [environment]::NewLine
}
# output to file
$output | Out-File 'result.bat' -Force

Replace first duplicate without regex and increment

I have a text file and I have 3 of the same numbers somewhere in the file. I need to add incrementally to each using PowerShell.
Below is my current code.
$duped = Get-Content $file | sort | Get-Unique
while ($duped -ne $null) {
$duped = Get-Content $file | sort | Get-Unique | Select -Index $dupecount
$dupefix = $duped + $dupecount
echo $duped
echo $dupefix
(Get-Content $file) | ForEach-Object {
$_ -replace "$duped", "$dupefix"
} | Set-Content $file
echo $dupecount
$dupecount = [int]$dupecount + [int]"1"
}
Original:
12345678
12345678
12345678
Intended Result:
123456781
123456782
123456783
$filecontent = (get-content C:\temp\pos\bart.txt )
$output = $null
[int]$increment = 1
foreach($line in $filecontent){
if($line -match '12345679'){
$line = [int]$line + $increment
$line
$output += "$line`n"
$increment++
}else{
$output += "$line`n"
}
}
$output | Set-Content -Path C:\temp\pos\bart.txt -Force
This works in my test of 5 lines being
a word
12345679
a second word
12345679
a third word
the output would be :
a word
12345680
a second word
12345681
a third word
Let's see if i understand the question correctly:
You have a file with X-amount of lines:
a word
12345678
a second word
12345678
a third word
You want to catch each instance of 12345678 and add 1 increment to it so that it would become:
a word
12345679
a second word
12345679
a third word
Is that what you are trying to do?

Powershell to count columns in a file

I need to test the integrity of file before importing to SQL.
Each row of the file should have the exact same amount of columns.
These are "|" delimited files.
I also need to ignore the first line as it is garbage.
If every row does not have the same number of columns, then I need to write an error message.
I have tried using something like the following with no luck:
$colCnt = "c:\datafeeds\filetoimport.txt"
$file = (Get-Content $colCnt -Delimiter "|")
$file = $file[1..($file.count - 1)]
Foreach($row in $file){
$row.Count
}
Counting rows is easy. Columns is not.
Any suggestions?
Yep, read the file skipping the first line. For each line split it on the pipe, and count the results. If it isn't the same as the previous throw an error and stops.
$colCnt = "c:\datafeeds\filetoimport.txt"
[int]$LastSplitCount = $Null
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if($LastSplitCount -and !($_.split("|").Count -eq $LastSplitCount)){"Process stopped at line number $($_.psobject.Properties.value[5]) for column count mis-match.";break}elseif(!$LastSplitCount){$LastSplitCount = $_.split("|").Count}}
That should do it, and if it finds a bad column count it will stop and output something like:
Process stopped at line number 5 for column count mis-match.
Edit: Added a Where catch to skip blank lines ( ?{$_} )
Edit2: Ok, if you know what the column count should be then this is even easier.
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if(!($_.split("|").Count -eq 210)){"Process stopped at line number $($_.psobject.Properties.value[5]), incorrect column count of: $($_.split("|").Count).";break}}
If you want it to return all lines that don't have 210 columns just remove the ;break and let it run.
A more generic approach, including a RegEx filter:
$path = "path\to\folder"
$regex = "regex"
$expValue = 450
$files= Get-ChildItem $path | Where-Object {$_.Name -match $regex}
Foreach( $f in $files) {
$filename = $f.Name
echo $filename
$a = Get-Content $f.FullName;
$i = 1;
$e = 0;
echo "Starting...";
foreach($line in $a)
{
if ($line.length -ne $expValue){
echo $filename
$a | Measure-Object -Line
echo "Long:"
echo $line.Length;
echo "Line Nº: "
echo $i;
$e = $e + 1;
}
$i = $i+1;
}
echo "Finished";
if ($e -ne 0){
echo $e "errors found";
}else{
echo "No errors"
echo ""
}
}
echo "All files examined"
Another possibility:
$colCnt = "c:\datafeeds\filetoimport.txt"
$DataLine = (Get-Content $colCnt -TotalCount 2)[1]
$DelimCount = ([char[]]$DataLine -eq '|').count
$MatchString = '.*' + ('|.*' * $DelimCount )
$test = Select-String -Path $colCnt -Pattern $MatchString -NotMatch |
where { $_.linenumber -ne 1 }
That will find the number of delimiter characters in the second line, and build a regex pattern that can be used with Select-String.
The -NotMatch switch will make it return any lines that don't match that pattern as MatchInfo objects that will have the filename, line number and content of the problem lines.
Edit: Since the first line is "garbage" you probably don't care if it didn't match so I added a filter to the result to drop that out.

Remove Duplicate Group of Data in Text file

I have a text file formatted similar to the following:
Description1: Data-123<br>
Description2: Data-ABC<br>
Description3: Data-789<br>
Description4: Data-EFG<br>
Description5: Data-XYZ<br>
Description1: Data-123<br>
Description2: Data-ABC<br>
Description3: Data-789<br>
Description4: Data-EFG<br>
Description5: Data-XYZ<br>
Description1: Data-123<br>
Description2: Data-ABC<br>
Description3: Data-789<br>
Description4: Data-EFG<br>
Description5: Data-584<br>
I need PowerShell to compare each group (5 lines of data) as a whole and remove any duplicate groups, leaving only the unique groups of data. I can get it to remove single duplicate lines with the code below, but no luck comparing each group.
get-content TextFile.txt | sort-object | get-unique > NewTextFile.txt
Maybe this can work, you need to create the output file based on the result of last line of code, anyway I give no explanation because you don't show us any code you have so far.
$a = gc mylist.txt
$b = [string]::Empty
$c = #()
$a | % {if ( $_ -ne [string]::Empty )
{ $b += "$_`n" }
else
{ $c += $b
$b = [string]::Empty
}
}
$c += $b
$c | select -Unique | out-file .\mynew.txt
Split the file content on double new line characters (that should match the end of the line right before the empty line + the empty line right after it), split each object returned (remove the empty line) and then join it back, add new line and write the results to a new file.
(Get-Content TextFile.txt | Out-String) -split "`r`n`r`n" | ForEach-Object{
($_.Split("`r`n",[System.StringSplitOptions]::RemoveEmptyEntries) -join "`r`n") + "`n"
} | Select-Object -Unique | Out-File NewTextFile.txt