Powershell skip first 2 lines of txt file when importing it - powershell

I have a powershell script designed to read a txt file on a remote server and import it into SQL.
I want to be able to skip the first 2 lines of the txt file. I am currently using the code below to import the file. The txt file is delimited
$datatable = new-object System.Data.DataTable
$reader = New-Object System.IO.StreamReader($empFile)
$columns = (Get-Content $empfile -First 1).Split($empFileDelimiter)
if ($FirstRowColumnNames -eq $true)
{
$null = $reader.readLine()
}
foreach ($column in $columns)
{
$null = $datatable.Columns.Add()
}
# Read in the data, line by line, not column by column
while (($line = $reader.ReadLine()) -ne $null)
{
$null = $datatable.Rows.Add($line.Split($empFiledelimiter))
The column parameter takes the first line of the txt file and creates the columns for the PS datatable.
The problem I have is the first two lines of the txt file are not needed and I need to skip them and use the third line of the txt file for the columns. I have the following line of code which will do this but I am uncertain how to integrate it into my code.
get-content $empFile | select-object -skip 2

Create an array for the $empfile without the first two lines, then use the first item of the array for the Columns, like this:
$Content = Get-Content $empFile | Select-Object -Skip 2
$columns = $Content[0].Split($empFileDelimiter)

just a quick one liner
(Get-Content $empFile| Select-Object -Skip 2) | Set-Content $empFile

Put in two unused calls to ReadLine(). Something like this:
$datatable = new-object System.Data.DataTable
$reader = New-Object System.IO.StreamReader($empFile)
$reader.ReadLine()
$reader.ReadLine()
$columns = ($reader.ReadLine()).Split($empFileDelimiter)
...

Related

How can I copy a column value from excel to txt file

I'm new to Power shell. I have a number of excel files (500+) having a column Animal Count that I would like to save in a new '.txt' file. Can any one give me tips to achieve this.
Looking at the image you provided, The count value is not in a column called 'Animal count', but in the column next to a label with that text.
As for the type of output, I would recommend not to use a .txt file, but output the found info as CSV file to be abe to keep the file names and the animal count values in a structured way.
Try:
$Source = 'D:\Test' # the path to where the Excel files are
# create an Excel COM object
$excel = New-Object -ComObject Excel.Application
# find Excel files in the Source path and loop through.
# you may want to add the -Recurse switch here if the code should also look inside subfolders
$result = Get-ChildItem -Path $Source -Filter '*.xlsx' -File | ForEach-Object {
$workBook = $excel.Workbooks.Open($_.FullName)
$workSheet = $Workbook.Sheets.Item(1)
$count = 0
$label = $WorkSheet.Cells.Find('*Animal count*')
if ($label) {
# get the numeric value for the cell next to the label
# empty cells will translate to 0
$count = [int]$workSheet.Cells.Item($label.Row, $label.Column + 1).Value()
}
# output a PSObject with the full filename and the animal count value
[PsCustomObject] #{
'File' = $_.FullName
'AnimalCount' = $count
}
$workBook.Close()
}
# quit Excel and clean up the used COM objects
$excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workSheet) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workBook) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
# output on screen
$result | Format-Table -AutoSize
#output to CSV file
$result | Export-Csv -Path 'D:\Test\AnimalCount.csv' -UseCulture -NoTypeInformation
The result on screen wil look something like this:
File AnimalCount
---- -----------
D:\Test\File1.xlsx 165
D:\Test\File2.xlsx 0
D:\Test\File3.xlsx 87596
Edit
Since you've commented the labels are in Merged cells, you need to use this to find the value for Animal count:
$label = $workSheet.Range('$A:$B').Find('*Animal count*')
if ($label) {
# get the numeric value for the cell next to the label
# empty cells will translate to 0
$count = [int]$workSheet.Cells.Item($label.Row, $label.Column + 2).Value()
}
That is assuming there are two cells merged into one.
P.S. If the animal count value can ever exceed 2147483647, cast to [int64] instead of [int]
You could use Import-Csv to turn the excel file into a PS object, and the columns would be the new object's properties.
$excel = Import-Csv $excelPath
$excel.Animals | out-file $txtPath
Try this this is for 1 file save to txt same name.
this can you done for more file's with foreach options
$FileName = "C:\temp\test.xlsx"
$Excel = New-Object -ComObject Excel.Application
$Excel.visible = $false
$Excel.DisplayAlerts = $false
$WorkBook = $Excel.Workbooks.Open($FileName)
$NewFilePath = [System.IO.Path]::ChangeExtension($FileName,".txt")
$Workbook.SaveAs($NewFilepath, 42) # xlUnicodeText
# cleanup
$Excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($WorkBook) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

Stuck with this PS Script

I have a text file that contains millions of records
I want to find out from each line that does not start with string + that line number (String starts with double quote 01/01/2019)
Can you help me modify this code?
Get-Content "(path).txt" | Foreach { if ($_.Split(',')[-1] -inotmatch "^01/01/2019") { $_; } }
Thanks
Based on your comments the content will look something like the array.
So you want to read the content, filter it, and get the resulting line from that content:
# Get the content
# $content = Get-Content -Path 'pathtofile.txt'
$content = #('field1,field2,field3', '01/01/2019,b,c')
# Convert from csv
$csvContent = $content | ConvertFrom-Csv
# Add your filter based on the field
$results = $csvContent | Where-Object { $_.field1 -notmatch '01/01/2019'} | % { $_ }
# Convert your results back to csv if needed
$results | ConvertTo-Csv
If performance is an issue then .net would handle millions of records with CsvHelper just like PowerBi.
# install CsvHelper
nuget install CsvHelper
# import csvhelper
import-module CsvHelper.2.16.3.0\lib\net45\CsvHelper.dll
# write the content to the file just for this example
#('field1,field2,field3', '01/01/2019,b,c') | sc -path "c:\temp\text.csv"
$results = #()
# open the file for reading
try {
$stream = [System.IO.File]::OpenRead("c:\temp\text.csv")
$sr = [System.IO.StreamReader]::new($stream)
$csv = [CsvHelper.CsvReader]::new($sr)
# read in the records
while($csv.Read()){
# add in the result
$result= #{}
[string] $value = "";
for($i = 0; $csv.TryGetField($i, [ref] $value ); $i++) {
$result.Add($i, $value);
}
# add your filter here for the results
$results.Add($result)
}
# dispose of everything once we are done
}finally {
$stream.Dispose();
$sr.Dispose();
$csv.Dispose();
}
My .txt file looks like this...
date,col2,col3
"01/01/2019 22:42:00", "column2", "column3"
"01/02/2019 22:42:00", "column2", "column3"
"01/01/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
This command does exactly what you are asking...
Get-Content -Path C:\myFile.txt | ? {$_ -notmatch "01/01/2019"} | Select -Skip 1
The output is:
"01/02/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
I skipped the top row. If you want to deal with particular columns, change myFile.txt to a .csv and import it.
Looking at the question and comments, you are dealing with a headerless CSV file it seems. Because the file contains millions of records, I think using Get-Content or Import-Csv could slow down too much. Using [System.IO.File]::ReadLines() would then be faster.
If indeed each line starts with a quoted date, you could use various methods of figuring out if the line start with "01/01/2019 or not. Here, I use the -notlike operator:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
foreach ($line in [System.IO.File]::ReadLines($fileIn)) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
Add-Content -Path $fileOut -Value $line
}
}
Update
Judging from your comment, you are apparently using an older .NET framework, as the [System.IO.File]::ReadLines() became available as of version 4.0.
In that case, the below code should work for you:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
$reader = New-Object System.IO.StreamReader($fileIn)
$writer = New-Object System.IO.StreamWriter($fileOut)
while (($line = $reader.ReadLine()) -ne $null) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
$writer.WriteLine($line)
}
}
$reader.Dispose()
$writer.Dispose()

Powershell Mass Rename files with a excel reference list

I need help with PowerShell.
I will have to start renaming files in a weekly basis which I will be renaming more than 100 a week or more each with a dynamic name.
The files I want to rename are in a folder name Scans located in the "C: Documents\Scans". And they would be in order, to say time scanned.
I have an excel file located in "C: Documents\Mapping\ New File Name.xlsx.
The workbook has only one sheet and the new names would be in column A with x rows. Like mention above each cell will have different variables.
P Lease make comments on your suggestions so that I may understand what is going on since I'm a new to coding.
Thank you all for your time and help.
Although I agree with Ad Kasenally that it would be easier to use CSV files, here's something that may work for you.
$excelFile = 'C:\Documents\Mapping\New File Name.xlsx'
$scansFolder = 'C:\Documents\Scans'
########################################################
# step 1: get the new filenames from the first column in
# the Excel spreadsheet into an array '$newNames'
########################################################
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $false
$workbook = $excel.Workbooks.Open($excelFile)
$worksheet = $workbook.Worksheets.Item(1)
$newNames = #()
$i = 1
while ($worksheet.Cells.Item($i, 1).Value() -ne $null) {
$newNames += $worksheet.Cells.Item($i, 1).Value()
$i++
}
$excel.Quit
# IMPORTANT: clean-up used Com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($worksheet) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
########################################################
# step 2: rename the 'scan' files
########################################################
$maxItems = $newNames.Count
if ($maxItems) {
$i = 0
Get-ChildItem -Path $scansFolder -File -Filter 'scan*' | # get a list of FileInfo objects in the folder
Sort-Object { [int]($_.BaseName -replace '\D+', '') } | # sort by the numeric part of the filename
Select-Object -First ($maxItems) | # select no more that there are items in the $newNames array
ForEach-Object {
try {
Rename-Item -Path $_.FullName -NewName $newNames[$i] -ErrorAction Stop
Write-Host "File '$($_.Name)' renamed to '$($newNames[$i])'"
$i++
}
catch {
throw
}
}
}
else {
Write-Warning "Could not get any new filenames from the $excelFile file.."
}
You may want to have 2 columns in the excel file:
original file name
target file name
From there you can save the file as a csv.
Use Import-Csv to pull the data into Powershell and a ForEach loop to cycle through each row with a command like move $item.original $item.target.
There are abundant threads describing using import-csv with forEach.
Good luck.

How can I count the number of CSV columns when the file has multiline data and no header

My CSV files have no headers and multi line entries like this:
11;"multi line
col12";13;foobar;foobar
21;22;23;24;25
And I'd like to count the number of columns. So 5 in this example. How do I do that?
What I tried:
Import-CSV doesn't work without the header parameter due to duplicate entries on the first line.
(Import-Csv .\bad.csv -Delimiter ";" | get-member -type NoteProperty).count
Adding a header parameter skews the count.
(Import-Csv .\bad.csv -Delimiter ";" -Header (1..99) | get-member -type NoteProperty).count
I had to abort reading the file manually via Get-Content because of all the parsing I would have to handle manually. Escaping characters and multi line entries...
My version of PowerShell is 3 and I have to port my script to version 2 later on.
If you are willing to accept the caveat that this could miscount the number of columns if there are quoted delimiters in string this could be good enough for you.
$path = "c:\temp\test.txt"
$delimiter = ";"
$numberOfColumns = Get-Content $path |
ForEach-Object{($_.split($delimiter)).Count} |
Measure-Object -Maximum |
Select-Object -ExpandProperty Maximum
Import-Csv $path -Header (1..$numberOfColumns) -Delimiter $delimiter
Read in the file with Get-Content and isolate the maximum number of columns by
splitting each line on its delimiter and then using that value to import the CSV. If the file is large you can read in the file once with Get-Content and then use ConvertTo-CSV once you know your column count.
If all lines contain a line break on them the above logic would fail. Still we could temporarily scrub the data by removing the correct line breaks in order to get the accurate count.
$delimiter = ";"
$fileData = (Get-Content $path | Out-String)
$numberOfColumns = ((($fileData -replace "(`"[^;]+?)`r`n",'$1') -split "`r`n" | Select -First 1).split($delimiter)).Count
$fileData | ConvertFrom-Csv -Header (1..$numberOfColumns) -Delimiter $delimiter
What this will do is find lines that end where there is a double quote followed by data that does not contain the delimiter. We also match the newline that follows but drop that same new line in the replacement. If that is done we know that the first line is proper. Use that same line to split and count just like before.
Since Excel knows, let's ask him :
$path = "path\to\bad.csv"
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($path)
$sheet = $workbook.ActiveSheet
$columnIndex = 1
while($sheet.Cells.Item(1, $columnIndex).Text -ne "") {
$columnIndex++
}
"There are $($columnIndex - 1) columns in CSV file $path"
Start-Sleep -Seconds 1
Get-Process excel | Stop-Process -Force
As pointed out by Ansgar Wiechers in comments, there is a much shorter solution :
$path = "path\to\bad.csv"
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($path)
$sheet = $workbook.ActiveSheet
$columnCount = $sheet.UsedRange.Columns.Count
"There are $columnCount columns in CSV file $path"
Start-Sleep -Seconds 1
Get-Process excel | Stop-Process -Force
(I know my way of killing Excel is dirty, but iirc it takes too much code to do so)
I know this is very old, but I came across a similar situation (did not have have rows of varying columns) today and found my own solution so I thought I would share for anyone else coming into this situation. My solution was to use Get-Content for the first row of the CSV and -split on the delimiter (,) to create an array and then return the count of the array. As mentioned in replies above, this will not account for delimiters existing within quotations.
((Get-Content $PathToCsv)[0] -split ",").count
I had the same issue and went with AAgent suggestion.
$CommaCount = ((Get-Content $PathToCsv)[0] -split ",").count
$SemicolonCount = ((Get-Content $PathToCsv)[0] -split ";").count
if ($CommaCount -gt $SemicolonCount){
$CMSlist = Import-Csv ($PathToCsv) –Delimiter “,”
}
else{
$CMSlist = Import-Csv ($PathToCsv) –Delimiter “;”

Powershell .csv merge with column remove

Using the code below I am able to merge several .csv files in 5 seconds.
$getFirstLine = $true
get-childItem "C:\my\dir\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "C:\my\dir\output_code2.csv" $linesToWrite
}
I would like to take this one step further, preferable using piping to remove several of the columns using a command like:
select DateAndTime,DG1_KW,DG2_KW,WT_KW,HTR1_KW,POSS_Load_KW,INV1_KW,INV2_SOC|Export-csv output_test.csv -Notypeinformation
that being the variables in the header of each file.
How would I modify this code to make this work? The idea here is that I am going to be working with hundreds up to thousands of files.
I have other code which can do this but it is no where near as fast.
for instance using 10 .csv files that are 450kb each. the code below takes 20 seconds to process and spits out a .csv file in 20 seconds removing 48 of the 56 columns leaving the variables I need. If I remove part of the code that trims the columns it still takes 12+ seconds.
# Directory containing csv files, include *.*
$directory = "C:\my\dir\*.*";
# Get the csv files
$csvFiles = Get-ChildItem -Path $directory -Filter *.csv;
#$content = $null;
$content = #();
# Process each file
foreach($csv in $csvFiles)
{
$content += Import-Csv $csv;
}
# Write a datetime stamped csv file
$datetime = Get-Date -Format "yyyyMMddhhmmss";
$content |Export-Csv -Path "C:\my\dir\output_code2_$datetime.csv" -NoTypeInformation;
The code I would like to modify runs those same 10 files in 5 seconds but does not remove the 48 columns.
Any Ideas guys?
Ok, you want an example... Let's say your CSVs always look like this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10
data1,data2,data3,data4,data5,data6,data7,data8,data9,data10
dataA,dataB,dataC,dataD,dataE,dataF,dataG,dataH,dataI,dataJ
Now let's say you only want Col1, Col2, Col6, Col9, and Col10. You could do a RegEx replace something like:
$Files = get-childItem "C:\my\dir\*.csv" | Select -Expand FullName
ForEach($File in $Files){
If($SkipFirst){
Get-Content $File | Select -Skip 1 | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}Else{
Get-Content $File | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}
}
That would extract just the columns that I noted above. See https://regex101.com/r/jY4oO6/1 for detailed breakdown of RegEx string. Effective output would be (skipping first line if so dictated):
Col1,Col2,Col6,Col9,Col10
data1,data2,data6,data9,data10
dataA,dataB,dataF,dataI,dataJ