Edit column in Tab-delimited Text file using Powershell - powershell

I have a very large (~250k row and 171 Column) Tab delimited text file that I need to edit. I need to add the letter "H" to the third column on every row.
So I need it to go from 03/20/2020 09:00 03/20/2020 10:00 1269805 ......
to 03/20/2020 09:00 03/20/2020 10:00 H1269805 .....
I actually have this working with the following code:
$source = Get-ChildItem "C:\test\input\*.txt"
$target = "C:\test\test.txt"
$data = Get-Content -Path $source | ConvertFrom-Csv -Delimiter "`t" -Header Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120,
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171
$data | % {
If ($_.Column3) {
#import ID
$_.Column3 = "H$($_.Column3)"
} }
$data | Select Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120,
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171 | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | % { $_ -replace '"', "" } | Select-Object -Skip 1 | Set-Content -Path $target
The problem I have is it takes a long time. I understand it is a large file, but is there any other way to do this faster? I feel like the converting to and from CSV is what is taking the longest, but I may be wrong. The whole process takes roughly 25 minutes to complete. Any help would be great.

To speed up processing, avoid the pipeline, use .NET types for file I/O and use plain-text operations:
# Create the output file.
$outFile = [IO.File]::CreateText($target)
# Loop over all input files
foreach ($file in Get-ChildItem C:\test\input\*.txt) {
# Loop over a given file's lines.
foreach ($line in [IO.File]::ReadLines($file.FullName)) {
# Prepend 'H' to the 3rd column and append to the output file.
$outFile.WriteLine(($line -replace '^.*?\t.*?\t', '$&H'))
}
}
$outFile.Close()
Note:
Be sure to always pass full file paths to .NET methods, because .NET's working directory usually differs from PowerShell's.
.NET file I/O methods default to BOM-less UTF-8 encoding.
The H is inserted in front of the 3rd tab-separated column using PowerShell's regex-based -replace operator.

Related

Powershell script to get the metadata field "writing application"

I am using a modified version of the GetMetaData script originally written by Ed Wilson at Microsoft (https://devblogs.microsoft.com/scripting/hey-scripting-guy-how-can-i-find-files-metadata/) and then modified by user wOxxOm here https://stackoverflow.com/a/42933461/5061596 . I'm trying to analyze all my DVD and BluRay rips and see what tool was used to create them. Mainly I want to check which ones I compressed with Handbrake and which ones came directly from MakeMKV. The problem is I can't find this field.
If I use the "stock" scrip and change the number of properties it looks for from 0 - 266 up to 0 - 330 I find the extra file info like movie length, resolution, etc. But I can't find the tool used. For example here is what the MediaInfo Lite tool reports:
But looking through the meta data I get something like this with no "Writing application" property:
Name : Ad Astra (2019).mkv
Size : 44.1 GB
Title : Ad Astra
Length : 02:03:02
Frame height : 2160
Frame rate : ‎23.98 frames/second
Frame width : 3840
Total bitrate : ‎51415kbps
Audio tracks : TrueHD S24 7.1 [Eng]
Contains chapters : Yes
Subtitle tracks : PGS [Eng], PGS [Eng]
Video tracks : HEVC (H265 Main 10 #L5.1)
How do I go about finding that property or is it not something that I can pull through PowerShell?
Edit: The info I'm looking for IS in Windows Explorer looking at the properties of the file and the details tab so if Explorer can see it I would think I should be able to:
edit: actually, this seems more reliable. So far any file that mediainfo can read, this also works with.
$FILE = "C:\test.mkv"
$content = (Get-Content -Path $FILE -First 100) + (Get-Content -Path $FILE -Tail 100)
if(($content -match '\*data')[0] -match '\*data\W*([\w\n\s\.]*)'){
write-host "Writing Application:" $Matches[1]
exit
}elseif(($content -match 'M€.*WA(.*)s¤')[0] -match 'M€.*WA(.*)s¤'){
write-host "Writing Application:" $Matches[1]
}
It looks like the last bytes in the file after *data that specify the writer, so try this:
(Get-Content -Path "c:\video.mkv" -Tail 1) -match '\*data\W*(.*)$' | out-null
write-host "Writing Application:" $Matches[1]
On my test file that resulted in "HandBrake 1.5.1 2022011000"
I'm not sure what standard specifies this sorry. There's also a host of useful info on the first line of data in the file as well, e.g:
ftypmp42 mp42iso2avc1mp41 free6dÊmdat ôÿÿðÜEé½æÙH·–,Ø Ù#îïx264 - core 164 r3065 ae03d92 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=1 deblock=1:0:0 analyse=0x1:0x111 me=hex subme=2 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=0 8x8dct=0 cqm=0 deadz
one=21,11 fast_pskip=1 chroma_qp_offset=0 threads=18 lookahead_threads=5 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=1 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=10 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin
=0 qpmax=69 qpstep=4 vbv_maxrate=14000 vbv_bufsize=14000 crf_max=0.0 nal_hrd=none filler=0 ip_ratio=1.40 aq=1:1.00
I couldn't replicate your success viewing the info with Windows Explorer, the field is invisible for me even though I can view it with MediaInfo etc

Converting Output to CSV and Out-Grid

I have a file as below.
I want it to convert it to CSV and want to have the out grid view of it for items Drives,Drive Type,Total Space, Current allocation and Remaining space only.
PS C:\> echo $fileSys
Storage system address: 127.0.0.1
Storage system port: 443
HTTPS connection
1: Name = Extreme Performance
Drives = 46 x 3.8T SAS Flash 4
Drive type = SAS Flash
RAID level = 5
Stripe length = 13
Total space = 149464056594432 (135.9T)
Current allocation = 108824270733312 (98.9T)
Remaining space = 40639785861120 (36.9T)
I am new to Powershell but I have tried below code for two of things but it's not even getting me desired output.
$filesys | ForEach-Object {
if ($_ -match '^.+?(?<Total space>[0-9A-F]{4}\.[0-9A-F]{4}\.[0-9A-F]{4}).+?(?<Current allocation>\d+)$') {
[PsCustomObject]#{
'Total space' = $matches['Total space']
'Current allocation' = $matches['Current allocation']
}
}
}
First and foremost, the named capture groups cannot contain spaces.
From the documentation
Named Matched Subexpressions
where name is a valid group name, and subexpression is any valid
regular expression pattern. name must not contain any punctuation
characters and cannot begin with a number.
Assuming this is a single string since your pattern attempts to grab info from multiple lines, you can forego the loop. However, even with that corrected, your pattern does not appear to match the data. It's not clear to me what you are trying to match or your desired output. Hopefully this will get you on the right track.
$filesys = #'
Storage system address: 127.0.0.1
Storage system port: 443
HTTPS connection
1: Name = Extreme Performance
Drives = 46 x 3.8T SAS Flash 4
Drive type = SAS Flash
RAID level = 5
Stripe length = 13
Total space = 149464056594432 (135.9T)
Current allocation = 108824270733312 (98.9T)
Remaining space = 40639785861120 (36.9T)
'#
if($filesys -match '(?s).+total space\s+=\s(?<totalspace>.+?)(?=\r?\n).+allocation\s+=\s(?<currentallocation>.+?)(?=\r?\n)')
{
[PsCustomObject]#{
'Total space' = $matches['totalspace']
'Current allocation' = $matches['currentallocation']
}
}
Total space Current allocation
----------- ------------------
149464056594432 (135.9T) 108824270733312 (98.9T)
Edit
If you just want the values in the parenthesis, modifying to this will achieve it.
if($filesys -match '(?s).+total space.+\((?<totalspace>.+?)(?=\)).+allocation.+\((?<currentallocation>.+?)(?=\))')
{
[PsCustomObject]#{
'Total space' = $matches['totalspace']
'Current allocation' = $matches['currentallocation']
}
}
Total space Current allocation
----------- ------------------
135.9T 36.9T
$unity=[Regex]::Matches($filesys, "\(([^)]*)\)") -replace '[(\)]','' -replace "T",""
$UnityCapacity = [pscustomobject][ordered] #{
Name = "$Display"
"Total" =$unity[0]
"Used" = $unity[1]
"Free" = $unity[2]
'Used %' = [math]::Round(($unity[1] / $unity[0])*100,2)
}``

How to add a new line after every integer

I am trying to figure out a way to make a new variable from another to output to a GUI. When I try to just display the variable through a lable it loses its line breaks.
I managed to figure out a solution when working with text but when it comes to numbers it does not work.
Here is what I have tried:
$ActiveUnits = #(Get-MsolAccountSku | Select-Object -ExpandProperty ActiveUnits)
$ActiveUnitsFix = "`n"
foreach ($Unit in $ActiveUnits) {
$ActiveUnitsFix += #($Unit + "`n")
}
The output that I am getting is this:
31425220100002521100001000000100000002137328420
When it should be something like this:
3
14
25
220
10000
25
21
10000
1000000
10000000
213
7
3
28
4
20
You could use the -join parameter for adding the new line if you receive an int array from (Get-MsolAccountSku).ActiveUnits.
[System.Int32[]]$ActiveUnits = (Get-MsolAccountSku).ActiveUnits
[System.String]$ActiveUnitsFix = $ActiveUnits -join [System.Environment]::NewLine
$ActiveUnitsFix

How would I test that a PowerShell function properly streams input from the pipeline?

I know how to write a function that streams input from the pipeline. I can reasonably tell by reading the source for a function if it will perform properly. However, is there any method for actually testing for the correct behavior?
I accept any definition of "testing"... be that some manual test that I can run or something more automated.
If you need an example, let's say I have a function that splits text into words.
PS> Get-Content ./warandpeace.txt | Split-Text
How would I check that it streams input from the pipeline and begins splitting immediately?
You can write a helper function, which would give you some indication as pipeline items passed to it and processed by next command:
function Print-Pipeline {
param($Name, [ConsoleColor]$Color)
begin {
$ColorParameter = if($PSBoundParameters.ContainsKey('Color')) {
#{ ForegroundColor = $Color }
} else {
#{ }
}
}
process {
Write-Host "${Name}|Before|$_" #ColorParameter
,$_
Write-Host "${Name}|After|$_" #ColorParameter
}
}
Suppose you have some functions to test:
$Text = 'Some', 'Random', 'Text'
function CharSplit1 { $Input | % GetEnumerator }
filter CharSplit2 { $Input | % GetEnumerator }
And you can test them like that:
PS> $Text |
>>> Print-Pipeline Before` CharSplit1 |
>>> CharSplit1 |
>>> Print-Pipeline After` CharSplit1
Before CharSplit1|Before|Some
Before CharSplit1|After|Some
Before CharSplit1|Before|Random
Before CharSplit1|After|Random
Before CharSplit1|Before|Text
Before CharSplit1|After|Text
After CharSplit1|Before|S
S
After CharSplit1|After|S
After CharSplit1|Before|o
o
After CharSplit1|After|o
After CharSplit1|Before|m
m
After CharSplit1|After|m
After CharSplit1|Before|e
e
After CharSplit1|After|e
After CharSplit1|Before|R
R
After CharSplit1|After|R
After CharSplit1|Before|a
a
After CharSplit1|After|a
After CharSplit1|Before|n
n
After CharSplit1|After|n
After CharSplit1|Before|d
d
After CharSplit1|After|d
After CharSplit1|Before|o
o
After CharSplit1|After|o
After CharSplit1|Before|m
m
After CharSplit1|After|m
After CharSplit1|Before|T
T
After CharSplit1|After|T
After CharSplit1|Before|e
e
After CharSplit1|After|e
After CharSplit1|Before|x
x
After CharSplit1|After|x
After CharSplit1|Before|t
t
After CharSplit1|After|t
PS> $Text |
>>> Print-Pipeline Before` CharSplit2 |
>>> CharSplit2 |
>>> Print-Pipeline After` CharSplit2
Before CharSplit2|Before|Some
After CharSplit2|Before|S
S
After CharSplit2|After|S
After CharSplit2|Before|o
o
After CharSplit2|After|o
After CharSplit2|Before|m
m
After CharSplit2|After|m
After CharSplit2|Before|e
e
After CharSplit2|After|e
Before CharSplit2|After|Some
Before CharSplit2|Before|Random
After CharSplit2|Before|R
R
After CharSplit2|After|R
After CharSplit2|Before|a
a
After CharSplit2|After|a
After CharSplit2|Before|n
n
After CharSplit2|After|n
After CharSplit2|Before|d
d
After CharSplit2|After|d
After CharSplit2|Before|o
o
After CharSplit2|After|o
After CharSplit2|Before|m
m
After CharSplit2|After|m
Before CharSplit2|After|Random
Before CharSplit2|Before|Text
After CharSplit2|Before|T
T
After CharSplit2|After|T
After CharSplit2|Before|e
e
After CharSplit2|After|e
After CharSplit2|Before|x
x
After CharSplit2|After|x
After CharSplit2|Before|t
t
After CharSplit2|After|t
Before CharSplit2|After|Text
Add some Write-Verbose statements to your Split-Text function, and then call it with the -Verbose parameter. You should see output in real-time.
Ah, I've got a very simple solution. The concept is to insert your own step into the pipeline with obvious side-effects before the function that you're testing. For example...
PS> 1..10 | %{ Write-Host $_; $_ } | function-under-test
If your function-under-test is "bad", you will see all of the output from 1..10 twice, like this
1
2
3
1
2
3
If the function-under-test is processing items lazily from the pipeline, you'll see the output interleaved.
1
1
2
2
3
3

Powershell remove timestamp from filename

if I have some files with the name pippo_yyyymmdd.txt, pluto_yyyymmdd.txt etc etc, how can I remove the timestamp and rename the files as pippo.txt, pluto.txt ?
Give this a try, it will remove any underscore followed by 8 digits. This is tha basicmidea, it doesn't take care of files end up having the same name:
Get-ChildItem *.txt |
Where {$_.Name -match '_\d{8}\.txt' } |
Rename-Item -NewName {$_.Name -replace '_\d{8}'}
I was facing the same problem but unfortunately, PowerShell did not help me to solve it. So I then resorted to using Python to solve this problem and it worked like a bomb. You only need to copy the python script into the affected folder and run the script to remove the timestamp from all the files in the folder and subfolders.
import os, sys, re
fileNameList =[]
patho = os.path.dirname(os.path.realpath(sys.argv[0]))
def renamer(fpath):
for path, subdirs, files in os.walk(fpath):
for name in files:
if re.findall(r"(?ix)(\.ini)", name, re.I) == None:
if re.search(r"(?ix)(\(\d{4}\_\d{2}_\d{2}\s\d{2}\_\d{2}\_\d{2}\sutc\))", name, re.I) != None:
old_name = os.path.join(path,name)
print(old_name)
new_name = re.sub(r"(?ix)(\s\(\d{4}\_\d{2}_\d{2}\s\d{2}\_\d{2}\_\d{2}\sutc\))", "", old_name, re.I)
print(new_name)
try:
os.replace(old_name,new_name)
except:
print(old_name)
fileNameList.append(old_name)
def log_errors_directories(fileNameList):
filename = "Log of error filenames.txt"
if len(fileNameList) != 0:
log = open(filename, "a")
log.write("###########################################################\n")
i = 1
for line in fileNameList:
nr_line = str(i) + " " + str(line)
log.write(str(nr_line))
log.write("\n")
i += 1
log.write("###########################################################\n\n")
log.close()
renamer(patho)
log_errors_directories(fileNameList)
This is my first time posting code online. I hope it works for you as it did for me. :)