Using regex in a key/value lookup table in powershell? - powershell

I am creating the below script to search through and replace data in a set of files. The problem I'm running into is I need to ONLY match if it's the beginning of the line, and I'm not sure how/where would I use regex in the below example (e.g. ^A, ^B) when doing the comparison? I tried putting the caret in front of the name values in the table, but that didn't work...
$lookupTable = #{
'A'='1';
'B'='2'
#etc
}
Get-ChildItem 'c:\windows\system32\dns' -Filter *.dns |
Foreach-Object {
$file = $_
Write-Host "$file"
(Get-Content -Path $file -Raw) | ForEach-Object {
$line = $_
$lookupTable.GetEnumerator() | ForEach-Object {
$line = $line -replace $_.Name, $_.Value
}
$line
} | Set-Content -Path $file
}

The -replace operator accepts Regex. Just $line = $line -replace "^$($_.Name)", "$_.Value".

the way that regex works makes getting a proper "start of line" marker into the regex pattern along with the $VarName a tad iffy. so i broke it out into it's own line and used the -f string format operator to build the regex pattern.
then i used the way that -replace works on an array of strings that one usually gets from Get-Content to work on the whole array at each pass.
note that the strings have lower case items where they otta be replaced, and uppercase items where the item should NOT be replaced. [grin]
$LookUpTable = #{
A = 'Wizbang Shadooby'
Z = '666 is the number of the beast'
}
$LineList = #(
'a sdfq A er Z xcv'
'qwertyuiop A'
'z xcvbnm'
'z A xcvbnm'
'qwertyuiop Z'
)
$LookUpTable.GetEnumerator() |
ForEach-Object {
$Target = '^{0}' -f $_.Name
$LineList = $LineList -replace $Target, $_.Value
}
$LineList
output ...
Wizbang Shadooby sdfq A er Z xcv
qwertyuiop A
666 is the number of the beast xcvbnm
666 is the number of the beast A xcvbnm
qwertyuiop Z

# Here is a complete, working script that beginners can read.
# This thread
# Using regex in a key/value lookup table in powershell?
# https://stackoverflow.com/questions/57277282/using-regex-in-a-key-value-lookup-table-in-powershell
# User-modifiable variables.
# substitutions
# We need to specify what we're looking for (keys).
# We need to specify our substitutions (values).
# Example: Looking for A and substituting 1 in its place.
# Add as many pairs as you like.
# Here I use an array of objects instead of a Hashtable so that I can specify upper- and lowercase matches.
# Use the regular expression caret (^) to match the beginning of a line.
$substitutions = #(
[PSCustomObject]#{ Key = '^A'; Value = '1' },
[PSCustomObject]#{ Key = '^B'; Value = '2' },
[PSCustomObject]#{ Key = '^Sit'; Value = '[Replaced Text]' }, # Example for my Latin placeholder text.
[PSCustomObject]#{ Key = 'nihil'; Value = '[replaced text 2]' }, # Lowercase example.
[PSCustomObject]#{ Key = 'Nihil'; Value = '[Replaced Text 3]' } # Omit comma for the last array item.
)
# Folder where we are looking for files.
$inputFolder = 'C:\Users\Michael\PowerShell\Using regex in a key value lookup table in powershell\input'
# Here I've created some sample files using Latin placeholder text from
# https://lipsum.com/
# Folder where we are saving the modified files.
# This can be the same as the input folder.
# I'm creating this so we can test without corrupting the original files.
$outputFolder = 'C:\Users\Michael\PowerShell\Using regex in a key value lookup table in powershell\output'
#$outputFolder = $inputFolder
# We are only interested in files ending with .dns
$filterString = '*.dns'
# Here is an example for text files.
#$filterString = '*.txt'
# For all files.
#$filterString = '*.*'
# More info.
# https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/get-childitem?view=powershell-6#parameters
# Search on the page for -Filter
# You won't need to update any variables after this line.
# ===================================================================
# Generate a list of files to look at.
$fileList = Get-ChildItem $inputFolder -Filter $filterString
# Simple example.
# get-content .\apple.dns | % { $_ -replace "sit", "michael" } | set-content "C:\output\apple.dns"
# input file substitutions output
# Set up loops.
# For each file.
#{
# For each key-value pair.
#}
# "For each key-value pair."
# Create a function.
# Pipe in a string.
# Specify a list of substitutions.
# Make the substitutions.
# Output a modified string.
filter find_and_replace ([object[]] $substitutions)
{
# The automatic variable $_ will be a line from the file.
# This comes from the pipeline.
# Copy the input string.
# This avoids modifying a pipeline object.
$myString = $_
# Look at each key-value pair passed to the function.
# In practice, these are the ones we defined at the top of the script.
foreach ($pair in $substitutions)
{
# Modify the strings.
# Update the string after each search.
# case-sensitive -creplace instead of -replace
$myString = $myString -creplace $pair.Key, $pair.Value
}
# Output the final, modified string.
$myString
}
# "For each file."
# main
# Do something with each file.
foreach ($file in $fileList)
{
# Where are we saving the output?
$outputFile = Join-Path -Path $outputFolder -ChildPath $file.Name
# Create a pipeline.
# Pipe strings to our function.
# Let the function modify the strings.
# Save the output to the output folder.
# This mirrors our simple example but with dynamic files and substitutions.
# find_and_replace receives strings from the pipeline and we pass $substitutions into it.
Get-Content $file | find_and_replace $substitutions | Set-Content $outputFile
# The problem with piping files into a pipeline is that
# by the time the pipeline gets to Set-Content,
# we only have modified strings
# and we have no information to create the path for an output file.
# ex [System.IO.FileInfo[]] | [String[]] | [String] | Set-Content ?
#
# Instead, we're in a loop that preserves context.
# And we have the opportunity to create and use the variable $outputFile
# ex foreach ($file in [System.IO.FileInfo[]])
# ex $outputFile = ... $file ...
# ex [String[]] | [String] | Set-Content $outputFile
# Quote
# (Get-Content -Path $file -Raw)
# By omitting -Raw, we get: one string for each line.
# This is instead of getting: one string for the whole file.
# This keeps us from having to use
# the .NET regular expression multiline option (and the subexpression \r?$)
# while matching.
#
# What it is.
# Multiline Mode
# https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options#Multiline
#
# How you would get started.
# Miscellaneous Constructs in Regular Expressions
# https://learn.microsoft.com/en-us/dotnet/standard/base-types/miscellaneous-constructs-in-regular-expressions
}

Related

Scanning log file using ForEach-Object and replacing text is taking a very long time

I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size. 
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins. 
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile
Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()

Powershell Files fetch

Am looking for some help to create a PowerShell script.
I have a folder where I have lots of files, I need only those file that has below two content inside it:
must have any matching string pattern as same as in file file1 (the content of file 1 is -IND 23042528525 or INDE 573626236 or DSE3523623 it can be more strings like this)
also have date inside the file in between 03152022 and 03312022 in the format mmddyyyy.
file could be old so nothing to do with creation time.
then save the result in csv containing the path of the file which fulfill above to conditions.
Currently am using the below command that only gives me the file which fulfilling the 1 condition.
$table = Get-Content C:\Users\username\Downloads\ISIN.txt
Get-ChildItem `
-Path E:\data\PROD\server\InOut\Backup\*.txt `
-Recurse |
Select-String -Pattern ($table)|
Export-Csv C:\Users\username\Downloads\File_Name.csv -NoTypeInformation
To test if a file contains a certain keyword from a range of keywords, you can use regex for that. If you also want to find at least one valid date in format 'MMddyyyy' in that file, you need to do some extra work.
Try below:
# read the keywords from the file. Ensure special characters are escaped and join them with '|' (regex 'OR')
$keywords = (Get-Content -Path 'C:\Users\username\Downloads\ISIN.txt' | ForEach-Object {[regex]::Escape($_)}) -join '|'
# create a regex to capture the date pattern (8 consecutive digits)
$dateRegex = [regex]'\b(\d{8})\b' # \b means word boundary
# and a datetime variable to test if a found date is valid
$testDate = Get-Date
# set two variables to the start and end date of your range (dates only, times set to 00:00:00)
$rangeStart = (Get-Date).AddDays(1).Date # tomorrow
$rangeEnd = [DateTime]::new($rangeStart.Year, $rangeStart.Month, 1).AddMonths(1).AddDays(-1) # end of the month
# find all .txt files and loop through. Capture the output in variable $result
$result = Get-ChildItem -Path 'E:\data\PROD\server\InOut\Backup'-Filter '*.txt'-File -Recurse |
ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
# first check if any of the keywords can be found
if ($content -match $keywords) {
# now check if a valid date pattern 'MMddyyyy' can be found as well
$dateFound = $false
$match = $dateRegex.Match($content)
while ($match.Success -and !$dateFound) {
# we found a matching pattern. Test if this is a valid date and if so
# set the $dateFound flag to $true and exit the while loop
if ([datetime]::TryParseExact($match.Groups[1].Value,
'MMddyyyy',[CultureInfo]::InvariantCulture,
[System.Globalization.DateTimeStyles]::None,
[ref]$testDate)) {
# check if the found date is in the set range
# this tests INCLUDING the start and end dates
$dateFound = ($testDate -ge $rangeStart -and $testDate -le $rangeEnd)
}
$match = $match.NextMatch()
}
# finally, if we also successfully found a date pattern, output the file
if ($dateFound) { $_.FullName }
elseif ($content -match '\bUNKNOWN\b') {
# here you output again, because unknown was found instead of a valid date in range
$_.FullName
}
}
}
# result is now either empty or a list of file fullnames
$result | set-content -Path 'C:\Users\username\Downloads\MatchedFiles.txt'

Surrounding a string variable with quotes

I am writing an IIS log parser and having trouble wrapping a variable value in quotes while doing some string processing.
Here is a truncated log file, as an example:
#Fields: date time s-ip cs-method ...
2021-08-09 19:00:16.367 0.0.0.0 GET ...
2021-08-09 19:01:42.184 0.0.0.0 POST ...
Here is how I am executing the code below:
.\Analyse.ps1 cs-method -eq `'POST`'
If the line marked with #PROBLEM is executed as is, the output looks like this:
> .\Analyse.ps1 cs-method -eq `'POST`'
"""""G""E""T""""" ""-""e""q"" ""'""P""O""S""T""'""
"""""P""O""S""T""""" ""-""e""q"" ""'""P""O""S""T""'""
But if I replace $quoted with $value, so that the code reads like this:
$thisInstruction = $thisInstruction -replace $key , $value #PROBLEM
The output looks like this:
> .\Analyse.ps1 cs-method -eq `'POST`'
GET -eq 'POST'
POST -eq 'POST'
The problem is that I want the first value on each line of the output (the GET and the POST before the -eq) to be wrapped in quotes.
How can I achieve this?
Here is my code:
# compile cli args into single line instruction
$instruction = $args -join " "
# define key array
$keys = #('date','time','s-ip','cs-method','cs(Host)','cs-uri-stem','cs-uri-query','s-computername','s-port','cs-username','c-ip','s-sitename','cs(User-Agent)','cs(Referer)','sc-status','sc-substatus','sc-win32-status','TimeTakenMS','x-forwarded-for')
# <#
# get current execution folder
$currentFolder = Get-Location
# define string splitter regex https://www.reddit.com/r/PowerShell/comments/2h5elx/split_string_by_spaces_unless_in_quotes/ckpkydh?utm_source=share&utm_medium=web2x&context=3
$splitter = ' +(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)'
# process *.log files in folder
Get-Childitem -Path $currentFolder *.log1 | ForEach-Object {
# process each line in the file
Get-Content $_.Name | ForEach-Object {
# duplicate instruction
$thisInstruction = $instruction
# exclude comment lines
if (!$_.StartsWith('#')) {
# split line into array
$logEntryArr = $_ -Split $splitter
# populate dictionary with contents of array
For ($i=0; $i -le $keys.length; $i++) {
# get key
$key = $keys[$i]
# get value
$value = $logEntryArr[$i]
$quoted = "`""+$value+"`""
# replace mention of key in instruction with dictionary reference
$thisInstruction = $thisInstruction -replace $key , $quoted #PROBLEM
}
# process rule from command line against dictionary
echo $thisInstruction
}
}
}
#>
I do know why, thanks to #mathias-r-jessen commenting
I don't know why, but altering the For loop to iterate one fewer fixed the problem and does not appear to leave out any keys. The only significant change is this:
For ($i=0; $i -le $keys.length-1; $i++) {
This PowerShell script can be used to query a folder of log files echo out matching rows, eg:
.\Analyse.ps1 cs-method -eq 'GET'
The above would print out all log entries with a cs-method value of GET.
Here's the code:
# compile cli args into single line instruction
$instruction = $args -join " "
# define key array
$keys = #('date','time','s-ip','cs-method','cs(Host)','cs-uri-stem','cs-uri-query','s-computername','s-port','cs-username','c-ip','s-sitename','cs(User-Agent)','cs(Referer)','sc-status','sc-substatus','sc-win32-status','TimeTakenMS','x-forwarded-for')
# get current execution folder
$currentFolder = Get-Location
# define string splitter regex https://www.reddit.com/r/PowerShell/comments/2h5elx/split_string_by_spaces_unless_in_quotes/ckpkydh?utm_source=share&utm_medium=web2x&context=3
$splitter = ' +(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)'
# process *.log files in folder
Get-Childitem -Path $currentFolder *.log | ForEach-Object {
# process each line in the file
Get-Content $_.Name | ForEach-Object {
# duplicate instruction
$thisInstruction = $instruction
# exclude comment lines
if (!$_.StartsWith('#')) {
# split line into array
$logEntryArr = $_ -Split $splitter
# populate dictionary with contents of array
For ($i=0; $i -lt $keys.length; $i++) {
# get key
$key = $keys[$i]
# get value
$quoted = "`'"+$logEntryArr[$i]+"`'"
# replace mention of key in instruction with dictionary reference
$thisInstruction = $thisInstruction -replace $key , $quoted
}
# process rule from command line against dictionary
$answer = Invoke-Expression $thisInstruction
if ($answer) {
echo $_
}
}
}
}

Search and replace a string in PowerShell

I need to search and replace values in a file using the values from another file. For example, A.txt has a string with a value LICENSE_KEY_LOC=test_lic and B.txt contains the string LICENSE_KEY_LOC= or some value in it. Now I need to replace the complete string in B.txt with the value from A.txt. I tried the following but for some reason it does not work.
$filename = "C:\temp\A.txt"
Get-Content $filename | ForEach-Object {
$val = $_
$var = $_.Split("=")[0]
$var1 = Write-Host $var'='
$_ -replace "$var1", "$val"
} | Set-Content C:\temp\B.txt
You may use the following, which assumes LICENSE_KEY_LOC=string is on a line by itself in the file and only exists once:
$filename = Get-Content "c:\temp\A.txt"
$replace = ($filename | Select-String -pattern "(?<=^LICENSE_KEY_LOC=).*$").matches.value
(Get-Content B.txt) -replace "(?<=^LICENSE_KEY_LOC=).*$","$replace" | Set-Content "c:\temp\B.txt"
For updating multiple single keys/fields in a file, you can use an array and loop through each element by updating the $Keys array:
$filename = Get-Content "c:\temp\A.txt"
$Keys = #("LICENSE_KEY_LOC","DB_UName","DB_PASSWD")
ForEach ($Key in $Keys) {
$replace = ($filename | Select-String -pattern "(?<=^$Key=).*$").matches.value
(Get-Content "c:\temp\B.txt") -replace "(?<=^$Key=).*$","$replace" | Set-Content "c:\temp\B.txt"
}
You can put this into a function as well to make it more modular:
Function Update-Fields {
Param(
[Parameter(Mandatory=$true)]
[Alias("S")]
[ValidateScript({Test-Path $_})]
[string]$SourcePath,
[Parameter(Mandatory=$true)]
[Alias("D")]
[ValidateScript({Test-Path $_})]
[string]$DestinationPath,
[Parameter(Mandatory=$true)]
[string[]]$Fields
)
$filename = Get-Content $SourcePath
ForEach ($Key in $Fields) {
$replace = ($filename | Select-String -pattern "(?<=^$Key=).*$").matches.value
(Get-Content $DestinationPath) -replace "(?<=^$Key=).*$","$replace" | Set-Content $DestinationPath
}
}
Update-Fields -S c:\temp\a.txt -D c:\temp\b.txt -Fields "LICENSE_KEY_LOC","DB_UName","DB_PASSWD"
Explanation - Variables and Regex:
$replace contains the result of a string selection that matches a regex pattern. This is a case-insensitive match, but you can make it case-sensitive using -CaseSensitive parameter in the Select-String command.
(?<=^LICENSE_KEY_LOC=): Performs a positive lookbehind regex (non-capturing) of the string LICENSE_KEY_LOC= at the beginning of a line.
(?<=) is a positive lookbehind mechanism of regex
^ marks the beginning of the string on each line
LICENSE_KEY_LOC= is a string literal of the text
.*$: Matches all characters except newline and carriage return until the end of the string on each line
.* matches zero or more characters except newline and carriage return because we did not specify single line mode.
$ marks the end of the string on each line
-replace "(?<=^LICENSE_KEY_LOC=).*$","$replace" is the replace operator that does a regex match (first set of double quotes) and replaces the contents of that match with other strings or part of the regex capture (second set of double quotes).
"$replace" becomes the value of the $replace variable since we used double quotes. If we had used single quotes around the variable, then the replacement string would be literally $replace.
Get-Content "c:\temp\A.txt" gets the contents of the file A.txt. It reads each line as a [string] and stores each line in an [array] object.
Explanation - Function:
Parameters
$SourcePath represents the path to the source file that you want to read. I added alias S so that -S switch could be used when running the command. It validates that the path exists ({Test-Path $_}) before executing any changes to the files.
$DestinationPath represents the path to the source file that you want to read. I added alias D so that -D switch could be used when running the command. It validates that the path exists ({Test-Path $_}) before executing any changes to the files.
$Fields is a string array. You can input a single string or multiple strings in an array format (#("string1","string2") or "string1","string2"). You can create a variable that contains the string array and then just use the variable as the parameter value like -Fields $MyArray.

PowerShell read text file line by line and find missing file in folders

I am a novice looking for some assistance. I have a text file containing two columns of data. One column is the Vendor and one is the Invoice.
I need to scan that text file, line by line, and see if there is a match on Vendor and Invoice in a path. In the path, $Location, the first wildcard is the Vendor number and the second wildcard is the Invoice
I want the non-matches output to a text file.
$Location = "I:\\Vendors\*\Invoices\*"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
foreach ($line in Get-Content $txt) {
if (-not($line -match $location)){$line}
}
set-content $Output -value $Line
Sample Data from txt or csv file.
kvendnum wapinvoice
000953 90269211
000953 90238674
001072 11012016
002317 448668
002419 06123711
002419 06137343
002419 06134382
002419 759208
002419 753087
002419 753069
002419 762614
003138 N6009348
003138 N6009552
003138 N6009569
003138 N6009612
003182 770016
003182 768995
003182 06133429
In above data the only match is on the second line: 000953 90238674
and the 6th line: 002419 06137343
Untested, but here's how I'd approach it:
$Location = "I:\\Vendors\\.+\\Invoices\\.+"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output ="I:\\Vendors\Missing\Missing.txt"
select-string -path $txt -pattern $Location -notMatch |
set-content $Output
There's no need to pick through the file line-by-line; PowerShell can do this for you using select-string. The -notMatch parameter simply inverts the search and sends through any lines that don't match the pattern.
select-string sends out a stream of matchinfo objects that contain the lines that met the search conditions. These objects actually contain far more information that just the matching line, but fortunately PowerShell is smart enough to know how to send the relevant item through to set-content.
Regular expressions can be tricky to get right, but are worth getting your head around if you're going to do tasks like this.
EDIT
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
% {
# extract fields from the line
$lineItems = $_ -split " "
# construct path based on fields from the line
$testPath = $Location -f $lineItems[0], $lineItems[1]
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
I guess we will have to pick through the file line-by-line after all. If there is a more idiomatic way to do this, it eludes me.
Code above assumes a consistent format in the input file, and uses -split to break the line into an array.
EDIT - version 3
$Location = "I:\Vendors\{0}\Invoices\{1}.pdf"
$txt = "C:\\Users\sbagford.RECOEQUIP\Desktop\AP.txt"
$Output = "I:\Vendors\Missing\Missing.txt"
get-content -path $txt |
select-string "(\S+)\s+(\S+)" |
%{
# pull vendor and invoice numbers from matchinfo
$vendor = $_.matches[0].groups[1]
$invoice = $_.matches[0].groups[2]
# construct path
$testPath = $Location -f $vendor, $invoice
# for debugging purposes
write-host ( "Line:'{0}' Path:'{1}'" -f $_.line, $testPath )
# test for existence of the path; ignore errors
if ( -not ( get-item -path $testPath -ErrorAction SilentlyContinue ) ) {
# path does not exist, so write the line to pipeline
write-output $_
}
} |
Set-Content -Path $Output
It seemed that the -split " " behaved differently in a running script to how it behaves on the command line. Weird. Anyway, this version uses a regular expression to parse the input line. I tested it against the example data in the original post and it seemed to work.
The regex is broken down as follows
( Start the first matching group
\S+ Greedily match one or more non-white-space characters
) End the first matching group
\s+ Greedily match one or more white-space characters
( Start the second matching group
\S+ Greedily match one or more non-white-space characters
) End the second matching groups