I have a log file which has data separated with "|" symbol. Like
"Username|servername|access|password|group"
"Username|servername|access|password|group"
I need to validate the data. And, If the group column(record) is missing information or empty. I need to write only that row into another file. Please help me. Thanks in Advance.
If you're just checking for missing data, you can run a quick check using a regex of '(\S+\|){4}\S+'. Use Get-Content with the -ReadCount parameter, and you can work in batches of a few thousand records at a time, minimizing disk i/o and memory usage without going through them one record at a time.
Get-Content $inputfile -ReadCount 2000 |
foreach {
$_ -notmatch '(\S+\|){4}\S+' |
Add-Content $outputfile
}
You could use 'Import-CSV with -Delimiter '|'. If your file doesn't have a header line, you would also need to use -Header to define it. You could then use Where to filter for the empty Group lines and Export-CSV with -Delimiter again to create a new file of just those lines.
For example:
Import-CSV 'YourLog.log' -Delimiter '|' -Header 'Username','Servername','Access','Password','Group' |
Where {$_.'Group' -eq ''} |
Export-CSV 'EmptyGroupLines.log' -Delimiter '|'
If your group column is always in the same place, which it looks like it is, you could use the split method. You can certainly neaten the code up. I have used the below as an example as to how you could use split.
The foreach statement is to iterate through each line in your file.
if (!$($groupstring.Split('|')[4])) checks if it is null.
$groupstring = 'Username|servername|access|password|group'
$groupstring.Split('|')[4]
foreach ($item in $collection)
{
if (!$($groupstring.Split('|')[4]))
{
Write-Host "variable is null"
}
}
Hope this helps.
Thanks, Tim.
Related
Could you please help me with an issue described below?
I wrote a script in PS which tries to split large CSV file (30 000 rows / 6MB) into smaller ones. New files are named as a mix of 1st and 2nd column content. If file already exists, script only appends new lines.
Main CSV file example:
Site;OS.Type;Hostname;IP address
Amsterdam;Server;AMS_SRVDEV01;10.10.10.12
Warsaw;Workstation;WAR-L4D6;10.10.20.22
Ankara;Workstation;AN-D5G36;10.10.13.22
Warsaw;Workstation;WAR-SRVTST02;10.10.20.33
Amsterdam;Server;LON-SRV545;10.10.10.244
PowerShell Version: 5.1.17134.858
function Csv-Splitter {
$fileName = Read-Host "Pass file name to process: "
$FileToProcess = Import-Csv "$fileName.csv" -Delimiter ';'
$MyList = New-Object System.Collections.Generic.List[string]
foreach ($row in $FileToProcess) {
if ("$($row.'OS.Type')-$($row.Site)" -notin $MyList) {
$MyList.Add("$($row.'OS.Type')-$($row.Site)")
$row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
}
else {
$row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
}
}
}
Basically, code works fine, however it generates some errors from time to time when it process through the loop. This causes lack of some rows in new files - number of missing rows equals to amount of errors:
Export-Csv : The process cannot access the file 'C:\xxx\xxx\xxx.csv' because
it is being used by another process.
Export-Csv is synchronous - by the time it returns, the output file has already been closed - so the code in the question does not explain the problem.
As you've since confirmed in a comment, based on a suggestion by Lee_Dailey, the culprit was the AV (anti-virus) Mcafee On-Access Scan module, which accessed each newly created file behind the scenes, thereby locking it temporarily, causing Export-Csv to fail intermittently.
The problem should go away if all output files can be fully created with a single Export-Csv call each, after the loop, as also suggested by Lee. This is preferable for performance anyway, but assumes that the entire CSV file fits into memory as a whole.
Here's a Group-Object-based solution that uses a single pipeline to implement write-each-output-file-in-full functionality:
function Csv-Splitter {
$fileName = Read-Host "Pass file name to process: "
Import-Csv "$fileName.csv" -Delimiter ';' |
Group-Object { $_.'OS.Type' + '_' + $_.Site + '.csv' } |
ForEach-Object { $_.Group | Export-Csv -NoTypeInformation $_.Name }
}
Your own answer shows alternative solutions that eliminate interference from the AV software.
Source of the issue was McAfee On-Access Scan which was scanning every file created. There are 3 ways to bypass the problem:
a) temporarily disable the whole AV / OAS module.
b) add powershell.exe to the OAS policies as a Low Risk process
c) collect all data in memory and create all files with Export-CSV, as a last step, as shown in the other answer.
Edit
I'll conclude that Import-Csv is not ideal for incorrect formatted CSV and will use Get-Content and split. Thanks for all the answers.
Example CSV:
"SessionID","ObjectName","DatabaseName",,,,,,,,
"144","","AC"
Using Import-Csv none of the empty fields at the end will be counted - it will simply stop after "DatabaseName".
Is there any way to include the empty fields?
Edit:
I simply need to count the fields and make sure there are less than X amount of them. It is not only the header that might contain empty fields but also the content. These files are often manually made and not properly formatted. Since the files also can get very large, I would prefer to not also use Get-Content and split since I'm already using Import-Csv and its properties.
Looks like it's missing its headers. If you would add some, it would work fine.
You could do something like
Get-Content My.CSV | Select -skip 1 | ConvertFrom-Csv -Header "SessionID","ObjectName","DatabaseName",'Whatnot1', 'Whatnot2', 'Whatnot3'
As dbso suggested split and Length will help you. I was on the way to code a header routine which now is obsolete. Nevertheless here it is:
$FileIn = "Q:\test\2017-01\06\SO_41505840.csv"
$Header= (Get-Content $FileIn|select -first 1)-split(",")
"Fieldcount for $FileIn is $($Header.Length)"
for($i=0; $i -lt $Header.Length; $i++){if ($Header[$i] -eq ""){$Header[$i]="`"Column$($i+1)`""}}
$Header -Join(",")
Returning this output
Fieldcount for Q:\test\2017-01\06\SO_41505840.csv is 11
"SessionID","ObjectName","DatabaseName","Column4","Column5","Column6","Column7","Column8","Column9","Column10","Column11"
I have a file I am trying to import from. The first column is an IDnum, Second is Name, Third is Version. What I want is just the IDNum column. The file is tab delimited so I was wondering how do I capture only the first column before the tab? The rest of the text on each line is not needed. The line looks like this:
4809490 WebGoat 5.0
So in this example, I only want 4809490. I do not need the rest of that stuff.
This can be done via Select-String as a one-liner.
Select-String -Path file.txt -Pattern '^\w+' -AllMatches |%{$_.Matches.Value} > newfile.txt
I sat back and thought about it and found a way myself too :)
ForEach ($line in $content) {
$thisappid = $line.Split("`t")
Write-Host $thisappid.GetValue(0)
}
Those are good answers as well so thank you!
You can either use the Import-CSV with a specified tab delimiter to load the file and select only the first column, or you can use the Get-Content cmdlet to load the file, iterate over it using ForEach-Object and use a regex to extract the ID:
Get-Content 'PATH_TO_YOUR_FILE' | ForEach-Object {
[regex]::Match($_, '\d+').Groups[0].Value
}
I'm far away from being an expert in PowerShell, so I'll be my best to explain here.
I was able to add a column, but now I want to add stuff in a column (already there) using a separate script.
For example, the following CSV file:
WhenToBuyThings,ThingsToBuy
BuyNow ,Bed
BuyNow ,Desk
BuyNow ,Computer
BuyNow ,Food
BuyLater ,
BuyLater ,
BuyLater ,
I have the array:
$BuyStuffLater = "Books","Toys","ShinnyStuff"
So the end result of the file should look like this
BuyNow ,Bed
BuyNow ,Desk
BuyNow ,Computer
BuyNow ,Food
BuyLater ,Books
BuyLater ,Toys
BuyLater ,ShinnyStuff
Any help with how to do this in code would be much appreciated. Also, we can't use delimiter ",". Because in the real script some values will have commas.
I got it after a few hours of fiddling...
$myArray = "Books","Toys","ShinnyStuff"
$i = 0
Import-Csv "C:\Temp\test.csv" |
ForEach-Object {if($_.WhenToBuyThings -eq "BuyLater"){$_.ThingsToBuy = $myArray[$i];$i++}return $_} |
Export-Csv C:\Temp\testtemp.csv -NoTypeInformation
All is well now...
I am new to powershell, too. Here's what I found. This searches and returns all lines that fit. I'm not sure it can pipe.
$BuyStuffLater = "Books","Toys","ShinnyStuff"
$x = 0
Get-Content -Path .\mydata.txt | select-string -pattern "BuyLater" #searches and displays
# Im not sure about this piping. (| foreach {$_ + $BuyStuffLater[$x];$x++} | Set-Content .\outputfile.csv)
This filter will work, though I still have to work on the piping. The other answer might be better.
I don't see a point to iterating through each object to see if it is a WhenToBuyThings is "BuyLater". If anything what you are doing could be harmful if you run multiple passes adding to the list. It could remove previous things you wanted to by. If "Kidney Dialysis Machine" was listed as a "BuyLater" under WhenToBuyThings then you would overwrite it with dire consequences.
What we can do is build two lists and merge into new csv file. First list is your original file minus any entry where a "BuyLater" has a blank ThingsToBuy. The second list is an object array built from your $BuyStuffLater. Add these lists together and export.
Also there is zero need to worry about using a comma delimiter when using Export-CSV. The data is quoted so commas in data do not affect the data structure. If this was still a concern you could use -Delimiter ";". I noticed in your answer that you did not attempt to account for commas either (not that it matters based on what I just said).
$path = "C:\Temp\test.csv"
$ListOfItemsToBuy = "Books","Toys","ShinnyStuff: Big, ShinyStuff"
$BuyStuffLater = $ListOfItemsToBuy | ForEach-Object{
[pscustomobject]#{
WhenToBuyThings = "BuyLater"
ThingsToBuy = $_
}
}
$CurrentList = Import-Csv $path | Where-Object{$_.WhenToBuyThings -ne "BuyLater" -and ![string]::IsNullOrWhiteSpace($_.ThingsToBuy)}
$CurrentList + $BuyStuffLater | Export-Csv $path -NoTypeInformation
Since you have 3.0 we can use [pscustomobject]#{} to build the new object very easily. Combine both arrays simply by adding them together and export back to the original file.
You should notice I used slightly different input data. One includes a comma. I did that so you can see what the output file looks like.
"BuyLater","Books"
"BuyLater","Toys"
"BuyLater","ShinnyStuff: Big, ShinyStuff"
I have a pipe-delimited text file with a header row. (I said CSV in the question to make it a a bit more immediately understandable ... I imagine most solutions would be applicable to either format.)
The file looks like this:
COLUMN1|COLUMN2|COLUMN3|COLUMN4|...|
Field1|Field2|Field3|Field4|...|
...
I need to obscure the data in (for example) columns 3 and 9, without affecting any of the other entries in the file.
I want to do this using a hashing algorithm like SHA1 or MD5, so that the same strings will resove to the same hash values anywhere they are encountered.
EDIT - Why I want to do this
I need to send some data to a third party, and certain columns contain sensitive information (e.g. customer names). I need the file to be complete, and where a string is replaced, I need it to be done in the same way every time it is encountered (so that any mapping or grouping remains). It does not need military encryption, just to be difficult to reverse. As I need to to this intermittently, a scripted solution would be ideal.
/EDIT
What is the easiest way to achieve this using a command line tool or script?
By preference, I would like a batch script or PowerShell script, since that does not require any additional software to achieve...
Try
(Import-Csv .\my.csv -delimiter '|' ) | ForEach-Object{
$_.column3 = $_.column3.gethashcode()
$_.column4 = $_.column4.gethashcode()
$_
} | Export-Csv .\myobfuscated.csv -NoTypeInformation -delimiter '|'
$md5 = new-object -TypeName Security.Cryptography.MD5CryptoServiceProvider
$utf8 = new-object -TypeName Text.UTF8Encoding
import-csv original.csv -delimiter '|' |
foreach {
$_.Column3 = [BitConverter]::ToString($md5.ComputeHash($utf8.GetBytes($_.Column3)))
$_.Column9 = [BitConverter]::ToString($md5.ComputeHash($utf8.GetBytes($_.Column9)))
$_
} |
export-csv encrypted.csv -delimiter '|' -noTypeInformation