Powershell: Matching a line of text from one file and adding into another file - date

I have a very large log file. It contains log data for multiple dates. Each line begins with a date (yyyy-dd-mm hh:mm:ss etc).
So, the log looks like this:
2016-02-17 10:15:24 some text that follows
2016-02-17 14:21:46 more text that follows
2016-02-19 11:54:11 other text that follows
2016-02-19 16:37:21 more text that follows
2016-02-19 19:52:17 other text that follows
2016-02-22 06:01:32 more text that follows
etc...
I am trying to write a PowerShell script that will:
Read each line of my file
Identify the date (first 10 characters)
Add this line to another file with the name targetfile-yyyy-mm-dd.log
My first attempt at this problem was to iterate through the whole range of dates in the file and parse the entire file from top to bottom for each date. This required multiple passes through the whole file (40GB of it!), which takes days.
My ideal solution would be to go through the file just one time, line by line, and copy each line into its appropriate target file based on the first ten characters in the line.
How would I do this to make it most efficient? Thank you for your thoughts!

Try this:
# Use StreamReader to read line by line the Log $file
$streamReader = New-Object System.IO.StreamReader -Arg "$file"
while($line = $streamReader.ReadLine()){
# Get the first 10 char to generate the $targetfile
$tagetfile = "target-file-$($line.Substring(0,10)).log"
# Add-Content of the $line value, skipping the first 20 char (Date)
$line.Substring(20) | Add-Content $tagetfile
}

Related

Import length-delimited file with PowerShell and export as csv file

I have a source file which is in .txt format. It looks like a semi-colon separated file:
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
However, it is determined by length. So it is a length-delimited file.
Fist column for example is from first value to the third (100), then a semi-colon follows.
Second column starts at 5th position (including), until (including) 7th position. A string column can contain a semi-colon.
Now I want to import this length-delimited txt file with Powershell and export it as a csv file. This file should be really semi-colon separated. The result should look like
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
But I have simply no idea how to do it? I googled it, but I did not find that much useful code examples for importing length-delimited txt files with PowerShell.
Unfortunately, I cannot use Python. I am not sure, if this task is generally possible using Powershell? Because when exporting, Powershell also needs to recognize that there are string values containing the separator, so it has to pay attention to the quoting: "Thisisa;;ringcolumnB". I think it would be also ok for me, if the whole column is quoted, so every entry in a string column gets quotes added.
You can use regex to describe a string in which the 3rd "column" contains a ; and then inject the quotation marks with the -replace operator:
$lines = Get-Content path\to\file.txt
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
The expression (.{20}(?<=;.{0,19})) is going to match the 20-char 3rd column value only if it contains at least one semi-colon - so lines with no semicolon in that column will be left alone:
# let's try it out with your test data
$lines = #'
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
'# -split '\r?\n'
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
Which yields the following four strings:
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
To write the output back to file, use Set-Content:
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;' |Set-Content path\to\fixed_output.scsv

Detecting selected text with powershell

Is it possible to detect selected text with powershell?
For instance, say I opened a text file that contained an essay and then sent a command to select the first line.
Is there a way to return the first line of that file as a string?
If you are trying to iterate over the lines of a text file you can access them like this:
$file = Get-Content essay.txt
$file[0] # for first line
$file[$file.Length - 1] # for last line

Reading a text file in PowerShell after of a marker

I'm just wondering if it's possible to read the content of text file with specific index?
What I mean is like this, for example:
I have text file like this, 'test1.txt'
12345678900 ## ## readthistext
54321123440 ## ## hellothistext
I just want to read the content of text file after of the hashtag.
To read the text after the # characters you must read the file content up to the # characters first. Also, in PowerShell you normally read files either line by line (via Get-Content) or completely (via Get-Content -Raw). You can discard those parts of the read content that don't interest you, though. For instance:
(Get-Content 'C:\input.txt') -replace '^.*#'
The above will read the file C:\input.txt and for each line remove the text from the beginning of the line up to the last # character.

powershell - replace line in .txt file

I am using PowerShell and I need replace a line in a .txt file.
The .txt file always has different number at the end of the line.
For example:
...............................txt (first)....................................
appversion= 10.10.1
............................txt (a second time)................................
appversion= 10.10.2
...............................txt (third)...................................
appversion= 10.10.5
I need to replace appversion + number behind it (the number is always different). I have set the required value in variable.
How do I do this?
Part of this issue you are getting, which I see from your comments, is that you are trying to replace text in a file and saved it back to the same file while you are still reading it.
I will try to show a similar solution while addressing this. Again we are going to use -replaces functionality as an array operator.
$NewVersion = "Awesome"
$filecontent = Get-Content C:\temp\file.txt
$filecontent -replace '(^appversion=.*\.).*',"`$1$NewVersion" | Set-Content C:\temp\file.txt
This regex will match lines starting with "appversion=" and everything up until the last period. Since we are storing the text in memory we can write it back to the same file. Change $NewVersion to a number ... unless that is your versioning structure.
Not sure about what numbers you are keeping
About which part of the numbers, if any, you are trying to preserve. If you intend to change the whole number then you can just .*\. to a space. That way you ignore everything after the equal sign.
Yes, you can with regex.
Let call $myString and $verNumber the variables with text and version number
$myString = "appversion= 10.10.1";
$verNumber = 7;
You can use -replace operator to get the version part and replace only last subversion number this way
$mystring -replace 'appversion= (\d+).(\d+).(\d+)', "appversion= `$1.`$2.$verNumber";

powershell remove first characters of line

I am parsing a JBOSS log file using powershell.
A typical line would being like this :
2011-12-08 09:01:07,636 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].etc..
I want to remove all the characters from character 1 until the word ERROR. So I want to remove the date and time, the coma and the number right after it. I want my lines to begin with the word ERROR and delete everything before that.
I looked on google and tried different things I have found but I struggle and can't make it work. I tried with substring and replace but can't find how to delete all characters until the word ERROR.
Any help would be greatly appreciated,
Thanks a lot!
This one-liner will read the contents of your file (in the example jboss.txt) and replace every line containing ERROR by ERROR + whatever follows on that line. Finally it will save the result in processed_jboss.txt
get-content jboss.txt | foreach-object {$_ -replace "^.*?(ERROR.*)",'$1'} | out-file processed_jboss.txt
Assuming the log line is in a variable of type string this should do it:
$line = "2011-12-08 09:01:07,636 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].etc.."
$ErrorIndex = $line.indexof("Error",0)
$CleanLogLine = $Line.Substring($ErrorIndex, $line.length)
Reference:
http://msdn.microsoft.com/en-us/library/system.string.aspx