Convert multiples .txt and into a single file? - powershell

I have a lot of folders.
In each folder there are aproximally between 1 and 20 .txt-files
Each .txt file (unique name) contain a title (line1) that is followed by a HTML-formatted text (line2)
Example (1) on how a .txt looks inside:
Frankfurter tail turkey doner
<p>Bacon ipsum dolor sit amet turkey sausage brisket pork.</p><p>Tongue swine turducken capicola shoulder hamburger pig.<p/><p>Ball tip jerky ham, doner filet mignon ham hock bresaola jowl andouille pig cow</p>
Example (2) on how a .txt looks inside:
Batman
<p>You either die a hero or you live long enough to see yourself become the villain.</p>
Simply explained, I want to merge the content of the .txt files into a single file, that contains one files content per row.
Each line shall shall also be wrapped in quotation marks and a comma be separating them.
From the example above, the output file should look like this:
"Frankfurter tail turkey doner","<p>Bacon ipsum dolor sit amet turkey sausage brisket pork.</p><p>Tongue swine turducken capicola shoulder hamburger pig.<p/><p>Ball tip jerky ham, doner filet mignon ham hock bresaola jowl andouille pig cow</p>"
"Batman","<p>You either die a hero or you live long enough to see yourself become the villain.</p>"
So, my question is simply how this is done the easiest and fastest way.
I am doing this by hand right now it's pretty slow, copypasting at this volume makes my brain swell.
Edit1: Been doing some lite research;
Powershell, VBA and .BAT files seems like something, still havent found anything that works.
I don't want to have the location of the input or output-files specified in the code, launch file for the solution is to be placed in any folder and output file should also be generated here.
Try 1#:
Created a Windows Batch Files (.bat) containing this:
for %f in (*.txt) do type "%f" >> combined.txt
Placed in a folder with a dozen .txt-files, but the console just opens and closes. No file created!
Edit2: Now we are cooking!
This:
for %%f in (*.txt) do type "%%f" >> combined.txt
Gives output:
Batman
<p>You either die a hero or you live long enough to see yourself become the villain.</p>
Frankfurter tail turkey doner
<p>Bacon ipsum dolor sit amet turkey sausage brisket pork.</p><p>Tongue swine turducken capicola shoulder hamburger pig.<p/><p>Ball tip jerky ham, doner filet mignon ham hock bresaola jowl andouille pig cow</p>
This is very close from what I want!
Now adding quotation marks and replaceing the linebrek with a comma
is not solved.
Best regards,
Lui Kang

Got external help, this works perfectly.
#echo off
cls
setlocal
set "combined=combined.txt"
(
for %%a in (*.txt) do (
if not "%%a" == "%combined%" (
echo %%a 1>&2
set "firstLine=true"
for /f "tokens=1,* delims=: " %%b in ('findstr /n "^" %%a') do (
if defined firstLine (
set /p =""%%c",""
set "firstLine="
) else if not "%%c" == "" (
set /p =%%c
)
)
echo "
)
)
) > %combined% <nul
endlocal
goto:eof

Related

Scala String trick or method that inserts a new line or other character each time an "|" is placed

First of all I would like to apologise for how vague this question may seem, but at this point, I have not found any information on the matter.
Some time ago a colleague was talking to me about monoids and a JSON Parser, then he showed me a way of constructing Strings that looked more or less like this (Based on my poor memory):
strangeMethodOrTrick = {
| Lorem ipsum dolor sit amet
| consectetur adipiscing elit
| sed do eiusmod
| tempor incididunt
}.toString
At one point he mentioned that, every time the "|" character was typed in, it meant a line break was placed at the end of each line once the final String was created.
Apologies again for not providing more information about the subject, today I asked my colleague about this but he said he doesn't remember such thing like that or anything related to it.
For my part I haven't found any information either, I only remember what the code looked like and so far I haven't found that in the Scala documentation.
I would appreciate if anyone knows about this tactic and, if you know the name or how I could find more information about it.
For anything else, I wish you guys a very happy new year!
It's not a built-in feature and it doesn't insert new line breaks. The standard library provides an extension method for strings stripMargin that strips any leading whitespace followed by | from every line of a string.
val str =
"""|multiple lines
|in one string
|with stripMargin""".stripMargin
Note that the newlines are already part of the string before calling stripMargin because I constructed a multi line string with the help of triple quotes """. All stripMargin does is delete the leading whitespace and the |.
As you can see in the docs there is a stripMargin method that you can call with another character than | if you want to, e.g. stripMargin('#').

Sed command to find lines in which all words start with a capital letter

I'm studying the sed command. I have written a command which replaces the first letter of a word with a capital letter:
sed -e "s/\b\(.\)/\u\1/g"
But I have no idea how to find lines in which all words start with a capital letter.
For example, my text file:
Hello world
Hello World
Lorem Ipsum sample
The command should return one line:
Hello World
I would do this by matching lines that have at least one word starting with a lowercase character and deleting them:
sed '/\b[[:lower:]]/d' infile
\b is a GNU extension, so this requires GNU sed.
sed is for doing s/old/new/ that is all. For anything else just use awk for simplicity, clarity, robustness, portability, performance, etc...
Look:
$ cat file
Hello world
Hello World
Lorem Ipsum sample
Lorem ipsum Foo bar And stuff
Lines where every word starts with an upper case letter:
$ awk 'gsub(/(^| )[[:upper:]]/,"&") == NF' file
Hello World
Lines where 2 words start with an upper case letter:
$ awk 'gsub(/(^| )[[:upper:]]/,"&") == 2' file
Hello World
Lorem Ipsum sample
Lines where more than 1 words start with an upper case letter:
$ awk 'gsub(/(^| )[[:upper:]]/,"&") > 1' file
Hello World
Lorem Ipsum sample
Lorem ipsum Foo bar And stuff
Lines where the same number of words start with upper case as with lower case letters:
awk 'gsub(/(^| )[[:upper:]]/,"&") == gsub(/(^| )[[:lower:]]/,"&")' file
Hello world
Lorem ipsum Foo bar And stuff
Try taking whatever sed script you get in response to your question and building on it for the above (or any other!) cases if/when your requirements change.
The above will work with any awk in any shell on any UNIX box.
For something like this you need to match the whole line, i.e. ^...$. This works for your example:
sed -E '/^ *(([A-Z][^ ]*) +)*[A-Z][^ ]*$/!d'
Explanation
* - allow optional space at the start of the line
(([A-Z][^ ]*) +)* - match capital letter followed by any number of non-space characters followed by one or more space. This whole group can be arbitrarily repeated
[A-Z][^ ]* * - finally the line should end with a capitalized word followed by optional space
the !d at the end deletes everyline that does not match the regular expression

replace line feeds with sed in a certain range of line

Is it possible to replace /n (e.g., with space) in a certain range of line position with sed?
Here is a sample without range filter. Is it somehow possible to set a range in sed?
for f in `find ${_filedir} -type f`
do
#replace all LF with spaces
sed 's/\n/ /g' ${f} > ${f}.noCR
done
Ok, here a sample:
Let's take some lines:
"I want to break free now"
"And friends will be friends"
I want to replace any "n" with an "m" in range 0 to 16, which results:
"I wamt to break free now"
"Amd friemds will be friends"
Edited again
Try awk:
awk '{l=substr($0,1,10);r=substr($0,11);gsub(/n/,"m",l);print l r}' file
where l is the left part of the string and r is the right and gsub() does global substitutions.
Edited
I would probably use Bash parameter substitution for that - documentation here:
#!/bin/bash
while read line
do
left=${line:1:16} # Get left 16 chars
right=${line:17} # Get remainder of line
left=${left//n/m} # Do global replacement in left part
echo $left $right # Show output
done < file
Original answer
Sure, just on lines 2-8
sed '2,8s/foo/bar/' file
Or, between start and end patterns:
sed '/start/,/end/s/foo/bar/' file

Remove Leading Whitespace from File

My shell has a call to 'fortune' in my .login file, to provide me with a little message of the day. However, some of the fortunes begin with one leading whitespace line, some begin with two, and some don't have any leading whitespace lines at all. This bugs me.
I sat down to wrapper fortune with my own shell script, which would remove all the leading whitespace from the input, without destroying any formatting of the actual fortune, which may intentionally have lines of whitespace.
It doesn't appear to be an easy one-liner two-minute fix, and as I read(reed) through the man pages for sed and grep, I figured I'd ask our wonderful patrons here.
Using the same source as Dav:
# delete all leading blank lines at top of file
sed '/./,$!d'
Source: http://www.linuxhowtos.org/System/sedoneliner.htm?ref=news.rdf
Additionally, here's why this works:
The comma separates a "range" of operation. sed can accept regular expressions for range definitions, so /./ matches the first line with "anything" (.) on it and $ specifies the end of the file. Therefore,
/./,$ matches "the first not-blank line to the end of the file".
! then inverts that selection, making it effectively "the blank lines at the top of the file".
d deletes those lines.
# delete all leading blank lines at top of file
sed '/./,$!d'
Source: http://www.linuxhowtos.org/System/sedoneliner.htm?ref=news.rdf
Just pipe the output of fortune into it:
fortune | sed '/./,$!d'
How about:
sed "s/^ *//" < fortunefile
i am not sure about how your fortune message actually looks like, but here's an illustration
$ string=" my message of the day"
$ echo $string
my message of the day
$ echo "$string"
my message of the day
or you could use awk
echo "${string}" | awk '{gsub(/^ +/,"")}1'

Parse quoted text from within batch file

I would like to do some simple parsing within a batch file.
Given the input line:
Foo: Lorem Ipsum 'The quick brown fox' Bar
I want to extract the quoted part (without quotes):
The quick brown fox
Using only the standard command-line tools available on Windows XP.
(I had a look at find and findstr but they don't seem quite flexible enough to return only part of a line.)
Something like this will work, but only if you have one quoted string per line of input:
#echo OFF
SETLOCAL enableextensions enabledelayedexpansion
set TEXT=Foo: Lorem Ipsum 'The quick brown fox' Bar
#echo %TEXT%
for /f "tokens=2 delims=^'" %%A in ("abc%TEXT%xyz") do (
set SUBSTR=%%A
)
#echo %SUBSTR%
Output, quoted string in the middle:
Foo: Lorem Ipsum 'The quick brown fox' Bar
The quick brown fox
Output, quoted string in the front:
'The quick brown fox' Bar
The quick brown fox
Output, quoted string at the end:
Foo: Lorem Ipsum 'The quick brown fox'
The quick brown fox
Output, entire string quoted:
'The quick brown fox'
The quick brown fox