Parse quoted text from within batch file - command-line

I would like to do some simple parsing within a batch file.
Given the input line:
Foo: Lorem Ipsum 'The quick brown fox' Bar
I want to extract the quoted part (without quotes):
The quick brown fox
Using only the standard command-line tools available on Windows XP.
(I had a look at find and findstr but they don't seem quite flexible enough to return only part of a line.)

Something like this will work, but only if you have one quoted string per line of input:
#echo OFF
SETLOCAL enableextensions enabledelayedexpansion
set TEXT=Foo: Lorem Ipsum 'The quick brown fox' Bar
#echo %TEXT%
for /f "tokens=2 delims=^'" %%A in ("abc%TEXT%xyz") do (
set SUBSTR=%%A
)
#echo %SUBSTR%
Output, quoted string in the middle:
Foo: Lorem Ipsum 'The quick brown fox' Bar
The quick brown fox
Output, quoted string in the front:
'The quick brown fox' Bar
The quick brown fox
Output, quoted string at the end:
Foo: Lorem Ipsum 'The quick brown fox'
The quick brown fox
Output, entire string quoted:
'The quick brown fox'
The quick brown fox

Related

Sed command to find lines in which all words start with a capital letter

I'm studying the sed command. I have written a command which replaces the first letter of a word with a capital letter:
sed -e "s/\b\(.\)/\u\1/g"
But I have no idea how to find lines in which all words start with a capital letter.
For example, my text file:
Hello world
Hello World
Lorem Ipsum sample
The command should return one line:
Hello World
I would do this by matching lines that have at least one word starting with a lowercase character and deleting them:
sed '/\b[[:lower:]]/d' infile
\b is a GNU extension, so this requires GNU sed.
sed is for doing s/old/new/ that is all. For anything else just use awk for simplicity, clarity, robustness, portability, performance, etc...
Look:
$ cat file
Hello world
Hello World
Lorem Ipsum sample
Lorem ipsum Foo bar And stuff
Lines where every word starts with an upper case letter:
$ awk 'gsub(/(^| )[[:upper:]]/,"&") == NF' file
Hello World
Lines where 2 words start with an upper case letter:
$ awk 'gsub(/(^| )[[:upper:]]/,"&") == 2' file
Hello World
Lorem Ipsum sample
Lines where more than 1 words start with an upper case letter:
$ awk 'gsub(/(^| )[[:upper:]]/,"&") > 1' file
Hello World
Lorem Ipsum sample
Lorem ipsum Foo bar And stuff
Lines where the same number of words start with upper case as with lower case letters:
awk 'gsub(/(^| )[[:upper:]]/,"&") == gsub(/(^| )[[:lower:]]/,"&")' file
Hello world
Lorem ipsum Foo bar And stuff
Try taking whatever sed script you get in response to your question and building on it for the above (or any other!) cases if/when your requirements change.
The above will work with any awk in any shell on any UNIX box.
For something like this you need to match the whole line, i.e. ^...$. This works for your example:
sed -E '/^ *(([A-Z][^ ]*) +)*[A-Z][^ ]*$/!d'
Explanation
* - allow optional space at the start of the line
(([A-Z][^ ]*) +)* - match capital letter followed by any number of non-space characters followed by one or more space. This whole group can be arbitrarily repeated
[A-Z][^ ]* * - finally the line should end with a capitalized word followed by optional space
the !d at the end deletes everyline that does not match the regular expression

sed: replace only in part of string

I have a simple playlist of song files:
1003 James Brown - The Boss Unknown Artist.mp3
1004 James Brown - Slaughters Theme Unknown Artist.mp3
1005 James Brown - Payback(1) Unknown Artist.mp3
...
I would like them in the following format:
1003 James_Brown_-_The_Boss_Unknown_Artist.mp3
1004 James_Brown_-_Slaughters_Theme_Unknown_Artist.mp3
...
Notice that the whitespace behind the number in front is NOT replaced. I have the following simple sed script:
sed "s/ /_/g"
but that replaces also the space after the number. I know how to form capture groups, but that will not help either. How can I convince sed to only apply the replacement to a portion of the input string, rather than the whole string?
You could do
sed 's/ /_/g; s/_/ /'
I.e. first turn all spaces into underscores, then turn the first underscore back into a space.

Capturing matches with sed

I have just started experimenting with sed and don't really get how does match capturing work: if I have a code like this for capturing two words sed 's/\([a-z]*\).*\([a-z]*\).*/\1 \2/' why isn't the second word captured?
Edit1: Let's say I have this string: "the brown fox jumps over the lazy dog". I want sed to match "the brown", but it only matches the first word
(Quoting Sundeep, just to make a Q/A pair.)
replace the dot in .* with space character...
sed 's/\([a-z]*\) *\([a-z]*\).*/\1 \2/'

How can I cut out a section of a string using Perl?

I need to cut some characters out of the middle of a string; the starting and ending positions of the character sequence to be cut will vary.
For example, say I have the sentence
The quick brown fox jumped over the lazy dog
I need to count forward from the first character until I get to fox, assign the character position of the f to a variable, continue counting forward until I get to 'the' and then cut out the characters between and including the initial f and the final e.
Note
There is an e in jumped which is between fox and the, this should be ignored, it must find the position of the e in the.
To remove a section of a string where you're not sure of all the intervening characters, you can use the substitution operator. If there's a match, the position of the beginning of the match (zero-indexed) is stored in $-[0] (or $LAST_MATCH_START[0] if you use English;):
use strict;
use warnings;
use 5.010;
my $string = 'The quick brown fox jumped over the lazy dog';
$string =~ s/fox.*the//;
say "Matched at char $-[0]" if defined $-[0];
say "New string: $string";
Output:
Matched at char 16
New string: The quick brown lazy dog
Which "the"?
Note that the regex I used is greedy, so it will gobble up every the until the last. For the string:
The quick brown fox jumped over the lazy dog and the sleepy cat
you will get:
Matched at char 16
New string: The quick brown sleepy cat
To stop at the first occurrence of the, change the substitution to:
s/fox.*?the//;
Whole words only
Both of the regexes above will still match partial words. The string:
The quick brown foxhole jumped over their lazy dog
gives:
Matched at char 16
New string: The quick brown ir lazy dog
To only match whole words* change the substitution to:
s/(?:^|\s+)\Kfox\s+.*\s+the(?=\s+|\z)//; # greedy
or
s/(?:^|\s+)\Kfox\s+.*?\s+the(?=\s+|\z)//; # non-greedy
* It's hard to define what counts as a whole word in an English sentence. The above expects a word to be surrounded on both sides by one or more spaces or to be at the beginning or end of the string, which excludes things like in-the-know, but also excludes "fox" and the,. This is obviously not a great definition.
I have the sentence
The quick brown fox jumped over the lazy dog
I need to count forward from the first character until I get to 'fox', asign the character position of 'f' to a variable, continue counting forward until I get to 'the' and then cut out the characters, including and between, 'f' and 'e'.
I am quoting your problem description because it indicates the C mindset with which you are approaching Perl. At a bit higher level than C, your problem is to actually to cut out the words between "brown" and "lazy". Perl allows you to directly express this idea:
$ perl -wE 'say join(" ", (split /\s+(?:fox|the)\s+/, "The quick brown fox jumped over the lazy dog")[0, 2])'
The quick brown lazy dog
Or, using the range operator:
$ perl -wE 'say join " ", grep !(/^fox$/ .. /^the$/), split " ", "The quick brown fox jumped over the lazy dog"'
The quick brown lazy dog
which literally reads "take all the words not between 'fox' and 'the', join them together using a single space as the word separator, and print the resulting sentence."
If the original sentence has many, many words, the first one might be more efficient as it will only ever create a three element list.
You can read more about the range operator in perldoc perlop. Since you are just beginning to learn Perl, you should read everything mentioned in perldoc perltoc at least once, including all the FAQ sections.

Convert multiples .txt and into a single file?

I have a lot of folders.
In each folder there are aproximally between 1 and 20 .txt-files
Each .txt file (unique name) contain a title (line1) that is followed by a HTML-formatted text (line2)
Example (1) on how a .txt looks inside:
Frankfurter tail turkey doner
<p>Bacon ipsum dolor sit amet turkey sausage brisket pork.</p><p>Tongue swine turducken capicola shoulder hamburger pig.<p/><p>Ball tip jerky ham, doner filet mignon ham hock bresaola jowl andouille pig cow</p>
Example (2) on how a .txt looks inside:
Batman
<p>You either die a hero or you live long enough to see yourself become the villain.</p>
Simply explained, I want to merge the content of the .txt files into a single file, that contains one files content per row.
Each line shall shall also be wrapped in quotation marks and a comma be separating them.
From the example above, the output file should look like this:
"Frankfurter tail turkey doner","<p>Bacon ipsum dolor sit amet turkey sausage brisket pork.</p><p>Tongue swine turducken capicola shoulder hamburger pig.<p/><p>Ball tip jerky ham, doner filet mignon ham hock bresaola jowl andouille pig cow</p>"
"Batman","<p>You either die a hero or you live long enough to see yourself become the villain.</p>"
So, my question is simply how this is done the easiest and fastest way.
I am doing this by hand right now it's pretty slow, copypasting at this volume makes my brain swell.
Edit1: Been doing some lite research;
Powershell, VBA and .BAT files seems like something, still havent found anything that works.
I don't want to have the location of the input or output-files specified in the code, launch file for the solution is to be placed in any folder and output file should also be generated here.
Try 1#:
Created a Windows Batch Files (.bat) containing this:
for %f in (*.txt) do type "%f" >> combined.txt
Placed in a folder with a dozen .txt-files, but the console just opens and closes. No file created!
Edit2: Now we are cooking!
This:
for %%f in (*.txt) do type "%%f" >> combined.txt
Gives output:
Batman
<p>You either die a hero or you live long enough to see yourself become the villain.</p>
Frankfurter tail turkey doner
<p>Bacon ipsum dolor sit amet turkey sausage brisket pork.</p><p>Tongue swine turducken capicola shoulder hamburger pig.<p/><p>Ball tip jerky ham, doner filet mignon ham hock bresaola jowl andouille pig cow</p>
This is very close from what I want!
Now adding quotation marks and replaceing the linebrek with a comma
is not solved.
Best regards,
Lui Kang
Got external help, this works perfectly.
#echo off
cls
setlocal
set "combined=combined.txt"
(
for %%a in (*.txt) do (
if not "%%a" == "%combined%" (
echo %%a 1>&2
set "firstLine=true"
for /f "tokens=1,* delims=: " %%b in ('findstr /n "^" %%a') do (
if defined firstLine (
set /p =""%%c",""
set "firstLine="
) else if not "%%c" == "" (
set /p =%%c
)
)
echo "
)
)
) > %combined% <nul
endlocal
goto:eof