Remove only single spaces in text file with sed, perl, awk, tr or anything - perl

I have a rather large text file where there is an extra space between every character;
I t l o o k s l i k e t h i s .
I'd like to remove those extra characters so
It looks like this.
via the Linux terminal.
I can't seem to find anyway to do this without removing all of the whitespaces. I'm willing to try any solution at this point. I'd appreciate any nudge in the right direction.

$ echo 'I t l o o k s l i k e t h i s . ' | sed 's/\(.\) /\1/g'
It looks like this.

Are you certain that the intermediate characters are spaces? It is most likely that this is a UTF-16 file.
I suggest you use a capable editor to open it as such and convert it to UTF-8.

An awksolution
echo "I t l o o k s l i k e t h i s ." | awk '{for (i=1;i<=NF;i+=2) printf $i;print ""}' FS=""
It looks like this.

As long as it's every other character you want to get rid of, you can use python.
>>> s = "I t l o o k s l i k e t h i s ."
>>> print s[0::2]
It looks like this.
If you wanted to do this for the text file, do the following:
with open("/path/to/file.txt") as f:
f = f.readlines()
with open("/path/to/new.txt") as g:
for i in f:
g.write(str(i)[0::2]+"\n")

perl -pe 's|(\s+)| " "x (length($1)>1) |ge' file

Related

How to create wordlist with custom pattern

I am a newbie. I need to create wordlist with specified pattern. The pattern will look like XXXXX00000 where X are 5 english characters (different, but can be same, small from alphabet) and 00000 are 5 numbers (0-9). (There will not be some special characters like &, $, _, -...)
Can someone help me?
It will be nice, if someone will post Terminal command. For example using crunch.
Thank you.
Examples:
aklmj98765
kjgfk11137
hhhhd00110
I made a program for you, save this as anyname.py and run. copy the output.
letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
numbers = "0123456789"
for a in range(len(letters)):
for b in range(len(letters)):
for c in range(len(letters)):
for d in range(len(letters)):
for e in range(len(letters)):
for f in range(len(numbers)):
for g in range(len(numbers)):
for h in range(len(numbers)):
for i in range(len(numbers)):
for j in range(len(numbers)):
print(letters[a] + letters[b] + letters[c] + letters[d] + letters[e] + numbers[f] + numbers[g] + numbers[h] + numbers[i] + numbers[j])
crunch 10 10 -t #####%%%%% -o result.txt
github.com/adaywithtape/wlm does just about anything you want easy to use.I answered in the wrong place see description below.

use sed to change a text report to csv

I have a report looks like this:
par_a
.xx
.yy
par_b
.zz
.tt
I wish to convert this format into csv format as below using sed 1 liner:
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
please help.
With awk:
awk '/^par_/{v=$0;next}/^ /{$0=v","$1;print}' File
Or to make it more generic:
awk '/^[^[:blank:]]/{v=$0;next} /^[[:blank:]]/{$0=v","$1;print}' File
When a line starts with par_, save the content to variable v. Now, when a line starts with space, change the line to content of v followed by , followed by the first field.
Output:
AMD$ awk '/^par_/{v=$0}/^ /{$0=v","$1;print}' File
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
With sed:
sed '/^par_/ { h; d; }; G; s/^[[:space:]]*//; s/\(.*\)\n\(.*\)/\2,\1/' filename
This works as follows:
/^par_/ { # if a new paragraph begins
h # remember it
d # but don't print anything yet
}
# otherwise:
G # fetch the remembered paragraph line to the pattern space
s/^[[:space:]]*// # remove leading whitespace
s/\(.*\)\n\(.*\)/\2,\1/ # rearrange to desired CSV format
Depending on your actual input data, you may want to replace the /^par_/ with, say, /^[^[:space:]]/. It just has to be a pattern that recognizes the beginning line of a paragraph.
Addendum: Shorter version that avoids regex repetition when using the space pattern to recognize paragraphs:
sed -r '/^\s+/! { h; d; }; s///; G; s/(.*)\n(.*)/\2,\1/' filename
Or, if you have to use BSD sed (as comes with Mac OS X):
sed '/^[[:space:]]\{1,\}/! { h; d; }; s///; G; s/\(.*\)\n\(.*\)/\2,\1/' filename
The latter should be portable to all seds, but as you can see, writing portable sed involves some pain.

How can I use sed to to convert $$ blah $$ in TeX to \begin{equation} blah \end{equation}

I have files with entries of the form:
$$
y = x^2
$$
I'm looking for a way (specifically using sed) to convert them to:
\begin{equation}
y = x^2
\end{equation}
The solution should not rely on the form of the equation (which may also span mutiple lines) nor on the text preceding the opening $$ or following the closing $$.
Thanks for the help.
sed '
/^\$\$$/ {
x
s/begin/&/
t use_end_tag
s/^.*$/\\begin{equation}/
h
b
: use_end_tag
s/^.*$/\\end{equation}/
h
}
'
Explanation:
sed maintains two buffers: the pattern space (pspace) and the hold space (hspace). It operates in cycles, where during each cycle it reads a line and executes the script for that line. pspace is usually auto-printed at the end of each cycle (unless the -n option is used), and then deleted before the next cycle. hspace holds its contents between cycles.
The idea of the script is that whenever $$ is seen, hspace is first checked to see if it contains the word "begin". If it does, then substitute the end tag; otherwise substitute the begin tag. In either case, store the substituted tag in the hold space so it can be checked next time.
sed '
/^\$\$$/ { # if line contains only $$
x # exchange pspace and hspace
s/begin/&/ # see if "begin" was in hspace
t use_end_tag # if it was, goto use_end_tag
s/^.*$/\\begin{equation}/ # replace pspace with \begin{equation}
h # set hspace to contents of pspace
b # start next cycle after auto-printing
: use_end_tag
s/^.*$/\\end{equation}/ # replace pspace with \end{equation}
h # set hspace to contents of pspace
}
'
This might work for you (GNU sed):
sed -r '1{x;s/^/\\begin{equation}\n\\end{equation}/;x};/\$\$/{g;P;s/(.*)\n(.*)/\2\n\1/;h;d}' file
Prime the hold space with the required strings. On encountering the marker print the first line and then swap the strings in anticipation of the next marker.
I can not help you with sed, but this awk should do:
awk '/\$\$/ && !f {$0="\\begin{equation}";f=1} /\$\$/ && f {$0="\\end{equation}";f=0}1' file
\begin{equation}
y = x^2
\end{equation}
The f=0is not needed, if its not repeated.

remove all words containing backslash

ive been tring sooooo many different variations to get this right.
i am simply looking to use sed to remove all words beginning with or containing a backslash.
so string
another test \/ \u7896 \n test ha\ppy
would become
another test test
i've tried soo many different options, but it doesnt seem to want to work. Does anybody have an idea how to do this?
and before everyone starts giving me minus 1 for this question, believe me, i have tried to find the answer.
You could use str.split and a list comprehension:
>>> strs = "another test \/ \u7896 \n test ha\ppy"
>>> [x for x in strs.split() if '\\' not in x]
['another', 'test', 'test']
# use str.join to join the list
>>> ' ' .join([x for x in strs.split() if '\\' not in x])
'another test test'
$ echo "another test \/ \u7896 \n test ha\ppy" | sed -r 's/\S*\\\S*//g' | tr -s '[:blank:]'
another test test
This might work for you (GNU sed):
sed 's/\s*\S*\\\S*//g' file
string = "another test \/ \u7896 \n test ha\ppy"
string_no_slashes = " ".join([x for x in string.split() if "\\" not in x])

Code Golf - Word Scrambler

Please answer with the shortest possible source code for a program that converts an arbitrary plaintext to its corresponding ciphertext, following the sample input and output I have given below. Bonus points* for the least CPU time or the least amount of memory used.
Example 1:
Plaintext: The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!
Ciphertext: eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
Example 2:
Plaintext: 123 1234 12345 123456 1234567 12345678 123456789
Ciphertext: 312 4213 53124 642135 7531246 86421357 975312468
Rules:
Punctuation is defined to be included with the word it is closest to.
The center of a word is defined to be ceiling((strlen(word)+1)/2).
Whitespace is ignored (or collapsed).
Odd words move to the right first. Even words move to the left first.
You can think of it as reading every other character backwards (starting from the end of the word), followed by the remaining characters forwards. Corporation => XoXpXrXtXoX => niaorCoprto.
Thank you to those who pointed out the inconsistency in my description. This has lead many of you down the wrong path, which I apologize for. Rule #4 should clear things up.
*Bonus points will only be awarded if Jeff Atwood decides to do so. Since I haven't checked with him, the chances are slim. Sorry.
Python, 50 characters
For input in i:
' '.join(x[::-2]+x[len(x)%2::2]for x in i.split())
Alternate version that handles its own IO:
print ' '.join(x[::-2]+x[len(x)%2::2]for x in raw_input().split())
A total of 66 characters if including whitespace. (Technically, the print could be omitted if running from a command line, since the evaluated value of the code is displayed as output by default.)
Alternate version using reduce:
' '.join(reduce(lambda x,y:y+x[::-1],x) for x in i.split())
59 characters.
Original version (both even and odd go right first) for an input in i:
' '.join(x[::2][::-1]+x[1::2]for x in i.split())
48 characters including whitespace.
Another alternate version which (while slightly longer) is slightly more efficient:
' '.join(x[len(x)%2-2::-2]+x[1::2]for x in i.split())
(53 characters)
J, 58 characters
>,&.>/({~(,~(>:#+:#i.#-#<.,+:#i.#>.)#-:)#<:##)&.><;.2,&' '
Haskell, 64 characters
unwords.map(map snd.sort.zip(zipWith(*)[0..]$cycle[-1,1])).words
Well, okay, 76 if you add in the requisite "import List".
Python - 69 chars
(including whitespace and linebreaks)
This handles all I/O.
for w in raw_input().split():
o=""
for c in w:o=c+o[::-1]
print o,
Perl, 78 characters
For input in $_. If that's not acceptable, add six characters for either $_=<>; or $_=$s; at the beginning. The newline is for readability only.
for(split){$i=length;print substr$_,$i--,1,''while$i-->0;
print"$_ ";}print $/
C, 140 characters
Nicely formatted:
main(c, v)
char **v;
{
for( ; *++v; )
{
char *e = *v + strlen(*v), *x;
for(x = e-1; x >= *v; x -= 2)
putchar(*x);
for(x = *v + (x < *v-1); x < e; x += 2)
putchar(*x);
putchar(' ');
}
}
Compressed:
main(c,v)char**v;{for(;*++v;){char*e=*v+strlen(*v),*x;for(x=e-1;x>=*v;x-=2)putchar(*x);for(x=*v+(x<*v-1);x<e;x+=2)putchar(*x);putchar(32);}}
Lua
130 char function, 147 char functioning program
Lua doesn't get enough love in code golf -- maybe because it's hard to write a short program when you have long keywords like function/end, if/then/end, etc.
First I write the function in a verbose manner with explanations, then I rewrite it as a compressed, standalone function, then I call that function on the single argument specified at the command line.
I had to format the code with <pre></pre> tags because Markdown does a horrible job of formatting Lua.
Technically you could get a smaller running program by inlining the function, but it's more modular this way :)
t = "The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!"
T = t:gsub("%S+", -- for each word in t...
function(w) -- argument: current word in t
W = "" -- initialize new Word
for i = 1,#w do -- iterate over each character in word
c = w:sub(i,i) -- extract current character
-- determine whether letter goes on right or left end
W = (#w % 2 ~= i % 2) and W .. c or c .. W
end
return W -- swap word in t with inverted Word
end)
-- code-golf unit test
assert(T == "eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos")
-- need to assign to a variable and return it,
-- because gsub returns a pair and we only want the first element
f=function(s)c=s:gsub("%S+",function(w)W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end return W end)return c end
-- 1 2 3 4 5 6 7 8 9 10 11 12 13
--34567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
-- 130 chars, compressed and written as a proper function
print(f(arg[1]))
--34567890123456
-- 16 (+1 whitespace needed) chars to make it a functioning Lua program,
-- operating on command line argument
Output:
$ lua insideout.lua 'The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!'
eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
I'm still pretty new at Lua so I'd like to see a shorter solution if there is one.
For a minimal cipher on all args to stdin, we can do 111 chars:
for _,w in ipairs(arg)do W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end io.write(W ..' ')end
But this approach does output a trailing space like some of the other solutions.
For an input in s:
f=lambda t,r="":t and f(t[1:],len(t)&1and t[0]+r or r+t[0])or r
" ".join(map(f,s.split()))
Python, 90 characters including whitespace.
TCL
125 characters
set s set f foreach l {}
$f w [gets stdin] {$s r {}
$f c [split $w {}] {$s r $c[string reverse $r]}
$s l "$l $r"}
puts $l
Bash - 133, assuming input is in $w variable
Pretty
for x in $w; do
z="";
for l in `echo $x|sed 's/\(.\)/ \1/g'`; do
if ((${#z}%2)); then
z=$z$l;
else
z=$l$z;
fi;
done;
echo -n "$z ";
done;
echo
Compressed
for x in $w;do z="";for l in `echo $x|sed 's/\(.\)/ \1/g'`;do if ((${#z}%2));then z=$z$l;else z=$l$z;fi;done;echo -n "$z ";done;echo
Ok, so it outputs a trailing space.