Find and replace text in a 47GB large file [closed] - command-line

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I have to do some find and replace tasks on a rather big file , about 47 GB in size .
Does anybody know how to do this ? I tried using services like TextCrawler , EditpadLite and more but nothing supports this large a file .
I'm assuming this can be done via the commandline .
Do you have an idea how this can be accomplished ?

Sed (stream editor for filtering and transforming text) is your friend.
sed -i 's/old text/new text/g' file
Sed performs text transformations in a single pass.

I use FART - Find And Replace Text by Lionello Lunesu.
It works very well on Windows Seven x64.
You can find and replace the text using this command:
fart -c big_filename.txt "find_this_text" "replace_to_this"
github

On Unix or Mac:
sed 's/oldstring/newstring/g' oldfile.txt > newfile.txt
fast and easy...

I solved the problem usig, before, split to reduce the large file in smalls with 100 MB each.

If you are using a Unix like system then you can use cat | sed to do this
cat hosted_domains.txt | sed s/com/net/g
Example replaces com with net in a list of domain names and then you can pipe the output to a file.

For me none of the tools suggested here work well. Textcrawler ate all my computer's memory, SED didn't work at all, Editpad complained about memory...
The solution is: create your own script in python, perl or even C++.
Or use the tool PowerGrep, this is the easiest and fastest option.
I have't tried fart, it's only command line and maybe not very friendly.
Some hex editor, such as Ultraedit also work well.

I used
sed 's/[nN]//g' oldfile.fasta > newfile.fasta
to replace all the instances of n's in my 7Gb file.
If I omitted the > newfile.fasta aspect it took ages as it scrolled up the screen showing me every line of the file.
With the > newfile it ran it in a matter of seconds on an ubuntu server

Related

How to reset my PATH after breaking it accidentally? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I think I run something incorrectly trying to add a directory to PATH in fish. Perhaps it was this:
set -g PATH my_foobar_directory "$PATH"
From fish tutorial I now understand that I shouldn't have added the double-quotes.
Better yet, should've used fish_add_path my_foobar_directory.
Lesson learned; however, the change has persisted somewhere, and nothing I try seems to recover the previous state. I also cannot find the previous PATH value — the console logs with it were washed away by copious fish: Unknown command: python etc, from fish_prompt bells & whistles.
Falling back to bash gives me bogus PATH as well — even after set -e PATH.
What do? How do I start over?
So for myself, I solved it like this.
In the process tree, I found a sufficiently long-running process. In my case, cinnamon-session worked — though any not-so-distant fish parent would do.
The idea being that in that process's environment, the previous PATH value could still be intact. It was.
Then basically — let's say the pid was 661 — print environment of pid 661 in fish format:
/bin/tr \0 ' ' < /proc/661/environ
# copy output
Then just pick that output, and feed it into the "universal" variant (fish-specific) of the PATH variable, taking care to erase all other variants:
set -e PATH
set -eg PATH
set -Ux PATH <paste>

Double exclamation in fish shell [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
In zsh, I can execute them.
$ sleep 1
$ echo !$ # !$ equals 1
$ echo !! # !! equals sleep 1
But I can't execute them in fish shell.
Could tell me why and where the zsh documentation is?
This is history expansion, which has a lot more to it then those simple examples.
Fish supports none of it (and probably never will). The usual workaround is to use keybindings. By default, alt-up and alt-down should go through the history token-wise, so you can press alt-up once to get what is effectively !$.
If you wish to prepend something to a command from history, recall that command, go to the beginning (e.g. with ctrl-a) and insert what you want.
Other possibilities are functions to bind e.g. !! to something to insert the previous command or to make a command called !!.
This is still discussed in fish issue #288, though concensus seems to be against adding history expansion.

Why is case insensitive search and replace not working in Solaris sed? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I tried the following commands in Solaris sed for case insensitive find and replace
sed s/TOFIND/REPLACE/gi fileName
sed s/TOFIND/REPLACE/gi fileName
/usr/xpg4/bin/sed s/TOFIND/REPLACE/gi fileName
/usr/xpg4/bin/sed s/TOFIND/REPLACE/gi fileName
but none of the ways worked. I got command garbled error for all. Is there no support for case insensitive search in Solaris sed?
i is a non standard GNU sed extension.
You can use GNU sed if installed. It might be in /usr/sfw/bin/gsed or /usr/gnu/bin/sed depending on the version.
Otherwise, the standard way is
sed 's/[Tt][Oo][Ff][Ii][Nn][Dd]/REPLACE/g' fileName
You might automatize the process that way:
pattern="tofind"
sed "s/$(printf "%s" "$pattern"|sed 's/./\[\U&\L&\]/g')/REPLACE/g" fileName
another alternative is to replace each aphabetic char of the search pattern by his equivalent [sC] like this: by [tT][hH][iI][sS]: (with a previous sed/awk on pattern to be generic)
printf "%s\n" "SearchPattern" | sed 's/[aA]/[aA]/g;s[bB]/[bB]/g; ..... ;s/[zZ]/[zZ]/g' | read -r CaseSearchPattern
/usr/xpg4/bin/sed "s/${CaseSearchPattern}/REPLACE/g" fileName
just add a second test eventually (and corrective action) if some special char like \ are in the content due to "" shell interpretation arround the sed action

What is a good way to count source lines of code (SLOC) in a CoffeeScript project? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Is there a common way of counting source lines of code (SLOC) in a CoffeeScript project?
I'm hoping for something that will traverse all of the directories in my project during the count. I found a few projects online but they seemed kind of overkill for the task. I would love a simple utility or even just some command-line-fu.
If you're on UNIX, I would go with the wc tool. I usually use wc -l *.coffee */*.coffee etc. because it is easy to remember. However, a recursive version would be
wc -l `find <proj-dir> -type f | grep \.coffee$`
which runs the find command, which recursively lists files of type f, or normal files, fed into the grep, which filters down to just Coffeescript files, and the output of that is used as the command line arguments to wc (-l signals a line count).
Edit: Now we don't want to count blank or comment lines (we're only catching single-line comments here). We lose the per-file counts, but here goes:
cat `find <proj-dir> -type f | grep \.coffee$` | sed '/^\s*#/d;/^\s*$/d' | wc -l
We find the Coffeescript files, and then cat them. Then, sed strips out lines that consist of only whitespace or have whitespace followed by a #. Finally, our friend wc counts the remaining lines.
This will do what you want: https://github.com/blackducksw/ohcount
It correctly excludes comments and blank lines and also supports many other languages.

Syntax Highlighting Pager [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Right now, I use most as my pager. While it has helpful syntax highlighting for man pages, it lacks colored syntax highlighting for anything else (I'm specifically looking for diff/C++).
Meanwhile, pygments is a wonderful program. I can easily make colorized output with it:
# ./pygmentize -f console256 ${file}
hg diff | ./pygmentize -f console256 -l diff
Now, I would like to be able to page the output, so I just use:
# ./pygmentize -f console256 ${file} | most
hg diff | ./pygmentize -f console256 -l diff | most
At this point, most dumps all the colorizing control characters to my screen like so:
^[[38;5;28;01mclass^[[39;00m ^[[38;5;21;01mheap_allocator^[[39;00m
{
^[[38;5;28;01mpublic^[[39;00m^[[38;5;241m:^[[39m
This is, of course, unreadable. I looked though the man page for most, but I couldn't find any "Hey, show those control characters as colors instead of printing them" options. less has the same garbage behavior as most, but more shows the colors perfectly fine, with the obvious limitations of being more.
Is there a pager that supports syntax highlighting or some crazy combination of parameters and programs I can string together to make this work? Ultimately, I would like to get diffs and logs from Mercurial to be highlighted, so maybe there is a shortcut in there...
Might I suggest vimpager?
First off, recent vim distributions (I believe 6.0 and above) come with a pager-esque-mode script. It's quite simple and functional, and operates similarly to less. Try: vim '+help less' +only.
Even better, however, Rafael Kitover has written a much more robust and powerful script called vimpager. It's available on GitHub (or vimscripts). If you are on OS X and using Homebrew, it's as easy as brew install vimpager.
At that point, you can simply set $PAGER=vimpager, or even alias less=vimpager. It works excellently.
less -R shows ANSI color sequences as-is (instead of expanding to caret notation). That'll make syntax highlighting work!
You can also create an environment variable LESS=-R to make this the default behavior. Similarly for other options; see man less.
Look for bat: A cat(1) clone with wings.
bat supports syntax highlighting for a large number of programming and markup languages.
It is not a pager, but it automatically redirects output to less if needed.
You might try using jed. Yes, it's a text editor, not a pager, but it's quite lightweight and the default install contains excellent colorschemes for a wide variety of file types and languages.
Jed has syntax highlighting modes for different languages, simillar to the emacs ones. For example if you piped a C program to it, you can turn on the highlighting by pressing 'ESC', then 'x', then typing 'c-mode' . If it is a php program - change the last part to 'php-mode' and so on ...