Sed operations only works with smaller files - sed

OS: Ubuntu 14.04
I have 12 large json files (2-4 gb each) that I want to perform different operations on. I want to remove the first line, find "}," and replace it with "}" and remove all "]".
I am using sed to do the operations and my command is:
sed -i.bak -e '1d' -e 's/},/}/g' -e '/]/d' file.json
When i run the command on a small file (12,7kb) it works fine. file.json contains the content with the changes and file.json.bak contains the original content.
But when i run the command on my larger files the original file is emptied, e.g. file.json is empty and file.json.bak contains the original content. The run time is also what I consider to be "to fast", about 2-3 seconds.
What am I doing wrong here?

Are you sure your input file contains newlines as recognized by the platform you are running your commands on? If it doesn't then deleting one line would delete the whole file. What does wc -l < file tell you?
If it's not that then you probably don't have enough file space to duplicate the file so sed is doing something internally like
mv file backup && sed '...' backup > file
but doesn't have space to create the new file after moving the original to backup. Check your available file space and if you don't have enough and can't get more then you'll need to do something like:
while [ -s oldfile ]
do
copy first N bytes of oldfile into tmpfile &&
remove first N bytes from oldfile using real inplace editing &&
sed 'script' tmpfile >> newfile &&
rm -f tmpfile
done
mv newfile oldfile
See https://stackoverflow.com/a/17331179/1745001 for how to remove the first N bytes inplace from a file. Pick the largest value for N that does fit in your available space.

Related

sed creates empty file after replacement operation

sed -i '' 's/|/ /g' largefile.tsv > outfile.tsv
I've got a rather large 37 gb file that I'm trying to replace '|' with '\t' but after running for a long time, sed only outputs an empty file (0 bytes).
I'm running on macOS. What am I missing?
With -i, the input file changes "in place", and there's no output to redirect to a file.

Using sed and mv to add characters to files

First off, I'd like to say that I know this is almost an exact duplicate of some posts that I've read, but have not had any luck with referencing.
I have 100+ files that all follow a very strict naming convention of 5_##_<name>.ext My issue was that when originally making these files I failed to realise that 5_100_ and above would mess up my ordering.
I am now trying to append a 0 in front of every number between 01 and 99. I've written a bash script using sed that works for the file contents (the file name is in the file as well):
#!/bin/bash
for fl in *.tcl; do
echo Filename: $fl
#sed -i 's/5_\(..\)_/5_0\1_/g' $fl
done
However, this only changes the contents and not the filename itself. I've read that mv is the solution (rename is simpler but I do not have it on my system). My current incarnation of my multiple attempts is:
mv "$fl" $(echo "$file" | sed -e 's/5_\(..\)_/5_0\1_/g') but it gives me an error: mv: missing destination file operand after <filename>
Again, I'm sorry about the duplicate but I wasn't able to solve my issue by reading it. I'm sure I'm just using the combination of mv and sed incorrectly.
Solution was entered in the comments. I was using $file instead of $fl.
Something like this might be useful:
for n in $(seq 99)
do
prefix2="5_$(printf "%02d" ${n})_"
prefix3="5_$(printf "%03d" ${n})_"
for f in ${prefix2}*.tcl
do
suffix="${f#${prefix2}}"
[[ -r "${prefix3}${suffix}" ]] || mv "${prefix2}${suffix}" "${prefix3}${suffix}"
done
done
Rather than processing every single file, it only looks at the ones that currently have a "5_XX_" prefix, and only renames them if the corresponding "5_XXX_" file doesn't already exist...
#!/bin/bash
for fl in *.tcl
do
NewName="$(echo "${fl} | sed '/^5_[0-9]\{2\}_/ s/../&0/' )"
#echo "Filename: ${fl} -> ${NewName}
[ ! "${fl}" = "${NewName}" ] && mv ${fl} ${NewName}
done
With a bit a securisation a allow to pass several time on same folder (changing only needed one).
Under linux (non posix sed by default), use sed --posix instead of simple sed call

Grep data and output to file

I'm attempting to extract data from log files and organise it systematically. I have about 9 log files which are ~100mb each in size.
What I'm trying to do is: Extract multiple chunks from each log file, and for each chunk extracted, I would like to create a new file and save this extracted data to it. Each chunk has a clear start and end point.
Basically, I have made some progress and am able to extract the data I need, however, I've hit a wall in trying to figure out how to create a new file for each matched chunk.
I'm unable to use a programming language like Python or Perl, due to the constraints of my environment. So please excuse the messy command.
My command thus far:
find Logs\ 13Sept/Log_00000000*.log -type f -exec \
sed -n '/LRE Starting chunk/,/LRE Ending chunk/p' {} \; | \
grep -v -A1 -B1 "Starting chunk" > Logs\ 13Sept/Chunks/test.txt
The LRE Starting chunk and LRE Ending chunk are my boundaries. Right now my command works, but it saves all matched chunks to one file (whose size is becoming exessive).
How do I go about creating a new file for each match and add the matched content to it? keeping in mind that each file could hold multiple chunks and is not limited to one chunk per file.
Probably need something more programmable than sed: I'm assuming awk is available.
awk '
/LRE Ending chunk/ {printing = 0}
printing {print > "chunk" n ".txt"}
/LRE Starting chunk/ {printing = 1; n++}
' *.log
Try something like this:
find Logs\ 13Sept/Log_00000000*.log -type f -print | while read file; do \
sed -n '/LRE Starting chunk/,/LRE Ending chunk/p' "$file" | \
grep -v -A1 -B1 "Starting chunk" > "Logs 13Sept/Chunks/$file.chunk.txt";
done
This loops over the find results and executes for each file and then create one $file.chunk.txt for each of the files.
Something like this perhaps?
find Logs\ 13Sept/Log_00000000*.log -type f -exec \
sed -n '/LRE Starting chunk/,/LRE Ending chunk/{;/LRE .*ing chunk/d;w\
'"{}.chunk"';}' {} \;
This uses sed's w command to write to a file named (inputfile).chunk. If that is not acceptable, perhaps you can use sh -c '...' to pass in a small shell script to wrap the sed command with. (Or is a shell script also prohibited for some reason?)
Perhaps you could use csplit to do the splitting, then truncate the output files at the chunk end.

SED Delete lines and replace with new from file

Have been looking at SED documention but need a little pointer in the right direction
I have 200 files I want to modify in a batch.
Source is html file.
Need to create a new file for the changes.
Want to delete the first part of each file up to the first tag (This is 20 or so lines but can vary slightly).
Then insert the contents of a source file (the same for all files) into the new target file starting at line 1, for 30 or so lines. The number of lines to insert does not match the number that are deleted though.
Hope you can help.
Paul
This can certainly be done with sed(1), but I would probably use the vanilla editor ed(1).
$ cat > bigfix.sh
for i in "$#"; do
ed "$i" << \eof
1,/<tag>/-1d
0r otherfile.html
w
q
eof
done
$ sh bigfix.sh file*.html
This shell script takes arguments and runs ed(1) on each arg. It deletes lines starting from the first and ending on the line right before the one with <tag>. It then puts otherfile.html at the top and writes out the result.
For an individual file:
sed -e '1,/tag/{/tag/r insertfile' -e ';d}' inputfile > outputfile
For many files:
find . -name 'criterion*.ext' -type f -exec sh -c 'sed -e "1,/tag/{/tag/r insertfile" -e ';d}" "{}" > "{}.new"' \;
Edit:
Fixed the find command to use sh because of the redirection. Note the change in quoting from the previous version.

How to search and replace in text files only?

I have a directory containing a bunch of files, some text some binary, with no consistent naming. I want to search and replace a string in text files only. So I went with:
perl -i -pne 's#/some/text/to/replace#/replacement/text#' *
Remove the -i option and you will see that binary files get caught. How do I modify this one-liner to skip binary files?
ack -n --text --sort -f . | xargs perl -i -pne 's…'
Abusing ack goes much quicker than writing your own solution with -T.
Well, this is all based on what your definition of a text file is. Perl 5 has the -T filetest operator that will tell you if a filename or filehandle is a text file (using Perl 5's definition):
perl -i -pne 'BEGIN{#ARGV=grep-T,#ARGV}s#regex#replacement#' *
The BEGIN block will filter out any files that don't pass the -T test, so they won't even be read (except for their first block because that is what -T uses to determine if they are text).
From perldoc -f -X
The -T and -B switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or characters with the high bit set. If too many strange characters (>30%) are found, it's a -B file; otherwise it's a -T file. Also, any file containing a zero byte in the first block is considered a binary file. If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block. Both -T and -B return true on an empty file, or a file at EOF when testing a filehandle. Because you have to read a file to do the -T test, on most occasions you want to use a -f against the file first, as in next unless -f $file && -T $file .