sed creates empty file after replacement operation - sed

sed -i '' 's/|/ /g' largefile.tsv > outfile.tsv
I've got a rather large 37 gb file that I'm trying to replace '|' with '\t' but after running for a long time, sed only outputs an empty file (0 bytes).
I'm running on macOS. What am I missing?

With -i, the input file changes "in place", and there's no output to redirect to a file.

Related

Just trying to delete lines of several .txt files - but keep getting SED error

So, I am trying to delete the first 23 lines of many .txt files. This is what I am currently doing:
sed -i -e 1,23d * .txt
but it gives me a weird error:
sed: 1: "1,23": command expected
I have no idea what to do. _
Like this:
sed -i -e '1,23d' *.txt
# ^ mandatory 'd'
# ^
# no space between * and .txt
The original question (before editing) showed:
sed -i -e 1,23d * .txt
but it gives me a weird error:
sed: Applications: in-place editing only works for regular files
The space between * and .txt was a mistake, but is an explanation of the weird error: It looks for all files as well as for the hidden file .txt.
Linux considers the directory Application as a file, but sed does not work on files.
The directory Application suggests that the files might come from Windows.
Another question is why echo *.txt don't show the txt files Mia was expecting. A logical explanation is that the files originated from Windows and Mia doesn't know that Linux is case sensitive. Files like A.TXT and b.Txt don't match *.txt.
When all txt files end with TXT, you can do
sed -i -e '1,23d' *.TXT
When you have a mix of upper- and lower case, the easiest way is
sed -i -e '1,23d' *.[tT][xX][tT]

Removing line breaks from CSV exported from Google Sheets

I have some data in the format:
-e, 's/,Chalk/,Cheese/g'
-e, 's/,Black/,White/g'
-e, 's/,Leave/,Remain/g'
in a file data.csv.
Using Gitbash, I use the file command to discover that this is ASCII text with CRLF terminators. If I also use the command cat -v , I see in Gitbash that each line ends ^M .
I want to remove those terminators, to leave a single line.
I've tried the following:
sed -e 's/'\r\n'//g' < data.csv > output.csv
taking care to put the \r\n in single quotes in order that the backslash is treated literally, but it does not work. No error, just no effect.
I'm using Gitbash for Windows.
Quotes within quotes cancel each other out, so you actually undo the quotes around the sed command for the newline characters. You could escape the quotes like 's|'\''\r\n'\''||g', but that would just include them in the string, which would not match anything in your case.
But that is not the only problem; sed by default only processes strings between newlines.
If you have the GNU version of sed, RAM to spare if the file is huge, and are sure the file does not contain data with null characters, try adding the -z argument, like:
sed -z -e 's|\r\n||g' < data.csv > output.csv
Though I guess you probably also want to replace it with a comma:
sed -z -e 's|\r\n|,|g' < data.csv > output.csv
For non-GNU versions of sed, you may have an easier time using tr instead, like:
tr '\r\n' ',' data.csv > output.csv

Sed operations only works with smaller files

OS: Ubuntu 14.04
I have 12 large json files (2-4 gb each) that I want to perform different operations on. I want to remove the first line, find "}," and replace it with "}" and remove all "]".
I am using sed to do the operations and my command is:
sed -i.bak -e '1d' -e 's/},/}/g' -e '/]/d' file.json
When i run the command on a small file (12,7kb) it works fine. file.json contains the content with the changes and file.json.bak contains the original content.
But when i run the command on my larger files the original file is emptied, e.g. file.json is empty and file.json.bak contains the original content. The run time is also what I consider to be "to fast", about 2-3 seconds.
What am I doing wrong here?
Are you sure your input file contains newlines as recognized by the platform you are running your commands on? If it doesn't then deleting one line would delete the whole file. What does wc -l < file tell you?
If it's not that then you probably don't have enough file space to duplicate the file so sed is doing something internally like
mv file backup && sed '...' backup > file
but doesn't have space to create the new file after moving the original to backup. Check your available file space and if you don't have enough and can't get more then you'll need to do something like:
while [ -s oldfile ]
do
copy first N bytes of oldfile into tmpfile &&
remove first N bytes from oldfile using real inplace editing &&
sed 'script' tmpfile >> newfile &&
rm -f tmpfile
done
mv newfile oldfile
See https://stackoverflow.com/a/17331179/1745001 for how to remove the first N bytes inplace from a file. Pick the largest value for N that does fit in your available space.

UNIX Replacing a character sequence in either tr or sed

Have a file that has been created incorrectly. There are several space delimited fields in the file but one text field has some unwanted newlines. This is causing a big problem.
How can I remove these characters but not the wanted line ends?
file is:
'Number field' 'Text field' 'Number field'
1 Some text 999999
2 more
text 111111111
3 Even more text 8888888888
EOF
So there is a NL after the word "more".
I've tried sed:
sed 's/.$//g' test.txt > test.out
and
sed 's/\n//g' test.txt > test.out
But none of these work. The newlines do not get removed.
tr -d '\n' does too much - I need to remove ONLY the newlines that are preceded by a space.
How can I delete newlines that follow a space?
SunOS 5.10 Generic_144488-09 sun4u sparc SUNW,Sun-Fire-V440
A sed solution is
sed '/ $/{N;s/\n//}'
Explanation:
/ $/: whenever the line ends in space, then
N: append a newline and the next line of input, and
s/\n//: delete the newline.
It might be simplest with Perl:
perl -p0 -e 's/ \n/ /g'
The -0 flag makes Perl read the entire file as one line. Then we can substitute using s in the usual way. You can, of course, also add the -i option to edit the file in-place.
How can I delete newlines that follow a space?
If you want every occurrence of $' \n' in the original file to be replaced by a space ($' '), and if you know of a character (e.g. a control character) that does not appear in the file, then the task can be accomplished quite simply using sed and tr (as you requested). Let's suppose, for example, that control-A is a character that is not in the file. For the sake of simplicity, let's also assume we can use bash. Then the following script should do the job:
#!/bin/bash
A=$'\01'
tr '\n' "$A" | sed "s/ $A/ /g" | tr "$A" '\n'

Extracting the contents between two different strings using bash or perl

I have tried to scan through the other posts in stack overflow for this, but couldn't get my code work, hence I am posting a new question.
Below is the content of file temp.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/<env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>
This file contains the base64 encoded contents of two files names test.txt and test1.txt. I want to extract the base64 encoded content of each file to seperate files test.txt and text1.txt respectively.
To achieve this, I have to remove the xml tags around the base64 contents. I am trying below commands to achieve this. However, it is not working as expected.
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g' > test.txt
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g' > test1.txt
Below command:
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g'
produces output:
XJzLXJlc3VsdHMtYWN0aW9uX18i
<dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:response> </env:Body></env:Envelope>`
Howeveer, in the output I am expecting only first line XJzLXJlc3VsdHMtYWN0aW9uX18i. Where I am commiting mistake?
When i run below command, I am getting expected output:
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g'
It produces below string
lc3VsdHMtYWN0aW9uX18i
I can then easily route this to test1.txt file.
UPDATE
I have edited the question by updating the source file content. The source file doesn't contain any newline character. The current solution will not work in that case, I have tried it and failed. wc -l temp must output to 1.
OS: solaris 10
Shell: bash
sed -n 's_<dp:file name="\([^"]*\)">\([^<]*\).*_\1 -> \2_p' temp
I add \1 -> to show link from file name to content but for content only, just remove this part
posix version so on GNU sed use --posix
assuming that base64 encoded contents is on the same line as the tag around (and not spread on several lines, that need some modification in this case)
Thanks to JID for full explaination below
How it works
sed -n
The -n means no printing so unless explicitly told to print, then there will be no output from sed
's_
This is to substitute the following regex using _ to separate regex from the replacement.
<dp:file name=
Regular text
"\([^"]*\)"
The brackets are a capture group and must be escaped unless the -r option is used( -r is not available on posix). Everything inside the brackets is captured. [^"]* means 0 or more occurrences of any character that is not a quote. So really this just captures anything between the two quotes.
>\([^<]*\)<
Again uses the capture group this time to capture everything between the > and <
.*
Everything else on the line
_\1 -> \2
This is the replacement, so replace everything in the regex before with the first capture group then a -> and then the second capture group.
_p
Means print the line
Resources
http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
http://www.grymoire.com/Unix/Sed.html
/usr/xpg4/bin/sed works well here.
/usr/bin/sed is not working as expected in case if the file contains just 1 line.
below command works for a file containing only single line.
/usr/xpg4/bin/sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' securebackup.xml 2>/dev/null
Without 2>/dev/null this sed command outputs the warning sed: Missing newline at end of file.
This because of the below reason:
Solaris default sed ignores the last line not to break existing scripts because a line was required to be terminated by a new line in the original Unix implementation.
GNU sed has a more relaxed behavior and the POSIX implementation accept the fact but outputs a warning.