I have a CSV file that is causing me serious headaches going into Tableau. Some of the rows in the CSV are wrapped in a " " and some not. I would like them all to be imported without this (i.e. ignore it on rows that have it).
Some data:
"1;2;Red;3"
1;2;Green;3
1;2;Blue;3
"1;2;Hello;3"
Do you have any suggestions?
If you have a bash prompt hanging around...
You can use cat to output the file contents so you can make sure you're working with the right data:
cat filename.csv
Then, pipe it through sed so you can visually check that the quotes were delted:
cat filename.csv | sed 's/"// g'
If the output looks good, use the -i flag to edit the file in place:
sed -i 's/"// g' filename.csv
All quotes should now be missing from filename.csv
If your data has quotes in it, and you want to only strip the quotes that appear at the beginning and end of each line, you can use this instead:
sed -i 's/^"\(.*\)"$/\1/' filename.csv
It's not the most elegant way to do it in Tableau but if you cannot remove it in the source file, you could create a calculated field for the first and last column that strips the quotation marks.
right click on the field for the first column choose Create/Calculated Field
Use this formula: INT(REPLACE([FirstColumn],'"',''))
Name the column accordingly
Do the same for the last column
Assuming the data you provided fits the data you work on. The assumption is that these fields are integer field (thus the INT() usage). In case they are string fields you would want to make sure that you don't remove quotation marks that belong to the field value.
Related
I am trying to reorganise images based on keywords that are found in the IPTC metadata. More specifically, I need sort images into directories based on the species name in the subject pseudo tag of exiftool.
To do this, I have compiled the keywords in a .txt file (species_ls.txt), with each keyword on a new line, as such:
Asian Tortoise
Banded Civet
Banded Linsang
...
To sort the images I have created the following for loop, which iterates through each line of the document, with sed pulling out the keyword. Here, $line_no is the number of lines in the species_ls.txt file, and image_raw is the directory containing the images.
for i in 'seq 1 $line_no'; do
sp_name=$(sed -n "${i}p" < species_ls.txt)
exiftool -r -if '$subject=~/${sp_name}/i' \
'-Filename=./${sp_dir}/%f%+c%E' image_raw`
Although the for loop runs, no conditions are being met in the -if flag in exiftool. I am assuming this is because the variable sp_name is not being passed into the condition properly.
Any suggestions, or a better way of doing this, would be appreciated.
For the line with condition, rather than using single quotes (' '), it would be better to use double quotes (" ").
The single quotes mean that the content is passed literally, meaning your variable won't get expanded.
To overcome, the $subject line expanding (which I presume you don't want), you can just put a \ in front of the $ to escape it being read as a variable.
This line should now look like:
exiftool -r -if "\$subject=~/${sp_name}/i"
Hope this helps you!
I know this should be straight forward but I'm stuck, sorry.
I have two files both contain the same parameters but with different values. I'm trying to read one file line at a time, get the parameter name, use this to match in the second file and replace the whole line with that from file 1.
e.g. rw_2.core.fvbCore.Param.isEnable 1 (FVB_Params)
becomes
rw_2.core.fvbCore.Param.isEnable true (FVB_Boolean)
The lines are not always the same length but I always want to replace the whole line.
The code I have is as follows but it doesn't make the substitutions and I can't work out why not.
while read line; do
ParamName=`awk '{print $1}'`
sed -i 's/$ParamName.*/$line/g' FVB_Params.txt
done < FVB_Boolean.txt
You need your sed command within double quotes if you want those variables to be replaced with their values. You have single quotes, so sed is actually looking for strings with dollar signs to replace with the string '$line', not whatever your shell has in the $line variable.
In short, sed's not seeing the values you want. Switch to double quotes.
I'm running Windows and have the GnuWin32 toolkit, which includes sed. Specifically:
C:\TEMP>sed --version
GNU sed version 4.2.1
I have a text file with two sections: A fixed part I want to preserve, and a part that's appended after running a job.
In the file is a unique string that identifies the start of the part that's added, and I'd like to use Gnu sed to isolate only the part of the file that's before the unique string - i.e., so I can append different data to the fixed part each time the job is run.
I know I could keep the fixed portion in a separate file, but that adds complexity and it would be more elegant if I could just reuse the data at the start of the same file.
A long time ago I knew how to set up sed scripts, and I'm sure this can be done with sed, but I've slept since then. :)
Can you please describe how to use sed to display the lines of text in a file up to and not including a specific string?
Example:
line 1 of fixed portion
line 2 of fixed portion
unique string
line 1 of appended portion
line 2 of appended portion
line 3 of appended portion
What I'd like is to see as output:
line 1 of fixed portion
line 2 of fixed portion
I've gotten as far as:
sed -r -n -e "0,/unique string/p"
but that prints the unique string as well.
Thanks in advance.
-Noel
This should work for you:
sed -n '/unique string/q;p' file
It quits processing at unique string. Other lines get printed.
An alternative might be to use a range address like this:
sed -n '1,/unique string/{/unique string/!p}' file
Note that sed includes the range border. We need to exclude unique string from printing.
Furthermore I'm using the -n option which makes sed suppress the output of input lines by default.
One thing, if unique string can contain characters which are also syntax characters in the regex like ...
test*
... sed might not be the right tool for the job any more since it can only match regular expressions but not fixed strings.
In that case awk might be the tool of choice:
awk 'index("*unique string*"){exit}1' file
index("string") returns a non zero value (the position) if the string has been found. We cancel further processing of input lines in that case and don't print that line as well.
The trailing 1 always evaluates to true and makes awk print all the lines until the previous condition applies.
I have a list of usernames and i would like add possible combinations to it.
Example. Lets say this is the list I have
johna
maryb
charlesc
Is there is a way to use sed to edit it the way it looks like
ajohn
bmary
ccharles
And also
john_a
mary_b
charles_c
etc...
Can anyone assist me into getting the commands to do so, any explanation will be awesome as well. I would like to understand how it works if possible. I usually get confused when I see things like 's/\.(.*.... without knowing what some of those mean... anyway thanks in advance.
EDIT ... I change the username
sed s/\(user\)\(.\)/\2\1/
Breakdown:
sed s/string/replacement/ will replace all instances of string with replacement.
Then, string in that sed expression is \(user\)\(.\). This can be broken down into two
parts: \(user\) and \(.\). Each of these is a capture group - bracketed by \( \). That means that once we've matched something with them, we can reuse it in the replacement string.
\(user\) matches, surprisingly enough, the user part of the string. \(.\) matches any single character - that's what the . means. Then, you have two captured groups - user and a (or b or c).
The replacement part just uses these to recreate the pattern a little differently. \2\1 says "print the second capture group, then the first capture group". Which in this case, will print out auser - since we matched user and a with each group.
ex:
$ echo "usera
> userb
> userc" | sed "s/\(user\)\(.\)/\2\1/"
auser
buser
cuser
You can change the \2\1 to use any string you want - ie. \2_\1 will give a_user, b_user, c_user.
Also, in order to match any preceding string (not just "user"), just replace the \(user\) with \(.*\). Ex:
$ echo "marya
> johnb
> alfredc" | sed "s/\(.*\)\(.\)/\2\1/"
amary
bjohn
calfred
here's a partial answer to what is probably the easy part. To use sed to change usera to user_a you could use:
sed 's/user/user_/' temp
where temp is the name of the file that contains your initial list of usernames. How this works: It is finding the first instance of "user" on each line and replacing it with "user_"
Similarly for your dot example:
sed 's/user/user./' temp
will replace the first instance of "user" on each line with "user."
Sed does not offer non-greedy regex, so I suggest perl:
perl -pe 's/(.*?)(.)$/$2$1/g' file
ajohn
bmary
ccharles
perl -pe 's/(.*?)(.)$/$1_$2/g' file
john_a
mary_b
charles_c
That way you don't need to know the username before hand.
Simple solution using awk
awk '{a=$NF;$NF="";$0=a$0}1' FS="" OFS="" file
ajohn
bmary
ccharles
and
awk '{a=$NF;$NF="";$0=$0"_" a}1' FS="" OFS="" file
john_a
mary_b
charles_c
By setting FS to nothing, every letter is a field in awk. You can then easy manipulate it.
And no need to using capturing groups etc, just plain field swapping.
This might work for you (GNU sed):
sed -r 's/^([^_]*)_?(.)$/\2\1/' file
This matches any charactes other than underscores (in the first back reference (\1)), a possible underscore and the last character (in the second back reference (\2)) and swaps them around.
I am trying to export data from postgresq to CSV files but when I do have newlines in text in the database, the exported data will be broken on several lines, which makes much harder to read the CSV file, not to say that most applications will fail to load it properly.
Here is how I export the data now:
PRESQL="\pset format unaligned
\pset fieldsep \",\"
\pset footer off
\o 'out.csv'
"
cat <(echo $PRESQL) $QUERYFILE | psql …
Sa far, so good, unless you have newlines in the text fields. Any hack that would allow me to generate a very simple to parse CSV file (with one record per line)?
It was a mistake to consider that a CSV can be forced to have one line per row. The RFC states clear that newlines are to be enclosed in double quotes.
You can try replace() or regexp_replace() function.
The answer to the followinig SO question should give you an idea: How to remove carriage returns and new lines in Postgresql?