How do i use \ (backslash) in a here-doc and have it show up when I print? - perl

print <<EOTEXT;
(`-') (`-') _<-. (`-')_ <-. (`-')
_(OO ) ( OO).-/ \( OO) ) .-> \(OO )_
,--.(_/,-.\(,------.,--./ ,--/ (`-')----. ,--./ ,-.)
\ \ / (_/ | .---'| \ | | ( OO).-. '| `.' |
\ / / (| '--. | . '| |)( _) | | || |'.'| |
_ \ /_) | .--' | |\ | \| |)| || | | |
\-'\ / | `---.| | \ | ' '-' '| | | |
`-' `------'`--' `--' `-----' `--' `--'
EOTEXT
This is my ascii art that id like to show up in console. How ever it seems that " \ " doesnt show up. Is there a way that i can make it appear.

In double-quoted string literals, \ is the start of an escape sequence. When followed by a non-word character, it causes that character to be produced. For example, \| and \␠ produce | and a space respectively. And of course, \\ produces \, so we can use \\ where we want \ in double-quote string literals.
Here docs (<< string literals) act as double-quoted string literals, unless the token that follows the << is single-quoted. Then the string produced matches the input exactly.
So we have the option of prepending \ to every special character (\, $ and #), or we can simply single-quote the token.
print <<'EOTEXT';
(`-') (`-') _<-. (`-')_ <-. (`-')
_(OO ) ( OO).-/ \( OO) ) .-> \(OO )_
,--.(_/,-.\(,------.,--./ ,--/ (`-')----. ,--./ ,-.)
\ \ / (_/ | .---'| \ | | ( OO).-. '| `.' |
\ / / (| '--. | . '| |)( _) | | || |'.'| |
_ \ /_) | .--' | |\ | \| |)| || | | |
\-'\ / | `---.| | \ | ' '-' '| | | |
`-' `------'`--' `--' `-----' `--' `--'
EOTEXT

Related

How to create new columns from an existing column that contains lists in pyspark dataframe?

I have the below PySpark DataFrame, I would like to create new columns out of the transaction column using PySpark. I'm not sure how this can be done by using python and pyspark functions.
| id | transaction |
| ------- | -------------------------------------------------- |
| 06t84g | [['T_BILL',0.99],['Z_BILL',0.33],['A_BILL',0.77]] |
| 098t1g | [['T_BILL',0.419],['Z_BILL',0.19],['A_BILL',0.137]]|
| 03z94f | [['T_BILL',0.79],['Z_BILL',0.49],['A_BILL',0.317]] |
| 10yw22 | [['T_BILL',0.91],['Z_BILL',0.818],['A_BILL',0.457]]|
| 30r990 | [['T_BILL',0.193],['Z_BILL',0.69],['A_BILL',0.947]]|
Below is the desired DataFrame:
| id | transaction |T_BILL|Z_BILL|A_BILL|
| ------- | -------------------------------------------------- | --- | ---- | ---- |
| 06t84g | [['T_BILL',0.99],['Z_BILL',0.33],['A_BILL',0.77]] |0.99 |0.33 |0.77 |
| 098t1g | [['T_BILL',0.419],['Z_BILL',0.19],['A_BILL',0.137]]|0.419|0.19 |0.137 |
| 03z94f | [['T_BILL',0.79],['Z_BILL',0.49],['A_BILL',0.317]] |0.79 |0.49 |0.317 |
| 10yw22 | [['T_BILL',0.91],['Z_BILL',0.818],['A_BILL',0.457]]|0.91 |0.818 |0.457 |
| 30r990 | [['T_BILL',0.193],['Z_BILL',0.69],['A_BILL',0.947]]|0.193|0.69 |0.947 |
I really appreciate the time and effort you put into this.
This can be achieved using explode() and pivot().
data_ls = [
('06t84g', [['T_BILL',0.99],['Z_BILL',0.33],['A_BILL',0.77]]),
('098t1g', [['T_BILL',0.419],['Z_BILL',0.19],['A_BILL',0.137]])
]
data_sdf = spark.sparkContext.parallelize(data_ls).toDF(['id', 'trans'])
data_sdf. \
withColumn('trans_explode', func.explode('trans')). \
select('id',
func.col('trans_explode').getItem(0).alias('bill_name'),
func.col('trans_explode').getItem(1).alias('bill_amt')
). \
groupBy('id'). \
pivot('bill_name', values=['T_BILL', 'Z_BILL', 'A_BILL']). \
agg(func.first('bill_amt')). \
show()
# +------+------+------+------+
# |id |T_BILL|Z_BILL|A_BILL|
# +------+------+------+------+
# |06t84g|0.99 |0.33 |0.77 |
# |098t1g|0.419 |0.19 |0.137 |
# +------+------+------+------+
The explode() will create separate rows for each bill list. We can then use that to create 2 columns - one for the name, and another for the amount. Finally, we can just pivot() the name column created in the last step. If the array column is also required, we can add that to the select() and groupBy() statements.
Per OP's comment, if the transaction field is not an array column but an array in string format, we can first parse the string to create an array and then use the aforementioned code on the resulting array column.
data_ls = [
('06t84g', '''[['T_BILL',0.99],['Z_BILL',0.33],['A_BILL',0.77]]'''),
('098t1g', '''[['T_BILL',0.419],['Z_BILL',0.19],['A_BILL',0.137]]''')
]
data_sdf = spark.sparkContext.parallelize(data_ls).toDF(['id', 'trans'])
# parse the string to create an array
def parse_string_array(strarr):
import json
return json.loads(strarr.replace('\'', '\"'))
# use an UDF for pyspark transformation
# this parses the numbers as strings
# so the user can cast it to required format when required later
parse_string_array_udf = func.udf(parse_string_array, ArrayType(ArrayType(StringType())))
data_sdf. \
withColumn('trans_arr', parse_string_array_udf(func.col('trans'))). \
withColumn('trans_explode', func.explode('trans_arr')). \
select('id',
func.col('trans_explode').getItem(0).alias('bill_name'),
func.col('trans_explode').getItem(1).alias('bill_amt')
). \
groupBy('id'). \
pivot('bill_name', values=['T_BILL', 'Z_BILL', 'A_BILL']). \
agg(func.first('bill_amt')). \
show()
# +------+------+------+------+
# | id|T_BILL|Z_BILL|A_BILL|
# +------+------+------+------+
# |06t84g| 0.99| 0.33| 0.77|
# |098t1g| 0.419| 0.19| 0.137|
# +------+------+------+------+

Weird behavior of sed's backreference

We have the following line of text:
| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
As you can see, the line of text simply consists of three similar phrases, which can be matched and changed (separately) using the following sed expression:
sed -n 's#| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |#| ![\3](\1\2.\3.\4) |#p'
If we had just one phrase (instead of the given three), the result would be the following:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) |' | sed -n 's#| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |#| ![\3](\1\2.\3.\4) |#p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |
But when we have two or thee phrases, the result always points to the last matched phrase:
Here's an example with two matches:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) |' | sed -n 's#| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |#| ![\3](\1\2.\3.\4) |#p'
| ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) |
And here's an example with three matches:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's#| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |#| ![\3](\1\2.\3.\4) |#p'
| ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
Why does this happening?
Is there a way to force sed to print the result only for the very first match?
The expected behavior? I though the following command would print something similar to this (just the first match):
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's#| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |#| ![\3](\1\2.\3.\4) |#p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |
or this (all matches):
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's#| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |#| ![\3](\1\2.\3.\4) |#p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
What's happening is that | !\[.*\] matches the longest possible match. That is, the first phrase, up to the beginning of the last phrase. If you want to match only the first phrase you must be more specific. For instance with:
sed 's#| !\[\]\(([^.]*\.\([^.]*\)\.[^)]*)\) |.*#| ![\2]\1 |#'
I do not fully understand the question, but, you can try this sed
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#' input_file
This will print all 3 matches but will only substitute into the first match
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
To target all 3, the g flag can be added
sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#g' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
You could also target just #2 for example
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#2' input_file
| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |

escape pipe( | ) character inside table. in Github pages

In my github page, the escape charecters not working currectly.
https://alirezanet.github.io/Gridify/
but its working currectly in github
if i remove these backslashes from MD file entire table will break in github,
how can i fix this?
| Name | Operator | Usage example |
| --------------------- | -------- | --------------------------------------------------------- |
| Equal | `==` | `"FieldName==Value"` |
| AND - && | `,` | `"FirstName==Value , LastName==Value2"` |
| OR - \|\| | `\|` | `"FirstName==Value \| LastName==Value2"` |
| Parenthesis | `()` | `"( FirstName=*Jo,Age<<30) \| ( FirstName!=Hn,Age>>30 )"` |
You can use:
<code>|</code>
in replacement of |

Removing spaces from data in a column of dataframe in scala spark

This is the command I am using to remove "." from data in a df column in spark-scala which is working fine
rfm = rfm.select(regexp_replace(col("tagname"),"\\.","_") as "tagname",col("value"),col("sensor_timestamp")).persist()
But this is not working to remove leading spaces in the same columnar data
rfm = rfm.select(regexp_replace(col("tagname")," ","") as "tagname",col("value"),col("sensor_timestamp")).persist()
There is no error . It just fails to remove any leading spaces that i see in the data
Input : rfmshow()
+--------------------+-----+----------------+
| tagname |value|timestamp |
+--------------------+-----+----------------+
| P.A |101.5| 1.409643313E12|
| P.A |100.5| 1.409643315E12|
| P.A |100.5| 1.409644709E12|
|P.B | 0.0| 1.40964471E12|
Output :
+--------------------+-----+----------------+
| tagname |value|timestamp |
+--------------------+-----+----------------+
| P_A |101.5| 1.409643313E12|
| P_A |100.5| 1.409643315E12|
| P_A |100.5| 1.409644709E12|
|P_B | 0.0| 1.40964471E12|
You have to provide a pattern not just the space. Provide it as below.
regexp_replace(col("tagname"),"\\s+"," ")
\s+ is for more than one space and one more extra \ is to escape the \ in \s inside method.

org-babel not concatenating strings before sending to code block variable.

I have just started using org-mode and org-babel as a lab notebook. I am trying to use a code block to fill in two columns of a table. The code block seems to work for the first column because those are the right numbers. However, when trying to concatenate a string to the file name in column three so the code blocks works on a different set of files it seems to just run the code block on the original files instead, which produces the same output as column one.
#+name: tRNAs
#+begin_src sh :var filename="" :results silent
cd Data/tRNA
grep -c ">" $filename
#+end_src
#+tblname: sequences
| # of Sequences before QC | # after QC | Original File name|
|--------------------------+------------+------------------|
| 681865 | 681865 | read1 |
| 324223 | 324223 | read2 |
| 1014578 | 1014578 | read3 |
| 971965 | 971965 | read4 |
| 931777 | 931777 | read5 |
| 810798 | 810798 | read6 |
| 965134 | 965134 | read7 |
| 718474 | 718474 | read8 |
|--------------------------+------------+------------------|
#+TBLFM: $1='(org-sbe tRNAs (filename (concat "\"" $3 "\"")))
#+TBLFM: $2='(org-sbe tRNAs (filename (concat "\"final_" $3 "\"")))