Print previous line when a "match" found: Pyspark

Print previous line when a "match" found: Pyspark - pyspark

I would like to display a line before the matched string.
I know how to use it using "GREP" but I wanted the same using pyspark.
Could you please let me now if we can do this using pyspark.
grep -B
I use below code to search for the matched string
df.filter(F.col('string').contains('start'))
In the below example input is the dataframe which contains 4 lines.
When I search for the keyword "start", I would like to get the output with line "start" and also the previous line.
example:
input
2021-08-30 end active
2021-09-01 end inactive
2021-09-02 start active
2021-09-03 end active
**Expected output:**
2021-09-01 end inactive
2021-09-02 start active

Related

Does intersystems cache have a wildcard to search global node?

Sometimes i want to search a character with wildcard, I don't want to search all the global nodes to find specific characters. so i want to know is any wildcard i can use to match specific characters on global nodes. as if i want to find ^G("abc") in ^G with ^G("*s*")

There is no way to do this using low level $order/$query functions as #
kazamatzuri correctly said, but you can use %Library.Global:Get class query - first parameter is namespace, and second parameter is pattern string. You can have a documentation on pattern syntax in the class itself or here https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GGBL_managing#GGBL_managing_view
Here is an example using CALL statement - let's assume we want to find all global nodes in ^%SYS global of USER namespace starting with "D":
DEV>d $system.SQL.Shell()
SQL Command Line Shell
----------------------------------------------------
The command prefix is currently set to: <<nothing>>.
Enter <command>, 'q' to quit, '?' for help.
[SQL]DEV>>call %Library.Global_Get('USER','^%SYS("D":"E"')
1. call %Library.Global_Get('USER','^%SYS("D":"E"')
Dumping result #1
Name Value Name Format Value Format Permissions
^%SYS("DBRefByName","CONFIG-ANALYTICS") ^^f:\trakcare\config\db\analytics\ 1
^%SYS("DBRefByName","CONFIG-APPSYS") ^^f:\trakcare\config\db\appsys\ 1 1
^%SYS("DBRefByName","CONFIG-AUDIT0") ^^f:\trakcare\config\db\audit0\ 1 1
^%SYS("DBRefByName","CONFIG-AUDIT1") ^^f:\trakcare\config\db\audit1\ 1 1
^%SYS("DBRefByName","CONFIG-AUDIT2") ^^f:\trakcare\config\db\audit2\ 1 1

No.
You'll have to implement that yourself using $ORDER or $QUERY. There are pattern matching and regex utils though.
Cheers!

PowerShell script to remove extra comma from txt file

We have a process that loads a daily bank file (txt format) into another system and we have noticed this process fails if an extra comma appears between the first and last name. We currently manually fix this by finding the error line and removing the offending comma and resaving the file. This issue has started to occur more frequently now and on random lines so I'm trying to see if there is say a PowerShell script I can create that will look for and remove offending comma should one appear.
A sample of the text file looks like this:
206985,23034038,1,62,206985,60715093,0098,00000019600,DERBYSHIRE S ,20456461 , , 17195
206985,23034038,1,62,206209,23511456,0005,00000010000,BRYANS C ,20499987 , , 17195
206985,23034038,1,62,203351,83878848,0006,00000005000,JM HARVEY ,20560148 , , 17195
206985,23034038,1,62,202542,43608352,0032,00000010389,INGLIS P E ,21775660 , , 17195
206985,23034038,1,62,209263,30535818,0016,00000018000,MUZONDO F ,22194301 , , 17195
206985,23034038,1,62,205568,90105171,0092,00000010000,ADKIN ,AM ,22363046 , , 17195
As you can see in the last line there is an extra comma between the last and firstname. I need a script that would remove this before the file gets processed as usual.
Is this possible?
The filename also begins "camt.xxx.barc.stm_" and the remainder is made up of a ref number which changes daily so for example:
camt.xxx.barc.stm_D20170714_R7741261
camt.xxx.barc.stm_D20170720_R8447561

Using perl to split over multiple lines

I'm trying to write a perl script to process a log4net log file. The fields in the log file are separated by a semi-colon. My end goal is to capture each field and populate a mysql table.
Usually I have lines that look a little like this (all on a single line)
DEBUG;2017-06-13T03:56:38,316-05:00;2017-06-13 08:56:38,316;79ab0b95-7f58-
44a8-a2c6-1f8feba1d72d;(null);WorkerStartup 1;"Starting services."
These are easy to process. I can simply split by semicolon to get the information I need.
However occassionally the "message" field at the end may span several lines, especially if there is a stack trace. I would want to capture the entire message as a single column. I cannot use split by semicolon, because the next lines would typically look like:
at some.random.classname
at another.classname
...
Can someone give some tips how to solve this problem?

The following solution uses that the number of " in a field is even ($p=~y/"//%2), this condition number of " odd may be changed by other that can indicate the field is not complete.
The number of columns splitted is fixed to 7 (to allow ; in last field) and may be changed for example #array = map {s/;$//} $p=~/\G(?:"[^"]*"|[^;])*;/g;.
The file is read line by line but a line is processed sub process when it's complete $p variable to store the previous line the last line is processed in END block.
perl -ne '
sub process {
#array = split /;/,$p,7;
# do something with array
print ((join "\n---\n", #array),"\n");
}
if ($p=~y/"//%2) {
$p.=$_;
next;
}
process;
$p=$_;
END{process}
' < logfile.txt

Hackerrank stdin only gives me the first line from multiple lines

I am trying to solve this challenge on HackerRank:
https://www.hackerrank.com/challenges/30-operators?h_r=next-challenge&h_v=zen
the way I tried to read the stdin is this:
let input = readline()!
However, the input consists of three lines, e.g.
12.00
20
8
How do I get all three lines, ideally in some separated way so that I can cast them to their respective types?

If you need the 3 lines, call it 3 times :)
The documentation is explicit
Returns Characters read from standard input through the end of the
current line or until EOF is reached, or nil if EOF has already been
reached.
But it seems they forgot to indicate that reading a line will change the current line

Generally what I would do is use input() to assign the stdin values to a variable and then pass the variables for functions.
example:
a = input() #gives the first line
b = input() #gives the second line
c = input() #gives the third line
if you would like to read all the lines then use a for loop:
example:
import sys
for line in sys.stdin:
print(line)

Linux command to sort according to second word/character

I have a linux file whose content is as below:
hey this
is just
sample file
I want to :
1. sort the three lines according to the second word so the output should be :
sample file
is just
hey this
2. sort the three lines according to the second character of second line, so the output would be :
hey this
sample file
is just
Is there anyway i can run a perl/unix command on command line(doesnt matter using pipes)?

I got the answer for both the questions:
For sorting by second word: sort -k2 myfile
For sorting by second character of second word sort -k2.3 myfile

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Print previous line when a "match" found: Pyspark - pyspark

Related

Does intersystems cache have a wildcard to search global node?

PowerShell script to remove extra comma from txt file

Using perl to split over multiple lines

Hackerrank stdin only gives me the first line from multiple lines

Linux command to sort according to second word/character

Categories

Resources