Convert XML like String to PySpark Dataframe - pyspark

I'm using azure.storage.queue's receive_messages() function in databricks to pull messages from a azure queue. The response looks like xml but it is really just a string:
<?xml version="1.0" encoding="utf-16"?>
<root>
<col1>123</col1>
<col2>1</col2>
<col3>Unknown</col3>
<col4>Dog</col4>
<col5>Owner</col5>
<col6>-1</col6>
<col7>Owner</col7>
<col8></col8>
</root>
When I write the response to a list, it looks like:
'<root>\r\n <col1>123</col1>\r\n <col2>1</col2>\r\n <col3>Unknown</col3>\r\n <col4>Dog</col4>\r\n <col5>Owner</col5>\r\n <col6>-1</col6>\r\n <col7>Owner</col7>\r\n <col8></col8>\r\n</root>'
I know that I can split on \r\n with something like:
l = [x.strip().split(' ') for x in a[0].split('\r\n')]
l
This gives:
['root'],
['<col1>123</col1>'],
['<col2>1</col2>'],
['<col3>Unknown</col3>'],
['<col4>Dog</col4>'],
['<col5>Owner</col5>'],
['<col6>-1</col6>'],
['<col7>Owner</col7>'],
['<col8></col8'],
['</root>']]
I'm not sure if this is the best route and I don't want to hard code each value into the spark dataframe, because I need to iterate through all messages in the queue. Looking for a solution that converts each 'col' into a header and then grabs the value between 'tags'.

Here is an answer:
data=[]
for message in response:
#print(message.content)
soup = BeautifulSoup(message.content, "xml")
c=soup.find_all('col1')
c1=soup.find_all('col2')
c2=soup.find_all('col3')
c3=soup.find_all('col4')
c4=soup.find_all('col5')
c5=soup.find_all('col6')
c6=soup.find_all('col7')
c7=soup.find_all('col8')
for i in range(0,len(c)):
rows=[c[i].get_text(),
c1[i].get_text(),
c2[i].get_text(),
c3[i].get_text(),
c4[i].get_text(),
c5[i].get_text(),
c6[i].get_text(),
c7[i].get_text()]
data.append(rows)
#print(data)
out_df = spark.createDataFrame(data,schema = ['c','c1','c2','c3','c4',
'c5','c6','c7'])

This was faster, but requires the response to always be in the same order, which mine is.
data=[]
for message in response:
#print(message.content)
root=etree.fromstring(message.content.encode('utf-16'))
arr=[]
for child in root:
r=child.text
arr.append(r)
data.append(arr)
out_df = spark.createDataFrame(data,schema = ['c','c1','c2','c3','c4',
'c5','c6','c7'])
display(out_df)

Related

Add element to the xml string in scala

I have the following relatively simple scenario, but it’s working.
I need an append to my xml string, here's the scenario:
val xmlStr = "<return> <numberPin> 123456 </numberPin> </return>"
I need some way to add the element data and return the string below, I would like some solution with regular expression if possible
"<return> <numberPin> 123456 </numberPin> <date> 2019-09-04 00:00:00 </date> </return>"
You can create a template xml at first that can be updated at runtime.
You can do something like below:
def updateXml (xmlStr:String, dateContent: String) = {
xmlStr.replace("DATE_DATA", dateContent)
}
val xmlStr = "<return> <numberPin> 123456 </numberPin> DATE_DATA </return>"
val dateData = "<date> 2019-09-04 00:00:00 </date>"
updateXml(xmlStr, dateData)
Another alternative is to create an xml template in a file(if the xml content is like a big file). Read it in your code and insert required data at run-time as shown in the above example(where i stuffed DATE_DATA in template and replaced it at runtime using the method).

Use excel file variables in SOAP request in SoapUI

I've faced with a problem. I'm new in SoapUI.
I must read excel file and then put some variables in the soap request. This is what I've done:
I've add a groovy script to get the excel file data:
import jxl.*
Workbook workbook = Workbook.getWorkbook(new File("C:\\PATH\\TestData.xls"))
Sheet sheet1 = workbook.getSheet("Sheet1")
def rows = sheet1.getRows()
def cols = sheet1.getColumns()
log.info "Row Count =" + rows
log.info "Column Count =" + cols
def array = []
for(i=1;i<rows;i++) {
for(j=0;j<cols;j++) {
Cell cell = sheet1.getCell(j,i)
def variable = cell.getContents()
log.info cell.getContents()
array << variable
}
}
return array
array returns: 10 and 20.
And this is a soap request:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tem="http://tempuri.org/">
<soapenv:Header/>
<soapenv:Body>
<tem:Add>
<tem:intA>10</tem:intA>
<tem:intB>10</tem:intB>
</tem:Add>
</soapenv:Body>
</soapenv:Envelope>
Can I somehow call a groovy script and put variables in
<tem:intA>10</tem:intA>
<tem:intB>20</tem:intB>
Instead of 10 and 20, I should call a groovy script method and put data, which I've taken from excel file.
Since your use-case is trivial: just two variables to be substituted, you can just use two properties.
Change the return array in your script to something like:
testCase.setPropertyValue("intA", array[0].toString())
testCase.setPropertyValue("intB", array[1].toString())
And then your request to:
<tem:intA>${#TestCase#intA}</tem:intA>
<tem:intB>${#TestCase#intB}</tem:intB>

Gatling: check binary response not empty

I'm doing some tests with Gatling using Scala. I'm trying to check whether a response body that is returned is not empty.
I'm doing it like this:
def getImages: ChainBuilder = feed(credentials)
.exec(http("Get Image")
.get(GET_MY_URI)
.queryParam("guid", "${branch}")
.queryParam("t", "0.458654")
.check(status.is(200))
.check(bodyString.transform(_.size > 1).is(true)))
But it's not working. I get:
java.nio.charset.MalformedInputException: Input length = 1
Does somebody know how to achieve what I'm trying?
Replace
.check(bodyString.transform(_.size > 1).is(true)))
with
.check(bodyBytes.exists)
All the DSL is explained here: https://gatling.io/docs/current/cheat-sheet/

How do i pass ## separated values in Scala?

Consider the following scenario:
["123##456","789##101112","131415##161718","192021##222324"]
first-id: 123, second-id: 456...
I get the above as two different sets of ids in the JSON payload of my response.
Saving the values via
.check(jsonPath("$.data[*].Id").findAll.saveAs("Id"))
works perfectly fine for me.
But now I need to pass the above-mentioned ids in the next request of post method, which comes as
["123##456","789##101112","131415##161718","192021##222324"]
So how to achieve that? If you could explain with an example please?
You could use split, something like:
var data = Array("123##456","789##101112","131415##161718","192021##222324");
for(i <- 0 until data.length){
var ids = data(i).split("##");
println("first id is: " + ids(0));
println("second id is: " + ids(1));
}

Groovy script for count value matches with offset

<... count="6" offset="3,2,7,1,4,5"/>
from the above snippet, i want to verify number of offset values should get match with count value. Please help to get SOAPUI REST services groovy script for this one.
Thanks!
Your question it's not clear so supposing that you've something like:
<myTag count="6" offset="3,2,7,1,4,5"/>
You can use XmlSlurper in groovy script to validate your requirement as follows:
def xmlStr = '<myTag count="6" offset="3,2,7,1,4,5"/>'
def xml = new XmlSlurper().parseText(xmlStr)
// use # notation to acces attributes
def count = xml.#count
def offset = xml.#offset.toString().split(',')
// assert that count matches the length of the array
assert count == offset.length
Anyways consider to provide more details and what you tried as #Opal suggest in it's comment.
Hope it helps,