Creating csv from an XML that has repeating tags - python-3.7

I am trying to normalize an xml file that has a parent tag with repeating child tags. Please see a sample of the data below:
<GeneralQuestions>
<HeaderText>Pain Management Assessment</HeaderText>
<QuestionText>Pain assessment</QuestionText>
<QuestionAnswer>Yes</QuestionAnswer>
<HeaderText>Activities of Daily Living</HeaderText>
<QuestionText>Patient walks</QuestionText>
<QuestionAnswer>With Some Help</QuestionAnswer>
<Score>1</Score>
<HeaderText>Pain Management Assessment</HeaderText>
<QuestionText>Patient consents to having Pain Management Assessment screening completed.</QuestionText>
<QuestionAnswer>Patient accepts</QuestionAnswer>
<HeaderText>Activities of Daily Living</HeaderText>
<QuestionText>Patient gets dressed</QuestionText>
<QuestionAnswer>With Some Help</QuestionAnswer>
<Score>1</Score>
</GeneralQuestions>
You'll observe that the child tags in "GeneralQuestions" are getting repeated with the child tag "Score" being optional. I am trying to convert it to a normalized form in which each set of child tags form a row, like shown below:
HeaderText, QuestionText,QuestionAnswer,Score
HeaderText, QuestionText,QuestionAnswer,Score
HeaderText, QuestionText,QuestionAnswer,Score
If the "Score" is missing, I want null value. I am using python 3.7 and xml.etree.ElementTree.iterparse to parse the data. Please let me know how I can normalize the data.

Here an incomplete answer that can help you using numpy to number the tags. I assumed some structure which is same. Say we have
XML = """<GeneralQuestions>
<HeaderText>Pain Management Assessment</HeaderText>
<QuestionText>Pain assessment</QuestionText>
<QuestionAnswer>Yes</QuestionAnswer>
<HeaderText>Activities of Daily Living</HeaderText>
<QuestionText>Patient walks</QuestionText>
<QuestionAnswer>With Some Help</QuestionAnswer>
<Score>1</Score>
<HeaderText>Pain Management Assessment</HeaderText>
<QuestionText>Patient consents to having Pain Management Assessment screening completed.</QuestionText>
<QuestionAnswer>Patient accepts</QuestionAnswer>
<HeaderText>Activities of Daily Living</HeaderText>
<QuestionText>Patient gets dressed</QuestionText>
<QuestionAnswer>With Some Help</QuestionAnswer>
<Score>1</Score>
</GeneralQuestions>"""
And we create a tree with
import xml.etree.ElementTree as ET
tree = ET.fromstring(XML)
Than with using numpy we can create an index with
import numpy as np
index_of_score = np.cumsum( [ e.tag == 'HeaderText' for e in tree.getchildren() ] )
Now with help of the index_of_score you can create a dictionary with indexed tags, and possible value
{ "{}_{}".format(a.tag,i) : a.text for i,a in zip(index_of_score, tree.getchildren() ) }
which will give you
{'HeaderText_1': 'Pain Management Assessment',
'QuestionText_1': 'Pain assessment',
'QuestionAnswer_1': 'Yes',
'HeaderText_2': 'Activities of Daily Living',
'QuestionText_2': 'Patient walks',
'QuestionAnswer_2': 'With Some Help',
'Score_2': '1',
'HeaderText_3': 'Pain Management Assessment',
'QuestionText_3': 'Patient consents to having Pain Management Assessment screening completed.',
'QuestionAnswer_3': 'Patient accepts',
'HeaderText_4': 'Activities of Daily Living',
'QuestionText_4': 'Patient gets dressed',
'QuestionAnswer_4': 'With Some Help',
'Score_4': '1'}
Depending on your desired output you can cherry pick the values you need. Say the above dictionary is dict_output, python has the nice dict_output.get("Score_1", None) which will give you either the value or in this case None which might help you in processing the data.

I created a context from the XML file:
xmlIter=ET.iterparse('C:\\Users\\ANAND_RA\\Documents\\Project\\XXXXXXX_MA_05042020_0.xml', events=('start','end'))
context=iter(xmlIter)
Then I used a for loop to parse through each tag:
for eachEvent, eachElement in context:
The final step (the most important one) is to process the tag "GeneralQuestions" under the for loop:
if eachElement.tag=='GeneralQuestions' and eachEvent=='start':
GQstart=True
GQcount=0
GQlist=[]
GQDataList=[]
if eachElement.tag=='HeaderText' and GQstart and eachEvent=='start':
if GQcount!=0:
if len(GQlist)<6:
GQlist.extend([None,1,today,today])
else:
GQlist.extend([1,today,today])
GQDataList.append(tuple(GQlist))
GQcount+=1
GQlist=[]
GQlist.extend([nextDLMemberSK,GQcount,eachElement.text])
if eachElement.tag=='QuestionText' and GQstart and eachEvent=='start':
GQlist.append(eachElement.text)
if eachElement.tag=='QuestionAnswer' and GQstart and eachEvent=='start':
GQlist.append(eachElement.text)
if eachElement.tag=='Score' and GQstart and eachEvent=='start':
GQlist.append(eachElement.text)
if eachElement.tag=='GeneralQuestions' and eachEvent=='end' and GQcount!=0:
if len(GQlist)==5:
GQlist.extend([None,1,today,today])
else:
GQlist.extend([1,today,today])
GQDataList.append(tuple(GQlist))
if eachElement.tag=='GeneralQuestions' and eachEvent=='end' and len(GQDataList)>0:
# print(GQDataList)
cur1.executemany(SQL_INS_INOV_GENERALQ,GQDataList)
Processing an XML is quite different from processing a json file. In json, the data inspection is more flexible. However, in xml data you have to inspect the tags as it comes in the for loop.

Related

pyspark does not parse an xml from a file containing multiple xmls

I am using spark 2.3.2 with python 3.7 to parse xml.
In an xml file (sample), I have appended 2 xmls.
When I parse it with:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.7.0 pyspark-shell'
conf = pyspark.SparkConf()
sc = SparkSession.builder.config(conf=conf).getOrCreate()
spark = SQLContext(sc)
dfSample = (spark.read.format("xml").option("rowTag", "xocs:doc")
.load(r"sample.xml"))
I see 2 xmls' data:
However, what I need is to extract the info under "ref-info" tag (along with their corresponding key eids), so my code is:
(dfSample.
withColumn("metaExp", F.explode(F.array("xocs:meta"))).
withColumn("eid", F.col("metaExp.xocs:eid")).
select("eid","xocs:item").
withColumn("xocs:itemExp", F.explode(F.array("xocs:item"))).
withColumn("item", F.col("xocs:itemExp.item")).
withColumn("itemExp", F.explode(F.array("item"))).
withColumn("bibrecord", F.col("item.bibrecord")).
withColumn("bibrecordExp", F.explode(F.array("bibrecord"))).
withColumn("tail", F.col("bibrecord.tail")).
withColumn("tailExp", F.explode(F.array("tail"))).
withColumn("bibliography", F.col("tail.bibliography")).
withColumn("bibliographyExp", F.explode(F.array("bibliography"))).
withColumn("reference", F.col("bibliography.reference")).
withColumn("referenceExp", F.explode(F.array("reference"))).
withColumn("ref-infoExp", F.explode(F.col("reference.ref-info"))).
withColumn("authors", F.explode(F.col("ref-infoExp.ref-authors.author"))).
withColumn("py", (F.col("ref-infoExp.ref-publicationyear._first"))).
withColumn("so", (F.col("ref-infoExp.ref-sourcetitle"))).
withColumn("ti", (F.col("ref-infoExp.ref-title"))).
drop("xocs:item", "xocs:itemExp", "item", "itemExp", "bibrecord", "bibrecordExp", "tail", "tailExp", "bibliography",
"bibliographyExp", "reference", "referenceExp").show())
This extracts the info only from the xml with eid = 85082880163
When I delete this one and only kept the one with eid = 85082880158, it works.
My file is an xml file containing those 2 lines in the link. I have also tried to merge those 2 into one xml but could not manage.
What is wrong with my data/approach? (My ultimate plan is to create such a file containing thousands of different xmls to be parsed)
Your trouble lies in this piece of XML:
<xocs:item>
<item>
<bibrecord>
<head>
<abstracts>
<abstract original="y" xml:lang="eng">
<ce:para>
Apple is often affected by [...] intercellular CO <inf>2</inf> concentration [...]
^^^^^^^^^^^^
While it is correct XML (mixed content), this embedded tag <inf>2</inf> seems to be breaking spark-xml parser. If you remove it (or convert to corresponding HTML entities) you will get correct results.

My google sheets function does the job when run from editor but gives different outcome when trigered by Form submit

I have a google form and a sheet that collects the responses which of course always appear at the bottom. I have been using the following script to copy the last response (which is always on the last row) from the Response sheet (Form Responses 2) to row two of another sheet (All Responses). When run by a trigger on Form Submit the script inserts a blank row into All Responses, then the copied values into another row above the blank row. Please can you help and tell me why and how I might change the script so the blank row is not added:
function CopyLastrowformresponse () {
var ss = SpreadsheetApp.getActive();
var AR = ss.getSheetByName("All Responses");
var FR = ss.getSheetByName("Form responses 2");
var FRlastrow = FR.getLastRow();
AR.insertRowBefore(2);
FR.getRange(FRlastrow, 1, FRlastrow, 22).copyTo(AR.getRange("A2"), SpreadsheetApp.CopyPasteType.PASTE_VALUES, false);
}
A few things could be going on here.
You're getting a number of rows equal to FRlastrow, when I think you only want to be getting 1 row.
Apps Script has buggy behavior with onFormSubmit() triggers, so you may to check duplicate triggers (see this answer).
The script isn't fully exploiting the event object provided by onFormSubmit(). Specifically, rather than getting the last row from one sheet, you could use e.values, which is the same data.
I would change the script to be something like this:
function CopyLastrowformresponse (e) {
if (e.values && e.values[1] != "") { // assuming e.values[1] (the first question) is required
SpreadsheetApp.getActive()
.getSheetByName("All Responses")
.insertRowBefore(2)
.getRange(2, 1, 1, e.values.length)
.setValues([e.values]);
}
}
But, ultimately, if all you want to do is simply reverse the order of the results, then I'd ditch Apps Script altogether and just use the =SORT() function.
=SORT('Form responses 2'!A:V, 'Form responses 2'!A:A, FALSE)

Adding columns to a Web2py table in a form

In my web2py application, in the controller I read from an external DB the names of students I want to take a register for. I loop through the resulting list adding the list elements to a new list.
for student in pupils_query:
attendance_list.insert(counter, [student[0], student[1], student[2], student[3]])
counter += 1
counter = 0
Then for each student I read their attendance codes for the day so far from another table, and append them to attendance_list:
for attendance_code in attendance_result:
attendance_list[counter].append(attendance_code)
Now, I'm going to want to make a form from all this, using a table which will show each students' attendance code in a text input (so they can be updated if wrong), then have a dropdown for input of the current lesson code.
I'm using a FORM and TABLE helper to create the table in the form:
form=FORM(TABLE(*[TR(*rows) for rows in attendance_list]))
but can't seem to be able to add a new 'row' form item with something like:
select = "SELECT("+ main_reg_list +")"
attendance_list[counter].append(select)
where main_reg_list is dictionary of acceptable attendance codes (or of course, any other form input element).
In summary, I'm stuck adding new TDs to a table made with a TABLE helper from a list of lists. I bet I'm not the first person to overcome this problem.
I am still not clear about what you want. I think you want table of student information and in one column you want dropdown. Something similat to following image
Above form is created from following code.
I hope following code will help you:
# controller/default.py
def index():
# Dummy attendance list, list after appending attendance code
attendance_list = [['stud_id_1', 'first_name_1', 'last_name_1', 'attendance_code_1'],
['stud_id_2', 'first_name_2', 'last_name_2', 'attendance_code_2'],
['stud_id_3', 'first_name_3', 'last_name_3', 'attendance_code_5'],
['stud_id_4', 'first_name_4', 'last_name_4', 'attendance_code_4']]
possible_att_code = ['attendance_code_1', 'attendance_code_2', 'attendance_code_3', 'attendance_code_4', 'attendance_code_5']
# initialise form_rows with Table heading
form_rows = [THEAD(TR(TH('ID'), TH('First Name'), TH('Last Name'), TH('Attendence Code')))]
for attendance in attendance_list:
attendance_code_dropdown = _get_dropdown(attendance[0], attendance[3], possible_att_code)
td_list = [TD(attendance[0]), TD(attendance[1]), TD(attendance[2]),
TD(attendance_code_dropdown)]
table_row = TR(td_list, _id='row_' + attendance[0])
form_rows.append(table_row)
# Form submit button
form_rows.append(TR(INPUT(_type='submit')))
form = FORM(TABLE(*form_rows), _name='student_attendance',
_id='student_attendance')
if form.accepts(request, session):
# Write code to update record
pass
return dict(form=form)
def _get_dropdown(stud_id, att_code, possible_att_code):
option_list = []
for pac in possible_att_code:
if pac == att_code:
option_list.append(OPTION(pac, _value=pac, _selected='selected'))
else:
option_list.append(OPTION(pac, _value=pac))
return SELECT(*option_list, _name=stud_id)
<!-- views/default/index.html -->
{{extend 'layout.html'}}
{{=form}}
Are my assumptions correct? or you want any thing else? Comment if didn't understood code.

Tridion Workflow Error on EDA_ITEMS_UPDATE Stored Procedure When Restarting WF Activities

I'm hitting an error whenever 2 or more instances of a workflow is restarted. We have suspended workflows during a network incident in our office(s). It was a script timeout that disrupted the process. I couldn't reproduce this timeout anymore so I attempted to restart the workflow activities to complete the process. However, when I restart more than 1 suspended activity, I get this error (the full log is farther below):
Cannot insert the value NULL into column 'ITEM_ID', table 'tridion_cm.dbo.ITEM_ASSOCIATIONS'; column does not allow nulls. INSERT fails.
The statement has been terminated.
Restarting activities and letting it finish one by one has no problem.
Full Event Log:
An error occurred while executing the Workflow script.
The Script Engine returned the following information:
SOURCE:
Line = 0
Column = 0
Number = -2147220673
Source = Component.Save
Description =
<?xml version="1.0" standalone="yes"?>
<tcm:Error ErrorCode="8004033F" Category="4" Source="Kernel" Severity="1" xmlns:tcm="http://www.tridion.com/ContentManager/5.0">
<tcm:Line ErrorCode="8004033F" Cause="false" MessageID="16137"><![CDATA[Unable to save Component (tcm:5-32795).]]>
<tcm:Token>RESID_4574</tcm:Token>
<tcm:Token>RESID_4418</tcm:Token>
<tcm:Token>tcm:5-32795</tcm:Token>
</tcm:Line>
<tcm:Line ErrorCode="8004033F" Cause="true">CDATA[No data found. ETA_ITEMS, U Cannot insert the value NULL into column 'ITEM_ID', table 'tridion_cm.dbo.ITEM_ASSOCIATIONS'; column does not allow nulls. INSERT fails. The statement has been terminated.
</tcm:Line>
<tcm:Line ErrorCode="8004033F" Cause="false"><![CDATA[A database error occurred while executing Stored Procedure "EDA_ITEMS_UPDATE".]]>
<tcm:Token>EDA_ITEMS_UPDATE</tcm:Token>
</tcm:Line>
<tcm:Details>
<tcm:CallStack>
<tcm:Location>System.Data.SqlClient.SqlConnection.OnError(SqlException,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlInternalConnection.OnError(SqlException,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(</tcm:Location>
<tcm:Location>System.Data.SqlClient.TdsParser.Run(RunBehavior,SqlCommand,SqlDataReader,BulkCopySimpleResultSet,TdsParserStateObject)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader,RunBehavior,String)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior,RunBehavior,Boolean,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior,RunBehavior,Boolean,String,DbAsyncResult)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult,String,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.ExecuteNonQuery()</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.DatabaseUtilities.ExecuteNonQuery(StoredProcedureInvocation)</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.Sql.SqlDatabaseUtilities.ExecuteNonQuery(StoredProcedureInvocation)</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.Sql.SqlDatabaseUtilities.ExecuteNonQuery(StoredProcedureInvocation)</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.IdentifiableObjectDataMapper.Tridion.ContentManager.Data.IIdentifiableObjectDataMapper.Update(IdentifiableObjectData)</tcm:Location>
<tcm:Location>Tridion.ContentManager.IdentifiableObject.Save(SaveEventArgs)</tcm:Location>
<tcm:Location>Tridion.ContentManager.ContentManagement.VersionedItem.Save(Boolean)</tcm:Location>
<tcm:Location>Tridion.ContentManager.ContentManagement.VersionedItem.Save()</tcm:Location>
<tcm:Location>Tridion.ContentManager.BLFacade.ContentManagement.VersionedItemFacade.UpdateAndCheckIn(UserContext,String,Boolean,Boolean)</tcm:Location>
<tcm:Location>XMLState.Save</tcm:Location>
<tcm:Location>Component.Save</tcm:Location>
</tcm:CallStack>
</tcm:Details>
</tcm:Error>
HelpFile =
HelpContext = 1000440
caused by: Component.Save
and description:
<?xml version="1.0" standalone="yes"?>
<tcm:Error ErrorCode="8004033F" Category="4" Source="Kernel" Severity="1" xmlns:tcm="http://www.tridion.com/ContentManager/5.0">
<tcm:Line ErrorCode="8004033F" Cause="false" MessageID="16137"><![CDATA[Unable to save Component (tcm:5-32795).]]>
<tcm:Token>RESID_4574</tcm:Token>
<tcm:Token>RESID_4418</tcm:Token>
<tcm:Token>tcm:5-32795</tcm:Token>
</tcm:Line>
<tcm:Line ErrorCode="8004033F" Cause="true">CDATA[No data found. ETA_ITEMS, U Cannot insert the value NULL into column 'ITEM_ID', table 'tridion_cm.dbo.ITEM_ASSOCIATIONS'; column does not allow nulls. INSERT fails. The statement has been terminated.
</tcm:Line>
<tcm:Line ErrorCode="8004033F" Cause="false"><![CDATA[A database error occurred while executing Stored Procedure "EDA_ITEMS_UPDATE".]]>
<tcm:Token>EDA_ITEMS_UPDATE</tcm:Token>
</tcm:Line>
<tcm:Details>
<tcm:CallStack>
<tcm:Location>System.Data.SqlClient.SqlConnection.OnError(SqlException,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlInternalConnection.OnError(SqlException,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(</tcm:Location>
<tcm:Location>System.Data.SqlClient.TdsParser.Run(RunBehavior,SqlCommand,SqlDataReader,BulkCopySimpleResultSet,TdsParserStateObject)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader,RunBehavior,String)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior,RunBehavior,Boolean,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior,RunBehavior,Boolean,String,DbAsyncResult)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult,String,Boolean)</tcm:Location>
<tcm:Location>System.Data.SqlClient.SqlCommand.ExecuteNonQuery()</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.DatabaseUtilities.ExecuteNonQuery(StoredProcedureInvocation)</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.Sql.SqlDatabaseUtilities.ExecuteNonQuery(StoredProcedureInvocation)</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.Sql.SqlDatabaseUtilities.ExecuteNonQuery(StoredProcedureInvocation)</tcm:Location>
<tcm:Location>Tridion.ContentManager.Data.AdoNet.IdentifiableObjectDataMapper.Tridion.ContentManager.Data.IIdentifiableObjectDataMapper.Update(IdentifiableObjectData)</tcm:Location>
<tcm:Location>Tridion.ContentManager.IdentifiableObject.Save(SaveEventArgs)</tcm:Location>
<tcm:Location>Tridion.ContentManager.ContentManagement.VersionedItem.Save(Boolean)</tcm:Location>
<tcm:Location>Tridion.ContentManager.ContentManagement.VersionedItem.Save()</tcm:Location>
<tcm:Location>Tridion.ContentManager.BLFacade.ContentManagement.VersionedItemFacade.UpdateAndCheckIn(UserContext,String,Boolean,Boolean)</tcm:Location>
<tcm:Location>XMLState.Save</tcm:Location>
<tcm:Location>Component.Save</tcm:Location>
</tcm:CallStack>
</tcm:Details>
</tcm:Error>
Source:
LogScriptError
Workflow Info
The components are newly created by another application through BusinessConnector. It performs revision saves before finishing the activity. The objective of the WF is to
assign the necessary component links based on text values from the other system;
create a page;
and publish it to a target.
I know this could've been done better by other means but it is how it is right now, please bear with the implementation.
Model / Schema
Product
- modelName (text)
- categoryName (text)
- categoryCL (component link)
- statusName (text)
- statusCL (component link)
- blah1 (text)
- blah2 (text)
- blah3 (text)
- blah4 (text)
Automatic Activity Script:
' COMPONENT WORKFLOW for PRODUCT SCHEMA (START)
' Load common functions
ExecuteGlobal oTDSE.GetObject("/webdav/200%20Design/Building%20Blocks/Library/Design/Workflow/Common/Workflow%20Functions.tbbs", 1).Content
' Initialize
Set objComp = CurrentWorkItem.GetItem()
Set oContentPub = objComp.Publication
Set oWebsitePub = oTDSE.GetObject(GetStagingLangPub("EN", "website1"), 1)
modelName = objComp.Fields.Item("modelName").Value.Item(1)
' Search and Assign component links based on some text fields from BC client
Set compListRowFilter = oTDSE.CreateListRowFilter()
Call compListRowFilter.SetCondition("ShowNewItems", TRUE)
Call compListRowFilter.SetCondition("ItemType", 16)
' Assign the first component link (categoryCL)
categoryName = objComp.Fields.Item("categoryName").Value.Item(1)
Set categoryComp = GetObjectFromFolder(categoryName, oTDSE.GetObject(categoryFolderWebDav, 1), compListRowFilter)
If Not categoryComp is Nothing Then
objComp.Fields.Item("categoryCL").Value.RemoveAll
objComp.Fields.Item("categoryCL").Value.Add(categoryComp)
Call objComp.Save(True)
Else
Call SendNoObjectEmail("Category", categoryName, "categoryCL", GetEmailsFromGroup(oContentPub, categoryOwner), "")
End if
' Assign the second component link (statusCL)
statusName = objComp.Fields.Item("statusName").Value.Item(1)
Set statusComp = GetObjectFromFolder(statusName, oTDSE.GetObject(statusFolderWebDav, 1), compListRowFilter)
If Not statusComp is Nothing Then
objComp.Fields.Item("statusCL").Value.RemoveAll
objComp.Fields.Item("statusCL").Value.Add(statusComp)
Call objComp.Save(True)
Else
Call SendNoObjectEmail("Status", statusName, "statusCL", GetEmailsFromGroup(oContentPub, statusOwner), "")
End if
' Create a page with the component in WF
Set oPage = CreateDefaultPage(modelName, oWebSitePub, SaveToFolderWebDav, PageTemplateWebDav)
Call oPage.ComponentPresentations.Add(oWebSitePub, oTDSE.GetObject(ComponentTemplateWebDav, 1))
oPage.Save(True)
' Publish the page
....
' Send email
....
' Finish the Activity
Call CurrentWorkItem.ActivityInstance.FinishActivity("someMsg")
' COMPONENT WORKFLOW for PRODUCT SCHEMA (END)
I haven't zoomed in to where exactly it happens but I would assume it is either the categoryCL or statusCL's objComp.Save(True) line.
I notice that you are calling Call objComp.Save(True) - The "True" parameter forces a "CheckIn", as it means "Done Editing".
It looks like this can occur twice (i.e. If there is a statusCL and a categoryCL) whichmay be causing the problem. I would try changing the behavior so that you only the call Save() method at the end if a change has been made, and not saving it twice.
From what I see the problem is that you are trying to assign nulls as component links. Are you sure that you are not trying to link these 2 components one to another? It might be that the link you are trying to create is to new component in workflow (it has no checked-in version yet), hence you can't create link to it. Try adding some logging to your script, so we could see where exactly does it fail and what are the parameters.
And I agree with Frank that this exception is worth a defect report.

Protovis - dealing with a text source

lets say I have a text file with lines as such:
[4/20/11 17:07:12:875 CEST] 00000059 FfdcProvider W com.test.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I: FFDC Incident emitted on D:/Prgs/testing/WebSphere/AppServer/profiles/ProcCtr01/logs/ffdc/server1_3d203d20_11.04.20_17.07.12.8755227341908890183253.txt com.test.testserver.management.cmdframework.CmdNotificationListener 134
[4/20/11 17:07:27:609 CEST] 0000005d wle E CWLLG2229E: An exception occurred in an EJB call. Error: Snapshot with ID Snapshot.8fdaaf3f-ce3f-426e-9347-3ac7e8a3863e not found.
com.lombardisoftware.core.TeamWorksException: Snapshot with ID Snapshot.8fdaaf3f-ce3f-426e-9347-3ac7e8a3863e not found.
at com.lombardisoftware.server.ejb.persistence.CommonDAO.assertNotNull(CommonDAO.java:70)
Is there anyway to easily import a data source such as this into protovis, if not what would the easiest way to parse this into a JSON format. For example for the first entry might be parsed like so:
[
{
"Date": "4/20/11 17:07:12:875 CEST",
"Status": "00000059",
"Msg": "FfdcProvider W com.test.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I",
},
]
Thanks, David
Protovis itself doesn't offer any utilities for parsing text files, so your options are:
Use Javascript to parse the text into an object, most likely using regex.
Pre-process the text using the text-parsing language or utility of your choice, exporting a JSON file.
Which you choose depends on several factors:
Is the data somewhat static, or are you going to be running this on a new or dynamic file each time you look at it? With static data, it might be easiest to pre-process; with dynamic data, this may add an annoying extra step.
How much data do you have? Parsing a 20K text file in Javascript is totally fine; parsing a 2MB file will be really slow, and will cause the browser to hang while it's working (unless you use Workers).
If there's a lot of processing involved, would you rather put that load on the server (by using a server-side script for pre-processing) or on the client (by doing it in the browser)?
If you wanted to do this in Javascript, based on the sample you provided, you might do something like this:
// Assumes var text = 'your text';
// use the utility of your choice to load your text file into the
// variable (e.g. jQuery.get()), or just paste it in.
var lines = text.split(/[\r\n\f]+/),
// regex to match your log entry beginning
patt = /^\[(\d\d?\/\d\d?\/\d\d? \d\d:\d\d:\d\d:\d{3} [A-Z]+)\] (\d{8})/,
items = [],
currentItem;
// loop through the lines in the file
lines.forEach(function(line) {
// look for the beginning of a log entry
var initialData = line.match(patt);
if (initialData) {
// start a new item, using the captured matches
currentItem = {
Date: initialData[1],
Status: initialData[2],
Msg: line.substr(initialData[0].length + 1)
}
items.push(currentItem);
} else {
// this is a continuation of the last item
currentItem.Msg += "\n" + line;
}
});
// items now contains an array of objects with your data