Parsing tab delimited file detect if first row value it empty/tab - c#-3.0

Hi all I have a to parse some files to load into a DataSet and I ran into an issue where the first row value is sometimes blank so when I parse the data the rows added to the columns are off because there is no value for the row[RouteCode].
Example Data
Columns are in the first line(Tab delimited) DataRows are in the following rows(Tab delimited)
RouteCode City EmailAddress FirstName
NULL MyCity My-Email MyFirstName
What I am seeing is all the Columns are added fine but each row added the first tab value is not detected so it shifts the columns(hope I am making sense) so in this case the city data is sitting in the RouteCode column and the last column somehow is getting the first row value (tab).
class TextToDataSet
{
public TextToDataSet()
{ }
/// <summary>
/// Converts a given delimited file into a dataset.
/// Assumes that the first line
/// of the text file contains the column names.
/// </summary>
/// <param name="File">The name of the file to open</param>
/// <param name="TableName">The name of the
/// Table to be made within the DataSet returned</param>
/// <param name="delimiter">The string to delimit by</param>
/// <returns></returns>
public static DataSet Convert(string File,
string TableName, string delimiter)
{
//The DataSet to Return
DataSet result = new DataSet();
//Open the file in a stream reader.
using (StreamReader s = new StreamReader(File))
{
//Split the first line into the columns
string[] columns = s.ReadLine().Split(delimiter.ToCharArray());
//Add the new DataTable to the RecordSet
result.Tables.Add(TableName);
//Cycle the colums, adding those that don't exist yet
//and sequencing the one that do.
foreach (string col in columns)
{
bool added = false;
string next = "";
int i = 0;
while (!added)
{
//Build the column name and remove any unwanted characters.
string columnname = col + next;
columnname = columnname.Replace("#", "");
columnname = columnname.Replace("'", "");
columnname = columnname.Replace("&", "");
//See if the column already exists
if (!result.Tables[TableName].Columns.Contains(columnname))
{
//if it doesn't then we add it here and mark it as added
result.Tables[TableName].Columns.Add(columnname);
added = true;
}
else
{
//if it did exist then we increment the sequencer and try again.
i++;
next = "_" + i;
}
}
}
//Read the rest of the data in the file.
string AllData = s.ReadToEnd();
//Split off each row at the Carriage Return/Line Feed
//Default line ending in most windows exports.
//You may have to edit this to match your particular file.
//This will work for Excel, Access, etc. default exports.
string[] rows = AllData.Split("\n".ToCharArray());
//Now add each row to the DataSet
foreach (string r in rows)
{
//Split the row at the delimiter.
string[] items = r.Split(delimiter.ToCharArray());
//Add the item
result.Tables[TableName].Rows.Add(items);
}
}
//Return the imported data.
return result;
}
}
}

If there aren't supposed to be any missing entries anywhere in the file (i.e there should always be something between the tabs) then you could use:
string[] columns = s.ReadLine().Split(delimiter.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
and then check that columns is not an empty array. If it is then read the next line and carry on processing:
while (columns.Length == 0)
{
// Row is empty so read the next line out of the file
columns = s.ReadLine().Split(delimiter.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
}
This will ensure that your data always starts with a filled row. However, it will break down if there is ever an empty entry further down the list.
If there could be empty entries then you'll probably have to check for all columns being empty:
while (columns.All(c => string.IsNullOrEmpty(c)))
{
// Row is empty so read the next line out of the file
columns = s.ReadLine().Split(delimiter.ToCharArray());
}

Related

How to filter multi-line text?

My filter does not work for multi-line text, not even for ASCII characters. Only single line cells are filtered out correctly.
Example of a row with 4 columns:
[1, WELD FIL.. ... MCW430
(REP.. ..bL), 各溶接.. ..と。(..430NbL), X]
(REP.. ..bL) text is still in the 2-nd column, but in new line
X character is in the 4-th column
Creating nattable:
private NatTable createTable(Composite parent, List<TableLine> tLines, String[][] propertyNames,
PropertyToLabels[] propToLabels, TableParams params, TextMatcherEditor<TableLine>editor, boolean openableParts) {
// another code: bodyLayerStack, bodyDataLayer
// another code: set row heights, set column widths
// another code: columnHeaderDataLayer
// another code: include table headers
CompositeLayer composite = null;
if( propertyNames != null ) {
ColumnHeaderLayer columnHeaderLayer =
new ColumnHeaderLayer(
columnHeaderDataLayer,
bodyLayerStack,
(SelectionLayer)null);
columnHeaderLayer.addConfiguration(NatTableLayerConfigurations.getColumnHeaderLayerConfiguration(false));
SortHeaderLayer<TableLine> sortHeaderLayer =
new SortHeaderLayer<TableLine>(
columnHeaderLayer,
new GlazedListsSortModel<TableLine>(
bodyLayerStack.getSortedList(),
getSortingColumnPropAccessor(propertyNames[0]),
configRegistry,
columnHeaderDataLayer));
// another code: setChildLayer, add configurations
natTable.configure();
editor.setFilterator(new TextFilterator<TableLine>() {
#Override
public void getFilterStrings(List<String> baseList, TableLine element) {
for( int i = 0; i < element.getLength(); i++ )
baseList.add("" + element.getObjectByColumn(i));
}
});
editor.setMode(TextMatcherEditor.REGULAR_EXPRESSION);
bodyLayerStack.getFilterList().setMatcherEditor(editor);
NatTableContentProvider.addNatTableData(natTable, bodyLayerStack.getSelectionLayer(), bodyLayerStack.getBodyDataProvider());
return natTable;
}
I figured out, that in ca.odell.glazedlists.impl.filter.TextMatchers.matches(List<String>, TextFilterator<? super E>, SearchTerm<E>[], TextSearchStrategy[], E) the condition if(filterString != null && textSearchStrategy.indexOf(filterString.toString()) != -1) is missed, when filtering w character. It is clear, the string in the 2-nd cell starts with W. See picture.
My search text for filtering w is (?i).*w.*(?-i), case insensitive.
Is there any workaround, or setting for this? Or do I need to transform data before sorting? If transform data - how? Those Glazedlists classes are final, I can not override them
Thanks for any comment!

Ag-Grid - Pasting Excel Data into Grid - Appending Rows

When pasting clipboard/Excel data into AG-Grid, how do I get the data to append to the current rows?
If my table currently has a single row and I'm trying to paste 10 rows into the table, Ag-Grid only overwrites the single row instead of appending the extra 9 rows. Am I missing a gridOption or is this not possible?
To get what you need, follow these steps:
1) Add a paste event listener:
mounted () {
window.addEventListener('paste', this.insertNewRowsBeforePaste);
}
2) Create the function that retrieves the data from the clipboard and creates new lines in the grid:
insertNewRowsBeforePaste(event){
var self = this;
// gets data from clipboard and converts it to an array (1 array element for each line)
var clipboardData = event.clipboardData || window.clipboardData;
var pastedData = clipboardData.getData('Text');
var dataArray = self.dataToArray(pastedData);
// First row is already in the grid and dataToArray returns an empty row at the end of array (maybe you want to validate that it is actually empty)
for (var i = 1; i < dataArray.length-1; i++) {
self.addEmptyRow(i);
}
}
3) dataToArray is a function that ag-Grid uses to paste new lines and I just needed to adjust the "delimiter" variable. I copied it from the clipboardService.js file.
// From http://stackoverflow.com/questions/1293147/javascript-code-to-parse-csv-data
// This will parse a delimited string into an array of
// arrays. The default delimiter is the comma, but this
// can be overriden in the second argument.
export var dataToArray = function(strData) {
var delimiter = self.gridOptions.api.gridOptionsWrapper.getClipboardDeliminator();;
// Create a regular expression to parse the CSV values.
var objPattern = new RegExp((
// Delimiters.
"(\\" + delimiter + "|\\r?\\n|\\r|^)" +
// Quoted fields.
"(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +
// Standard fields.
"([^\"\\" + delimiter + "\\r\\n]*))"), "gi");
// Create an array to hold our data. Give the array
// a default empty first row.
var arrData = [[]];
// Create an array to hold our individual pattern
// matching groups.
var arrMatches = null;
// Keep looping over the regular expression matches
// until we can no longer find a match.
while (arrMatches = objPattern.exec(strData)) {
// Get the delimiter that was found.
var strMatchedDelimiter = arrMatches[1];
// Check to see if the given delimiter has a length
// (is not the start of string) and if it matches
// field delimiter. If id does not, then we know
// that this delimiter is a row delimiter.
if (strMatchedDelimiter.length &&
strMatchedDelimiter !== delimiter) {
// Since we have reached a new row of data,
// add an empty row to our data array.
arrData.push([]);
}
var strMatchedValue = void 0;
// Now that we have our delimiter out of the way,
// let's check to see which kind of value we
// captured (quoted or unquoted).
if (arrMatches[2]) {
// We found a quoted value. When we capture
// this value, unescape any double quotes.
strMatchedValue = arrMatches[2].replace(new RegExp("\"\"", "g"), "\"");
}
else {
// We found a non-quoted value.
strMatchedValue = arrMatches[3];
}
// Now that we have our value string, let's add
// it to the data array.
arrData[arrData.length - 1].push(strMatchedValue);
}
// Return the parsed data.
return arrData;
}
4) Finally, to add new blank lines in the grid, use the function below:
addEmptyRow(rowIndex) {
var newItem = {};
this.gridOptions.api.updateRowData({add: [newItem], addIndex: rowIndex});
}
Basically what this code does is insert blank rows at the beginning of the grid and let ag-Grid paste the data into those rows. For it to work, the line where the code is pasted must be the first line in the grid. It's using the updateRowData from ag-grid (https://www.ag-grid.com/javascript-grid-data-update/).
You may need to make some adjustments if you need something else.
I have the same question. I originally used a paste event listener to add a number of rows to the grid, based on the difference between available space and clipboard data length. But now the grid will only add the rows and not complete the paste.

Remove Content controls after adding text using open xml

By the help of some very kind community members here I managed to programatically create a function to replace text inside content controls in a Word document using open xml. After the document is generated it removes the formatting of the text after I replace the text.
Any ideas on how I can still keep the formatting in word and remove the content control tags ?
This is my code:
using (var wordDoc = WordprocessingDocument.Open(mem, true))
{
var mainPart = wordDoc.MainDocumentPart;
ReplaceTags(mainPart, "FirstName", _firstName);
ReplaceTags(mainPart, "LastName", _lastName);
ReplaceTags(mainPart, "WorkPhoe", _workPhone);
ReplaceTags(mainPart, "JobTitle", _jobTitle);
mainPart.Document.Save();
SaveFile(mem);
}
private static void ReplaceTags(MainDocumentPart mainPart, string tagName, string tagValue)
{
//grab all the tag fields
IEnumerable<SdtBlock> tagFields = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == tagName);
foreach (var field in tagFields)
{
//remove all paragraphs from the content block
field.SdtContentBlock.RemoveAllChildren<Paragraph>();
//create a new paragraph containing a run and a text element
Paragraph newParagraph = new Paragraph();
Run newRun = new Run();
Text newText = new Text(tagValue);
newRun.Append(newText);
newParagraph.Append(newRun);
//add the new paragraph to the content block
field.SdtContentBlock.Append(newParagraph);
}
}
Keeping the style is a tricky problem as there could be more than one style applied to the text you are trying to replace. What should you do in that scenario?
Assuming a simple case of one style (but potentially over many Paragraphs, Runs and Texts) you could keep the first Text element you come across per SdtBlock and place your required value in that element then delete any further Text elements from the SdtBlock. The formatting from the first Text element will then be maintained. Obviously you can apply this theory to any of the Text blocks; you don't have to necessarily use the first. The following code should show what I mean:
private static void ReplaceTags(MainDocumentPart mainPart, string tagName, string tagValue)
{
IEnumerable<SdtBlock> tagFields = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == tagName);
foreach (var field in tagFields)
{
IEnumerable<Text> texts = field.SdtContentBlock.Descendants<Text>();
for (int i = 0; i < texts.Count(); i++)
{
Text text = texts.ElementAt(i);
if (i == 0)
{
text.Text = tagValue;
}
else
{
text.Remove();
}
}
}
}

how to seek position in datatable before writing to the file

I have the data table that is read from csv file. Then it is iterated through row and columns, and each value is appended before writing to the file that is also destination csv file. I want to separate the data of one column upon the special character ("/"), into two columns. For example, the column of 'Type' of data table is "women/shoes and handbags/guess". I have another column 'SubType', so I want to separate one column to two columns in data table before writing. I just want to ignore third type that is guess. Is there a way to seek position "/" and after second "/", I want to insert that value into another column of data table that is 'SubType'.
foreach (DataRow dRow in dtSor.Rows)
{
for (int i = 0; i < dtSor.Columns.Count; i++)
{
if (dRow[i].ToString().Contains(","))
{
dest_csv.Append("\"" + dRow[i].ToString() + "\"" + ",");
}
else if (dRow[i].ToString() == "")
{
dest_csv.Append("NULL" + ",");
}
else
{
dest_csv.Append(dRow[i].ToString() + ",");
//dest_csv.Append(dRow[i].ToString());
}
}
dest_csv.Remove(dest_csv.Length - 1, 1);
dest_csv.Append(Environment.NewLine);
}
File.WriteAllText(destination_file, dest_csv.ToString(), Encoding.Default);
}
First check if the special char is in the field with indexOf, then split the field on the special char in an araay and take only the parts you're intereseted in.
like so:
const char special = '\\';
string maintype ="";
string subtype ="";
string field = dRow[i].ToString();
if (field.IndexOf(special)>-1)
{
string[] splitted = field.Split(special);
maintype = splitted[0];
subtype = splitted[1];
}

Cannot WriteXML for DataTable because Windows Search Returns String Array for Authors Property

The System.Author Windows property is a multiple value string. Windows Search returns this value as an array of strings in a DataColumn. (The column's data-type is string[] or String().) When I call the WriteXML method on the resulting data-table, I get the following InvalidOperationException exception.
Is there a way to specify the data-table's xml-serializer to use for specific columns or specific data-types?
Basically, how can I make WriteXML work with this data-table?
System.InvalidOperationException:
Type System.String[] does not
implement IXmlSerializable interface
therefore can not proceed with
serialization.
You could easily copy your DataTable changing the offending Authors column to a String and joing the string[] data with a proper delimiter like "|" or "; ".
DataTable xmlFriendlyTable = oldTable.Clone();
xmlFriendlyTable.Columns["Author"].DataType = typeof(String);
xmlFriendlyTable.Columns["Author"].ColumnMapping = MappingType.Element;
foreach(var row in oldTable.Rows) {
object[] rowData = row.ItemArray;
object[] cpyRowData = new object[rowData.Length];
for(int i = 0; i<rowData.Length; i++) {
if(rowData[i] != null && rowData[i].GetType() == typeof(String[])) {
cpyRowData[i] = String.Join("; ", (rowData[i] as String[]));
} else {
cpyRowData[i] = rowData[i];
}
xmlFriendlyTable.Rows.Add(cpyRowData);
}
}
xmlFriendlyTable.WriteXml( ... );
NOTE Wrote the above in the web browser, so there may be syntax errors.