Counting lines of a file in Scala - scala

I am studying Scala nowadays and this is my code snippet to count the number of lines in a text file.
//returns line number of a file
def getLineNumber(fileName: String): Integer = {
val src = io.Source.fromFile(fileName)
try {
src.getLines.size
} catch {
case error: FileNotFoundException => -1
case error: Exception => -1
}
finally {
src.close()
}
}
I am using Source.fromFile method as explained in Programming in Scala book. Here is the problem: If my text file is like this:
baris
ayse
deneme
I get the correct result 6. If I press enter after word deneme I still get number 6, however I exptect 7 in this case. If I press space after pressing enter I get 7 which is correct again. Is this a bug in Scala standard library or more possibly am I missing something?
Finally, my basic main method here If it helps:
def main(args: Array[String]): Unit = {
println(getLineNumber("C:\\Users\\baris\\Desktop\\bar.txt"))
}

It uses java.io.BufferedReader to readLine. Here is the source of that method:
/**
* Reads a line of text. A line is considered to be terminated by any one
* of a line feed ('\n'), a carriage return ('\r'), or a carriage return
* followed immediately by a linefeed.
*
* #return A String containing the contents of the line, not including
* any line-termination characters, or null if the end of the
* stream has been reached
*
* #exception IOException If an I/O error occurs
*
* #see java.nio.file.Files#readAllLines
*/
public String readLine() throws IOException {
return readLine(false);
}
Which calls this:
...
* #param ignoreLF If true, the next '\n' will be skipped
...
String readLine(boolean ignoreLF) ...
...
/* Skip a leftover '\n', if necessary */
if (omitLF && (cb[nextChar] == '\n'))
nextChar++;
skipLF = false;
omitLF = false;
So basically that's how it's implemented. I guess it depends what a line means to you. Are you counting lines that contain something or new line characters? - different things obviously.

If you press enter after word deneme simply you add an end-of-line sequence (CR+LF, in your case) to the 6th line. You see the cursor goes to new line, but you did not create a new line: You simply specify that the sixth line is over. To create a new line you have to put a character after the end-of-line sequence, as you make when you press space.

Related

Special characters other than delimiter in DelimitedLineTokenizer of FlatFileItemReader

I am trying to read unl file in Spring batch.
Use FlatFileItemReader and delimiter is "|".
001-A472468827" |N|100| The delimiter does not work when encountering this data.
Data cannot be divided by the delimiter if it contains " and spaces or if it contains the # character.
quoteCharacter doesn't seem to work.
In this situation, is there a way to import special characters such as " and # as they are?
#Bean
#StepScope
public FlatFileItemReader unlFileReader() throws MalformedURLException {
return new FlatFileItemReaderBuilder<ExampleDTO>()
.name("unlFileReader")
/*.encoding(StandardCharsets.UTF_8.name())*/
.resource(fileService.inputFileResource(UNZIP_PATH + "example.unl"))
.fieldSetMapper(new BeanWrapperFieldSetMapper<>())
.targetType(ExampleDTO.class)
.delimited().delimiter("|")
.quoteCharacter('#')
.quoteCharacter('"')
.quoteCharacter(DelimitedLineTokenizer.DEFAULT_QUOTE_CHARACTER)
.includedFields(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141
)
.names(ExampleDTO.getFieldNameArrays())
.build();
}
In this situation, is there a way to import special characters such as " and # as they are?
You are calling quoteCharacter() several times, note that this overrides the previous value and does not add the quote character to a list of quote characters. Only one quote character will be used (the last one added if you chain such calls).
Data cannot be divided by the delimiter if it contains " and spaces or if it contains the # character
This is because " is the default quote character. If the input contains a single ", you need to specify another delimiter (otherwise Spring Batch considers that as a "bug" in your data, which is true as the field is not correctly quoted). Here is a quick test that passes:
#Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827\"|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827\"", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions.assertEquals("100", fieldSet.readString(2));
}
This test shows that the " is part of the first field. The same test passes with a # in the input:
#Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827#|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827#", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions.assertEquals("100", fieldSet.readString(2));
}

CwvReader not loading lines starting with #

I'm trying to load a text file (.csv) into a SQL Server database table. Each line in the file is supposed to be loaded into a single column in the table. I find that lines starting with "#" are skipped, with no error. For example, the first two of the following four lines are loaded fine, but the last two are not. Anybody knows why?
ThisLineShouldBeLoaded
This one as well
#ThisIsATestLine
#This is another test line
Here's the segment of my code:
var sqlConn = connection.StoreConnection as SqlConnection;
sqlConn.Open();
CsvReader reader = new CsvReader(new StreamReader(f), false);
using (var bulkCopy = new SqlBulkCopy(sqlConn))
{
bulkCopy.DestinationTableName = "dbo.TestTable";
try
{
reader.SkipEmptyLines = true;
bulkCopy.BulkCopyTimeout = 300;
bulkCopy.WriteToServer(reader);
reader.Dispose();
reader = null;
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
System.Diagnostics.Debug.WriteLine(ex.Message);
throw;
}
}
# is the default comment character for CsvReader. You can change the comment character by changing the Comment property of the Configuration object. You can disable comment processing altogether by setting the AllowComment property to false, eg:
reader.Configuration.AllowComments=false;
SqlBulkCopy doesn't deal with CSV files at all, it sends any data that's passed to WriteServer to the database. It doesn't care where the data came from or what it contains, as long as the column mappings match
Update
Assuming LumenWorks.Framework.IO.Csv refers to this project the comment character can be specified in the constructor. One could set it to something that wouldn't appear in a normal file, perhaps even the NUL character, the default char value :
CsvReader reader = new CsvReader(new StreamReader(f), false, escape:default);
or
CsvReader reader = new CsvReader(new StreamReader(f), false, escape : '\0');

Good practice for error messages when checking multiple parameters

My problem:
I have a function with two file paths that are both required to start with '/'.
I struggled with finding a simple solution to tell the caller of the function precisely what went wrong if the requirement failed.
For example, if only one of the paths is wrong, should I say the first file parameter was wrong or the parameter with name aFile? It was more complicated than I thought.
My solution:
After failing to give back the incorrect paths in a simple and clear manner, I settled with the following solution, where I give back all paths, because it was simple, gave clear information and is easy to adapt to more paths. (Also it was shorter than using an if-else statement):
/** ...
* #param i Some number...
* #param aFile Path of AData file to use (e.g. "/dir/a.csv")
* #param anyPath Random text...
* #param bFile Path of BData file to use (e.g. "/bData.csv")
*/
def f(i: Int, aFile: String, anyPath: String, bFile: String): Unit = {
val s = List(aFile, bFile)
require(!s.exists(!_.startsWith("/")),
"Paths aFile and bFile must start with '/', but List(aFile, bFile) was: " + s)
}
val a = "/adf"
val b = "asdf"
f(1, a, "eee", b)
// IllegalArgumentException: requirement failed:
// Paths aFile and bFile must start with '/', but List(aFile, bFile) was: List(/adf, asdf)
Are there suggestions or a good practice to handle such a case better?

Signature defined. Must be closed in PdfSignatureAppearance

I am getting the below error when signing a pdf. The error is
“Signature defined. Must be closed in PdfSignatureAppearance.”
I am able to sign the pdf for the first time. It creates a pdf file in output folder with the signature in the first page. So far the code works fine.
Now when I give the recently generated file as input to sign in a second page I get the error “Signature defined. Must be closed in PdfSignatureAppearance.”
I am getting the error in the below line
appearance.SetVisibleSignature(new iTextSharp.text.Rectangle(300, 40, 530, 120), pageNo, "Icsi-Vendor");
Please find the code below
if (File.Exists(fName))
{
PdfReader.unethicalreading = true;
using (PdfReader pdfReader = new PdfReader(fName))
{
//file name
fName = fName.Substring(fName.LastIndexOf("\\") + 1);
outputFile = outputFolder + fName + ".pdf";
if (!File.Exists(outputFile))
{
using (FileStream fout = new FileStream(outputFile, FileMode.Create, FileAccess.ReadWrite))
{
using (PdfStamper stamper = PdfStamper.CreateSignature(pdfReader, fout, '\0'))
{
PdfSignatureAppearance appearance = stamper.SignatureAppearance;
string imagePath = txtImage.Text;
iTextSharp.text.Image signatureFieldImage = iTextSharp.text.Image.GetInstance(imagePath);
appearance.SignatureGraphic = signatureFieldImage;
signatureFieldImage.SetAbsolutePosition(250, 50);
stamper.GetOverContent(pageNo).AddImage(signatureFieldImage);
appearance.SetVisibleSignature(new iTextSharp.text.Rectangle(300, 40, 530, 120), pageNo, "Icsi-Vendor");
appearance.Reason = txtReason.Text;
IExternalSignature es = new PrivateKeySignature(pk, "SHA-256");
MakeSignature.SignDetached(appearance, es, new X509Certificate[] { pk12.GetCertificate(alias).Certificate }, null, null, null, 0, CryptoStandard.CMS);
stamper.Close();
}
}
}
}
this.Invoke(new BarDelegate(UpdateBar), fName);
}
Can some one help me please and let me know in case more details are required.
There are multiple issues in the OP's code:
The correct Close call
When applying signatures, one must not close the stamper object itself but instead the signature appearance object. And if one uses helper methods like MakeSignature.SignDetached, one does not even have to code that closing because SignDetached implicitly already closes the appearance in its last line.
Thus, please
remove stamper.Close() and
don't put PdfStamper stamper = PdfStamper.CreateSignature(pdfReader, fout, '\0') into a using directive as this causes a call of the stamper's Dispose method which in turn calls Close.
Usually you are not hurt by those lines because after the implicit appearance close in MakeSignature.SignDetached, further close calls are ignored.
If you don't get that far, though, e.g. due to some error situation before, such close calls cause the error you observe, in your case the close call caused by the using directive.
The issue in SetVisibleSignature
You are getting the error in
appearance.SetVisibleSignature(new iTextSharp.text.Rectangle(300, 40, 530, 120), pageNo, "Icsi-Vendor");
Unfortunately the actual error occurring in this line is replaced by the error caused by the Close call during the Dispose call due to the using directive.
Considering the message code:
/**
* Sets the signature to be visible. It creates a new visible signature field.
* #param pageRect the position and dimension of the field in the page
* #param page the page to place the field. The fist page is 1
* #param fieldName the field name or <CODE>null</CODE> to generate automatically a new field name
*/
virtual public void SetVisibleSignature(Rectangle pageRect, int page, String fieldName) {
if (fieldName != null) {
if (fieldName.IndexOf('.') >= 0)
throw new ArgumentException(MessageLocalization.GetComposedMessage("field.names.cannot.contain.a.dot"));
AcroFields af = writer.GetAcroFields();
AcroFields.Item item = af.GetFieldItem(fieldName);
if (item != null)
throw new ArgumentException(MessageLocalization.GetComposedMessage("the.field.1.already.exists", fieldName));
this.fieldName = fieldName;
}
if (page < 1 || page > writer.reader.NumberOfPages)
throw new ArgumentException(MessageLocalization.GetComposedMessage("invalid.page.number.1", page));
this.pageRect = new Rectangle(pageRect);
this.pageRect.Normalize();
rect = new Rectangle(this.pageRect.Width, this.pageRect.Height);
this.page = page;
}
the obvious causes would be
the field name containing a dot,
the named field already existing in the document, or
an invalid page number.
As you describe your situation as
I am able to sign the pdf for the first time. It creates a pdf file in output folder with the signature in the first page. So far the code works fine. Now when I give the recently generated file as input to sign in a second page I get the error
I assume the second item to be the most probable cause: If you want to add multiple signatures to the same document, their field names must differ.
Append mode
As you indicate that you apply multiple signatures to the same file, you must use the append mode. If you don't, you'll invalidate the earlier signatures:
PdfStamper stamper = PdfStamper.CreateSignature(pdfReader, fout, '\0', true);
Cf. that CreateSignature method overload comment
/**
* Applies a digital signature to a document, possibly as a new revision, making
* possible multiple signatures. The returned PdfStamper
* can be used normally as the signature is only applied when closing.
* <p>
... (outdated Java example code) ...
* #param reader the original document
* #param os the output stream or <CODE>null</CODE> to keep the document in the temporary file
* #param pdfVersion the new pdf version or '\0' to keep the same version as the original
* document
* #param tempFile location of the temporary file. If it's a directory a temporary file will be created there.
* If it's a file it will be used directly. The file will be deleted on exit unless <CODE>os</CODE> is null.
* In that case the document can be retrieved directly from the temporary file. If it's <CODE>null</CODE>
* no temporary file will be created and memory will be used
* #param append if <CODE>true</CODE> the signature and all the other content will be added as a
* new revision thus not invalidating existing signatures
* #return a <CODE>PdfStamper</CODE>
* #throws DocumentException on error
* #throws IOException on error
*/
public static PdfStamper CreateSignature(PdfReader reader, Stream os, char pdfVersion, string tempFile, bool append)

Doxygen #code line numbers

Is there a way to display code line numbers inside a #code ... #endcode block? From the screenshots in the doxygen manual it would seem that there is, but I was unable to find an option for doxygen itself, or a tag syntax to accomplish this.
I need this to be able to write something like "In the above code, line 3" after a code block.
Tested also for fenced code blocks, still getting no numbers.
Short Answer
It seems that at least in the current version (1.8.9) line numbers are added:
to C code only when using \includelineno tag
to any Python code
Details
Python code formatter
Python code formatter includes line numbers if g_sourceFileDef evaluates as TRUE:
/*! start a new line of code, inserting a line number if g_sourceFileDef
* is TRUE. If a definition starts at the current line, then the line
* number is linked to the documentation of that definition.
*/
static void startCodeLine()
{
//if (g_currentFontClass) { g_code->endFontClass(); }
if (g_sourceFileDef)
( https://github.com/doxygen/doxygen/blob/Release_1_8_9/src/pycode.l#L356
)
It's initialized from FileDef *fd passed into parseCode/parsePythonCode if it was provided (non-zero) or from new FileDef(<...>) otherwise:
g_sourceFileDef = fd;
<...>
if (fd==0)
{
// create a dummy filedef for the example
g_sourceFileDef = new FileDef("",(exName?exName:"generated"));
cleanupSourceDef = TRUE;
}
( https://github.com/doxygen/doxygen/blob/Release_1_8_9/src/pycode.l#L1458 )
so it seems all Python code is having line numbers included
C code formatter
C code formatter has an additional variable g_lineNumbers and includes line numbers if both g_sourceFileDef and g_lineNumbers evaluate as TRUE:
/*! start a new line of code, inserting a line number if g_sourceFileDef
* is TRUE. If a definition starts at the current line, then the line
* number is linked to the documentation of that definition.
*/
static void startCodeLine()
{
//if (g_currentFontClass) { g_code->endFontClass(); }
if (g_sourceFileDef && g_lineNumbers)
( https://github.com/doxygen/doxygen/blob/Release_1_8_9/src/code.l#L486 )
They are initialized in the following way:
g_sourceFileDef = fd;
g_lineNumbers = fd!=0 && showLineNumbers;
<...>
if (fd==0)
{
// create a dummy filedef for the example
g_sourceFileDef = new FileDef("",(exName?exName:"generated"));
cleanupSourceDef = TRUE;
}
( https://github.com/doxygen/doxygen/blob/Release_1_8_9/src/code.l#L3623 )
Note that g_lineNumbers remains FALSE if provided fd value was 0
HtmlDocVisitor
Among parseCode calls in HtmlDocVisitor::visit there is only one (for DocInclude::IncWithLines, what corresponds to \includelineno) which passes non-zero fd:
https://github.com/doxygen/doxygen/blob/Release_1_8_9/src/htmldocvisitor.cpp#L540
so this seems to be the only command which will result in line numbers included into C code listing