Update sepcific line in file - flutter

Hi is there a method from which I can update a specific line in a file.
My file has data seperated by line break
Sample example to delete line but I have to write everything into file again, can I perform CRUD opertion directly on file lines ?
I want to update specific line in file wihout reading entire file => update string and => write all lines to file.
I may switch to any kind of file type that can offer me this functionality.
Is there a way to store data in row column architecture like sql ?
import 'dart:io';
Future<void> myAsyncFunction() async {
const index = 5;
final File f = File('test.txt');
final List<String> lines = await f.readAsLines();
lines.removeAt(index);
await f.writeAsString(lines.join('\n'));
}

This should be possible by using the String Scanner library, it provides a class called LineScanner and LineScannerState through which you can set the position.
I have not tried this for the exact use case you mention above, so please do evaluate it for your use-case

Files are stored as a contiguous array of bytes on a disk, there is no way to remove a specific line without scanning for newlines and shifting trailing data to fill the void.
For a more sophisticated way of storing data there are many popular database packages, including sqflite, hive, drift, sembast, and objectbox.

Related

how to pass a variable value from one flutter page to another

suppose, I have a file body1.dart where I created a variable "hello" which was initially empty!
String hello = "";
Now, in the same file, I have created a loop which will set and update the value of hello after every iteration!
Now the thing is whenever the value of "hello" get changed, I want to display that inside a Text Field which is in a different file body2.dart. I want to retrieve the real-time value at that exact moment! (Not the final outcome at the end of the loop)
Can you provide code snippet for this? So that we can best assist you.
Otherwise the best option would be to use a stream. You can have multiple dart classes subscribed to the same stream.
Streams provide an asynchronous sequence of data. Data sequences include user-generated events and data read from files. You can process a stream using either await for or listen() from the Stream API
Checkout official documentation from dart.dev

Scala Spark not reading ignoring first line header and loading all data from 2nd line onwards

I have a Scala Spark notebook on an AWS EMR cluster that loads data from an AWS S3 bucket. Previously, I had standard code like the following:
var stack = spark.read.option("header", "true").csv("""s3://someDirHere/*""")
This loaded multiple directories of files (.txt.gz) into a Spark DataFrame object called stack.
Recently, there were new files added to this directory. The content of the new files look the same (I downloaded a couple of them and opened them using both Sublime Text and Notepad++). I tried two different text editors to see if there were perhaps some invisible, non-unicode characters that was disrupting the interpretation of the first line as a header. The new data files causes my code above to ignore the first header line and instead interpret the second line as the header. I have tried a few variations without luck, here are a few examples of things I tried:
var stack = spark.read.option("quote", "\"").option("header", "true").csv("""s3://someDirHere/*""") // header not detected
var stack = spark.read.option("escape", "\"").option("header", "true").csv("""s3://someDirHere/*""") // header not detected
var stack = spark.read.option("escape", "\"").option("quote", "\"").option("header", "true").csv("""s3://someDirHere/*""") // header not detected
I wish I could share the files but it contains confidential information. Just wondering if there are some ideas as to what I can try.
how many files are there? if its to much to check manually you could try to read them withouth the header option. Your expectation is that the header matches everywhere right?
If thats truly the case that should have a count of 1:
spark.read.csv('path').limit(1).dropDuplicates().count()
If not you could see like this what different headers there are
spark.read.csv('path').limit(1).dropDuplicates().show()
Remember its important not to use the header option, so you can operate on it

Searching all file names recursively in hdfs using Spark

I’ve been looking for a while now for a way to get all filenames in a directory and its sub-directories in Hadoop file system (hdfs).
I found out I can use these commands to get it :
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
sc.wholeTextFiles(path).map(_._1)
Here is "wholeTextFiles" documentation:
Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.
Parameters:
path - Directory to the input data files, the path can be
comma separated paths as the list of inputs.
minPartitions - A
suggestion value of the minimal splitting number for input data.
Returns:
RDD representing tuples of file path and the corresponding
file content
Note: Small files are preferred, large file is also
allowable, but may cause bad performance., On some filesystems,
.../path/* can be a more efficient way to read all files in a
directory rather than .../path/ or .../path, Partitioning is
determined by data locality. This may result in too few partitions by
default.
As you can see "wholeTextFiles" returns a pair RDD with both the filenames and their content. So I tried mapping it and taking only the file names, but I suspect it still reads the files.
The reason I suspect so: if I try to count (for example) and I get the spark equivalent of "out of memory" (losing executors and not being able to complete the tasks).
I would rather use Spark to achieve this goal the fastest way possible, however, if there are other ways with a reasonable performance I would be happy to give them a try.
EDIT:
To clear it - I want to do it using Spark, I know I can do it using HDFS commands and such thing - I would like to know how to do such thing with the existing tools provided with Spark and maybe an explanation on how I can make "wholeTextFiles" not reading the text itself (kind of like how transformations only happen after an action and some of the "commands" never really happen).
Thank you very much!
This is the way to list out all the files till the depth of last subdirectory....and is with out using wholetextfiles
and is recursive call till the depth of subdirectories...
val lb = new scala.collection.mutable[String] // variable to hold final list of files
def getAllFiles(path:String, sc: SparkContext):scala.collection.mutable.ListBuffer[String] = {
val conf = sc.hadoopConfiguration
val fs = FileSystem.get(conf)
val files: RemoteIterator[LocatedFileStatus] = fs.listLocatedStatus(new Path(path))
while(files.hasNext) {// if subdirectories exist then has next is true
var filepath = files.next.getPath.toString
//println(filepath)
lb += (filepath)
getAllFiles(filepath, sc) // recursive call
}
println(lb)
lb
}
Thats it. it was tested with success. you can use as is..

Forcing Tesseract to match pattern (four digits in a row)

I'm trying to get Tesseract (using the Tess4J wrapper) to match only a specific pattern. The pattern is four digits in a row, which I think would be \d\d\d\d. Here is a VERY small subset of the image I'm feeding tesseract (the floorplans are restricted, so I'm cautious to post much more of it): http://mike724.com/view/a06771
I'm using the following java code:
File imageFile = new File("/<redacted>/file.pdf");
Tesseract instance = Tesseract.getInstance();
instance.setTessVariable("load_system_dawg", "F");
instance.setTessVariable("load_freq_dawg", "F");
instance.setTessVariable("user_words_suffix", "");
instance.setTessVariable("user_patterns_suffix", "\\d\\d\\d\\d");
try {
String result = instance.doOCR(imageFile);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
The problem I'm running into is that tesseract seems to not be honoring these configuration options, I still get text/words in the results. I expect to get only the room numbers (ex. 2950).
You have not configured this correctly.
user_patterns_suffix is meant to indicate the file extension of a text file that contains your patterns, e.g.
user_patterns_suffix pats
would mean you need to put a file in the tesseract tessdata folder
tessdata/eng.pats
... assuming eng was the language you were using.
See more here:
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data
I do recall that user patterns may not be any shorter than 6 fixed chars before a pattern so you may not be able to accomplish this in any case - but try the correct config first.
They look like init-only parameters; as such, they need to be in a configs file, for instance, named bazaar placed under configs folder, to be be passed into setConfigs method.
instance.setConfigs(Arrays.asList("bazaar");
References:
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc
https://github.com/tesseract-ocr/tesseract/wiki/ControlParams
http://tess4j.sourceforge.net/docs/docs-1.4/

generic text reading

I am working on a project where I need to read some generic text...I am looking for any api by I can read generic text and also can convert it to .csv file...
Can any one plz help...
using java on windows os...
--------------------------MORE Detail---------------------------------------------------------------------------------------
let me clarify:
Assume I have a pdf document or for that matter any file type document. I intend to use Print to Generic text printer option and get the file in that format.Finally, I intend to use some API which shoudl enable me to programatically read this Generic Text Format file. I intend to extract text from this generic text file.
So, be it any file (.doc/.pdf/.xls etc wtatever), I intend to create a Generic Text Format file using print option. Then run my code to read those files and extract some information.
PS: Assume that I have a Status report form with standard fields. Ok. But, some people might submit in .pdf, some in .doc , some in text format. But, every document contains same fields, but probably with diferent layouts.
Now, I am looking for a generic solution, by which i shoudl be able to convert every file type in to generic text file format and then apply some logic to extract my Status report fields.
In Java this is more or less what you need to read a text file, assuming it's comma separated (just change the string in the "line.split" method if you need something else). It also skips the header.
public void parse(String filename) throws IOException {
File file = new File(filename);
FileInputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis);
BufferedReader br = new BufferedReader(isr);
String line;
int header = 1;
while ((line = br.readLine()) != null) {
if (header == 1) {
header = 2;
continue; // skips header
}
String[] splitter = line.split(",");
// do whatever
System.out.println(splitter[0]);
}
}
CSV is a format for data in columns. It's not very useful for, say, a Wikipedia article.
The Apache Tika library will take all kinds of data and turn it into bland XML, from which you can make CSV as you like.
It would help if you would edit your question to clarify 'generic' versus' generated', and tell more about the data.
As for Windows printer drivers, are you looking to do something like 'print to pdf' as 'print to csv'? If so, I suspect that you need to start from MSDN samples of printer drivers and code this the hard way.
The so-called 'generic text file format' is not a structured format. It's completely unpredictable what you will find in there for any given input to the printer system.
A generic free book: Text Processing in Python
Just used the standard Java classes for I/O:
BufferedWriter, File, FileWriter, IOException, PrintWriter
.csv is simply a comma-separated values file. So just name your output file with a .csv extension.
You'll also need to figure out how you'd like to split your content.
Here are Java examples to get you going:
writing to a text file
how to read lines from a file