Only one streamlit column is updating on CSV - ag-grid

I'm new to streamlit, and I wanted to build an interactive app that updates a CSV file when pressing an "update" button.
I defined 3 columns ("A", "B", "C") that are editable from my CSV file (has two other columns). For some reason when I make my updates, only one of them ("A") is actually updating when I press the button, while the other two are not.
Here is my code:
def main():
parser = argparse.ArgumentParser(description='')
parser.add_argument("-d", "--data", type=str, required=False, default=data.csv)
args = None
try:
args = parser.parse_args()
except SystemExit as e:
os._exit(e.code)
st.set_page_config(layout="wide")
grid_table = show_grid(args.data)
st.sidebar.header("Options:")
st.sidebar.button("Update CSV file", on_click=update, args=[grid_table, args.data])
def data_upload(data_path):
df = pd.read_csv(data_path)
return df
def show_grid(data_path):
df = data_upload(data_path)
gb = GridOptionsBuilder.from_dataframe(df)
gb.configure_column("A", editable=True)
gb.configure_column("B", editable=True)
gb.configure_column("C", editable=True)
grid_table = AgGrid(
df,
height=400,
gridOptions=gb.build(),
fit_columns_on_grid_load=True,
allow_unsafe_jscode=True,
)
return grid_table
def update(grid_table, data_path):
grid_table_df = pd.DataFrame(grid_table['data'])
grid_table_df.to_csv(data_path, index=False)
if __name__ == '__main__':
main()
Tried to update columns "B" and "C" and expected them to update, but only changes done to column "A" seems to be updating on the CSV when pressing the update button.

Related

remove header from csv while reading from from txt or csv file in spark scala

I am trying to remove header from given input file. But I couldn't make it.
Th is what I have written. Can someone help me how to remove headers from the txt or csv file.
import org.apache.spark.{SparkConf, SparkContext}
object SalesAmount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(getClass.getName).setMaster("local")
val sc = new SparkContext(conf)
val salesRDD = sc.textFile(args(0),2)
val salesPairRDD = salesRDD.map(rec => {
val fieldArr = rec.split(",")
(fieldArr(1), fieldArr(3).toDouble)
})
val totalAmountRDD = salesPairRDD.reduceByKey(_+_).sortBy(_._2,false)
val discountAmountRDD = totalAmountRDD.map(t => {
if (t._2 > 1000) (t._1,t._2 * 0.9)
else t
})
discountAmountRDD.foreach(println)
}
}
Skipping the first row when manually parsing text files using the RDD API is a bit tricky:
val salesPairRDD =
salesRDD
.mapPartitionsWithIndex((i, it) => if (i == 0) it.drop(1) else it)
.map(rec => {
val fieldArr = rec.split(",")
(fieldArr(1), fieldArr(3).toDouble)
})
The header line will be the first item in the first partition, so mapPartitionsWithIndex is used to iterate over the partitions and to skip the first item if the partition index is 0.

Understanding the operation of map function

I came across the following example from the book "Fast Processing with Spark" by Holden Karau. I did not understand what the following line of code does in the program:
val splitLines = inFile.map(line => {
val reader = new CSVReader(new StringReader(line))
reader.readNext()
})
val numericData = splitLines.map(line => line.map(_.toDouble))
val summedData = numericData.map(row => row.sum)
The program is :
package pandaspark.examples
import spark.SparkContext
import spark.SparkContext._
import spark.SparkFiles;
import au.com.bytecode.opencsv.CSVReader
import java.io.StringReader
object LoadCsvExample {
def main(args: Array[String]) {
if (args.length != 2) {
System.err.println("Usage: LoadCsvExample <master>
<inputfile>")
System.exit(1)
}
val master = args(0)
val inputFile = args(1)
val sc = new SparkContext(master, "Load CSV Example",
System.getenv("SPARK_HOME"),
Seq(System.getenv("JARS")))
sc.addFile(inputFile)
val inFile = sc.textFile(inputFile)
val splitLines = inFile.map(line => {
val reader = new CSVReader(new StringReader(line))
reader.readNext()
})
val numericData = splitLines.map(line => line.map(_.toDouble))
val summedData = numericData.map(row => row.sum)
println(summedData.collect().mkString(","))
}
}
I briefly know the functionality of the above program. It parses the input
CSV and sums all the rows. But how exactly those 3 lines of code work to achieve is what I am unable to understand.
Also could anyone explain how the output would change if those lines are replaced with flatMap? Like:
val splitLines = inFile.flatMap(line => {
val reader = new CSVReader(new StringReader(line))
reader.readNext()
})
val numericData = splitLines.flatMap(line => line.map(_.toDouble))
val summedData = numericData.map(row => row.sum)
val splitLines = inFile.map(line => {
val reader = new CSVReader(new StringReader(line))
reader.readNext()
})
val numericData = splitLines.map(line => line.map(_.toDouble))
val summedData = numericData.map(row => row.sum)
so in this code is basically reading a CSV file data and adding it's value.
suppose your CSV file is something like -
10,12,13
1,2,3,4
1,2
so here inFile we are fetching a data from CSV file like -
val inFile = sc.textFile("your CSV file path")
so Here inFile is an RDD Which has text formatted data.
and when you apply collect on it then it will look like this -
Array[String] = Array(10,12,13 , 1,2,3,4 , 1,2)
and when you apply map over it then you will find -
line = 10,12,13
line = 1,2,3,4
line = 1,2
and for reading this data in CSV format it is using -
val reader = new CSVReader(new StringReader(line))
reader.readNext()
so after reading data in CSV format, splitLines look like -
Array(
Array(10,12,13),
Array(1,2,3,4),
Array(1,2)
)
on splitLines, it's applying
splitLines.map(line => line.map(_.toDouble))
here in line you will get Array(10,12,13) and after it, it's using
line.map(_.toDouble)
so it's changing all elements type from string to Double.
so in numericData you will get same
Array(Array(10.0, 12.0, 13.0), Array(1.0, 2.0, 3.0, 4.0), Array(1.0, 2.0))
but all elements now in form of Double
and it's applying the sum of the individual row or array so answer something like -
Array(35.0, 10.0, 3.0)
you will get it when you will apply susummedData.collect()
First of all there is no any flatMap operation in your code sample, so title is misleading. But in general map called on collection returns new collection with function applied to each element of collection.
Going line by line through your code snippet:
val splitLines = inFile.map(line => {
val reader = new CSVReader(new StringReader(line))
reader.readNext()
})
Type of inFile is RDD[String]. You take every such string, create csv reader out of it and call readNext (which returns array of strings). So at the end you will get RDD[String[]].
val numericData = splitLines.map(line => line.map(_.toDouble))
A bit more tricky line with 2 maps operations nested. Again, you take each element of RDD collection (which is now String[]) and apply _.toDouble function to every element of String[]. At the end you will get RDD[Double[]].
val summedData = numericData.map(row => row.sum)
You take elements of RDD and apply sum function to them. Since every element is Double[], sum will produce single Double value. At the end you will get RDD[Double].

Union all files from directory and sort based on first column

After implement the code below:
def ngram(s: String, inSep: String, outSep: String, n:Int): Set[String] = {
s.toLowerCase.split(inSep).sliding(n).map(_.sorted.mkString(outSep)).toSet
}
val fPath = "/user/root/data/data220K.txt"
val resultPath = "data/result220K"
val lines = sc.textFile(fPath) // lines: Array[String]
val ngramNo = 2
val result = lines.flatMap(line => ngram(line, " ", "+", ngramNo)).map(word => (word, 1)).reduceByKey((a, b) => a+b)
val sortedResult = result.map(pair => pair.swap).sortByKey(true)
sortedResult.count + "============================")
sortedResult.take(10)
sortedResult.saveAsTextFile(resultPath)
I'm getting a big amount of files in HDFS with this schema:
(Freq_Occurrencies, FieldA, FieldB)
Is possible to join all the file from that directory? Every rows are diferent but I want to have only one file sorted by the Freq_Occurrencies. Is possible?
Many thanks!
sortedResult
.coalesce(1, shuffle = true)
.saveAsTextFile(resultPath)`
coalesce makes Spark use a single task for saving, thus creating only one part. The downside is, of course, performance - all data will have to be shuffled to a single executor and saved using a single thread.

How to delete the last line of the file in scala?

I am trying to append to a file such that I first want to delete the last line and then start appending. But, I can't figure how to delete the last line of the file.
I am appending the file as follows:
val fw = new FileWriter("src/file.txt", true) ;
fw.write("new item");
Can anybody please help me?
EDIT:
val lines_list = Source.fromFile("src/file.txt").getLines().toList
val new_lines = lines_list.dropRight(1)
val pw = new PrintWriter(new File("src/file.txt" ))
(t).foreach(pw.write) pw.write("\n")
pw.close()
After following your method, I am trying to write back to the file, but when I do this, all the contents, with the last line deleted come in a single line, however I want them to come in separate lines.
For very large files a simple solution relies in OS related tools, for instance sed (stream editor), and so consider a call like this,
import sys.process._
Seq("sed","-i","$ d","src/file1.txt")!
which will remove the last line of the text file. This approach is not so Scalish yet it solves the problem without leaving Scala.
Return random access file in position without last line.
import java.io.{RandomAccessFile, File}
def randomAccess(file: File) = {
val random = new RandomAccessFile(file, "rw")
val result = findLastLine(random, 0, 0)
random.seek(result)
random
}
def findLastLine(random: RandomAccessFile, position: Long, previous: Long): Long = {
val pointer = random.getFilePointer
if (random.readLine == null) {
previous
} else {
findLastLine(random, previous, pointer)
}
}
val file = new File("build.sbt")
val random = randomAccess(file)
And test:
val line = random.readLine()
logger.debug(s"$line")
My scala is way off, so people can probably give you a nicer solution:
import scala.io.Source
import java.io._
object Test00 {
def main(args: Array[String]) = {
val lines = Source.fromFile("src/file.txt").getLines().toList.dropRight(1)
val pw = new PrintWriter(new File("src/out.txt" ))
(lines :+ "another line").foreach(pw.println)
pw.close()
}
}
Sorry for the hardcoded appending, i used it just to test that everything worked fine.

Searching for Terms and Printing lines in a Text File

The file name is searcher.scala and I want to be able to type:
scala searcher.scala "term I want to find" "file I want to search through" "new file with new lines"
I tried this code but keeps saying I have an empty iterator
import java.io.PrintWriter
val searchTerm = args(0)
val input = args(1)
val output = args(2)
val out = new PrintWriter(output);
val listLines = scala.io.Source.fromFile(input).getLines
for (line <- listLines)
{
{ out.println("Line: " + line) }
def term (x: String): Boolean = {x == searchTerm}
val newList = listLines.filter(term)
println(listLines.filter(term))
}
out.close;
You have iterator listLines and you read it few times but iterator is one-time object:
for (line <- listLines)
val newList = listLines.filter(term)
println(listLines.filter(term))
You need revise your code to avoid repeat using of iterator.