Heres a sample DF:
Date Party name Symbol Buy/Sell indicator # of shares trade price
2011-01-03 American Funds EuPc;A AAPL BUY 2400 332.87
2011-02-14 American Funds CWGI;A SLB BUY 6700 94.08
2011-01-06 Tudor Investment Corp ALL BUY 11800 31.92
2011-01-20 American Funds Inc;A AMZN SELL 3600 180.14
And here is what I wish to achieve:
Date Party name Symbol Buy/Sell # of shares trade price trading volume
2011-04-21 Federated Prime Obl;Inst MMM BUY 2600 96.17 250042
2011-01-05 Fortress Investment Group CMCSA SELL 29700 21.96 644193
2011-02-28 Dodge & Cox Intl Stock DELL SELL 57400 15.67 899458
2011-05-02 American Funds Inc;A S BUY 137300 5.19 712587
The new trading volume column is the # of shares column * trade price column. Anyone know how to achieve this automatically since there are a lot more lines? What I would like to do after is take the trading volume values and show them as an output in descending order. The exact instruction is
The biggest dollar trading volume counter parties, top twenty list.
I have this so far:
val dataframe = spark.read.cvs("c:\data")
val newdf = dataframe.select("# of shares","trade price")
Any help would be much appreciated. Thank you.
Here you go:
import org.apache.spark.sql.functions._
val newdf = dataframe.withColumn("trading volume",col("# of shares")*col("trade price"))
.select("# of shares","trade price","trading volume")
Related
I have a dataset ab the donor of the organization. Each row starts with donor's ID and name, home address, donate_date_1, donate_amount_1, donate_date_2, donate_amount_2, keep going until 98. It looks similar to this
Donor_ID NAME donate_date_1 donate_amount_1 donate_date_2 donate_amount_2 Total_Amount
0 A Month/Day/Year 100 Month/Day/Year 200 300
1 B Month/Day/Year 200 Month/Day/Year 50 250
2 C Month/Day/Year 1000 Month/Day/Year 500 1500
The dataset has the donors' records from 1982. Every donor starts to donate on different date, ex: 2007 to donor1 is the first donation, but to donor 2 might be the 10th donation.
So here is my question: How can I combine all the dates to see which year the donors donate the most money and create the filter to see how much money was donated in total in the specific period?
I have a spreadsheet with clients first purchase I want to see how many new clients a month we are getting. However some of the clients we sell too could have the same name in a different zip code for example miami clinic could be in both florida and ohio so I want them counted individually. I also want to see the total new clinics per month. but if a clinic purchases in january and again in march i only want that january purchase counted
Let us assume your sample data be like
Now create a calculated field first purchase date as
{Fixed [Clinic Name], [Zipcode]: min([Sale Date])}
This field will give each clinic its first purchase date only. Check it
Now to create a filter for first purchase date create a calculated field say first purchase filter as
[first purchase date] = [Sale Date]
Adding TRUE value filter (and don't forget to add the filter to context by right clicking it) will give sale amount in first purchases
OR use this filter for differentiation also
For any further query please share your actual data structure with a few rows and desired output in respect of those sample rows.
This question already has answers here:
Xml processing in Spark
(4 answers)
Closed 3 years ago.
I want to play around with the 1987 Reuters dataset using Scala and possibly Spark. I can see that the files I've downloaded are in the .sgm format. I've never seen this before but performing a more:
$ more reut2-003.sgm
<!DOCTYPE lewis SYSTEM "lewis.dtd">
<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="19419" NEWID="3001">
<DATE> 9-MAR-1987 04:58:41.12</DATE>
<TOPICS><D>money-fx</D></TOPICS>
<PLACES><D>uk</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN>
RM
f0416reute
b f BC-U.K.-MONEY-MARKET-SHO 03-09 0095</UNKNOWN>
<TEXT>
<TITLE>U.K. MONEY MARKET SHORTAGE FORECAST AT 250 MLN STG</TITLE>
<DATELINE> LONDON, March 9 - </DATELINE><BODY>The Bank of England said it forecast a
shortage of around 250 mln stg in the money market today.
Among the factors affecting liquidity, it said bills
maturing in official hands and the treasury bill take-up would
drain around 1.02 billion stg while below target bankers'
balances would take out a further 140 mln.
Against this, a fall in the note circulation would add 345
mln stg and the net effect of exchequer transactions would be
an inflow of some 545 mln stg, the Bank added.
REUTER
</BODY></TEXT>
</REUTERS>
we can see that it looks like pretty simple markup.
Since I don't want to write my own parser, my question is, is there some simple way of parsing this in Scala/Spark using some library?
Q: Since I don't want to write my own parser, my question is, is there
some simple way of parsing this in Scala/Spark using some library?
AFAIK there is no such api. you have to map and parse (clean special characters in it) it. transform in to multiple columns.
I tried in the below way... but your xml showing as corrupt record from dataframe.
Further pointer :https://github.com/databricks/spark-xml
import java.io.File
import org.apache.commons.io.FileUtils
import org.apache.spark.sql.{SQLContext, SparkSession}
/**
* Created by Ram Ghadiyaram
*/
object SparkXmlWithDtd {
def main(args: Array[String]) {
val spark = SparkSession.builder.
master("local")
.appName(this.getClass.getName)
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
val sc = spark.sparkContext
val sqlContext = new SQLContext(sc)
val str =
"""
|<!DOCTYPE lewis SYSTEM "lewis.dtd">
|
|<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="19419" NEWID="3001">
|<DATE> 9-MAR-1987 04:58:41.12</DATE>
|<TOPICS><D>money-fx</D></TOPICS>
|<PLACES><D>uk</D></PLACES>
|<PEOPLE></PEOPLE>
|<ORGS></ORGS>
|<EXCHANGES></EXCHANGES>
|<COMPANIES></COMPANIES>
|<UNKNOWN>
|RM
|f0416reute
|b f BC-U.K.-MONEY-MARKET-SHO 03-09 0095</UNKNOWN>
|<TEXT>
|<TITLE>U.K. MONEY MARKET SHORTAGE FORECAST AT 250 MLN STG</TITLE>
|<DATELINE> LONDON, March 9 - </DATELINE><BODY>The Bank of England said it forecast a
|shortage of around 250 mln stg in the money market today.
| Among the factors affecting liquidity, it said bills
|maturing in official hands and the treasury bill take-up would
|drain around 1.02 billion stg while below target bankers'
|balances would take out a further 140 mln.
| Against this, a fall in the note circulation would add 345
|mln stg and the net effect of exchequer transactions would be
|an inflow of some 545 mln stg, the Bank added.
| REUTER
|</BODY></TEXT>
|</REUTERS>
""".stripMargin
val f = new File("sgmtest.sgm")
FileUtils.writeStringToFile(f, str)
val xml_df = spark.read.
format("com.databricks.spark.xml")
.option("rowTag", "REUTERS")
.load(f.getAbsolutePath)
xml_df.printSchema()
xml_df.createOrReplaceTempView("XML_DATA")
spark.sql("SELECT * FROM XML_DATA").show(false)
xml_df.show(false)
}
}
Result :
root
|-- _corrupt_record: string (nullable = true)
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|_corrupt_record |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
9-MAR-1987 04:58:41.12
money-fx
uk
RM
f0416reute
b f BC-U.K.-MONEY-MARKET-SHO 03-09 0095
U.K. MONEY MARKET SHORTAGE FORECAST AT 250 MLN STG
LONDON, March 9 - The Bank of England said it forecast a
shortage of around 250 mln stg in the money market today.
Among the factors affecting liquidity, it said bills
maturing in official hands and the treasury bill take-up would
drain around 1.02 billion stg while below target bankers'
balances would take out a further 140 mln.
Against this, a fall in the note circulation would add 345
mln stg and the net effect of exchequer transactions would be
an inflow of some 545 mln stg, the Bank added.
REUTER
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|_corrupt_record |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
9-MAR-1987 04:58:41.12
money-fx
uk
RM
f0416reute
b f BC-U.K.-MONEY-MARKET-SHO 03-09 0095
U.K. MONEY MARKET SHORTAGE FORECAST AT 250 MLN STG
LONDON, March 9 - The Bank of England said it forecast a
shortage of around 250 mln stg in the money market today.
Among the factors affecting liquidity, it said bills
maturing in official hands and the treasury bill take-up would
drain around 1.02 billion stg while below target bankers'
balances would take out a further 140 mln.
Against this, a fall in the note circulation would add 345
mln stg and the net effect of exchequer transactions would be
an inflow of some 545 mln stg, the Bank added.
REUTER
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Is there a rule for stocktwits symbols for stocks that are traded on non-US stock exchanges? For example British Telecom is traded on the London Stock Exchange and it's symbol is LON: BT.A.
How do I format the stocktwits symbol for this stock? Is it $LON:BT.A or $LONBTA?
StockTwits does not currently support the London Stock Exchange but might in the future. Only forex, US exchanges and the Toronto Stock Exchange are currently supported
I want to implement recurring payment in PayPal with variable amount. I successfully implement recurring payment with constant amount. But i don't know how to implement the recurring payment with variable amount,
Very typical scenario would be Telephone Bill amount deduction by the service providers.
If my September month bill contains Rental : 20 Euros, usage : 15 Euros, then the deduction would be 35 euros
Next if my October month bill contains Rental : 20 Euros , usage : 25 Euros, then the deduction would be 45 Euros.
Next if my November month bill contains Rental : 20 Euros , usage : 50 Euros, then the deduction would be 70 Euros.
Considering the above scenarios, please advise how to handle it from both the sides..
Thanks in advance..
Riyaz
You might have to simply automate the PayPal payment from your end,
not automatically from PayPal's end. You can't have a subscription
that varies in price, so you'll have to do a single charge every
month, with the amount you specify. (As far as I know)
That also means that you'll have to manage the subscriptions on your
end (pretty easily doable), and there will be no way for the user to
un-subscribe from the PayPal side.