Using Laravel 5.5 and eloquent.
Now i got a lot of data list with number.
Example Data actually have more than 100 data:
user_wallets
--------------------------
|id | points | use_point |
--------------------------
|1 | 10.2 | 502.22 |
|2 | 32.6 | 23.3 |
|3 | 33 | 1020.32 |
--------------------------
any idea to update the point in better ways?
let say, I want to update all point with * 0.55.
below is my current ways, but it slows.
and this calculation is updated with some offer date.
$rate = $request->rate; //0.55
$wallets = UserWallet::all();
foreach($wallets as $wallet)
{
UserWallet::find($wallet->id)->update(['points'=>$wallet->points*$rate])
}
any idea to make it faster? or better way to do this update?
Thanks
You could try using the each() method as $wallets is a collection.
So:
$rate = $request->rate; //0.55
$wallets = UserWallet::all();
$wallets->each(function($item, $key) use(&$rate){
$item->update(['points' => $item->points*$rate])
});
It's not a whole lot different than a foreach but I think it's the "Eloquent" way.
Related
I have distorted Data,
I am using below function here.
to_timestamp("col","yyyy-MM-dd'T'hh:mm:ss.SSS'Z'")
Data:
time | OUTPUT | IDEAL
2022-06-16T07:01:25.346Z | 2022-06-16T07:01:25.346+0000 | 2022-06-16T07:01:25.346+0000
2022-06-16T06:54:21.51Z | 2022-06-16T06:54:21.051+0000 | 2022-06-16T06:54:21.510+0000
2022-06-16T06:54:21.5Z | 2022-06-16T06:54:21.005+0000 | 2022-06-16T06:54:21.500+0000
so, I have S or SS or SSS format for milisecond in data. How can i normalise it into SSS correct way? Here, 51 miliseconds mean 510 not 051.
Using spark version : 3.2.1
Code :
import pyspark.sql.functions as F
test = spark.createDataFrame([(1,'2022-06-16T07:01:25.346Z'),(2,'2022-06-16T06:54:21.51Z'),(3,'2022-06-16T06:54:21.5Z')],['no','timing1'])
timeFmt = "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'"
test = test.withColumn("timing2", (F.to_timestamp(F.col('timing1'),format=timeFmt)))
test.select("timing1","timing2").show(truncate=False)
Output:
I also use v3.2.1 and it works for me if you just don't parse the timestamp format. It is already in the right format:
from pyspark.sql import functions as F
test = spark.createDataFrame([(1,'2022-06-16T07:01:25.346Z'),(2,'2022-06-16T06:54:21.51Z'),(3,'2022-06-16T06:54:21.5Z')],['no','timing1'])
new_df = test.withColumn('timing1_ts', F.to_timestamp('timing1'))\
new_df.show(truncate=False)
new_df.dtypes
+---+------------------------+-----------------------+
|no |timing1 |timing1_ts |
+---+------------------------+-----------------------+
|1 |2022-06-16T07:01:25.346Z|2022-06-16 07:01:25.346|
|2 |2022-06-16T06:54:21.51Z |2022-06-16 06:54:21.51 |
|3 |2022-06-16T06:54:21.5Z |2022-06-16 06:54:21.5 |
+---+------------------------+-----------------------+
Out[9]: [('no', 'bigint'), ('timing1', 'string'), ('timing1_ts', 'timestamp')]
I was using this setting :
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
I have to reset this and it is working as normal.
I have a dataset just like in the example below and I am trying to group all rows from a given symbol and perform standard scaling of each group so that at the end all my data is scaled by groups. How can I do that with MlLib and Pyspark? I could not find a single solution on internet for it. Can anyone help here?
+------+------------------+------------------+------------------+------------------+
|symbol| open| high| low| close|
+------+------------------+------------------+------------------+------------------+
| AVT| 4.115| 4.115| 4.0736| 4.0736|
| ZEC| 365.6924715181936| 371.9164684545918| 364.8854025324053| 369.5950712239761|
| ETH| 647.220769018717| 654.6370842160561| 644.8942258095359| 652.1231757197687|
| XRP|0.3856343600456335|0.4042970302356221|0.3662228285447956|0.4016658006619401|
| XMR|304.97650674864144|304.98649644294267|299.96970554155274| 303.8663243145598|
| LTC|321.32437862304715| 335.1872636382617| 320.9704201234651| 334.5057757774086|
| EOS| 5.1171| 5.1548| 5.1075| 5.116|
| BCH| 1526.839255299505| 1588.106037653013|1526.8392543926366|1554.8447136830328|
| DASH| 878.00000003| 884.03769206| 869.22000004| 869.22000004|
| BTC|17042.224796462127| 17278.87984139109|16898.509289685637|17134.611038665582|
| REP| 32.50162799| 32.501628| 32.41062673| 32.50162799|
| DASH| 858.98413357| 863.01413927| 851.07145059| 851.17051529|
| ETH| 633.1390884474979| 650.546984589714| 631.2674221381849| 641.4566047907362|
| XRP|0.3912300406160967|0.3915937383961073|0.3480682353334925|0.3488616679337076|
| EOS| 5.11| 5.1675| 5.0995| 5.1674|
| BCH|1574.9602789966184|1588.6004569127992| 1515.3| 1521.0|
| BTC| 17238.0199449088| 17324.83886467445|16968.291408828714| 16971.12960974206|
| LTC| 303.3999614441217| 317.6966006615225|302.40702519057584| 310.971265429805|
| REP| 32.50162798| 32.50162798| 32.345677| 32.345677|
| XMR| 304.1618444641083| 306.2720324372592|295.38042671416935| 295.520097663825|
+------+------------------+------------------+------------------+------------------+
I suggest you import the following:
import pyspark.sql.functions as f
then you can do it like this (not fully tested code):
stats_df = df.groupBy('symbol').withColumn(\
'open', f.mean("open")).alias("open_mean")\
.withColumn(\
'open', f.stddev("open")).alias("open_stddev").collect()
This is the principle of how you would do it (you could use instead the min and max functions for a MinMax scaling), then you just have to apply the formula of standard scaling to stats_df:
x' = (x - μ) / σ
I have a spark code to read some data from a database.
One of the columns (of type string) named "title" contains the following data.
+-------------------------------------------------+
|title |
+-------------------------------------------------+
|Example sentence |
|Read the ‘Book’ |
|‘LOTR’ Is A Great Book |
+-------------------------------------------------+
I'd like to remove the HTML entities and decode it to look as given below.
+-------------------------------------------+
|title |
+-------------------------------------------+
|Example sentence |
|Read the ‘Book’ |
|‘LOTR’ Is A Great Book |
+-------------------------------------------+
There is a library "html-enitites" for node.js that does exactly what I am looking for,
but i am unable to find something similar for spark-scala.
What would be good approach to do this?
You can use org.apache.commons.lang.StringEscapeUtils with a help of UDF to achieve this.
import org.apache.commons.lang.StringEscapeUtils;
val decodeHtml = (html:String) => {
StringEscapeUtils.unescapeHtml(html);
}
val decodeHtmlUDF = udf(decodeHtml)
df.withColumn("title", decodeHtmlUDF($"title")).show()
/*
+--------------------+
| title|
+--------------------+
| Example sentence |
| Read the ‘Book’ |
|‘LOTR’ Is A Great...|
+--------------------+
*/
This question already has an answer here:
Difference between explode and explode_outer
(1 answer)
Closed 2 years ago.
i have the following data frame
|tokenCnt|filtered |
|5 |[java,scala, list, java, linkedlist]|
|3 |[also, genseq, parseq] |
I want to take out the arrays in the column 'filtered' one by one and turn them into one data frame.
|filtered|
|java |
|scala |
|list |
|java |
|linkedList|
|alse |
|gensqe |
|parseq |
like this.
Could someone help me?
You can use explode:
val result = df.select(explode("filtered").alias("filtered"))
I'm using the Zend Framework database relationships for a couple of weeks now. My first impression is pretty good, but I do have a question related to inserting related data into multiple tables. For a little test application I've related two tables with each other by using a fuse table.
+---------------+ +---------------+ +---------------+
| Pages | | Fuse | | Prints |
+---------------+ +---------------+ +---------------+
| pageid | | fuseid | | printid |
| page_active | | fuse_page | | print_title |
| page_author | | fuse_print | | print_content |
| page_created | | fuse_locale | | ... |
| ... | | ... | +---------------+
+---------------+ +---------------+
Above is an example of my DB architecture
Now, my problem is how to insert related data to two separate tables and insert the two newly created ID's into the fuse table at the same time. If someone could could maybe explain or give me a topic related tutorial. I would appreciate it!
I assume you got separate models for each table. Then simply insert stuff in Prints table, store returned ID in variable. Then insert stuff in Pages table and store returned ID in another varialble. Eventually insert data in your Fuse table. You do not need any "at the same time" (atomic) operation here. ID of newly inserted rows are returned by save() (I assume you use autoincrement fields for this).
$printsModel = new Application_Model_Prints();
$pagesModel = new Application_Model_Pages();
$fuseModel = new Application_Model_Fuse();
$printData = array('print_title'=>'foo',
...);
$printId = $printsModel->insert( $printData );
$pagesData = array('page_author'=>'bar',
...);
$pageId = $pagesModel->insert($pagesData);
$fuseData = array('fuse_page' => $pageId,
'fuse_print' => $printId,
...);
$fuseId = $fuseModel->insert($fuseData);
thus is pseudo code, so you may want to move inserts into your models and do somoe i.e. normalisation etc.
I also suggest paying more attention to fields naming convention. It usually helps and now you got fuseid but also fuse_page. So it either should be fuse_id or fusepage (not to mention I suspect this field stores id so it would be fuse_page_id or fusepageid).
Prints and Pages are two entities . Create row clases for each
class Model_Page extends Zend_Db_Table_Row_Abastract
{
public function addPrint($print)
{
$fuseTb = new Table_Fuse();
$fuse = $fuseTb->createRow();
$fuse->fuse_page = $this->pageid;
$fuse->fuse_print = $print->printid;
$fuse->save();
return $fuse;
}
}
Now when you create page
$page = $pageTb->createRow() ; //instance of Model_Page is returned
$page->addPrint($printTb->find(1)->current());