Linear Regression in mlr3 with Interactions / Quadratic Terms - linear-regression

I try to fit a linear model with interactions and/or quadratic terms in mlr3 benchmark. Unfortunately, I didn't find a possibility on github or stackexchange. Here is an example:
library(mlr3verse)
tskScen <- tsk("mtcars")
msrMSE <- msr("regr.rmse")
rsgScen = rsmp("cv", folds = 4)
learners = lrns(c("regr.lm", "regr.ranger"))
benchdesign = benchmark_grid(tskScen, learners, rsgScen)
bmr = benchmark(benchdesign, store_models = TRUE)
bmr$aggregate(msrMSE)
And here is the version info:
> mlr3verse_info()
package version
1: bbotk 0.5.1
2: mlr3cluster 0.1.2
3: mlr3data 0.6.0
4: mlr3filters 0.5.0
5: mlr3fselect 0.6.1
6: mlr3hyperband 0.4.0
7: mlr3learners 0.5.1
8: mlr3misc 0.10.0
9: mlr3pipelines 0.4.0
10: mlr3proba 0.4.4
11: mlr3tuning 0.12.1
12: mlr3tuningspaces 0.1.1
13: mlr3viz 0.5.7
14: paradox 0.8.0
Thanks!

You can use the regr.rsm Learner that is available through mlr3extralearners.
See here: https://mlr3extralearners.mlr-org.com/reference/mlr_learners_regr.rsm.html

So, the only way I've found to do this is with the model matrix pipe op: https://mlr3pipelines.mlr-org.com/reference/mlr_pipeops_modelmatrix.html
This allows you to define the formula of the task, like so:
library(mlr3verse)
library(mlr3pipelines)
tskScen <- tsk("mtcars")
pop <- po("modelmatrix", formula = ~ hp * disp * drat)
tskScen.new <- pop$train(list(tskScen))$output
lm.lrnr <- lrn("regr.lm")
lm.lrnr$train(tskScen.new)
lm.lrnr$predict(tskScen.new)
This creates new interaction features in the task which you can then use to train your linear model.

Related

How to convert geom_point(aes()) + geom_vline(aes()) to Plotly?

I found this tutorial online that helps convert ggplot2's geom_abline() to a Plotly graph: https://plotly.com/ggplot2/geom_abline/
It looks like we can simply make such conversion using ggplotly():
library(ggplot2)
library(plotly)
p <- ggplot(data, aes(x=x_val, y=y_val, colour=color_val)) +
geom_point() +
geom_vline(aes(xintercept=xintercept_val), colour=color_val)
ggplotly(p)
However, I cannot convert my ggplot2 graph into a Plotly graph with the following code:
# notice that both my x_val and xintercept_val are dates.
# here's my ggplot2 code:
gg <- ggplot(data) +
geom_point(aes(
x_val,
y_val,
color=color_val,
shape=shape_val
)) +
geom_vline(aes(
xintercept=xintercept_val,
color=color_val
))
ggplotly(gg)
Here's a screenshot of my ggplot2 graph (I cropped out the legends):
Here's a screenshot of my Plotly graph using ggplotly(gg):
Not sure why the vertical lines aren't showing up in Plotly.
Looks like you stumbled over a bug in ggplotly (perhaps you should raise an issue on github). The issue is that ggplotly internally converts the dates to numerics (same with categorical variables). However, inspecting the JSON representation via plotly_json shows that the xintercepts in geom_vline are not converted. That's why they don't show up. However, as a workaround you can make the conversion manually using as.numeric().
As you provided no data I use a simple example dataset from the plotly website to which I added some dates. Try this:
dat <- read.table(header=TRUE, text='
cond xval yval
control 11.5 10.8
control 9.3 12.9
control 8.0 9.9
control 11.5 10.1
control 8.6 8.3
control 9.9 9.5
control 8.8 8.7
control 11.7 10.1
control 9.7 9.3
control 9.8 12.0
treatment 10.4 10.6
treatment 12.1 8.6
treatment 11.2 11.0
treatment 10.0 8.8
treatment 12.9 9.5
treatment 9.1 10.0
treatment 13.4 9.6
treatment 11.6 9.8
treatment 11.5 9.8
treatment 12.0 10.6
')
dat$xval <- rep(as.Date(paste0("2020-", 1:10, "-01")), 2)
max_date1 <- dat[dat$cond == "control", "xval"][which.max(dat[dat$cond == "control", "yval"])]
max_date2 <- dat[dat$cond == "treatment", "xval"][which.max(dat[dat$cond == "treatment", "yval"])]
# The basic scatterplot
p <- ggplot(dat, aes(x=xval, y=yval, colour=cond)) +
geom_point()
# Add colored lines for the date of the max yval of each group
p <- p +
geom_vline(aes(xintercept=as.numeric(max_date1)), colour="green") +
geom_vline(aes(xintercept=as.numeric(max_date2)), colour="lightblue")
p
fig <- ggplotly(p)
fig
Gives me this plot:

mapbox gl fill-extrusion-height restriction decimal values in meter

I was trying to model a detailed building with fill-extrusion feature of mapbox, but I experienced that it takes height in multiplication of 1 meter only.
Please see this JSFiddle:
`http://jsfiddle.net/parveenkaloi/p5w1je7s/20/`
I've set 9 boxes in these sizes (meters):
1: 0.25m,
2: 0.5m,
3: 0.75m,
4: 1.0m,
5: 1.25m,
6: 1.5m,
7: 1.75m,
8: 2.0m,
9: 2.25m
but in result height is :
1: 0.0m,
2: 0.0m,
3: 0.0m,
4: 1.0m,
5: 1.0m,
6: 1.0m,
7: 1.0m,
8: 2.0m,
9: 2.0m
Please help me, if there is any solution for this.
Thanks
You use the old version of the mapbox-gl-js library - v0.38.0. Try latest v0.47.0.
There is also a shorter record for obtaining values - via expressions:
"paint": {
'fill-extrusion-color': ["get", "clr"],
'fill-extrusion-height': ["get", "ht" ],
'fill-extrusion-base': ["get", "pz" ]
}
[ http://jsfiddle.net/n3zvs9jm/1/ ]

Huge amount of memory used by flink

Since the last couple week I build a DataStream programs in Flink in scala.
But I have a strange behavior, flink uses lots of more memory than I expected.
I have a 4 ListState of tuple(Int, long) in my processFunction keyed by INT, I use it to get different unique Counter in a different time frame, and I expected the most of the memory was used by this List.
But it's not the case.
So I print an histo live of the JVM.
And I was surprised how many memories are used.
num #instances #bytes class name
----------------------------------------------
1: 138920685 6668192880 java.util.HashMap$Node
2: 138893041 5555721640 org.apache.flink.streaming.api.operators.InternalTimer
3: 149680624 3592334976 java.lang.Integer
4: 48313229 3092046656 org.apache.flink.runtime.state.heap.CopyOnWriteStateTable$StateTableEntry
5: 14042723 2579684280 [Ljava.lang.Object;
6: 4492 2047983264 [Ljava.util.HashMap$Node;
7: 41686732 1333975424 com.myJob.flink.tupleState
8: 201 784339688 [Lorg.apache.flink.runtime.state.heap.CopyOnWriteStateTable$StateTableEntry;
9: 17230300 689212000 com.myJob.flink.uniqStruct
10: 14025040 561001600 java.util.ArrayList
11: 8615581 413547888 com.myJob.flink.Data$FingerprintCnt
12: 6142006 393088384 com.myJob.flink.ProcessCountStruct
13: 4307549 172301960 com.myJob.flink.uniqresult
14: 4307841 137850912 com.myJob.flink.Data$FingerprintUniq
15: 2153904 137849856 com.myJob.flink.Data$StreamData
16: 1984742 79389680 scala.collection.mutable.ListBuffer
17: 1909472 61103104 scala.collection.immutable.$colon$colon
18: 22200 21844392 [B
19: 282624 9043968 org.apache.flink.shaded.netty4.io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
20: 59045 6552856 [C
21: 33194 2655520 java.nio.DirectByteBuffer
22: 32804 2361888 sun.misc.Cleaner
23: 35 2294600 [Lscala.concurrent.forkjoin.ForkJoinTask;
24: 640 2276352 [Lorg.apache.flink.shaded.netty4.io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
25: 32768 2097152 org.apache.flink.core.memory.HybridMemorySegment
26: 12291 2082448 java.lang.Class
27: 58591 1874912 java.lang.String
28: 8581 1372960 java.lang.reflect.Method
29: 32790 1311600 java.nio.DirectByteBuffer$Deallocator
30: 18537 889776 java.util.concurrent.ConcurrentHashMap$Node
31: 4239 508680 java.lang.reflect.Field
32: 8810 493360 java.nio.HeapByteBuffer
33: 7389 472896 java.util.HashMap
34: 5208 400336 [I
The tupple(Int, long) is com.myJob.flink.tupleState in 7th position.
And I see the tuple use less than 2G of memory.
I don't understand why flink used this amount of memory for these classes.
Can anyone give me a light on this behavior, thanks in advance.
Update:
I run my job on a stand alone cluster (1 jobManager, 3 taskManager)
the flink version is 1.5-SNAPSHOT commit : e4486ae
I get the histo live on one taskManager node.
Update 2 :
In my processFunction I used :
ctx.timerService.registerProcessingTimeTimer(ctx.timestamp + 100)
And after on onTimer function, I process my listState to check all old data.
so it create a timer for each call on processFunction.
but why the timer is steel on memory after onTimer function triggered
How many windows do you end up with? Based on the top two entries what are are seeing is the "timers" that are used by Flink to track when to clean up the window. For every key in the window you will end up with (key, endTimestamp) effectively in the timer state. If you have a very large number of windows (perhaps out of order time or delayed watermarking) or a very large number of keys in each window, those will each take up memory.
Note that even if you are using RocksDB state, the TimerService uses Heap memory so you have to watch out for that.

Tensorflow doesn't want to use GPU

I want to train "standford chatbot" from here https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot on GPU, but it doesn't use my GPU, but all need libraries (CuNN, CUDA, tensorflow-gpu etc.) are installed
I tried:
def train():
""" Train the bot """
test_buckets, data_buckets, train_buckets_scale = _get_buckets()
# in train mode, we need to create the backward path, so forwrad_only is False
model = ChatBotModel(False, config.BATCH_SIZE)
model.build_graph()
saver = tf.train.Saver(var_list=tf.trainable_variables())
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)) as sess:
print('Start training')
sess.run(tf.global_variables_initializer())
_check_restore_parameters(sess, saver)
iteration = model.global_step.eval()
total_loss = 0
while True:
skip_step = _get_skip_step(iteration)
bucket_id = _get_random_bucket(train_buckets_scale)
encoder_inputs, decoder_inputs, decoder_masks = data.get_batch(data_buckets[bucket_id],
bucket_id,
batch_size=config.BATCH_SIZE)
start = time.time()
_, step_loss, _ = run_step(sess, model, encoder_inputs, decoder_inputs, decoder_masks, bucket_id, False)
total_loss += step_loss
iteration += 1
if iteration % skip_step == 0:
print('Итерация {}: потеря {}, время {}'.format(iteration, total_loss/skip_step, time.time() - start))
start = time.time()
total_loss = 0
saver.save(sess, os.path.join(config.CPT_PATH, 'chatbot'), global_step=model.global_step)
if iteration % (10 * skip_step) == 0:
# Run evals on development set and print their loss
_eval_test_set(sess, model, test_buckets)
start = time.time()
sys.stdout.flush()
But It always show:
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'save/Const': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Const: CPU
Identity: CPU
[[Node: save/Const = Constdtype=DT_STRING, value=Tensor, _device="/device:GPU:0"]]
Are there some configuration file for tensorflow where I can specify to use only GPU or some another way (i tried "with tf.device("/gpu:0"):" and device_count={'GPU': 1}) )
From your error:
Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
That means that the 'save/Const' operation cannot be forcefully assigned to a GPU via with tf.device(): because there is no GPU implementation for it. Remove the with tf.device(): part (or put that operation outside of it) and let TF decide where to put operations (it will prefer GPU over CPU anyhow)

Tensorflow: Cannot interpret feed_dict key as Tensor

I am trying to build a neural network model with one hidden layer (1024 nodes). The hidden layer is nothing but a relu unit. I am also processing the input data in batches of 128.
The inputs are images of size 28 * 28. In the following code I get the error in line
_, c = sess.run([optimizer, loss], feed_dict={x: batch_x, y: batch_y})
Error: TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder_64:0", shape=(128, 784), dtype=float32) is not an element of this graph.
Here is the code I have written
#Initialize
batch_size = 128
layer1_input = 28 * 28
hidden_layer1 = 1024
num_labels = 10
num_steps = 3001
#Create neural network model
def create_model(inp, w, b):
layer1 = tf.add(tf.matmul(inp, w['w1']), b['b1'])
layer1 = tf.nn.relu(layer1)
layer2 = tf.matmul(layer1, w['w2']) + b['b2']
return layer2
#Initialize variables
x = tf.placeholder(tf.float32, shape=(batch_size, layer1_input))
y = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
w = {
'w1': tf.Variable(tf.random_normal([layer1_input, hidden_layer1])),
'w2': tf.Variable(tf.random_normal([hidden_layer1, num_labels]))
}
b = {
'b1': tf.Variable(tf.zeros([hidden_layer1])),
'b2': tf.Variable(tf.zeros([num_labels]))
}
init = tf.initialize_all_variables()
train_prediction = tf.nn.softmax(model)
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
model = create_model(x, w, b)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(model, y))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
#Process
with tf.Session(graph=graph1) as sess:
tf.initialize_all_variables().run()
total_batch = int(train_dataset.shape[0] / batch_size)
for epoch in range(num_steps):
loss = 0
for i in range(total_batch):
batch_x, batch_y = train_dataset[epoch * batch_size:(epoch+1) * batch_size, :], train_labels[epoch * batch_size:(epoch+1) * batch_size,:]
_, c = sess.run([optimizer, loss], feed_dict={x: batch_x, y: batch_y})
loss = loss + c
loss = loss / total_batch
if epoch % 500 == 0:
print ("Epoch :", epoch, ". cost = {:.9f}".format(avg_cost))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
valid_prediction = tf.run(tf_valid_dataset, {x: tf_valid_dataset})
print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))
test_prediction = tf.run(tf_test_dataset, {x: tf_test_dataset})
print("TEST accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
then i had again loaded the model.
K.clear_session()
i faced this problem in production server,
but in my pc it was running fine
...........
from keras import backend as K
#Before prediction
K.clear_session()
#After prediction
K.clear_session()
Variable x is not in the same graph as model, try to define all of these in the same graph scope. For example,
# define a graph
graph1 = tf.Graph()
with graph1.as_default():
# placeholder
x = tf.placeholder(...)
y = tf.placeholder(...)
# create model
model = create(x, w, b)
with tf.Session(graph=graph1) as sess:
# initialize all the variables
sess.run(init)
# then feed_dict
# ......
If you use django server, just runserver with --nothreading
for example:
python manage.py runserver --nothreading
I had the same issue with flask. adding --without-threads flag to flask run or threaded=False to app.run() fixed it
In my case, I was using loop while calling in CNN multiple times, I fixed my problem by doing the following:
# Declare this as global:
global graph
graph = tf.get_default_graph()
# Then just before you call in your model, use this
with graph.as_default():
# call you models here
Note: In my case too, the app ran fine for the first time and then gave the error above. Using the above fix solved the problem.
Hope that helps.
The error message TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("...", dtype=dtype) is not an element of this graph can also arise in case you run a session outside of the scope of its with statement. Consider:
with tf.Session() as sess:
sess.run(logits, feed_dict=feed_dict)
sess.run(logits, feed_dict=feed_dict)
If logits and feed_dict are defined properly, the first sess.run command will execute normally, but the second will raise the mentioned error.
You can also experience this while working on notebooks hosted on online learning platforms like Coursera. So, implementing following code could help get over with the issue.
Implement this at the topmost block of Notebook file:
from keras import backend as K
K.clear_session()
Similar to #javan-peymanfard and #hmadali-shafiee, I ran into this issue when loading the model in an API. I was using FastAPI with uvicorn. To fix the issue I just set the API function definitions to async similar to this:
#app.post('/endpoint_name')
async def endpoint_function():
# Do stuff here, including possibly (re)loading the model