Issue with DatePicker - RSelenium - datepicker

I'm scraping publicly available data for academic research. The website I'm pulling the information from has a really annoying datepicker though. I'm not sure if they implement this to deter private companies from scraping criminal data but it seems pretty dumb.
Here's the url.
I can bypass the Captcha with my institutional credentials, FYI.
You can see code - minus the login information - below:
#Miami Scraper
rm(list=ls())
remDr$close()
rm(rD)
gc()
rm(list=ls())
setwd("~/Desktop/Miami Scrape")
library(httr)
library(rvest)
library(zoo)
library(anytime)
library(lubridate)
library(dplyr)
library(RSelenium)
browser <- remoteDriver(port = 5556, browserName = "firefox")
remDr<-browser[["client"]]
url <- "https://www2.miami-dadeclerk.com/PremierServices/login.aspx"
rD <- rsDriver(verbose=FALSE,port=4444L,browser="firefox")
remDr <- rD$client
remDr$navigate(url)
#Click the Logging In Option
#Log-in stuff happens here
url2 <- "https://www2.miami-dadeclerk.com/cjis/casesearch.aspx"
remDr <- rD$client
remDr$navigate(url2)
#Here, you will read in the sheets. Let's start with a handful
date <- c("02", "01", "01")
sequence <- c("030686","027910","014707")
seqbar <- remDr$findElement("id","txtCaseNo3")
seqbar$sendKeysToElement(list(sequence[1]))
type <- remDr$findElement("id","ddCaseType")
type$clickElement()
type$sendKeysToElement(list("F","\n"))
yearbar <- remDr$findElement("id","txtCaseNo2")
yearbar$clearElement()
prev <- remDr$setTimeout("2000")
yearbar$sendKeysToElement(list(date[1]))
Invariably, the datepicker defaults to 19 but this isn't systematic. I'm only beginning to develop the code but I notice if I use the same case information for two searches in a row that it'll switch from "02" to "19" regularly. If I switch to another case, it may not work either. I'm not sure how to deal with this datepicker. Any help would be greatly appreciated.
I've tried a couple of things. As you can see, I've tried to clear out the default and slow my code, too. That doesn't seem to work.
Also one last note, if you line-by-line run the code it works but execution all at once won't run properly.

I can't test with R as can't seem to get RSelenium set up, but changing the value attribute of the year input box seems to work. In R it looks like there are two ways to do that.
Can't test, but something like:
year <- '02'
#method 1 using inbuilt method which executes js under hood
remDr$findElement('id','txtCaseNo2')$setElementAttribute('value',year)
#method 2 js direct
js <- paste0("document.querySelector('#txtCaseNo2').value='", year,"';")
remDr$executeScript(js)
Anyway, might be enough to get you on track for a solution.
I tested similar versions with Python successfully
from selenium import webdriver
d = webdriver.Chrome()
d.get('https://www2.miami-dadeclerk.com/cjis/casesearch.aspx?AspxAutoDetectCookieSupport=1')
case_nums = ["030686"]
year = '02'
d.execute_script("document.querySelector('#txtCaseNo2').value='" + year + "';")
# d.execute_script("arguments[0].value = '02';", d.find_element_by_id('txtCaseNo2'))
d.find_element_by_id('txtCaseNo3').send_keys(case_nums[0])
d.find_element_by_css_selector('[value=F]').click()
captcha = input()
d.find_element_by_id('CaptchaCodeTextBox').send_keys(captcha)
d.find_element_by_id('btnCaseSearch').click()

Related

When using OPT-2.7B or any other natural language model, is there a way to trick it into having a conversation/ give it a pre prompt in the code

Using this code, or a variant of, is there anything that can be added to "trick" opt into conversing as another user in a style more similar to a chatbot. As of now it will either start something more similar to an article or have a conversation with itself for a few lines as seen below.
val = input("Enter your value: ")
input_ids = tokenizer((val), return_tensors='pt').input_ids
output = model.generate(input_ids, min_length=1, max_length=1024, penalty_alpha=0.6, top_k=6)
print("Output:\n" + 100 * '-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 * '-')
with open("OutputText.txt", "w") as f:
f.write(tokenizer.decode(output[0], skip_special_tokens=True))
Here's an example of the current output:
*User Input:
Hello There.
Model Output:
Hello there. I have an egg that matches your TSV. Would you mind hatching it for me?
Sure, I'll add you now. Let me know when you're online.
Sorry for the late reply. I'm online now for the next few hours. Just send me a trade request whenever you're ready.
No probs, I'm in the middle of a battle at the moment, but I'll get you as soon as I'm done.
Thank you very much for the hatch. Have a nice day :D
*
I've attempted to add a prompt to the start and it hasn't made a difference.

R Web Scraping rvest forms submit_form

I am new to r and not very knowledgeable about html, xml etc. I am trying to scrape a site that requires input from a drop down. It's for an academic paper using text and sentiment analysis on the press releases of members of congress. NOT A PROGRAMMER LOL So be gentle!
memberUrl = 'https://grijalva.house.gov/press-releases/'
session <- html_session(memberUrl)
forms <- html_form(session)
yearForm <- forms[[4]]
#--- so far so good (I think) -- and i have successfully scraped sites that don't have drop downs
#--- but here is where I get confused and can't find a good tutorial on forms and submit_form
set_values(yearForm, ??? ) #----- get stuck on how to use set_values
submit_form( session, yearForm, ???) #--- and here
Thanks! Jim
submit_form didn't work, maybe because that form uses JS to submit. Here is the solution:
library(rvest)
memberUrl = 'https://grijalva.house.gov/press-releases/'
session <- html_session(memberUrl)
session <- rvest:::request_POST(session,
memberUrl,
body = list(
getNewsByyear = "2018" #change the value here, 'getNewsByyear' is the name of the dropdown list
))
titles <- read_html(session) %>%
html_nodes("ul > li > h3") %>%
html_text()

Using date input via Dateinput to be filename

I have a shiny UI which allows user to select a date via dateinput box. Given output from this will be backup daily hence would like to use such "date", eg 20181224 as part of filename.
library(shiny)
library(shinyFiles)
ui <- fluidPage(
sidebarPanel(
dateInput("COBInput", "Select a Date", value=Sys.Date())
))
server <- function(input,output,session){
COB <- reactive(as.Date(input$COBInput,format="%Y-%m-%d"))
COB2 <- paste(
"Test",as.character(
format(input$COBInput,format="%Y-%m-%d",'%Y')
)
)}
shinyApp(ui,server)
Error that I got :
Listening on http://127.0.0.1:4973
Warning: Error in .getReactiveEnvironment()$currentContext: Operation not allowed
without an active reactive context. (You tried to do something that can only be
done from inside a reactive expression or observer.)
54: stop
53: .getReactiveEnvironment()$currentContext
52: .subset2(x, "impl")$get
51: $.reactivevalues
47: server [N:/AdHoc Query/R/FFVA/DateInputTest/ShinyApp.R#42]
Error in .getReactiveEnvironment()$currentContext() :
Operation not allowed without an active reactive context. (You tried to do something that can only be done from inside a reactive expression or observer.)
I would expect for each day, I could save file with name like "Daily20181224","Daily20181221" etc
I was exactly not clear about the requirements but tried using textoutput which can give you idea about how to generate the filename.
library(shiny)
library(shinyFiles)
ui <- fluidPage(
sidebarPanel(
dateInput("COBInput", "Select a Date", value=Sys.Date()),
textOutput("filename")
))
server <- function(input,output,session){
output$filename<-renderText({
input_date<-input$COBInput
year <- as.numeric(format(input_date,'%Y'))
month<-as.numeric(format(input_date,'%m'))
day<-as.numeric(format(input_date,'%d'))
paste0("Daily",year,month,day)
})
}
shinyApp(ui,server)
I think you can generate the filenames now
One thing I like to say about ShinyFiles - I think you are aware that it can only be used for server side file browsing after deployment.

How to suppress printing of variable values in zeppelin

Given the following snippet:
val data = sc.parallelize(0 until 10000)
val local = data.collect
println(s"local.size")
Zeppelin prints out the entire value of local to the notebook screen. How may that behavior be changed?
You can also try adding curly brackets around your code.
{val data = sc.parallelize(0 until 10000)
val local = data.collect
println(s"local.size")}
Since 0.6.0, Zeppelin provides a boolean flag zeppelin.spark.printREPLOutput in spark's interpreter configuration (accessible via the GUI), which is set to true by default.
If you set its value to false then you get the desired behaviour that only explicit print statements are output.
See also: https://issues.apache.org/jira/browse/ZEPPELIN-688
What I do to avoid this is define a top-level function, and then call it:
def run() : Unit = {
val data = sc.parallelize(0 until 10000)
val local = data.collect
println(local.size)
}
run();
FWIW, this appears to be new behaviour.
Until recently we have been using Livy 0.4, it only output the content of the final statement (rather than echoing the output of the whole script).
When we upgraded to Livy 0.5, the behaviour changed to output the entire script.
While splitting the paragraph and hiding the output does work, it seems like an unnecessary overhead to the usability of Zeppelin.
for example, if you need to refresh your output, then you have to remember to run two paragraphs (i.e. the one that sets up your output and the one containing the actual println).
There are, IMHO, other usability issues with this approach that makes, again IMHO, Zeppelin less intuitive to use.
Someone has logged this JIRA ticket to address "the problem", please vote for it:
LIVY-507
Zeppelin, as well as spark-shell REPL, always prints the whole interpreter output.
If you really want to have only local.size string printed - best way to do it is to put println "local.size" statement inside the separate paragraph.
Then you can hide all output of the previous paragraph using small "book" icon on the top-right.
a simple trick I am using is to define
def !() ="_ __ ___ ___________________________________________________"
and use as
$bang
above or close to the code I want to check
and it works
res544: String = _ __ ___ ___________________________________________________
then I just leave there commented out ;)
// hope it helps

How do I Benchmark RESTful Service with Variable Parameters?

I'm currently working on benchmarking a RESTful service I've made, and part of that is making sure it runs in a reasonable amount of times for a large array of parameters. For example, let's say I have RESTful API of the form some_site.com/item?item_id=y. In that case to be sure my service is working as fast as I'd like it to work, I'd want to try out many values for y one by one, preferably coming from some text file. I can't figure out any way of doing this in ab or httperf. I'm open to using a different benchmarking program if I have, but would prefer something simple and light. What I want to do seems like something pretty standard, so I'm guessing there must already be a program that let's me do it, but an hour or so of googling hasn't gotten me an answer. Ideas?
Answer: Jmeter (which is apparently awesome). This faq explains how to do it. Hopefully this helps someone else, as it took me like a day of searching to figure this out.
I have just had some good experience with using JavaScript (via BSF/Rhino) in JMeter.
I have put one thread group in my test plan and stick a 'Simple Controller' with two elements under it - 'HTTP Request' sampler and 'BSF PreProcessor'.
Set BSF language to 'javascript' and either type the code into the text box or point it to a file (use full path or relative to CWD of JMeter process).
/* Since `Math.random()` gives us float, we use `java.util.Random()`
* see: http://docs.oracle.com/javase/7/docs/api/java/util/Random.html */
var Random = new Packages.java.util.Random();
var min = 10-1;
var max = 2;
var maxLines = (min)+Random.nextInt(max-min);
var s = '';
for (var d = 0; d <= maxLines; d++) {
s += d.toString()+','+Random.nextInt(1000).toString()+'\n';
}
// s => '0,312\n1,104\n2,608\n'
vars.put('PAYLOAD', s);
Now I can refer to ${PAYLOAD} in the HTTP request!
You can generate JSON, but you will need to upgrade jakarta-jmeter-2.5.1/lib/js-1.6R5.jar with the newest version of Rhino to get JSON.stringify and JSON.parse. That worked perfectly for me also, though I thought I'd put a simple example here.
You can use BSF pre-processor for URL params as well, just set another variable with vars.put('X', 'some value') and pass it as ${X} in the request parameter.
This blog post helped quite a bit, by the way.