Convert cursor into pyspark dataframe

Convert cursor into pyspark dataframe - pyspark

I have been using this to connect to our organisation's cluster to query the data. is it possible to convert the cursor output directly into pyspark dataframe.
jarFile="/dataflame/nas_nfs/tmp/lib/olympus-jdbc-driver.jar"
url = "olympus:jdbcdriver"
env='uat'
print("Using environment", env.upper())
className = "net.vk.olympus.jdbc.driver.OlympusDriver"
conn = jaydebeapi.connect(className, url,{'username':userid,'password':pwd,'ENV':env,'datasource':'HIVE','EnableLog':'false'},jarFile)
cursor = conn.cursor()
query = "select * from abc.defcs123 limit 5"
cursor.execute(query)
pandas_df = as_pandas(cursor)
print(pandas_df)

Related

Add quotation to each element of a list

I need to use a variable that I've created before in spark to select data from a teradata table:
%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
val df = sqlContext.sql(query)
val dfv = df.select("cod_contrato")
the variable is a string.
So I would like to query the databe usign that vector of strings:
If I use:
%spark
val sql = s"(SELECT * FROM xx2.CONTRATOS where cod_contrato in '$dfv') as query"
I get:
(SELECT * FROM xx2.CONTRATOS where cod_contrato in '[cod_contrato: string]') as query
The desired result would be:
SELECT * FROM xx2.CONTRATOS where cod_contrato in ('11111', '11112' )
How can I transform the vector to a list enclosed by () and with quotation in each element?
thanks

This is my trial. From some dataframe,
val test = df.select("id").as[String].collect
> test: Array[String] = Array(6597, 8011, 2597, 5022, 5022, 6852, 6852, 5611, 14838, 14838, 2588, 2588)
and so the test is now array. Thus, by using mkString,
val sql = s"SELECT * FROM xx2.CONTRATOS where cod_contrato in " + test.mkString("('", "','", "')") + " as query"
> sql: String = SELECT * FROM xx2.CONTRATOS where cod_contrato in ('6597','8011','2597','5022','5022','6852','6852','5611','14838','14838','2588','2588') as query
where the final result is now string.

Make a temp view of the values you want to filter on and then reference it in the query
%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
sqlContext.sql(query).selectExpr("cast(cod_contrato as string)").createOrReplaceTempView("dfv_table"")
val sql = "(SELECT * FROM xx2.CONTRATOS where cod_contrato in (select * from dfv_table)) as query"
this will work for the query in spark sql, but will not return a query string. Lamanus's answer should be sufficient if all you want is the query as string

pymongo to RMongo, mongodb sort query

In pymongo I'm doing a sort query like this:
from pymongo import MongoClient
client = MongoClient()
dbase = client[dbname]
collection = dbase[symbol]
start = time.time()
cursor = collection.find().sort([{'_id', -1}]).limit(6000)
data = list(cursor)
Trying to do the same thing in R now...
library("RMongo")
mongo <- mongoDbConnect("dbname", "localhost", 27017)
query = '{sort({_id: -1})}'
output <- dbGetQuery(mongo, "symbol", query, skip=0, limit=6000)
> output
data frame with 0 columns and 0 rows
what is the proper JSON query string format here?

figured it out with mongolite....
library('mongolite')
con <- mongo("collection_name", url = "mongodb://localhost:27017/dbname")
output <- con$find('{}', sort='{"_id":-1}', limit=6000)

passing numeric where parameter condition in postgress using python

I am trying to use postgresql in Python.The query is against a numeric field value in the where condition. The result set is not fetching and giving error ("psycopg2.ProgrammingError: no results to fetch").There are records in the database with agent_id (integer field) > 1.
import psycopg2
# Try to connect
try:
conn=psycopg2.connect("dbname='postgres' host ='localhost')
except:
print "Error connect to the database."
cur = conn.cursor()
agentid = 10000
try:
sql = 'SELECT * from agent where agent_id > %s::integer;'
data = agentid
cur.execute(sql,data)
except:
print "Select error"
rows = cur.fetchall()
print "\nRows: \n"`
for row in rows:``
print " ", row[9]

Perhaps try these things in your code:
conn=psycopg2.connect("dbname=postgres host=localhost user=user_here password=password_here port=port_num_here")
sql = 'SELECT * from agent where agent_id > %s;'
data = (agentid,) # A single element tuple.
then use
cur.execute(sql,data)
Also, I am confused here to what you want to do with this code
for row in rows:``
print " ", row[9]
Do you want to print each row in rows or just the 8th index of rows, from
rows = cur.fetchall()
If you wanted that index, you could
print rows[9]

How to execute multi line sql in spark sql

How can I execute lengthy, multiline Hive Queries in Spark SQL? Like query below:
val sqlContext = new HiveContext (sc)
val result = sqlContext.sql ("
select ...
from ...
");

Use """ instead, so for example
val results = sqlContext.sql ("""
select ....
from ....
""");
or, if you want to format code, use:
val results = sqlContext.sql ("""
|select ....
|from ....
""".stripMargin);

You can use triple-quotes at the start/end of the SQL code or a backslash at the end of each line.
val results = sqlContext.sql ("""
create table enta.scd_fullfilled_entitlement as
select *
from my_table
""");
results = sqlContext.sql (" \
create table enta.scd_fullfilled_entitlement as \
select * \
from my_table \
")

val query = """(SELECT
a.AcctBranchName,
c.CustomerNum,
c.SourceCustomerId,
a.SourceAccountId,
a.AccountNum,
c.FullName,
c.LastName,
c.BirthDate,
a.Balance,
case when [RollOverStatus] = 'Y' then 'Yes' Else 'No' end as RollOverStatus
FROM
v_Account AS a left join v_Customer AS c
ON c.CustomerID = a.CustomerID AND c.Businessdate = a.Businessdate
WHERE
a.Category = 'Deposit' AND
c.Businessdate= '2018-11-28' AND
isnull(a.Classification,'N/A') IN ('Contractual Account','Non-Term Deposit','Term Deposit')
AND IsActive = 'Yes' ) tmp """

It is worth noting that the length is not the issue, just the writing. For this you can use """ as Gaweda suggested or simply use a string variable, e.g. by building it with string builder. For example:
val selectElements = Seq("a","b","c")
val builder = StringBuilder.newBuilder
builder.append("select ")
builder.append(selectElements.mkString(","))
builder.append(" where d<10")
val results = sqlContext.sql(builder.toString())

In addition to the above ways, you can use the below-mentioned way as well:
val results = sqlContext.sql("select .... " +
" from .... " +
" where .... " +
" group by ....
");

Write your sql inside triple quotes, like """ sql code """
df = spark.sql(f""" select * from table1 """)
This is same for Scala Spark and PySpark.

passing python variable to sql statement psycopg2 pandas

I am trying to replace a piece of sql code with a python variable that I will ask a user to generate using a raw_input.
Below is the code i'm using which works great if I set mypythonvariable manually i.e. inputting 344 into the sql code, but if I set the sql as is to mypythonvariable it doesn't work.
The whole sql query is then converted into a pandas dataframe for further messing about with.
Any help on how to do be appreciated.
UPDATE: I just added the %s code into the statement and i'm now getting the error message '': not all arguments converted during string formatting
'
conn = pg.connect(host = "localhost",
port = 1234,
dbname = "somename",
user = "user",
password = "pswd")
mypythonvariable = raw_input("What is your variable number? ")
sql = """
SELECT
somestuff
FROM
sometable
WHERE
something = %s
"""
df = pd.read_sql_query(sql, con=conn,params=mypythonvariable)

thanks to all that looked.
I found the solution.Looks like the params need to be passed as a list.
conn = pg.connect(host = "localhost",
port = 1234,
dbname = "somename",
user = "user",
password = "pswd")
mypythonvariable = raw_input("What is your variable number? ")
sql = """
SELECT
somestuff
FROM
sometable
WHERE
something = %s
"""
df = pd.read_sql_query(sql, con=conn,params=[mypythonvariable])

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Convert cursor into pyspark dataframe - pyspark

Related

Add quotation to each element of a list

pymongo to RMongo, mongodb sort query

passing numeric where parameter condition in postgress using python

How to execute multi line sql in spark sql

passing python variable to sql statement psycopg2 pandas

Categories

Resources