Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Using spark 2.0, I want to process the Full MovieLens Dataset.
my dataframe contains information about movies :
val moviesDF = spark.read.format("csv").option("delimiter",",")
.option("header","true").option("inferSchema", "true")
.load("/path/to/movies/")
How to select the movies such as the value for the column "tagline" contains the substring "comedy"?
You can filter it:
val comediesDF = moviesDF.filter("tagline like '%comedy%'")
Then you can show the content:
comediesDF.show(false)
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question 8 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm able to write pytest functions by manually giving column names and values to create data frame and passing it to the production code to check all the transformed fields values in palantir foundry code repository.
Instead of manually passing column names and their respective values I want to store all the required data in dataset and import that dataset into pytest function to fetch all the required values and passing over to the production code to check all the transformed field values.
Is there anyways to accept dataset as an input the test function in planatir code repository.
You can probably do something like this:
Lets say you have your csv inside a fixtures/ folder next to your test.
test_yourtest.py
fixtures/yourfilename.csv
You can just read it directly and pass it to create a new dataframe. I didn't test this code but it should be something similar to this:
def load_file(spark_context):
filename = "yourfilename.csv"
file_path = os.path.join(Path(__file__).parent, "fixtures", filename)
return open(file_path).read()
Now you can load your CSV, it's just a matter of loading it into a dataframe and passing it into your pyspark logic that you want to test. Get CSV to Spark dataframe
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
select col1, col2 from table1 where col1 > 10
What will be query in MongoDB ?
The query should be like :
db.table1.find({col1:{$gt:10}},{col1:1,col2:1})
For more information read this document : db.collection.find()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
As the question clearly indicates, what is actually a Schema in PostgreSQL that I see in the top level of the hierarchy in pgAdmin(III)?
Ok, I'm answering my own question just to clarify any other people (who do not have time to read docs or want a more simplified version):
You can think of a Schema as a namespace/package (just like in Java or C++). For example, let us assume mydb is the name of our database, A and B is the name of two different schemas which are present in the same database (mdb).
Now, we can use the same table name in two different schemas in the same single database:
mydb -> A -> myTable
mydb -> B -> myTable
Hope, that clarifies your answer. For more detail: PostgreSQL 9.3.1 Documentation - 5.7. Schemas
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
"SQL0204N "FUSIONDBUNIT.ACQUIREDRUN" is an undefined name. SQLSTATE=42704
The table is actually AcquireRun and not ACQUIREDRUN
Following line throws the exception
pRecordSet->Open(CRecordset::dynaset, NULL,CRecordset::readOnly | CRecordset::skipDeletedRecords)
DB2 table names are not case-sensitive unless you define them with double-quotes around the name, e.g. CREATE TABLE "MySchema"."MyTable" (...) will only work if you do:
SELECT *
FROM "MySchema"."MyTable"
I won't work even if you do SELECT * FROM MySchema.MyTable because DB2 automatically folds identifiers to upper-case, unless you quote them.
However, as noted by #sl0ppy, it looks like you might have a typo, AcquireRun vs. ACQUIREDRUN (no D).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
What RDBMS command is used to tell what user has what permissions on a particular object?
That depends on the database system you use. In Oracle, you can find out a lot by
select * from all_tab_privs;
Heres how to do it in sql server 2005
select dp.NAME AS principal_name,
dp.type_desc AS principal_type_desc,
o.NAME AS object_name,
p.permission_name,
p.state_desc AS permission_state_desc
sys.database_permissions p
OUTER JOIN sys.all_objects o
on p.major_id = o.OBJECT_ID
inner JOIN sys.database_principals dp
on p.grantee_principal_id = dp.principal_id