I am using json data file “order_items” and data looks like
{“order_item_id”:1,“order_item_order_id”:1,“order_item_product_id”:957,“order_item_quantity”:1,“order_item_subtotal”:299.98,“order_item_product_price”:299.98}
{“order_item_id”:2,“order_item_order_id”:2,“order_item_product_id”:1073,“order_item_quantity”:1,“order_item_subtotal”:199.99,“order_item_product_price”:199.99}
{“order_item_id”:3,“order_item_order_id”:2,“order_item_product_id”:502,“order_item_quantity”:5,“order_item_subtotal”:250.0,“order_item_product_price”:50.0}
{“order_item_id”:4,“order_item_order_id”:2,“order_item_product_id”:403,“order_item_quantity”:1,“order_item_subtotal”:129.99,“order_item_product_price”:129.99}
orders = spark.read.json("/user/data/retail_db_json/order_items")
I am getting a error while run following command .
orders.where("order_item_order_id in( 2,4,5,6,7,8,9,10) ").groupby(“order_item_order_id”).agg(sum(“order_item_subtotal”),count()).orderBy(“order_item_order_id”).show()
TypeError: unsupported operand type(s) for +: ‘int’ and 'str’
I am not sure why I am getting ...All column vales are string. Any suggestion ?
Cast the column to int type. Can’t apply aggregation methods on string types.
Related
I'm using python 3.8 and psycopg2
I'm trying to insert a registry in the database.
I have a function that formats a query and send as result a list with 2 values, one is the query and the other the values.
I made a test and put a fixed value with the exact value of the result list query[1] and worked without error, but when I use the query[1] as values instead the value by itself I got this error:
TypeError: not all arguments converted during string formatting
At my log I have these values for the query list, result of my query construction function.
['INSERT INTO country (code, name, flag, update_time) VALUES(%s,%s,%s,%s)', "('US', 'USA', 'https://example.com/flags/us.svg', 1596551810)"]
query[0]
INSERT INTO country (code, name, flag, update_time) VALUES(%s,%s,%s,%s)
query[1]
('US', 'USA', 'https://example.com/flags/us.svg', 1596551810)
This is the code snipet
`
cursor = connection.cursor()
query_insert = query[0]
query_values = tuple(query[1])
cursor.execute(query_insert,(query_values))
I tried to put it as tuple, use parentheses, but error persists.
If I put the value of the query[1] at my code,as values, work well, so I suppose that the error is at the values part of the cursor.execute parameters.
Any help is welcome !
I would like to get the result of a query using rowMode="array" (as this is a potentially very large table and I don't want it formatted to object format) but I couldn't figure out how to pass in a array/list parameter for use in an IN operator.
const events = await t.manyOrNone({text: `select * from svc.events where user_id in ($1:list);`, rowMode: "array"}, [[1,2]]);
However, the above gives an error: syntax error at or near ":"
Removing the :list did not work either:
const events = await t.manyOrNone({text: `select * from svc.events where user_id in ($1);`, rowMode: "array"}, [[1,2]]);
Error: invalid input syntax for integer: "{"1","2"}"
I understand that this might be because I'm forced to use ParameterizedQuery format for rowMode="array" which does not allow those snazzy modifiers like :list, but this then leads to the question, if I were to use ParameterizedQuery format, then how do I natively pass in a Javascript array so that it is acceptable to the driver?
I guess an alternative formulation to this question is: how do I use arrays as parameters for ParameterizedQuery or PreparedStatements...
Answering my own question as I eventually found an answer to this issue: how to pass in arrays as params for use in the IN operator when using rowMode="array" | ParameterizedQuery | PreparedStatements.
Because this query is being parameterized in the server, we cannot use the IN operator, because the IN operator parameterize items using IN ($1, $2, $3...). Instead we need to use the ANY operator, where ANY($1) where for $1 an array is expected.
So the query that will work is:
const events = await t.manyOrNone({text: `select * from svc.events where user_id=ANY($1);`, rowMode: "array"}, [[1,2]]);
I'm trying to apply pos tagging on one of my tokenized column called "removed" in pyspark dataframe.
I'm trying with
nltk.pos_tag(df_removed.select("removed"))
But all I get is Value Error: ValueError: Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column.
How can I make it?
It seems the answer is in the error message: the input of pos_tag should be a string and you provide a column input. You should apply pos_tag on each row of you column, using the function withColumn
For example you start by writing:
my_new_df = df_removed.withColumn("removed", nltk.pos_tag(df_removed.removed))
You can do also :
my_new_df = df_removed.select("removed").rdd.map(lambda x: nltk.pos_tag(x)).toDF()
Here you have the documentation.
I get this error when querying with a json column:
(psycopg2.ProgrammingError) operator does not exist: json = text
The column is defined as JSON with SQLAlchemy:
json_data = db.Column(db.JSON, nullable=False)
How do you compare with Postgres?
There is no equality (or inequality) operator for the data type json. If you need to test the value as a whole, you might cast to jsonb:
... WHERE json_data::jsonb = jsonb '{}';
Or cast to text for simple cases:
... WHERE json_data::text = '{}';
But there are many valid text representations for the same json value - which is the reason why Postgres does not implement equality / inequality operators for the type.
See:
How to query a json column for empty objects?
I'm using Q.f to format column fields from integer to float with 4 digits precision:
fmt_price:{[val] .Q.f[4;](val*0.0001)}
select fmt_price[price] from mytable
The fmt_price works well at the q prompt, but if I embed the function in a query I get this error:
An error occurred during execution of the query. The server sent the
response: `type
The fmt_price call works if I return a float or integer variable, rather than the result of Q.f.
You need to do an each over the list. Currently you are passing a list of values to .Q.f, when it expects an atom. Something like the following is what you need:
fmt_price:{[val] .Q.f[4;] each (val*0.0001)}