how can I read consecutive structures from a file, when they have different fields, and create for each of them the appropriate fields (title: value)? I am a beginner. I think it is about dynamic adding new fields while reading i-th structure and dynamic removing the fields from the i-1 structure, which remained empty after reading a structure i. But how am I able to do it not knowing the names of all the fields before? For this I couldn't find example in documentation nor in the forum.
Thanks!
If some fields appear in every object, have them in a common structure that your array has instances of. For the variables fields, make a field "variable" or something in the main structure, and then dynamically assign field names and values within that structure. So for example, your structure might be:
a.name = 'Name1';
a.value = 'Value1';
a.variable.price = 50;
b.name = 'Name2';
b.value = 'Value2';
b.variable.year = 1996;
data(1) = a; data(2) = b;
where every object has fields "name" and "price" and object a has a price field but not a year field, and object b has a year field and no price field.
This will work for the kind of data you want to read in.
Related
I have top level data 'whatever' comprising many (thousands) of records. Each record has data of the form
structA
structB
...
structN,
varA
varB
...
varN
each structure may also contain other structures and variables e.g. structAA, structAB... varAA, varAB... etc
I would like to query these data and return records meeting a given criteria, for example
SELECT structA.structBC.structNN FROM whatever WHERE structB.structBN.varNN = "value"
Which would return structNN from each record where varNN equals "value"
Databricks will allow me to do something like
SELECT structA FROM whatever as a WHERE a.varN = "value"
but I cant seem to find out if/how one does more levels between SELECT and FROM or after WHERE. I dont particularly want the variable I select-on to be in the structure I return either, so that further complicates things. I suspect the data is accessed as a table which may be part of the problem i.e. I imagine the records as objects but they are columns and rows in some sense. If I cant do this with an SQL query, what can I use in Databricks to achieve the desired result?
it's pretty simple.
First, let's create a data with underlying structure, for example:
data = [{\
'level1a': {\
'level2a': {\
'a':1,\
'b':2\
}\
}\
,'level1b': {\
'level2b': 'some text'
}\
}]
df = spark.createDataFrame(data)
df.createOrReplaceTempView("data_view")
Now, we can select any element of the structure (or the whole or part of the structure):
%sql
select level1a, level1a.level2a, level1a.level2a.a, level1a.level2a.b, level1b, level1b.level2b from data_view;
Here is the result:
EDIT:
The query with condition:
%sql
select level1a, level1a.level2a, level1a.level2a.a, level1a.level2a.b, level1b, level1b.level2b
from data_view
where level1a.level2a.a = 1;
This seems like it would be straightforward to do but I just can not figure it out. I have a query that returns an ARRAY of strings in one of the columns. I want that array to only contain unique strings. Here is my query:
SELECT
f."_id",
ARRAY[public.getdomain(f."linkUrl"), public.getdomain(f."sourceUrl")] AS file_domains,
public.getuniqdomains(s."originUrls", s."testUrls") AS source_domains
FROM
files f
LEFT JOIN
sources s
ON
s."_id" = f."sourceId"
Here's an example of a row from my return table
_id
file_domains
source_domains
2574873
{cityofmontclair.org,cityofmontclair.org}
{cityofmontclair.org}
I need file_domains to only contain unique values, IE a 'set' instead of a 'list'. Like this:
_id
file_domains
source_domains
2574873
{cityofmontclair.org}
{cityofmontclair.org}
Use a CASE expression:
CASE WHEN public.getdomain(f."linkUrl") = public.getdomain(f."sourceUrl")
THEN ARRAY[public.getdomain(f."linkUrl")]
ELSE ARRAY[public.getdomain(f."linkUrl"), public.getdomain(f."sourceUrl")]
END
I am currently getting a list of related field like so
List ( join_table::id_b;)'
and what i would like to do is filter that list by a second field in the same related table pseudo code as follows
List ( join_table::id_b;jointable:other="foo")
not really sure how to filter it down
The List() function will return a list of (non-empty) values from all related records.
To get a list filtered by a second field, you could do any one of the following:
Define a calculation field in the join table = If ( other = "foo" ; id_b ) and use this field in your List() function call instead of the id_b field;
Construct a relationship filtered by the other field;
Use the ExecuteSQL() function instead of List();
Write your own recursive custom function (requires the Advanced version to install).
I tried looking for an example of using Arel::UpdateManager to form an update statement with a from clause (as in UPDATE t SET t.itty = "b" FROM .... WHERE ...), couldn.t find any. The way I've seen it, Arel::UpdateManager sets the main engine on initialization and allows to set the various fields and values to update. Is there actually a way to do this?
Another aside would be to find out how to express Postgres posix regex matching into ARel, but this might be impossible by now.
As far as I see the current version of arel gem is not support FROM keyword for the sql query. You can generate a query using the SET, and WHERE keywords only, like:
UPDATE t SET t.itty = "b" WHERE ...
and the code, which copies a value from field2 to field1 for the units table, will be like:
relation = Unit.all
um = Arel::UpdateManager.new(relation.engine)
um.table(relation.table)
um.ast.wheres = relation.wheres.to_a
um.set(Arel::Nodes::SqlLiteral.new('field1 = "field2"'))
ActiveRecord::Base.connection.execute(um.to_sql)
Exactly you can use the additional method to update a relation. So we create the Arel's UpdateManager, assigning to it the table, where clause, and values to set. Values shell be passed to the method as an argument. Then we need to add FROM keyword to the generated SQL request, we add it only if we have access to external table of the specified one by the UPDATE clause itself. And at the last we executes the query. So we get:
def update_relation!(relation, values)
um = Arel::UpdateManager.new(relation.engine)
um.table(relation.table)
um.ast.wheres = relation.wheres.to_a
um.set(values)
sql = um.to_sql
# appends FROM field to the query if needed
m = sql.match(/WHERE/)
tables = relation.arel.source.to_a.select {|v| v.class == Arel::Table }.map(&:name).uniq
tables.shift
sql.insert(m.begin(0), "FROM #{tables.join(",")} ") if m && !tables.empty?
# executes the query
ActiveRecord::Base.connection.execute(sql)
end
The you can issue the the relation update as:
values = Arel::Nodes::SqlLiteral.new('field1 = "field2", field2 = NULL')
relation = Unit.not_rejected.where(Unit.arel_table[:field2].not_eq(nil))
update_relation!(relation, values)
I am using mongo's shell and want to do what is basically equivalent to "SQL's select col INTO var" and then use the value of var to look up other rows in the same table or others (Joins). For example, in PL/SQL I will declare a variable called V_Dno. I also have a table called Emp(EID, Name, Sal, Dno). I can access the value of Dno for employee 100 as, "Select Dno into V_Dno from Emp where EID = 100). In MongoDB, when I find the needed employee (using its _id), I end up with a document and not a value (a field). In a sense, I get equivalent to the entire row in SQL and not just a column. I am doing the following to find the given emp:
VAR V_Dno = db.emp.find ({Eid : 100}, {Dno : 1});
The reason I want to do this to traverse from one document into the other using the value of a field. I know I can do it using the DBRef, but I wanted to see if I could tie documents together using this method.
Can someone please shed some light on this?
Thanks.
find returns a cursor that lets you iterate over the matching documents. In this case you'd want to use findOne instead as it directly returns the first matching doc, and then use dot notation to access the single field.
var V_Dno = db.emp.findOne({Eid : 100}, {Dno : 1}).Dno;
Using your query as a starting point:
var vdno = db.emp.findOne({Eid: 100, Dno :1})
This returns a document from the emp collection where the Eid = 100 and the Dno = 1. Now that I have this document in the vdno variable I can "join" it to another collection. Lets say you have a Department collection, a document in the department collection has a manual reference to the _id field in the emp collection. You can use the following to filter results from the department collection based on the value in your variable.
db.department.find({"employee._id":vdno._id})