I want to get grouped aggregated data, but running into a problem with the aggregating same column with multiple functions.
Basically, I want to know if there is a way to rename an aggregated column, so it doesn't rewrite.
Here is my code
df = Daru::DataFrame.from_activerecord(active_record,
*%i[jobs.id jobs.demand_created_at jobs.quality_rating jobs.service_rating jobs.value_rating SC.name D.czso_region_id])
df.vectors = Daru::Index.new(%i[job_id demand_created_at quality_rating service_rating value_rating specific_category_name region_id])
# computed columns
df[:avg_rating] = ((df[:quality_rating] + df[:service_rating] + df[:value_rating]) / 3.0)
df[:broad_region_id] = df[:region_id].recode { |i| i[0...-1]}
df_grouped = df.group_by([:specific_category_name, :broad_region_id, :job_id])
df_grouped.aggregate(avg_rating: :mean, job_id: :count).aggregate(avg_rating: :mean, job_id: :count)
I'm having problem here:
df_grouped.aggregate(avg_rating: :mean, job_id: :count).aggregate(avg_rating: :mean, job_id: :count)
Basically, I want to write something like this (for example):
df_grouped.aggregate(avg_rating: :mean, avg_rating: :std)
However, this only generates one column named avg_rating and error
(irb):124: warning: key :avg_rating is duplicated and overwritten on line 124
Is there a way to rename aggregated column?
The only idea I have is to duplicate columns, but that seems like a very hacky solution.
Well, I finally found the answer here.
Agreggation of grouped data can be done like this:
df.group_by(:a).aggregate(
avg_d: ->(df) { df[:d].mean },
sum_c: ->(df) { df[:c].sum },
avg_of_c: ->(df) { df[:c].mean },
size_b_with_lambda: ->(grouped){ grouped[:b].size},
uniq_b_with_proc: proc {|grouped| grouped[:b].uniq.size }
)
which solves all my issues
Related
OrientDB Version: 3.2.6
I wish to create an OrientDB server-side function which zips two arrays into an object, but also format the values (which are always record ids) slightly.
Function name:
idToRidMap
Takes args:
keys - An array of strings representing the keys of the resulting object, Example: ["id1", "id2"]
values - An array of record ids, Example: [#15:1, #15:2]
Returns an object with the id array elements as keys and the record ids as string values without leading "#", i.e. from examples above it would be:
{
"id1": "15:1",
"id2": "15:2"
}
I wish to use this function in queries like this one:
SELECT idToRidMap(configIds, configRids) as configs, * FROM (
SELECT outE("HasConfig").id as configIds, out("HasConfig").#rid as configRids, * FROM 12:0
)
So, depending on the id of the linking Edges and the record-id of the linked Vertices I wish to build one property showing all those relations in the returned record:
{
"#rid": "#12:0",
"someNativeProp": "Hello",
"configs": {
"id1": "15:1",
"id2": "15:2"
},
...
}
Note though, that this would also require me to drop the projections for the intermediate array results as well, extending the query to be something like this:
SELECT idToRidMap(configIds, configRids) as configs, !configIds, !configRids, * FROM (
SELECT outE("HasConfig").id as configIds, out("HasConfig").#rid as configRids, * FROM 12:0
)
The OrientDB JS function definition I've tried (among many others) is:
var result = {};
for (i = 0; i < keys.length; i++) {
result[keys[i]] = String(values[i]).replace('#', '');
}
return result;
But I realized that length is not available (it is undefined) on the keys argument. When testing by using keys.size() (guessing it was java.util.arraylist) I was given an error:
com.orientechnologies.orient.core.command.script.OCommandScriptException: Error on parsing script at position #0: Error on execution of the script Script: idToRidMap ------^ DB name="test" --> javax.script.ScriptException: org.graalvm.polyglot.PolyglotException: TypeError: invokeMember (size) on java.util.ArrayList#1b395482 failed due to: Unknown identifier: size --> org.graalvm.polyglot.PolyglotException: TypeError: invokeMember (size) on java.util.ArrayList#1b395482 failed due to: Unknown identifier: size
Which seems to indicate that it has something to do with graalvm polyglot and that it indeed has to do with java.util.ArrayList. I did check https://www.graalvm.org/22.1/reference-manual/js/JavaInteroperability/#access-java-from-javascript but I'm not sure how relevant it is and I didn't find anything that helped me.
So, to sum up. Basically there's two parts to this question:
Is there any documentation of how the JavaScript server-side functions work type-wise and syntax-wise, etc? It seems really picky to what kind of JavaScript I write as well. How can I do the desired iterations to implement my function?
Is there a better way of achieving my end goal?
Thankful for any insight, I've always had a hard time with OrientDB custom functions.
I am trying to make a complex query in swift to get data from DynamoDB.
I am able to get all information by using the userID. However there are times that I may not know the entirety of the userID and need to make a more complex query.
For instance, if I know the first name and the last name, and the user id format is "firstname:lastname:email", I need to be able to query all userID's that include the first and last name, then add a where for another column.
I am very new to dynamo and want to accomplish something like the sql query below.
SQL example:
SELECT * FROM mytable
WHERE column2 LIKE '%OtherInformation%'
AND (column1 LIKE '%lastname%' OR column1 LIKE '%firstname%')
Here is the code I have in swift4 for getting the userID if I know it exaclty, not entirely sure how to modify this for complex queries.
func queryDBForUser(Fname: String, Lname: String) {
let userId = Fname + "." + Lname + ":" + (UIDevice.current.identifierForVendor?.uuidString)!
self.UserId = userId
let objectMapper = AWSDynamoDBObjectMapper.default()
let queryExpression = AWSDynamoDBQueryExpression()
queryExpression.keyConditionExpression = "#userId = :userId"
queryExpression.expressionAttributeNames = ["#userId": "userId",]
queryExpression.expressionAttributeValues = [":userId": userId,]
objectMapper.query(CheckaraUsers.self, expression: queryExpression, completionHandler: {(response: AWSDynamoDBPaginatedOutput? ,error: Error?) -> Void in
if let error = error {
print("Amazon DynamoDB Error: \(error)")
return
}
I have also tried many variations along the lines of the following code, with no luck:
queryExpression.keyConditionExpression = "#FirstName = :firstName and #LastName = :lastName,"
queryExpression.expressionAttributeNames = ["#FirstName": "FirstName" , "#LastName": "LastName"]
queryExpression.expressionAttributeValues = [":FirstName": Fname,":LastName": Lname]
Any help would be greatly appreciated, thanks in advance!
You won't be able to do this with a DynamoDB query. When you query a table (or index) in DynamoDB you must always specify the complete primary key. In your case that would mean the full value of "firstname:lastname:email".
You could sort of do this with a DynamoDB scan and a filter expression, but that will look at every item in your table, so it could be slow and expensive. Amazon will charge you for the read capacity necessary to look at every item in the table.
So if you really wanted to, the filter expression for the scan operation would be something like:
"contains (#FirstName, :firstName) and contains (#LastName, : lastName)"
Note that contains looks for an exact substring match, so if you want case insensitive matches (like ILIKE in SQL) it won't work.
If you need to do these types of queries then you need to evaluate whether or not DynamoDB is the right choice for you. DynamoDB is a NoSQL key/value store basically. It trades limited querying functionality for scalability and performance. If you are coming at DynamoDB from a SQL background and are expecting to be able to do freeform queries of anything in your table, you will be disappointed.
Got the query working by adding a secondary index to my DynamoDB table, although this is not what I initially wanted, it still works as now I can query for a value that exists in both columns I needed, without doing a table scan and filtering after.
query code:
queryExpression.indexName = "Index-Name" queryExpression.keyConditionExpression = "#LastName = :LastName and #otherValue = :otherValue"
queryExpression.expressionAttributeNames = ["#LastName": "LastName" , "#otherValue": "otherValue"]
queryExpression.expressionAttributeValues = [":LastName": Lname,":otherValue": self.otherValue!]
UPDATE:
I understood that the solution to my problem is doing subqueries, which apply a different filter each time, and they have a reduced result set. But I can't find a way to do that in MyBatis logic. Here is my query code
List<IstanzaMetadato> res = null;
SqlSession sqlSession = ConnectionFactory.getSqlSessionFactory().openSession(true);
try {
IstanzaMetadatoMapper mapper = sqlSession.getMapper(IstanzaMetadatoMapper.class);
IstanzaMetadatoExample example = new IstanzaMetadatoExample();
Iterator<Map.Entry<Integer, String>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer, String> entry = it.next();
example.createCriteria().andIdMetadatoEqualTo(entry.getKey()).andValoreEqualTo(entry.getValue());
}
example.setDistinct(true);
res = mapper.selectByExample(example);
I need to execute a new selectByExample but inside the while cycle, and it has to query the previus "SELECTED" results....
Is there a Solution ?
ORIGINAL QUESTION:
I have this table structure
I have to select rows from the table with different filters, specified by the final user.
Those filters are specified by a couple (id_metadato, valore), in example you can have id_metadato = 3 and valore = "pippo";
the user can specify 0-n filters from the web page typing 0-n values inside the search boxes which are based on id_metadato
Obviusly, the more filters the users specifies, the more restriction would have the final query.
In example if the user fills only the first search box, the query will have only a filter and would provide all the rows that will have the couple (id_metadato, valore) specified by the user.
If he uses two search boxes, than the query will have 2 filters, and it will provide all the rows that verify the first condition AND the second one, after the "first subquery" is done.
I need to do this dinamically, and in the best efficient way. I can't simply add AND clause to my query, they have to filter and reduce the result set every time.
I can't do 0-n subqueries (Select * from ... IN (select * from ....) ) efficiently.
Is there a more elegant way to do that ? I'm reading dynamic SQL queries tutorials with MyBatis, but I'm not sure that is the correct way. I'm still trying to figure out the logic of the resosultio, then I will try to implement with MyBatis.
Thanks for the answers
MyBatis simplified a lot this process of nesting subqueries, it was sufficient to concatenate the filter criterias and to add
the excerpt of the code is the following
try {
IstanzaMetadatoMapper mapper = sqlSession.getMapper(IstanzaMetadatoMapper.class);
IstanzaMetadatoExample example = new IstanzaMetadatoExample();
Iterator<Map.Entry<Integer, String>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer, String> entry = it.next();
if (listaIdUd.isEmpty()) {
example.createCriteria().andIdMetadatoEqualTo(entry.getKey()).andValoreEqualTo(entry.getValue());
example.setDistinct(true);
listaIdUd = mapper.selectDynamicNested(example);
continue;
}
example.clear();
example.createCriteria().andIdMetadatoEqualTo(entry.getKey()).andValoreEqualTo(entry.getValue()).andIdUdIn(listaIdUd);
example.setDistinct(true);
listaIdUd = mapper.selectDynamicNested(example);
}
Kind of a specific question but I wasn't sure how to approach it. I've got a list of rooms, that I am trying to group first by type, then by owner. I am doing this to check if there are duplicate rooms for a given owner and type (which shouldn't be possible so I need to prune them out). Right now my code looks like this:
IQueryable<IGrouping<Guid, Room>> allRoomsByOwner = _dbContext.Rooms.GroupBy(x => x.OwnerId);
List<Room> duplicates = new List<Room>();
foreach (IGrouping<Guid, Room> roomsByOwner in allRoomsByOwner)
{
IEnumerable<IGrouping<Guid, Room>> roomsOfOwnerByType = roomsByOwner.ToList().GroupBy(x => x.TypeId);
foreach (IGrouping<Guid, Room> grouping in roomsOfTypeByType)
{
if (grouping.Count() > 1)
{
duplicates.AddRange(grouping.ToList());
}
}
}
I'm just wondering if it's possible to put this all into one LINQ statement? I've got similar things before, but not quite this complex and not using two group bys. Thanks.
You can group by multiple columns ( OwnerId and TypeId) and flatten the groups with more than one elements (using the SelectMany method) to get the duplicates:
var duplicates = _dbContext.Rooms.GroupBy(x => new{x.OwnerId,x.TypeId})
.Where(g=>g.Count()>1)
.SelectMany(g=>g.Skip(1))// If you like you can skip the first element as representative of the group and the treat the rest as a duplicate.
.ToList();
I'm new to Couchbase and am struggling to get a composite index to do what I want it to. The use-case is this:
I have a set of "Enumerations" being stored as documents
Each has a "last_updated" field which -- as you may have guessed -- stores the last time that the field was updated
I want to be able to show only those enumerations which have been updated since some given date but still sort the list by the name of the enumeration
I've created a Couchbase View like this:
function (doc, meta) {
var time_array;
if (doc.doc_type === "enum") {
if (doc.last_updated) {
time_array = doc.last_updated.split(/[- :]/);
} else {
time_array = [0,0,0,0,0,0];
}
for(var i=0; i<time_array.length; i++) { time_array[i] = parseInt(time_array[i], 10); }
time_array.unshift(meta.id);
emit(time_array, null);
}
}
I have one record that doesn't have the last_updated field set and therefore has it's time fields are all set to zero. I thought as a first test I could filter out that result and I put in the following:
startkey = ["a",2012,0,0,0,0,0]
endkey = ["Z",2014,0,0,0,0,0]
While the list is sorted by the 'id' it isn't filtering anything! Can anyone tell me what I'm doing wrong? Is there a better composite view to achieve these results?
In couchbase when you query view by startkey - endkey you're unable to filter results by 2 or more properties. Couchbase has only one index, so it will filter your results only by first param. So your query will be identical to query with:
startkey = ["a"]
endkey = ["Z"]
Here is a link to complete answer by Filipe Manana why it can't be filtered by those dates.
Here is a quote from it:
For composite keys (arrays), elements are compared from left to right and comparison finishes as soon as a element is different from the corresponding element in the other key (same as what happens when comparing strings à la memcmp() or strcmp()).
So if you want to have a view that filters by date, date array should go first in composite key.