Is there a way to expand dynamically tables found in multiple columns using Power Query? - merge

I have used the List.Accumulate() to merge mutliple tables. This is the output I've got in this simple example:
Now, I need a solution to expand all these with a formula, because in real - world I need to merge multiple tables that keep increasing in number (think Eurostat tables, for instance), and modifying the code manually wastes much time in these situations.
I have been trying to solve it, but it seems to me that the complexity of syntax easily becomes the major limitation here. For instance, If I make a new step where I nest in another List.Accumulate() the Table.ExpandTableColumns(), I need to pass inside a column name of an inner table, as a text. Fine, but to drill it down actually, I first need to pass a current column name in [] in each iteration - for instance, Column 1 - and it triggers an error if I store column names to a list because these are between "". I also experimented with TransformColumns() but didn't work either.
Does anyone know how to solve this problem whatever the approach?

See https://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/
which boils down to this function
let Source = (TableToExpand as table, optional ColumnNumber as number) =>
//https://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/
let ActualColumnNumber = if (ColumnNumber=null) then 0 else ColumnNumber,
ColumnName = Table.ColumnNames(TableToExpand){ActualColumnNumber},
ColumnContents = Table.Column(TableToExpand, ColumnName),
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(ColumnContents, each if _ is table then Table.ColumnNames(_) else {}))),
NewColumnNames = List.Transform(ColumnsToExpand, each ColumnName & "." & _),
CanExpandCurrentColumn = List.Count(ColumnsToExpand)>0,
ExpandedTable = if CanExpandCurrentColumn then Table.ExpandTableColumn(TableToExpand, ColumnName, ColumnsToExpand, NewColumnNames) else TableToExpand,
NextColumnNumber = if CanExpandCurrentColumn then ActualColumnNumber else ActualColumnNumber+1,
OutputTable = if NextColumnNumber>(Table.ColumnCount(ExpandedTable)-1) then ExpandedTable else ExpandAll(ExpandedTable, NextColumnNumber)
in OutputTable
in Source
alternatively, unpivot all the table columns to get one column, then expand that value column
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(Table.Column(#"PriorStepNameHere", "ValueColumnNameHere"), each if _ is table then Table.ColumnNames(_) else {}))),
#"Expanded ColumnNameHere" = Table.ExpandTableColumn(#"PriorStepNameHere", "ValueColumnNameHere",ColumnsToExpand ,ColumnsToExpand ),

Related

Creating a For loop that iterates through all the numbers in a column of a table in Matlab

I am a new user of MatlabR2021b and I have a table where the last column (with name loadings) spans multiple sub-columns (all sub-columns were added under the same variable/column and are threated as one column). I wanto to create a For loop that goes through each separate loading column and iterates through them, prior to creating a tbl that I will input into a model. The sub-columns contain numbers with rows corresponding to the number of participants.
Previously, I had a similar analogy where the loop was iterating through the names of different regions of interest, whereas now the loop has to iterate through columns that have numbers in them. First, the numbers in the first sub-column, then in the second, and so on.
I am not sure whether I should split the last column with T1 = splitvars(T1, 'loadings') first or whether I am not indexing into the table correctly or performing the right transformations. I would appreciate any help.
roi.ic = T.loadings;
roinames = roi.ic(:,1);
roinames = [num2str(roinames)];
for iroi = 1:numel(roinames)
f_roiname = roinames{iroi};
tbl = T1;
tbl.(roinames) = T1.loadings(:,roiname);
**tbl.(roinames) = T1.loadings_rsfa(:,roiname)
Unable to use a value of type cell as an index.
Error in tabular/dotParenReference (line 120)
b = b(rowIndices,colIndices)**

How do I get unique values of one column based on another column using the insert database query in Anylogic?

How do I get unique values of one column based on another column using the query?
I tried using
(double)selectFrom(tasks).where(tasks.tasks_type.eq()).uniqueResult(tasks.task_cycle_time_hr);
I want to automate this and make sure that all the values of task_type are being read and a unique value for each of the tasks_type is being returned!
For all the values in the column task_type, I require a unique value from the column task_cycle_time_hr.
I don't really understand why you're trying to do this in one query.
If you want to get the cycle time (task_cycle_time_hr column) for each task type (tasks_type column), just do queries in a loop for each possible tasks_type value. If you don't know those a priori, do queries for each value returned by a query of the task type values, which would look something like
for (String taskType : selectFrom(tasks).list(tasks.tasks_type)) {
double cycleTime = (double) selectFrom(tasks)
.where(db_table.tasks_type.eq(taskType))
.firstResult(tasks.task_cycle_time_hr);
traceln("Task type " + taskType + ", cycle time " + cycleTime);
}
But this just amounts to querying all rows and reading the task type and cycle time values from each, so you wouldn't normally do it like this: you'd just have a single query looping through all the full rows instead...
List<Tuple> rows = selectFrom(tasks).list();
for (Tuple row : rows) {
traceln("Task type " +
row.get(tasks.tasks_type) + ", cycle time " +
row.get(tasks.task_cycle_time_hr));
}
NB: I assume you don't have any rows with duplicate task types because then the whole exercise doesn't make sense unless you want only the first row for each task type value, or want some kind of aggregate (e.g., sum) of the cycle time values for each given task type. You were trying to use uniqueResult, which may mean you want to get a value if there is exactly one row (for a given task type) and 'no result otherwise', but uniqueResult throws an exception (errors) if there isn't exactly one row (so you can't use that directly like that). In that case one way (there are others, some probably slightly better) would be to do a count first to check; e.g. something like
for (String taskType : selectFrom(tasks).list(tasks.tasks_type)) {
int rowCount = (int) selectFrom(tasks)
.where(db_table.task.eq(taskType))
.count();
if (rowCount == 1) {
double cycleTime = (double) selectFrom(tasks)
.where(db_table.tasks_type.eq(taskType))
.firstResult(tasks.task_cycle_time_hr);
traceln("Task type " + taskType + ", unique cycle time " + cycleTime);
}
}
Import your excel sheet into the AnyLogi internal DB and then make use of the DB wizard that will take you step by step to write the code to retrieve the data you want
(double) selectFrom(data)
.where(data.tasks.eq("T1"))
.firstResult(data.task_cycle_time_hr)

Why am I getting an error that I cannot concat two different datatypes even after casting the fields datatype

I have a query in postgresql where I want to append a minus sign to the transactions.amount field when the transaction.type = 2 (which refers to withdrawals). I am trying to concat a minus sign and the transactions.amount field which is an int. I casted the transactions.amount field to a text/varchar but no matter what I still get the error, "PostgreSql Error: case types numeric and text cannot be matched"
Here is the query I am running,
SELECT CAST(CASE WHEN "IsVoided" IS TRUE THEN 0
WHEN "Transactions"."TransactionType" = 2
THEN CONCAT('-', CAST("Transactions"."Amount" AS TEXT))
ELSE "Transactions"."Amount" END AS Text) AS "TransAmount"
FROM "Transactions"
LEFT JOIN "DepositSources"
ON "Transactions"."DepositSourceId" =
"DepositSources"."DepositSourceId"
LEFT JOIN "WithdrawalSources"
ON "Transactions"."WithdrawalSourceId" =
"DepositSources"."DepositSourceId"
WHERE "Transactions"."FundId" = 4
AND "Transactions"."ReconciliationId" = 24
What's very perplexing is when i run the below query it works as expected,
SELECT CONCAT('-', CAST("Transactions"."Amount" AS TEXT)) FROM
"Transactions"
All branches of a CASE expression need to have the same type. In this case, you're stuck with making all branches text, because what follows THEN can only be text. Try this version:
CASE WHEN IsVoided IS TRUE
THEN '0'
WHEN Transactions.TransactionType = 2
THEN CONCAT('-', Transactions.Amount::text)
ELSE Transactions.Amount::text END AS TransAmount
Note that it is unusual to be using the logic you have in a CASE expression. Typically, you would just be checking the values of a single column, not multiple different columns.
Edit:
It appears that your call to CONCAT mainly serves to negative a value. Here is one more simple way to do this:
CASE WHEN IsVoided IS TRUE
THEN 0
WHEN Transactions.TransactionType = 2
THEN -1.0 * Transactions.Amount
ELSE Transactions.Amount END AS TransAmount
In this case, we can make the CASE expression just generate numeric output, which might be really what you are after.

Replace rows based on a modified timestamp

I am looking for an efficient method (which I can reuse for similar situations) to drop rows which have been updated.
My table has many columns, but the important ones are:
creation_timestamp, id, last_modified_timestamp
My primary key is the creation_timestamp and the id. However, after and id has been created, it can be modified by other users which is indicated by the last_modified_timestamp.
1) Read a daily file and add any new rows (based on creation_timestamp and id)
2) Remove old rows which have a different last_modified_timestamp and replace them with the latest versions.
I typically do most of my operations with Pandas (python library) and pyscopg2, so I am not extremely familiar with PostgreSQL 9.6 which is the database I am using. My initial approach is to just add the last_modified_timestamp to the primary key, and then just use a view to SELECT DISTINCT based on the latest changes. However, it seems like that is 'cheating' and I will be wasting space since I do not need to retain previous versions.
EDIT:
def create_update_query(df, table=FACT_TABLE):
columns = ', '.join([f'{col}' for col in DATABASE_COLUMNS])
constraint = ', '.join([f'{col}' for col in PRIMARY_KEY])
placeholder = ', '.join([f'%({col})s' for col in DATABASE_COLUMNS])
updates = ', '.join([f'{col} = EXCLUDED.{col}' for col in DATABASE_COLUMNS])
query = f"""
INSERT INTO {table} ({columns})
VALUES ({placeholder})
ON CONFLICT ({constraint})
DO UPDATE SET {updates};"""
query.split()
query = ' '.join(query.split())
return query
def load_updates(df, connection=DATABASE):
conn = connection.get_conn()
cursor = conn.cursor()
df1 = df.where((pd.notnull(df)), None)
insert_values = df1.to_dict(orient='records')
for row in insert_values:
cursor.execute(create_update_query(df), row)
conn.commit()
cursor.close()
del cursor
conn.close()
This appears to work. I was running into some issues, so right now i am looping through each row of the DataFrame as a dictionary, then inserting that row. Also, I had to figure out a way to fill in the nan columns with None, because I was getting errors with Timestamp dtypes with blank values, etc.

AREL: writing complex update statements with from clause

I tried looking for an example of using Arel::UpdateManager to form an update statement with a from clause (as in UPDATE t SET t.itty = "b" FROM .... WHERE ...), couldn.t find any. The way I've seen it, Arel::UpdateManager sets the main engine on initialization and allows to set the various fields and values to update. Is there actually a way to do this?
Another aside would be to find out how to express Postgres posix regex matching into ARel, but this might be impossible by now.
As far as I see the current version of arel gem is not support FROM keyword for the sql query. You can generate a query using the SET, and WHERE keywords only, like:
UPDATE t SET t.itty = "b" WHERE ...
and the code, which copies a value from field2 to field1 for the units table, will be like:
relation = Unit.all
um = Arel::UpdateManager.new(relation.engine)
um.table(relation.table)
um.ast.wheres = relation.wheres.to_a
um.set(Arel::Nodes::SqlLiteral.new('field1 = "field2"'))
ActiveRecord::Base.connection.execute(um.to_sql)
Exactly you can use the additional method to update a relation. So we create the Arel's UpdateManager, assigning to it the table, where clause, and values to set. Values shell be passed to the method as an argument. Then we need to add FROM keyword to the generated SQL request, we add it only if we have access to external table of the specified one by the UPDATE clause itself. And at the last we executes the query. So we get:
def update_relation!(relation, values)
um = Arel::UpdateManager.new(relation.engine)
um.table(relation.table)
um.ast.wheres = relation.wheres.to_a
um.set(values)
sql = um.to_sql
# appends FROM field to the query if needed
m = sql.match(/WHERE/)
tables = relation.arel.source.to_a.select {|v| v.class == Arel::Table }.map(&:name).uniq
tables.shift
sql.insert(m.begin(0), "FROM #{tables.join(",")} ") if m && !tables.empty?
# executes the query
ActiveRecord::Base.connection.execute(sql)
end
The you can issue the the relation update as:
values = Arel::Nodes::SqlLiteral.new('field1 = "field2", field2 = NULL')
relation = Unit.not_rejected.where(Unit.arel_table[:field2].not_eq(nil))
update_relation!(relation, values)