Hey,
I am calculating a moving average of [value] (n = 10) across a table. To reduce computation, I am using an app formula on a physical column, so [value_moving_avg] is calculated once during record entry and then saved.
I have a workflow which triggers on any update or delete which changes the [value]. The workflow identifies the following 9 records, and updates updates their [value_moving_avg] so the dataset remains accurate.
My moving average formula is shown below:
AVERAGE(
SELECT(
table[value],
IN(
[id],
TOP(ORDERBY(FILTER(table,[created_datetime]<[_THISROW].[created_datetime]),[created_datetime], TRUE),9)
)
)+LIST([value])
)
The identification of the rows to include in the moving average calculates rather quickly (the portion of the formula nested inside the IN()).
The challenge is, taking the referenced rows and turning them into [value] to be averaged. The IN() formula is very slow in this implementation. Now that my dataset has grown, the workflow rule hangs or fails entirely.
Any thoughts on how to optimize something like this?
@Aleksi please donโt say its not possible at this time
Solved! Go to Solution.
By the nature of the calculation, it will require multiple rows. One row for every row in the table, sort of as described here: https://www.essentialsql.com/sql-puzzle-calculate-moving-averages/
Maybe it will provide more clarity; here is what I am producing for my users:
The bigger problem is AppSheets operations for filtering, ordering, and selecting data from a list only returns a list of primary keys. There isnt a simple way to convert this list of primary keys back to a piece of data stored along with them. Its unfortunate that we canโt specifically return a column when using operators like ORDERBY(). Rarely am I wanting a list of primary keys, i usually want some associated value.
Iโve found a solution to streamline the calculation, although its clunky in implementation it works.
The solution requires two physical columns: one column is EnumList base type REF, and holds a list of the 10 related records I would like to average. Essentially, this part of the bigger formula:
TOP(ORDERBY(FILTER(table,[created_datetime]<[_THISROW].[created_datetime]),[created_datetime], TRUE),9)
Then, I have a second formula which looks up each associated value directly, rather than using IN(). Essentially,
AVERAGE(LIST(
[value],
LOOKUP(INDEX([_THISROW].[EnumListREF],1),โtableโ,โidโ,โvalueโ),
...,
LOOKUP(INDEX([_THISROW].[EnumListREF],9),โtableโ,โidโ,โvalueโ)
))
Doing the calculation this way has it performing an order of magnitude better. I still have a workflow rule to update associated records on updates or deletes.
I should also note, in case anyone uses this post in the future, that there are other options. Each has some benefits and downsidesโฆ to name two:
Cheers @Aleksi ! Would love to know if you have any other methods of doing this that may be better implented.
In generally speakingโฆ you want to calculate an average using top 10 values from the same table, correct?
That is the goal although old rows must be deletable or editable, and the dataset must adjust all related records dynamically.
A virtual column would do this of course but I have found it to be too slow.
How about if you make that virtual calculation in another table (only one row)?
By the nature of the calculation, it will require multiple rows. One row for every row in the table, sort of as described here: https://www.essentialsql.com/sql-puzzle-calculate-moving-averages/
Maybe it will provide more clarity; here is what I am producing for my users:
The bigger problem is AppSheets operations for filtering, ordering, and selecting data from a list only returns a list of primary keys. There isnt a simple way to convert this list of primary keys back to a piece of data stored along with them. Its unfortunate that we canโt specifically return a column when using operators like ORDERBY(). Rarely am I wanting a list of primary keys, i usually want some associated value.
Iโve found a solution to streamline the calculation, although its clunky in implementation it works.
The solution requires two physical columns: one column is EnumList base type REF, and holds a list of the 10 related records I would like to average. Essentially, this part of the bigger formula:
TOP(ORDERBY(FILTER(table,[created_datetime]<[_THISROW].[created_datetime]),[created_datetime], TRUE),9)
Then, I have a second formula which looks up each associated value directly, rather than using IN(). Essentially,
AVERAGE(LIST(
[value],
LOOKUP(INDEX([_THISROW].[EnumListREF],1),โtableโ,โidโ,โvalueโ),
...,
LOOKUP(INDEX([_THISROW].[EnumListREF],9),โtableโ,โidโ,โvalueโ)
))
Doing the calculation this way has it performing an order of magnitude better. I still have a workflow rule to update associated records on updates or deletes.
I should also note, in case anyone uses this post in the future, that there are other options. Each has some benefits and downsidesโฆ to name two:
Cheers @Aleksi ! Would love to know if you have any other methods of doing this that may be better implented.
How aboutโฆ
AVERAGE(SELECT(Table[Value],
AND(
[_ROWNUMBER]<=[_THISROW].[_ROWNUMBER],
[_ROWNUMBER]>=[_THISROW].[_ROWNUMBER]-10
)))
That would be sufficient in the simplest case, however:
User | Count |
---|---|
18 | |
14 | |
11 | |
7 | |
4 |