Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Getting BQ error: Out of stack space due to deeply nested query expression during query resolution i

I am working on a project where I need to compute scores for each record in a table based on the weights of different features. I have generated the weights using logistic regression. I have also generated a query that does the score weighted computation for all features using weights generated by LR. There are around 500 features in total.

However, when I run the query, I get an error saying Out of stack space due to deeply nested query expression. I understand that this is because the query is complex as it compute weighted score for lot of feats in select statement. But I don't know how to simplify the query or avoid this error.

Is there any way to rewrite the query or use a different approach to achieve the same result without getting this error? Any help or suggestions would be appreciated.

Example query: CREATE TEMP FUNCTION LR(x float64) RETURNS FLOAT64 AS ( 1/(1+ EXP(case when x > 100 then 100 else x end)) ); SELECT Id, LR( -( 0.01 * feat_1 + 0.04 * feat_2 ......{for all features}) )

0 1 671
1 REPLY 1

You are hitting an error due to the complexity of the query. Your current approach is to compute the score in a single, large expression which BigQuery is unable to handle due to its depth and complexity.

One way to work around this is to break down the calculations into smaller parts using intermediate tables or views. Here's a rough outline of how you can do this:

  1. Compute the weighted score for a subset of features: Create multiple temporary tables or views where each one computes the weighted score for a subset of features.

 

CREATE OR REPLACE TEMPORARY TABLE temp_table_1 AS SELECT id, 0.01 * feat_1 + 0.04 * feat_2 + .... as partial_score FROM your_table;

You would repeat this for multiple subsets of features. For example, if you have 500 features, you might create 50 temporary tables each computing the weighted score for a different set of 10 features.

  1. Combine the partial scores: Once you have the temporary tables, you can then combine the partial scores to get the total score.

 

SELECT id, LR( - (SELECT SUM(partial_score) FROM ( SELECT partial_score FROM temp_table_1 UNION ALL SELECT partial_score FROM temp_table_2 UNION ALL ... SELECT partial_score FROM temp_table_50 ) )) FROM your_table;

This approach breaks down the computation into smaller parts, so it should avoid hitting the complexity limit.

Remember, this is a workaround that splits the computation across multiple queries. It might increase the overall running time, as it adds the overhead of creating and querying multiple temporary tables. However, given your current situation, this could be a viable solution.