Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

WARN DAGScheduler: Broadcasting large task binary with size 2.2 MiB

Hi there.
There were quite a number of such warnings as the model was getting trained.

22/08/04 07:08:08 WARN DAGScheduler: Broadcasting large task binary with size 1139.5 KiB
22/08/04 07:08:09 WARN DAGScheduler: Broadcasting large task binary with size 2.2 MiB

May I know if we are safe to ignore them?
What does it mean actually?
Thanks in advance.

0 1 5,670
1 REPLY 1

There is a limit of MB while broadcasting a task (10 MB), while using your VM that has enough resources if you don’t pass this limit its going to be OK, but if your VM has low resources this could create a timeout.

You can mitigate it by reducing the task size => reduce the data its handling

First, check number of partitions in dataframe via df.rdd.getNumPartitions() After, increase partitions: df.repartition(100)