Hi there.
There were quite a number of such warnings as the model was getting trained.
22/08/04 07:08:08 WARN DAGScheduler: Broadcasting large task binary with size 1139.5 KiB 22/08/04 07:08:09 WARN DAGScheduler: Broadcasting large task binary with size 2.2 MiB
May I know if we are safe to ignore them?
What does it mean actually?
Thanks in advance.
There is a limit of MB while broadcasting a task (10 MB), while using your VM that has enough resources if you don’t pass this limit its going to be OK, but if your VM has low resources this could create a timeout.
You can mitigate it by reducing the task size => reduce the data its handling
First, check number of partitions in dataframe via df.rdd.getNumPartitions() After, increase partitions: df.repartition(100)
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |