Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

DATAFLOW: Could not change log.info level

Hello again,

I've been trying all day to custom the level of the logging Python library in my Beam data pipeline. I've been sending a parameter DEBUG through the UI, and according to the value, set something like:

 
if DEBUG:
    log.setLevel(level=logging.DEBUG)
else:
    log.setLevel(level=logging.INFO)
 
At the end of the day, I realized Dataflow has the total control of logging, so I'm using log.info for everything and controlling it accordingly.
 
Has this happened to you?
 
I'm using Apache Beam Python 3.8 SDK 2.37.0
--
Best regards
David Regalado
Web | Linkedin | Twitter
0 2 1,786
2 REPLIES 2

Apache Beam SDK for Python uses the Python logging module, and indeed Google Cloud Dataflow does have some control over the logging level.

However, your approach to dynamically set the log level based on a parameter should generally work. Here are a few things you might want to check or consider:

  1. Configuration in the Dataflow UI: In the Google Cloud Dataflow UI, you can specify default worker log levels. Ensure that these are not set to a level that would override your settings (for example, they are not set to WARNING or ERROR which would suppress INFO and DEBUG logs).

  2. Logger Initialization: Make sure that you are setting the log level on the correct logger instance. The logging module in Python has a hierarchical structure of loggers. If you are setting the log level on a child logger, but the parent logger has a higher log level set, the messages from the child logger may not be displayed. You may want to try setting the log level on the root logger for testing purposes, using logging.getLogger().setLevel(...), and see if that changes the behavior.

  3. DEBUG Parameter: Ensure that the DEBUG parameter is being correctly passed and read in your pipeline code.

  4. Logging Behavior: Note that setting the logging level to DEBUG will include all logs at the DEBUG level and above (i.e., INFO, WARNING, ERROR, CRITICAL). Setting it to INFO will include INFO level and above. Make sure that your log messages are being logged at the appropriate level.

  5. Dataflow Runner: When running pipelines using the DataflowRunner, the worker logs are sent to Cloud Logging. You can view these logs in the Google Cloud Console.

You can set the `default_sdk_harness_log_level` flag to DEBUG when launching dataflow jobs.

I also suggest you take a deep dive into how logging works in python. This article is really useful: https://docs.python.org/3/howto/logging.html