Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

log4j vulnerability (CVE-2021-44228) impact on DataFlow

Hello,

I've read this bulletin about the vulnerability:

https://cloud.google.com/log4j2-security-advisory?hl=en

But no info about DataFlow is available there.

I would like to know whether the DataFlow SDK (namely Apache Beam and related stack) uses a fixed log4j version (>= 2.15) or whether there are plans to do so. Also, whether they are any action items on DataFlow users' side.

We're currently using DataFlow SDK 2.34.0, its pom.xml file shows dependency on slf4j 1.7.25. I can't find any explicit log4j dependencies, but it doesn't mean the runner doesn't load these classes dynamically through slf4j via some configuration settings inside the runner...

Thanks

Solved Solved
0 6 6,959
1 ACCEPTED SOLUTION


Thank you for clarifying. 

As indicated in the reference links, the vulnerability is observed specifically for versions  2.14.1 or below. So, as you are using a newer version, you should not be affected by the vulnerability. 

For Apache Beam and Dataflow, based on our investigation so far they are not impacted. There are 3 reasons for that:

- For Beam versions 2.32.0 and older, Beam has no public facing dependencies on log4j. And remaining tests dependencies were removed today. 
- Dataflow workers does not carry this dependency by default.

- And Dataflow VMs have a JRE version that is not impacted by this vulnerability 

View solution in original post

6 REPLIES 6

I hope I understand your question clearly. I understand you are asking about if the DataFlow SDK uses a fixed log4j version in relation to the issue reported[0]. Please let me know if my understanding is wrong. 

As can be seen in your reference link[0], there was a recent security vulnerability, CVE-2021-4428[1] that has been disclosed in the Apache Log4j versions 2.0 to 2.14.1[1], although the impact on Dataflow is unknown at this time but we have reasons to think there is no impact on Dataflow with Apache Beam. 

Nonetheless, the recommendation at this time is to always use the latest verion[2]

[0]https://cloud.google.com/log4j2-security-advisory

[1]https://nvd.nist.gov/vuln/detail/CVE-2021-44228

[2]https://logging.apache.org/log4j/2.x/download.html

 

Thanks you, let me clarify: when I say "fixed", I mean "doesn't have the vulnerability", i.e. the problem is "fixed". I do not mean "fixed version" as a "constant version". And yes, I am specifically referring to the CVE you had mentioned.

Therefore, if I understand correctly, the status of DataFlow regarding said CVE is still unknown? Note DataFlow logging docs mention log4j:

https://cloud.google.com/dataflow/docs/guides/logging


Thank you for clarifying. 

As indicated in the reference links, the vulnerability is observed specifically for versions  2.14.1 or below. So, as you are using a newer version, you should not be affected by the vulnerability. 

For Apache Beam and Dataflow, based on our investigation so far they are not impacted. There are 3 reasons for that:

- For Beam versions 2.32.0 and older, Beam has no public facing dependencies on log4j. And remaining tests dependencies were removed today. 
- Dataflow workers does not carry this dependency by default.

- And Dataflow VMs have a JRE version that is not impacted by this vulnerability 

FYI I got this email from Google a few minutes ago, I could not find an online page as reference so I'm posting it here:

 

Dear Google Cloud customer

Google Cloud is actively following the security vulnerability in the open-source Apache “Log4j 2" utility (CVE-2021-44228). We are currently assessing the potential impact of the vulnerability for Google Cloud products and services. This is an ongoing event and we will continue to provide updates through our customer communications channels.

A security vulnerability, CVE-2021-44228, has been disclosed in the Apache Log4j versions 2.0 to 2.14.1 and Dataflow users may be vulnerable to Log4j 2 under certain circumstances. Specifically, users that meet the following criteria should take immediate action:

Use Apache Beam version 2.31.0 or older version
Include a vulnerable version of Log4j either directly or indirectly in the Apache Beam pipeline. Users can identify if they are using an impacted Log4j 2 version in their Dataflow pipeline by inspecting the classpath or by inspecting the filesToStage pipeline option if they are not using an uber jar.
Log input. (Apache Beam, by default, does not log user provided input but users can change this behavior)
Immediate Action

We strongly recommend the following actions.

Users using Apache Beam version 2.31.0 or older should update all Dataflow pipelines to Apache Beam version 2.32.0 or newer. These Apache Beam versions do not have any direct dependencies on Log4j 2.
All users should update direct and indirect dependencies (if any) on Log4j 2 to version 2.15.0 or later by updating your build configuration.
Additional Notes

Cloud Dataflow workers do not carry the Log4j 2 dependency.
Apache Beam versions 2.32.0 or later do not have public facing dependencies on Log4j 2.
Cloud Dataflow Templates base image does not have a Log4j 2 dependency. Google provided templates do not have a dependency on the impacted Log4j 2 versions
Apache Beam, by default, does not log user provided input but users can change this behavior. Note that users might still be impacted if user code, a dependency, or a transitive dependency is using an impacted Log4j 2 dependency AND user code logs user provided and/or untrusted input.
Apache Beam test environment (not available to Apache Beam users) has been updated to the latest version of Log4j on Dec 10, 2021.
Background

The Apache Log4j utility is a commonly used component for logging requests. On December 9, 2021, a vulnerability was reported that could allow a system running Apache Log4j version 2.14.1 or below to be compromised and allow an attacker to execute arbitrary code.

On December 10, 2021, NIST published a critical Common Vulnerabilities and Exposure alert, CVE-2021-44228. More specifically, Java Naming Directory Interface (JNDI) features used in configuration, log messages, and parameters do not protect against attacker controlled LDAP and other JNDI related endpoints. An attacker who can control log messages or log message parameters can execute arbitrary code loaded from remote servers when message lookup substitution is enabled.

Hi folks, 

We are using Dataproc. Whats the recommended fix? 
Also we use pyspark. Do we need to upgrade it as well?

From https://cloud.google.com/log4j2-security-advisory:

Dataproc:  Dataproc released new images on December 12, 2021 to address the vulnerability in CVE-2021-44228. Customers must follow Dataproc documentation to take advantage of the mitigation.