Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataflow - No such file or directory: 'ffprobe': 'ffprobe'

Hello!

I am using a Python library called Pydub to work with audio. It works very well in Colab Enterprise, but when I try to run it in a Dataflow job, I get the following error:

No such file or directory: 'ffprobe': 'ffprobe'

After searching on the internet and in the issues on the official repository of this library (here is the link: https://github.com/jiaaro/pydub/issues?page=3&q=not+found), I saw that the recommended solution is to add /usr/bin/ffprobe to a PATH variable.

Given that the Dataflow flex template works with a Dockerfile, I am adding the ffprobe path to the PATH environment variable in the Dockerfile, at build time. However, I still get the same error message.

What else can I do to fix this error?

This is my Dockerfile:

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

ARG PYTHON_PY_FILE=insights_interpreter.py

COPY . .

ENV PYTHONPATH ${WORKDIR}

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/${PYTHON_PY_FILE}"
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"

RUN apt-get update \
&& apt-get install ffmpeg libavcodec-extra libav-tools -y \
&& pip install --upgrade pip \
&& pip install google-cloud-texttospeech pydub \
# Download the requirements to speed up launching the Dataflow job.
&& pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE

ENV PATH="/usr/bin/ffprobe:$PATH"

RUN echo $PATH # Verification

# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True

ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]

 

I read you in the comments.

--
Best regards
David Regalado
Web | Linkedin | Cloudskillsboost

0 1 1,395
1 REPLY 1