Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataflow: ModuleNotFoundError: No module named 'src'

Hi

 

I'm been trying to launch a dataflow job with flex templates in python sdk. The job starts and then fails with the error ModuleNotFoundError: No module named 'src'.

I'll provide some context:

File treeFile treeFile tree

DockerfileDockerfileDockerfile

 setup.pysetup.pysetup.py

 requirements.txtrequirements.txtrequirements.txt

 

metadata.jsonmetadata.jsonmetadata.json

 e_commerce_batch.pye_commerce_batch.py

e_commerce_batch.py

Then, in cloud shell I run the following:

 

gcloud dataflow flex-template build gs://${bucket}/e_commerce_batch.json \
--image-gcr-path "${region}-docker.pkg.dev/${proyecto}/${artifact_registry_name}/dataflow/e_commerce_batch:latest" \
--sdk-language "PYTHON" \
--flex-template-base-image "PYTHON3" \
--metadata-file "metadata.json" \
--py-path "." \
--py-path "src/" \
--py-path "src/processors/" \
--env "FLEX_TEMPLATE_PYTHON_PY_FILE=e_commerce_batch.py" \
--env "FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE=requirements.txt" \
 

 What am I missing? I don't want to move the src.processors code to the main python file (e_commerce_batch.py) because that would make that file less readable.

--
Best regards
David Regalado
Web | Linkedin | Cloudskillsboost



 

Solved Solved
1 11 14.6K
1 ACCEPTED SOLUTION

EUREKA!

I've solved the issue by adding save_main_session=True in my python code!

See Pickling and Managing the Main Session for more info.

--
Best regards
David Regalado
Web | Linkedin | Cloudskillsboost

View solution in original post

11 REPLIES 11