Your guide to preparing for the Google Cloud Professional Data Engineer Certification

Lauren_vdv
Community Manager
Community Manager

Companies are ramping up the search for cloud experts who can take vast amounts of data and put it to work solving top business challenges, including customer satisfaction, production quality, and operational efficiency.

By earning the Google Cloud Professional Data Engineer Certification, you can demonstrate your skills and distinguish yourself among your peers with your ability to design data solutions that ensure maximum flexibility and scalability, meet security and compliance requirements, and enable data-driven decision making.

To help prepare for the exam, Google Cloud recently held a live session with Cloud Data Warehouse Engineer, Andrew Fleischer (@afleisc) and Learning Services Community & Outreach Manager, Carrie Browde (@carrie) covering:

  • How the Professional Data Engineer Certification helps validate your expertise and elevate your career
  • Expert recommendations and resources to best prepare for the exam, including the topics covered on the exam
  • Answers to your Data Engineer Certification questions

In this post, we provide an overview of the session, including key takeaways, links to supporting resources, and written answers to your questions so you can go into the Professional Data Engineer Certification exam feeling prepared and ready to pass! 

For any additional questions, please feel free to leave a comment below and someone from the Community or Google Cloud team will be happy to help.

Event recording

Tip👉 Use the timestamp links in the YouTube description to jump to the topics you’re most interested in.

Why become a certified Professional Data Engineer?

For the fourth year in a row, Google Cloud certifications rank among the highest paid IT certifications in the industry. So if you need one reason, it’s that you’ll get paid.

Certifications are also helpful when it comes to promotions, career changes, and overall confidence in cloud skills and your professional future. According to the Google Cloud Certification impact report, 85% of Google Cloud certified individuals reported feeling more confident in their cloud skills, while 78% reported feeling more confident in their professional future. 

Furthermore, more than one in four Google Cloud certified individuals took on more responsibilities or leadership roles, while almost one in five received a raise. 

Additional benefits to earning your certification include:

Check this out: Why IT leaders choose Google Cloud certification for their teams

Who should take the Professional Data Engineer Certification exam?

Google Cloud offers a range of certifications, with each one suited for different job functions, roles, and requirements. So who is a good candidate for the Professional Data Engineer Certification?

Lauren_vdv_0-1681309427601.pngThe recommended candidate includes experienced technical professionals who make data-driven decisions by collecting, transforming, and publishing data. These individuals have hands-on experience designing, operationalizing, and monitoring data processing systems and leveraging pre-existing machine learning models. 

It’s recommended you have 3+ years of industry experience, including 1+ years designing and managing solutions using Google Cloud, but there are no official prerequisites to taking (and passing!) the exam.

With the Google Cloud Credential Holder Directory, you can see thousands of Professional Data Engineer certification holders and which job roles they have today.

Information about the exam

Here are details about the Professional Data Engineer Certification exam format, delivery options, renewal requirements, and more:

  • Length: 2 hours
  • Registration fee: $200 (plus tax where applicable)
  • Languages: English, Japanese
  • Format: 50-60 multiple choice and multiple select questions
  • Exam delivery methods: a) Take the online-proctored exam from a remote location or b) Take the onsite-proctored exam at a testing center
  • Certification Renewal / Recertification: Candidates must recertify in order to maintain their certification status. Your Professional Google Cloud certification is valid for two years from the date of certification (Foundational and Associate certifications are valid for three years after you pass). Recertification is accomplished by retaking the exam during the recertification eligibility time period and achieving a passing score. You may attempt recertification starting 60 days prior to your certification expiration date.

Recommended steps and resources to prepare for the Professional Cloud Architect Certification exam

  1. Set your exam date well in advance and reverse plan your study/preparation time leading up to the date. Generally, people study incrementally over a few months, but if you’re new or already experienced with Google Cloud, more or less time may be required. 

  2. Register and select the option to take the exam remotely or at a nearby testing center. Review testing requirements and FAQs here to make sure you have everything you need to take the exam.

  3. Review the exam guide to determine if/how your current skills align with the topics on the exam. This will help you understand how much time you need to spend studying and in which areas.

  4. Complete the Professional Data Engineer Learning Path (at no cost for 30 days for new users), including the hands-on labs, quests, and courses. It’s not a requirement to go through each step in the Learning Path before you take the exam, but the more you know, the better prepared you’ll be!
    Lauren_vdv_1-1681309427609.png

    Supplement your studying with Google Cloud documentation, on-demand videos, instructor led sessions, and the O’Reilly Professional Data Engineer Study Guide

  5. Review the sample questions to familiarize yourself with the format of the exam questions and example content that may be covered on the exam. Test your knowledge with the sample questions, identify those that you got wrong, and review those topics again.

  6. Take and pass the exam! Share your new certification with the Google Cloud Community in the Learning & Certification Hub, on social media, and even on your fridge 🙂 

    Shoutout to @parash31 @toraaglobal @woojak-0141 @dromo4 @YeoNadege @jessicajohn @tramontini @KishoreDurgam to name just a few recent Certification holders in the Cloud Certified Community! 👏🎉 

    Who did we miss? Let us know in the comments! 

Professional Data Engineer Certification exam tips

Test taking tips

  • Try to anticipate the correct answer before looking at the options
  • All options are plausible, but only one will meet all the specifications called out in the question
  • There’s no penalty for guessing, so try not to leave questions unanswered
  • Mark questions for review if you don’t know after five minutes, and return to them later

What’s on the exam

The following provides a few key tips and areas that you should study before taking the Professional Data Engineer certification exam. With that said, make sure you review the exam guide to understand all the topics you should expect to know.

Understand open source versus Google Cloud managed services 

There are many interesting and useful tools that exist in the data engineering open source ecosystem. In some cases, Google Cloud offers a corresponding service that is a managed service, making it easier for you to operate at scale and with less operational overhead. The following table outlines the open source tools and their corresponding Google Cloud managed services that you should be familiar with.

Open source

Google Cloud managed service

Hadoop, Spark, Hive

Dataproc

Beam

Dataflow

Airflow

Cloud Composer

Kafka, RabbitMQ

Pub/Sub

Cassandra

Cloud Bigtable

See additional context in the event recording at 25:38

Understand common data engineering patterns

Within the data engineering ecosystem, there are common patterns that organizations choose to adopt, such as ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform), that you should be familiar with.

One such pattern to be familiar with is real-time streaming analytics. In situations where organizations need immediate insights for their streaming data, Google Cloud provides a three-product pairing that works well in this particular use case: Pub/Sub for ingesting the data into a messaging queue, Dataflow to transform that data, and then BigQuery to analyze and extract insights from the data. Check out this video, which explains the real-time streaming data engineering pattern in more detail. 

Review data flow charts

Make sure you understand the logic behind the following charts for determining the best data engineering solution to use for which use case. 

Google Cloud data engineering services and solution mapping

Lauren_vdv_2-1681309427612.png

Google Cloud database/storage decision tree

Lauren_vdv_3-1681309427616.png

Google Cloud data transformation tools decision tree (Dataproc or Dataflow)

Lauren_vdv_4-1681309427620.png

Opt for answers that reduce operational overhead

During the exam, you may come across a question that has more than one plausible answer, where you need to select the best option among them. In these cases, choose a correct answer that requires the least amount of operational overhead or manual work - which often means it’s using a Google Cloud managed solution. 

Study the Google Cloud Resource Hierarchy 

Make sure you’re familiar with the Google Cloud Resource hierarchy, including how and when permissions are assigned. Google Cloud offers IAM, which lets you assign granular access to specific Google Cloud resources and prevents unwanted access to other resources. IAM lets you control who (users) has what access (roles) to which resources by setting IAM policies on the resources.

You can set an IAM policy at the organization level, the folder level, the project level, or (in some cases) the resource level. Resources inherit the policies of the parent resource. If you set a policy at the organization level, it is inherited by all its child folder and project resources, and if you set a policy at the project level, it is inherited by all its child resources.

The effective policy for a resource is the union of the policy set on the resource and the policy inherited from its ancestors. This inheritance is transitive. In other words, resources inherit policies from the project, which inherit policies from the organization resource. Therefore, the organization-level policies also apply at the resource level. Learn more here

Lauren_vdv_5-1681309427622.png

Apply the principle of least privilege 

In addition to understanding how permissions are assigned, you should always make sure you’re following the principle of least privilege, where you’re not giving a user, resource, or service account any more permissions than is required. 

Professional Data Engineer Certification resource roundup

Pro tip 👉 Bookmark this page so you can refer back to these resources at your convenience.

Professional Cloud Data Engineer Certification FAQs

  1. What are some practical projects or use cases that I can start working on to gain hands-on experience with Google Cloud data engineering tools and technologies, and how can I effectively leverage Python and SQL skills in these projects?
    There are many practical projects or use cases that you can start working on to gain hands-on experience with Google Cloud data engineering tools and technologies. Some examples include:
    • ETL (Extract, Transform, Load): This is the process of extracting data from one source, transforming it into a format that can be used by another system, and loading it into that system. You can use Dataflow/Dataproc to create ETL pipelines that can extract data from a variety of sources, transform it, and load it into a variety of destinations.
    • Data warehousing: This is the process of storing and managing data in a way that makes it easy to access and analyze. You can use BigQuery to create data warehouses that can store vast amounts of data and provide high-performance access to that data.
    • Data visualization: This is the process of creating graphical representations of data. You can use Looker to create dashboards and reports that visualize data from a variety of sources.
    • GCP maintains a public Pub/Sub topic projects/pubsub-public-data/topics/taxirides-realtime and public BigQuery datasets, which you can subscribe to test out with. Use Python for Dataflow and SQL within BigQuery. 

  2. Is there any Google Cloud solution that may help with cleaning and fixing raw data? For example, autopopulate missing elements, unify date formats, or convert digital fields and strings to string format, etc.
    Dataprep is a self-service data preparation tool that can help you clean and fix raw data. It can automatically populate missing elements, unify formats, and convert digital fields and strings to string format.

  3. Should we know data science before applying for this certification?
    It’s not required to have a data science background before applying for the Google Cloud Professional Data Engineer certification. However, it will be beneficial to have some familiarity with data science concepts and techniques. This will help you to understand the data processing and analysis tasks that you will be required to perform as a data engineer.

  4. What is the difference between data analytics and data science?
    Data analytics and data science are both fields that deal with data. However, they have different focuses and goals.

    Data analytics is the process of extracting meaningful insights from data. It involves cleaning, organizing, and analyzing data to identify patterns and trends. Data analysts use a variety of tools and techniques to perform data analytics, including statistical analysis, machine learning, and data visualization.

    Data science is a more general field that encompasses data analytics. Data scientists use data analytics to extract insights from data, but they also use other tools and techniques, such as machine learning and natural language processing, to develop new products and services. Data scientists also work on developing new methods for collecting and storing data.

    The main difference between data analytics and data science is that data analysts focus on extracting insights from data, while data scientists focus on developing new products and services using data. Data analysts use a variety of tools and techniques to perform data analytics, while data scientists use a wider range of tools and techniques, including machine learning and natural language processing.

    Data analytics is a more established field, while data science is a newer field. Data analysts are in high demand, and there are many job opportunities available. Data scientists are also in high demand, but there are fewer job opportunities available.

  5. As a Cloud Architect, what level of data engineering knowledge should you have to provide the right estimates or the usage of right tools?
    You should be familiar with the data product flowcharts shown above. See the Professional Cloud Architect exam guide to understand what level of knowledge and experience is expected for a Professional Cloud Architect. 

  6. Can I get credits to prepare for the preparation course in Google Cloud Skills Boost? A voucher for the exam?
    When you enroll into most courses, you will be able to consume course materials like videos and documents for free. If a course consists of labs, you will need to purchase an individual subscription or credits to be able consume the labs. Labs can also be unlocked by any campaigns you participate in. All required activities in a course must be completed to be awarded the completion badge.

    See your subscription options for Google Cloud Skills Boost here. New users are eligible for a 30-day free trial or you can opt for a monthly subscription of $29/month. Additionally, you can take advantage of the newly released Innovators Plus subscription, available for $299 USD/year, including access to $500 Google Cloud credits, a Google Cloud Certification voucher, a bonus $500 credits after the first certification you earn each year, and more. Learn more here.

    Partners can take advantage of the Partner Kickstart Certification program

    Learn how to request and share credits with Google Cloud Community members here.

    Learn all about vouchers and discounts here

  7. Can a novice with no coding experience take this course?
    Yes. Having no coding experience should not stop you from taking the course. You won’t be asked to code on the exam; it's more about choosing the right approach, product, or solution to fit the use case. 

  8. Is the new wave in data cloud & AI, namely Foundation models and Generative AI, tackled in the exam material and Cloud Skills Boost content? If not, is it going to be available in future training? Specifically, I am referring to the data engineering view - the one that employs Vertex AI with Generative AI, the PaLM API, the Gen App Builder, the MakerSuite, Workspace with AI, in the entire data lifecycle.
    In terms of the exam, see “Section 3: Operationalizing machine learning models” for what is covered on the topic of machine learning. The exam guide will always reflect what you should expect to know and what you will see on the exam. New products like Generative AI have not yet been integrated into the Professional Data Engineer certification.

  9. I’ve taken the training courses in Coursera. Do I need to retake it in Google Cloud Skills Boost? Why doesn’t Coursera offer skill badges?
    The course content and individual lab content is the same on Coursera, but skill badges are only available on Google Cloud Skills Boost. A skill badge is earned by completing a series of hands-on labs and taking a final assessment challenge lab to test a learner’s skills, through the Google Cloud Skills Boost learning platform.

  10. What are the learning options for Google Cloud partners? How do they differ from Skills Boost?
    The courses and hands-on labs available through Google Cloud Skills Boost are not different for our partners, but as a Google Cloud Partner, you have unlimited access to all our training at no cost, as well as some additional training/on-demand options exclusive to Partners. Partners can access the course content here: Google Cloud Skills Boost for Partners. If you're a Google Cloud Partner and don't have an account, you can request one here

    Learn more about Partner resources
    here

  11. How do Google Cloud learning partners’ offerings (e.g. Coursera, Pluralsight, LinkedIn) differ from Google Cloud Skills Boost?
    Google Cloud Skills Boost is our first party platform destination for online learning, skills development, and certifications, managed and delivered directly by Google Cloud. We also offer the same content on partner platforms, like Coursera, LinkedIn, and PluralSight.

  12. Will signing up for the Cloud Free Trial allow us to get used to the products such as Dataflow, Dataproc, Pub/Sub, and BigQuery?
    Yes; you can learn all about what’s included in the Free Tier and the 90-day, $300 Free Trial offer here.

  13. What other technologies (ie, SQL, python, Java, etc.) outside of Google Cloud would help with this certification or be helpful as a GCP Data Engineer?
    It really depends on what you’re looking to accomplish, but SQL, Python, and Java are great to know, as you mentioned. For example, if you’re using BigQuery, SQL is used for ETL/ELT. Additionally, you can run Dataflow or Apache Beam using Python or Java, so those are both good to be familiar with in that case.

    If you’re using something like Hadoop or Dataproc, you might also want to be familiar with Apache Spark. And if you know Apache Airflow, then you can be familiar with using the workflow orchestration service, Cloud Composer

    It might also be useful to know Terraform, which isn’t directly related to the Professional Data Engineer certification, but if you want to spin up a database or a different product in an infrastructure as code fashion, Terraform will be useful. 

See additional FAQs and answers in the Help Center here.