Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

JSON Schema Validation in Google Pub/Sub Without Additional Services?

Hello!

We are using Google Pub/Sub for our messaging needs and need to validate the JSON schema of messages before they are published. Our goals and restrictions are:

  • Goals:

    • Validate JSON payloads against a pre-configured schema.
    • Automatically reject (not acknowledge) messages that do not conform to the schema.
  • Restrictions:

    • No additional services or tools should be used (e.g., Cloud Functions, Cloud Run).
    • Messages must remain in JSON format (no transformation to Protocol Buffers).

We are considering enforcing client-side validation by requiring all callers to use a custom library for validation before publishing to Pub/Sub. However, we foresee several challenges:

  1. Development Overhead: The library would need to be maintained and developed in multiple languages (e.g., Node.js, .NET).
  2. Developer Experience: Developers would not be able to use the standard Pub/Sub communication methods, as they would need to go through our library.
  3. Bypass Risk: There is no guarantee that developers will use the library, and they might bypass it and call Pub/Sub directly, avoiding the validation.

Is there a way to achieve JSON schema validation natively within Google Pub/Sub, similar to how it handles Protocol Buffers? If not, what are some best practices to ensure schema validation is enforced under these constraints?

Thank you!

0 2 634
2 REPLIES 2

Currently,Pub/Sub doesn't offer native JSON schema validation like it does for Protocol Buffers. However, there are still some effective strategies you can implement. Given your constraints, here are some strategies you can adopt:

Enhanced Client-Side Validation:

  • Invest in a well-maintained, open-source client library that supports multiple languages (Node.js, .NET, etc.) and simplifies schema validation for your developers.

  •  Incorporate a mechanism in the library to automatically fetch the latest JSON schema from a centralized location like Google Cloud Storage. This ensures consistent validation across all clients.

Subscriber-Side Filtering:

  •  Set up a Dead Letter Topic to capture messages that fail validation. You can then analyze these messages to identify validation issues and improve your schemas.

  • Create alerts based on the volume of messages in the DLT to proactively detect validation failures.

Limited Schema Enforcement:

  • Use message attributes to store schema version or hash information, allowing subscribers to filter or validate messages based on schema compatibility.

  • If your schema has mandatory fields, implement subscription filters to block messages missing those fields.

Prioritize Client-Side Validation:

  • Ensures messages adhere to the correct format, preventing errors and ensuring downstream processes work smoothly.

  •  Invalid messages can disrupt your applications, impacting reliability.

  • Consistent message formats simplify future updates and maintenance.

Example: Subscription Filtering 

If your JSON schema requires a field called eventType, create a subscription filter like this:

attributes.eventType != null

This filters out messages without the eventType attribute before they reach your subscribers.

Hi jfbaro,

Google PubSub supports only avro and protobuf for schema validation.
If you are comfortable with defining schema in avro or protobuf format then you can create PubSub Schema resource and associate it to your topic.