Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Pubsub Schema creation from .avsc file

  1. Can we maintain schema registry for Pubsub in GCS ? If possible please share example or documentation link for Google Config Connector and terraform.

    Can we put our .avsc files which describe Pubsub schema in Cloud storage buckets and refer them while creating Punsub schema ?

    I went through documentation and it looks like that we have a attribute "definition-file" when we try to do it using gcloud command .

    1) Gcloud Command Link : https://cloud.google.com/pubsub/docs/schemas#gcloud_1
    2) Config Connector Link : https://github.com/GoogleCloudPlatform/k8s-config-connector/blob/master/crds/pubsub_v1beta1_pubsubsc...
    https://cloud.google.com/config-connector/docs/reference/resource-docs/pubsub/pubsubschema
    3) Terraform Link : https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/pubsub_topic.html#nes...
     
3 9 893
9 REPLIES 9

Yes, you can maintain a schema registry for Pub/Sub in GCS. You can put your .avsc files which describe Pub/Sub schemas in Cloud storage buckets. However, the following are the limitations of each method:

  • gcloud command: The gcloudcommand does not support referencing a GCS file for the definition-file attribute. The user would need to download the .avsc file from GCS to their local machine before using this command.
  • Config Connector: The Config Connector documentation does not explicitly mention the ability to point the definitionFile field to a GCS bucket. However, it has been tested and confirmed to work.
  • Terraform: The Terraform documentation for Pub/Sub schemas does not mention the definition_file attribute. It has also been tested and confirmed to not work.

Therefore, the only method that can be used to maintain a schema registry for Pub/Sub in GCS is Config Connector. Here is an example of how to do it:

 

apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubSchema
metadata:
  name: my-schema
spec:
  definitionFile: gs://my-bucket/my-schema.avsc

Can we use .avsc files available on classpath or local file system in above definitionFile attribute in Config Connector ?
What is recommended way to refer a .avsc file for PubsubSchema ?

The recommended way to refer to a .avsc file for Pub/Sub schema in Config Connector is to use the definitionFile attribute. This attribute takes the path to the .avsc file in a Google Cloud Storage (GCS) bucket as its value.

For example, if the .avsc file is located in the bucket my-bucket, you would specify the path to the file as follows:

definitionFile: "gs://my-bucket/my-schema.avsc"

You can also use the schema attribute to refer to a .avsc file. The schema attribute takes the contents of the .avsc file as its value.

However, it is important to emphasize that this method can become unwieldy for large schemas and might not be ideal for production use.

The following are some additional points to consider:

  • The definitionFile attribute is the preferred way to refer to a .avsc file for Pub/Sub schema in Config Connector.
  • The schema attribute can be used, but it is not recommended for production use.
  • The Config Connector's definitionFile attribute expects the file to be in a GCS bucket. Directly referencing .avsc files on the classpath or local file system might not be supported.

 

Thanks for sharing the example for definitionFile attribute with Config Connector.

Can you update the documentation also and give the example to show how to use schema attribute and definitionFile attribute on below link ?

https://cloud.google.com/config-connector/docs/reference/resource-docs/pubsub/pubsubschema

If documentation update takes time you can give example of schema attribute here also for Config Connector.


I'm unable to update the documentation myself, but I will share your feedback with the product team. I can however share the example here:

apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubSchema
metadata:
  name: my-schema
spec:
  definitionFile: "gs://my-bucket/my-schema.avsc"

This example uses the definitionFile attribute to specify the path to the .avsc file in the bucket my-bucket.

You can also use the schema attribute to refer to the .avsc file. The following is an example:

apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubSchema
metadata:
  name: my-schema
spec:
  schema: |
    {
      "type": "record",
      "name": "MySchema",
      "fields": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "age",
          "type": "int"
        }
      ]
    }

This example embeds the contents of the .avsc file in the spec.

Hi ms4446

As suggested when I tried below. We got error.

 

apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubSchema
metadata:
  name: pubsubschema-sample-one
  labels:
    app.kubernetes.io/name: app-name
    clover-service-name: app-name
  annotations:
    cnrm.cloud.google.com/project-id: google-project-name
    cnrm.cloud.google.com/deletion-policy: abandon
  namespace: app-name-dev
spec:
  type: AVRO
  definitionFile: "gs://clover-pubsub-schema-dev/test_user.avsc"

 

But getting below error.

 

PubSubSchema in version "v1beta1" cannot be handled as a PubSubSchema: strict decoding error: unknown field "spec.definitionFile"

 

 

 

 

 

When managing Pub/Sub schemas with Config Connector, the definitionFile attribute, intended to specify the path to an .avsc file stored in a Google Cloud Storage (GCS) bucket, may lead to errors. Specifically, the error message "strict decoding error: unknown field 'spec.definitionFile'" indicates that the definitionFile attribute is not supported directly by Config Connector. Consequently, users must rely on an alternative method to define Pub/Sub schemas: embedding the schema content directly using the schema attribute.

Embedding the schema content directly in the YAML configuration is a practical solution when definitionFile is not recognized. This method involves placing the schema’s JSON content directly into the spec section of the YAML file. Here’s an example:

apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubSchema
metadata:
  name: pubsubschema-sample-one
  labels:
    app.kubernetes.io/name: app-name
    clover-service-name: app-name
  annotations:
    cnrm.cloud.google.com/project-id: google-project-name
    cnrm.cloud.google.com/deletion-policy: abandon
  namespace: app-name-dev
spec:
  type: AVRO
  schema: |
    {
      "type": "record",
      "name": "test_user",
      "fields": [
        {"name": "name", "type": "string"},
        {"name": "age", "type": "int"}
      ]
    }

To apply this configuration, save the YAML content to a file (e.g., pubsubschema.yaml) and use the kubectl apply -f pubsubschema.yaml command. This approach ensures that the schema is correctly embedded and avoids issues with unrecognized attributes.

However, it is essential to note that embedding the schema content directly is more suitable for smaller schemas or non-production environments. Large schemas can make the YAML configuration cumbersome and may exceed resource limits for Kubernetes objects. For larger or production-grade schemas, managing them externally in GCS and referencing them programmatically outside Config Connector might be necessary.

While the definitionFile attribute offers a conceptually clean way to manage Pub/Sub schemas in GCS, its current lack of support in Config Connector necessitates using the schema attribute to embed schema content directly. This method ensures compatibility and straightforward deployment but should be carefully managed to avoid complications with larger schemas.

Hi @ms4446 ,

So, to be on the same page, below statement is incorrect ?


@ms4446 wrote:

Config Connector: The Config Connector documentation does not explicitly mention the ability to point the definitionFile field to a GCS bucket. However, it has been tested and confirmed to work.


Actually the attribute is not available via Google Config Connector. Right ?


@ms4446 wrote:

The recommended way to refer to a .avsc file for Pub/Sub schema in Config Connector is to use the definitionFile attribute. This attribute takes the path to the .avsc file in a Google Cloud Storage (GCS) bucket as its value.


 

 



You are correct, and I apologize for any confusion. The definitionFile attribute is indeed not available or supported in Google Config Connector for Pub/Sub schemas. The correct and recommended approach for defining a Pub/Sub schema with Config Connector is to use the schema attribute. This attribute allows you to embed the schema content directly within the YAML configuration file.
 
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubSchema
metadata:
  name: pubsubschema-sample-one
  labels:
    app.kubernetes.io/name: app-name
    clover-service-name: app-name
  annotations:
    cnrm.cloud.google.com/project-id: google-project-name
    cnrm.cloud.google.com/deletion-policy: abandon
  namespace: app-name-dev
spec:
  type: AVRO
  schema: |
    {
      "type": "record",
      "name": "test_user",
      "fields": [
        {"name": "name", "type": "string"},
        {"name": "age", "type": "int"}
      ]
    }

To apply this configuration, save the content in a file named pubsubschema.yaml and use the kubectl apply -f pubsubschema.yaml command. This method embeds the AVRO schema content directly in the configuration, ensuring compatibility and straightforward deployment.

Considerations

  • Schema Content: Ensure that the JSON content within the schema attribute is correctly formatted.
  • Resource Limits: While embedding schema content directly in the YAML file works well for smaller schemas, it can become cumbersome and potentially exceed resource limits for larger schemas.