While reviewing the google cloud pub sub schema registry: https://cloud.google.com/pubsub/docs/schemas#go
It isn't clear how google manages schema versioning or schema evolution? For example it doesn't seem possible to perform an edit on a schema? Does that mean that any additive change is always a new version? If so, how would the team recommend that we tag the version of the schema?
After a schema is associated with a topic, you cannot update the schema or remove its association with that topic. If a schema is deleted, publishing to associated topics will fail. Schemas in Pub/Sub aren't designed to be updated or modified -- rather, they are used to enforce a particular structure on all published messages to a topic. If your schema were to change, you would need to create a new topic and attach the updated schema to it.
Pub/Sub product manager here. @jredl we are trying to figure out how evolution must work. Any feedback about your hopes and requirements on this would be appreciated. Easy way to submit this privately is through the feedback link in Console (Question Mark Icon in upper right corner).
Hi @KirTitievsky,
We'd like to see it work similar to what Avro calls full transitive compatibility. So a change can be made to a schema, but only if that change is fully compatible with all data that has been sent before with previous schemas.
This would allow publishers to evolve their schema to a degree, whilst also allowing consumers to be confident building on the topic knowing the schema won't introduce breaking changes.
It will (presumably) also be easier for you to implement, knowing that features like replaying data will continue to work, because no matter what schema is being used the data will be compliant.
For big changes to the schemas, I think it's ok for us to have some migration path to a new topic, with new subscriptions, etc, that the consumers can then be migrated to, similar to a big change in an API and a new version. But to have to go through that effort simply to add a new field is too prohibitive and is probably going to a blocker for us making use of PubSub schemas, unless we can find some workaround (though we can't think of any yet).
Which would be a shame, as we want to start treating our data like our APIs, with schemas, versions, etc, so we do want our PubSub feeds to be associated with a schema, but right now PubSub schemas are just too prohibitive, so we're having to look at alternatives/workarounds. We're just not going to come up with a perfect schema on day 1 🙂
I was thinking along the same lines. Changes that are forward compatible according to the schema framework (currently Protobuf or Avro) should not necessitate new topics, but breaking changes absolutely should.
Today my worry is that if I was going to use Pub/Sub schemas I'd end up having to build a system around them that managed creating new topics whenever a schema changed in a forward compatible way and republishing the messages into a stable subscription for subscribers because the schemas are simply evolving too rapidly as developers iterate on business/customer problems.
As it stands I would only consider using the Schema feature for very well-known Schemas that are probably orthogonal to business logic and unlikely to evolve. There are definitely a few of these, but I don't believe they represent the majority case in our usage of Pub/Sub (3500+ topics).
@jredl That make sense. Very helpful definition from Confluent there. I think full transitive is achievable. Any thoughts on the value of
- Schema validation server side
- Does evolution mean: the a new version is added to a schema attached to topic or a new schema resource, guaranteed to be compatible with the previous one, is attached to the topic?
Ok that would be great, thanks!
> Schema validation server side
Sorry, by server side validation, do you mean validation of the schemas by the applications, i.e. the publishers to the topic and the consumers from the subscription? That's one of the workarounds we are looking at, but it's tricky as they could be written in any language, and we would introduce a single point of failure on a registry (we currently use an Avro registry, although may be able to use GCP Data Catalog which would help alleviate that concern). It would also be a lot less strict validation that having it done by PubSub, and wouldn't work with things like Dataflow SQL.
> Does evolution mean: the a new version is added to a schema attached to topic or a new schema resource, guaranteed to be compatible with the previous one, is attached to the topic?
Yes. I think ideally there would be a version associated with a schema. Again, similar to how an Avro Registry might do it.
So maybe if I run:
gcloud pubsub schemas create my-schema
And there is already a schema called `my-schema` in that project, a new version of that schema is saved, and any PubSub topics associated with that schema start using the new version.
Running:
gcloud pubsub schemas describe my-schema
Could list the version history and useful metadata like when they were created, who by, etc.
Hi @KirTitievsky,
Did my replies make sense?
Do you feel this is something you are likely to work on, and if so do you have an idea when you might look at this (i.e. is it weeks, months, quarters away)?
We're actively looking at applying schemas to our PubSub data and would love to get an idea if this is something we can expect to have in the future, or should we invest in alternatives (i.e. maybe not using PubSub schemas, maybe using a registry with PubSub, maybe looking more seriously at Kafka and its ecosystem, etc).
Thanks.
@andrewjones Apologies for the silence. Yes, your answer makes sense. We are actively working on the design for this. I hope to have this out in GA within 3-6 months, but at the moment cannot make firm commitment.
A major usability question we are working through now is whether it is better to have schema be immutable and have topics manage versioning, as an alternative to what you proposed. Meaning, if you wanted to evolve a version of mySchema_v0.1 associated with myTopic you would create mySchema_v0.2 as a distinct object and associate it with the same topic. The topic would then have a history of schema it has supported and enforce backwards compatibility of mySchema_v0.2 and mySchema_v0.1. What would be the implication of this version of evolution for your designs?
If you'd be open to discussing this live, please send me a note at kir@google.com
That's great, so exited that this feature is coming!
Yeah that sounds great, and I think that works just as well for our use case.
@jredl @dwalker-va Might you be open to discussing your views on this live? If so, send me a note at kir@google.com. I'd be grateful for a deeper discussion.
@KirTitievsky Looking forward to see this feature coming as well - do you mind keeping this thread up-to-date as you have a better feel on timing?
@KirTitievsky are there any updates on this topic?
We want to utilize Google PubSub and Avro Schemas for our future architecture. Currently we have the same questions as @andrewjones and @jredl ..
@KirTitievsky Hi! Just wanted to bump @Brotzka 's message up to see if there were any updates on this?
Hey, folks. The original timeline did not pan out. Tentative ETA is mid-2022.
Thanks for the response @KirTitievsky . That's disappointing - I realise you can't get a firm date and don't know what other priorities pushed this back, but will this be a priority for you and your team next year? If not, and you think this pushed again, we will need to discuss moving to another solution as this is a pretty large inconvenience for our teams.
@mkmcdo would you be so kind as to reach out to cloud-pubsub@google.com directly so we can discuss this in detail?
Hi @KirTitievsky - I sent a message to the email you linked to last month but haven't received a response yet. Happy to still chat about this as it's still an issue for our customers and could be a blocker for pubsub to be a long term solution for us.
Thanks,
Megan
Is Pub/Sub Schemas is useful for website development?
Hey @KirTitievsky , hope you're well.
Just wondering if there has been any updates on this feature, and if there is a rough ETA you can share?
We've continued to build on Pub/Sub Schemas and it's going well, but the lack of compatible updates to schemas does mean that every schema change requires quite a big migration, which is quite a lot of effort if (say) you're just adding a new field. It has the potential to slow our teams down and/or make them reluctant to structure their data in the first place.
If it helps to have more context on what we're doing with Pub/Sub Schemas, I talk a bit about our usage of them on an episode of the Data Mesh Radio podcast here: https://daappod.com/data-mesh-radio/early-learnings-from-data-contracts-andrew-jones/.
Thanks,
Andrew
Hi Everyone,
David here, Cloud Pub/Sub Technical Lead Manager, we are actively working on the implementation of schema updates (a.k.a. Schema Evolution). Hoping to have a preview version for you in Q2.
Thanks,
That's great to hear, thanks for update!
Do you have any hints if there's gonna be any difference between Protobuf and Avro
Hi @davidtorres ! I hope you had a good weekend 🙂
Are there any updates on timing or specifics on when in Q2 the preview will be ready? Really looking forward to seeing what we have. Thanks!
Hi,
No specific date, at this point nothing earlier than end of the quarter.
Best,
- David
Hi @davidtorres - Hope your week is going well! Do you have a firmer date yet on when you can share schema evolution updates?
Hey
Are there any updates on the schema update?
Really looking forward to it
@davidtorres any update on preview feature?
Hello @fortoajit ,
This is Prateek, Product Manager on Cloud Pub/Sub. Schema Evolution is actively being worked on, and will be available in preview in Q4, with GA planned for H1 2023. Thanks.
@prateekduble Is it possible to get insights of roadmap for customers? We are really looking to have this feature.
Hi @fortoajit ,
Can you please elaborate on what you mean by getting insights of roadmap for customers? Are you looking for H2 roadmap for Cloud Pub/Sub, or more details on how this feature would work? Thanks.
Hi @prateekduble We are looking for H2 roadmap for Cloud Pub/Sub to follow upcoming features. may I ask if there is a google group to look into the issues and features currently being tested?
Hi @fortoajit , We are planning to launch BigQuery subscriptions (GA most likely by end of July 2022) and Exactly Once Delivery GA (Q4 2022). We are also planning to launch Schema Evolution in preview in Q4, with GA planned for H1 2023. Hope that helps.
Please let me know if you have any questions. Thanks.
Hi @prateekduble ,
I'm very excited that schema evolution in pubsub has been planned. Any update, by when can we expect preview of this feature?
Hi @spatel13 ,
We are actively working on it and can share the user guide if you want to reach us out through your Google Cloud representative. We are planning to have the feature GA in early H1 2023.
Hi @prateekduble ,
Do we have the schema evolution feature in pubsub already rolled out?
Hi @PraveenRam ,
We are targeting for a GA launch by the end of March. I will update here once we have launched it. Thanks for reaching back out.
saw the changes of in the client library, very excited.
Thanks for all the updates, @prateekduble!