Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

GCP PubSub asynch Publish, long delays in running callback

We are running Microservices on GKE and we subscribe and publish messages to PubSub asynchronously. We are seeing huge delays executing the callbacks where the P99 is in the order of 6-10s irrespective of the load.

Each of our GKE nodes are provisioned with 2 CPUs and 2GB memory. The CPU and memory usage is < 25% even under high loads.

My publish code

 

  private void publish(
      final ProcessingContext context,
      final PubsubMessage message,
      final AckReplyConsumer consumer,
      final TopicName targetTopic,
      ExecutorService callbackExecutor,
      PublisherFactory publisherFactory) {

    // Get Publisher from Cache
    // This will be a persistent publisher
    Supplier<ApiFuture<String>> futureSupplier =
        () -> publisherFactory.createPublisher(String.valueOf(targetTopic)).publish(message);

    long ts = System.nanoTime();
    ApiFuture<String> future = context.execPublishCallForMessage(futureSupplier);
    long publishCallLatency = System.nanoTime() - ts;
    log.info("Success publishCallLatency: {}", publishCallLatency);

    final long publishBeginTs = System.nanoTime();
    ApiFutures.addCallback(
        future,
        new ApiFutureCallback<>() {

          @Override
          public void onSuccess(String result) {
            long publishAndCallbackLatency = System.nanoTime() - publishBeginTs;
            log.info("Success publishAndCallbackLatency: {}", publishAndCallbackLatency);
            callbackExecutor.execute(consumer::ack);
          }

          @Override
          public void onFailure(Throwable t) {
            long publishAndCallbackLatency = System.nanoTime() - publishBeginTs;
            log.info("Failed publishAndCallbackLatency: {}", publishAndCallbackLatency);
            callbackExecutor.execute(consumer::nack);
          }
        },
        MoreExecutors.directExecutor());
  }

 

 

The publishCallLatency is low in the order of 5-10 ms but the publishAndCallbackLatency is very high in the order of 6-10s.

I have flow control and batching enabled. But I'm unable to figure why it takes such a long time for the callback to execute. I tried increasing the publisher executor threads and ran profiler but most of the executor threads are idle. I'm trying to reduce this latency and can't explain this latency.

Any ideas?

 

0 6 1,171
6 REPLIES 6