Redis MemoryStore Sorted Set dataloss

I am testing out various features of GCP whilst my company ramps our product to production. This evening I have found that whilst using a sorted set in Redis that it seems to be losing data.

I have a sorted set being updated with additional scores twice an hour (adding two values and scores each hour). Today I was in the cli and did a zrangebyscore -/+ inf and got back 146 keys. An hour later I do the same and get back 146 keys which made me think damn its not being updated but on closer inspection it was. The new keys where there and two keys from the middle of the set had just disappeared.

I don't in my code have any deletes or removals from the sets so I am at a loss as to what is going on. I can only get its something to do with the snapshotting, I am persisting once an hour but this could be completely unrelated.

I have used Redis Enterprise in other projects without any such issues. 

Does anyone know of any bugs / limitations / gotchas which could cause something like this to happen?

Thanks in advance,

Tom

Solved Solved
0 4 789
1 ACCEPTED SOLUTION

If you don't have a support package with Google Cloud, it can be a bit challenging to get direct help, but there are still a few steps you can take:

  1. Redis GitHub Repository: If you believe it's a bug in Redis itself,you can create an issue on the Redis GitHub repository:https://github.com/redis/redis.While Memorystore is a managed service from Google Cloud, the underlying software is Redis. Remember to include as much detail as possible.
  2. Document & Monitor: Keep a log of every instance of data discrepancy. Note down times,operations preceding the event,and any other potentially relevant information. This record will be helpful for both your internal team and any external help you might seek. You could also consider using a time series database to store these logs, which would make it easier to track and analyze trends
  3. Alternative Redis Deployment: As a temporary measure, you can consider deploying a Redis instance yourself on a Compute Engine VM on Google Cloud or using another cloud provider. This way, you'll have more control over configurations and logs, which could help in diagnosing the issue.
  4. Backup Strategy: Ensure you have a robust backup strategy in place. If you're adding data every hour, consider taking backups just as frequently, if possible. This might not prevent data loss but can minimize its impact.

I understand this is a challenging situation. Navigating issues without direct support can be tough.

View solution in original post

4 REPLIES 4

In Google Cloud Memorystore, there are a few things that could be causing data loss in your Redis sorted set:

  • Improper persistence configuration: If you are using snapshots, make sure to take them frequently enough to avoid data loss. If you are using AOF,make sure to flush the append-only file regularly.
  • Exceeding memory limits: If Redis's memory usage reaches its max limit, it could trigger evictions based on the set policy.

Here are some additional things you can check:

  • Make sure that your code is not accidentally deleting or removing elements from the sorted set.
  • Check the Redis logs to see if there are any errors or warnings.
  • Try reproducing the issue in a test environment.

Here are some additional tips for using Redis sorted sets in Google Cloud Memorystore:

  • Use a consistent persistence strategy. Both RDB snapshots and AOF have their strengths and weaknesses, but neither should result in data loss if used correctly. It is important to understand the nuances of each and choose a persistence strategy based on your needs.
  • Monitor your Redis instance. Use the Redis INFO command to check the memory usage, eviction policy, and other important metrics.

Thank you for your reply

  • Improper persistence configuration: If you are using snapshots, make sure to take them frequently enough to avoid data loss. If you are using AOF,make sure to flush the append-only file regularly.

    It was my understanding that Redis memory store only supports snapshots, this is currently set to 1 hour which is the minumum configurable frequency.

  • Exceeding memory limits: If Redis's memory usage reaches its max limit, it could trigger evictions based on the set policy.

    The DB capacity is 2GB however we are using only a few meg of that.

  • Make sure that your code is not accidentally deleting or removing elements from the sorted set.

    We have no delete calls at all its only appending to this key there isn't much to it.

  • Check the Redis logs to see if there are any errors or warnings.

    This is interesting as it shows this but no context to what is causing the errors and of course because its a managed instance no access to the logs.

    # Errorstats
    errorstat_ERR:count=3670

  • Try reproducing the issue in a test environment.

    We can but its such a fundamental and basic issue that it seems it wouldn't tell us much.

  • Other

    We have been using the monitoring, nothing is odd there no evictions which as this is a set it wouldn't be anyway I don't think. 

    We also don't have support package to ask Google.



If you don't have a support package with Google Cloud, it can be a bit challenging to get direct help, but there are still a few steps you can take:

  1. Redis GitHub Repository: If you believe it's a bug in Redis itself,you can create an issue on the Redis GitHub repository:https://github.com/redis/redis.While Memorystore is a managed service from Google Cloud, the underlying software is Redis. Remember to include as much detail as possible.
  2. Document & Monitor: Keep a log of every instance of data discrepancy. Note down times,operations preceding the event,and any other potentially relevant information. This record will be helpful for both your internal team and any external help you might seek. You could also consider using a time series database to store these logs, which would make it easier to track and analyze trends
  3. Alternative Redis Deployment: As a temporary measure, you can consider deploying a Redis instance yourself on a Compute Engine VM on Google Cloud or using another cloud provider. This way, you'll have more control over configurations and logs, which could help in diagnosing the issue.
  4. Backup Strategy: Ensure you have a robust backup strategy in place. If you're adding data every hour, consider taking backups just as frequently, if possible. This might not prevent data loss but can minimize its impact.

I understand this is a challenging situation. Navigating issues without direct support can be tough.

Thanks @ms4446 I have found the issue.

For posterity and for anyone seeing the same issue... check your ordering of the score and value.... you have them the wrong way round 🤔