Hi all,
I am facing a kind of degradation using KVMops policy while putting and retrieving values from kvm (and its cache). I am using Apigee Edge hybrid.
I am using the KVM to save a sessionId when a user performs a successful log-in (a jwt will be generated containing the sessionId) and then I retrieve it when he tries to invoke a backend service in order to match its value against the one contained in the jwt (a simple way to permit a user to have only one session alive, and a new session will "steal" the old one).
Basically I have this KVMop on a response api proxy post flow after the user is successfully logged-in, to save the sessionId into the kvm:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<KeyValueMapOperations enabled="true" name="KVMop-postSessionID" mapIdentifier="sessionCache" async="false" continueOnError="false">
<DisplayName>KVMop-postSessionID</DisplayName>
<Properties/>
<ExclusiveCache>false</ExclusiveCache>
<ExpiryTimeInSecs>21600</ExpiryTimeInSecs>
<Put override="true">
<Key>
<Parameter ref="username"/>
</Key>
<Value ref="code"/>
</Put>
<Scope>environment</Scope>
</KeyValueMapOperations>
And when he tries to call a service, on the request Preflow I retrieve from the KVM the sessionId (using this policy) and then perform the check i mentioned before:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<KeyValueMapOperations continueOnError="false" enabled="true" name="KVMop-getSessionID" mapIdentifier="sessionCache">
<DisplayName>KVMop-getSessionID</DisplayName>
<Properties/>
<Get assignTo="sessionID">
<Key>
<Parameter ref="jwt.username"/>
</Key>
</Get>
</KeyValueMapOperations>
The problem: sometimes some users failed the check "JwtSessionId = KvMSessionId" when they legitimately call a service after a login. What I see, in debug session, is that the put KVMop and write jwt is performed but when the get KVMop tries to retrieve the sessionId for the user it gets an old value and not the fresh one (as if there is some latency in writing the sessionId or as if a false positive is returned from the put KVMop meaning that the value is not written in KVM but the policy response is ok).
The KVM storing the sessionIds is not little, it has around 2k entries. But even purging it, didn't solve the sporadic errors.
Another strange thing: after the login, some services are called in parallel and what I saw is that some fails the check others not (as if there is an underlying kvm or kvm cache layer with structures working and not working but it's just an assumption and i don't know very well the apigee architecture).
I have also found this on https://cloud.google.com/apigee/docs/release/known-issues :
@dchiesa1 @anilsagar maybe do you have any clues or hints?
Many thanks if you can help me or analyze my problem 🙂
Solved! Go to Solution.
Hi Perry,
Thanks for reaching out about this. Addressing sessionID management can have some complexities.
First, I recommend looking at using the Cache policies in place of the KVM policies. Whether you choose to change will depend on your requirements. The reason I mention this is that Cached items (with expiration times) automatically clean up from storage. While a KVM exists until it is deleted. I don't know from the description whether the usernames change or grow in number frequently. I see the reference to 2k entries, so this may not yet be a concern. If they do, it may be worth changing to help reduce storage bloat.
Additionally, changing to a cache based model will allow the session to timeout in the database as well as in the JWT adding a bit more security to the system. The cache's expiration can be set to the time the JWT expires.
Specifically to your questions however, I have a few thoughts that may apply based on the described situation.
From what has been described, think you are running into #3. The user may be re-logging in or something is causing a new JWT to be created. When the new JWT is created, the pod that processes the new JWT will know about it. However any other pods that may still have the old JWT in cache will not know there is a new JWT until the cache expires.
So there are a couple of approaches that could help.
Cheers,