Solved! Go to Solution.
Yes.
The components of BaaS (Usergrid) 2.1 are as follows:
Tomcat – Application Logic
Cassandra – Persistent Data Storage and Graph data
ElasticSearch – Indexed entities in flattened JSON
When entities (documents) are written into BaaS with PUT/POST operations they are persisted immediately and synchronously to Cassandra. A representation of the entity for indexing purposes is also written to Cassandra and a reference to that document to be indexed is placed in a queue. Entity data or the contents of the data persisted in BaaS does not go through this queue - only a UUID reference. We expect that this will be completed in <=50ms. During the time when the document is not indexed the document can still be retrieved by GET /{org}/{app}/{collection}/{uuid | name}. The UUID of the entity is returned in the PUT/POST API response.
There are two options for this internal queue for indexing-
In Memory – Messages produced by a given tomcat will only be visible and therefore processed by that tomcat
Using Amazon SQS and SNS – all tomcats are eligible to receive a message
In the case of a multi-datacenter deployment there are two options. With both options all sets of Tomcat servers in each datacenter will be able to serve traffic and each set of Cassandra nodes in each datacenter should maintain a replicated dataset.
Deployment Option 1:
In this option, the components would be deployed in the following manner:
Tomcat: Active/Active
Cassandra: Active/Active
ElasticSearch: Active/Active
Queue (Distributed): Amazon SNS+SQS
This is how Apigee runs BaaS in the cloud. For the case of an on-premises installation the customer would be responsible for maintaining the Amazon account and credentials required by BaaS.
Deployment Option 2:
In this option, the components would be deployed in the following manner:
Tomcat: Active/Active
Cassandra: Active/Active
ElasticSearch: Active/Passive
Queue (Local): In-Memory
In this case a ‘primary’ datacenter for ElasticSearch. This would involve pointing all Tomcat instances in all datacenters to this instance of ElasticSearch. Even though the Tomcats were pointed at a single ElasticSearch, they could still serve API traffic. Additional latency would only be incurred when doing queries using QL. From West <-> East the latencies are in the ballpark of 40ms on average.
In the case of a loss of connectivity to this datacenter another ElasticSearch cluster in a different datacenter would need to be promoted to be ‘primary’. All Tomcat instances would need to be updated to point to this new primary and a reindex of the data would need to be performed from a Tomcat within the same datacenter. The duration of the reindex would depend on the network latency and the amount of data.
All data is permanently persisted in Cassandra so the reindex of the data is benign.
Yes.
The components of BaaS (Usergrid) 2.1 are as follows:
Tomcat – Application Logic
Cassandra – Persistent Data Storage and Graph data
ElasticSearch – Indexed entities in flattened JSON
When entities (documents) are written into BaaS with PUT/POST operations they are persisted immediately and synchronously to Cassandra. A representation of the entity for indexing purposes is also written to Cassandra and a reference to that document to be indexed is placed in a queue. Entity data or the contents of the data persisted in BaaS does not go through this queue - only a UUID reference. We expect that this will be completed in <=50ms. During the time when the document is not indexed the document can still be retrieved by GET /{org}/{app}/{collection}/{uuid | name}. The UUID of the entity is returned in the PUT/POST API response.
There are two options for this internal queue for indexing-
In Memory – Messages produced by a given tomcat will only be visible and therefore processed by that tomcat
Using Amazon SQS and SNS – all tomcats are eligible to receive a message
In the case of a multi-datacenter deployment there are two options. With both options all sets of Tomcat servers in each datacenter will be able to serve traffic and each set of Cassandra nodes in each datacenter should maintain a replicated dataset.
Deployment Option 1:
In this option, the components would be deployed in the following manner:
Tomcat: Active/Active
Cassandra: Active/Active
ElasticSearch: Active/Active
Queue (Distributed): Amazon SNS+SQS
This is how Apigee runs BaaS in the cloud. For the case of an on-premises installation the customer would be responsible for maintaining the Amazon account and credentials required by BaaS.
Deployment Option 2:
In this option, the components would be deployed in the following manner:
Tomcat: Active/Active
Cassandra: Active/Active
ElasticSearch: Active/Passive
Queue (Local): In-Memory
In this case a ‘primary’ datacenter for ElasticSearch. This would involve pointing all Tomcat instances in all datacenters to this instance of ElasticSearch. Even though the Tomcats were pointed at a single ElasticSearch, they could still serve API traffic. Additional latency would only be incurred when doing queries using QL. From West <-> East the latencies are in the ballpark of 40ms on average.
In the case of a loss of connectivity to this datacenter another ElasticSearch cluster in a different datacenter would need to be promoted to be ‘primary’. All Tomcat instances would need to be updated to point to this new primary and a reindex of the data would need to be performed from a Tomcat within the same datacenter. The duration of the reindex would depend on the network latency and the amount of data.
All data is permanently persisted in Cassandra so the reindex of the data is benign.