Setting replicas to three for cassandra

Not applicable

We have build our apigee planet in amazon, and our instances are spread across two AZs in a single region. We have cassandra members1 & 3 in AZ1 with cassandra member 2 in AZ2. Today it seems our cassandra instances were unable for about 6 minutes in AZ, but cassandra member 2 was up the entire time. We found errors from the message processing saying it couldn't find data for our API keys. Looking further at the cassandra ring replicas seems to be set to 2 instead of 3.

Can someone from apigee confirm replicas is set to 2 by default? We also want to make sure each cassandra member has a copy of our data. What is the process for changing it to 3?

0 9 1,354
9 REPLIES 9

I presume, you are trying to achieve High Availability (HA) for your setup on AWS.

We run our public cloud on AWS, and we make use of 3 AZ's in a AWS region as our HA strategy.

Recommended / default Replication Factor for all apigee datastores is 3. By utilizing 3 AZ's in AWS, and using EC2Snitch, replicas are placed in 3 distinct AZ's / 1 node in each AZ.

By utilizing 3 AZ's in AWS, we can achieve the recommended higher consistency (local_quorum) and High Availability (HA) in the event of a node failure or an entire AZ failure or disconnect.

Let me know if you have further questions.

Thanks for the reply. I had one question regarding the usage of EC2Snitch. Can you confirm you're configuring your racks (ex. ra-1, ra-2, ra-3) to align with availability zones within the region? I'm assuming you're doing this to ensure replication in each AZ.

Not applicable

hmm - this doesnt seem right.

If we are using EC2Snitch, C* nodes will get the DataCenter and Rack information automatically from the ec2metadata.

cassandra-topology.properties is used only when using property file snitch (default).

In AWS, Availability Zone(s) is equivalent to Racks.

When using propertyfile in AWS, we need to mention (1b,1c,1d) as Racks in regions/datacenter us-east and 2a, 2b, 2c in region/datacenter us-west-2

However, I would recommend using EC2Snitch instead of maintaining property files.

eg:

Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: us-east
==========
Address         Rack        Status State   Load            Owns                Token
                                                                               141784319550391026443072753096570088105
192.168.95.25   1d          Up     Normal  80.6 GB         16.67%              0
192.168.94.104  1c          Up     Normal  80.27 GB        16.67%              28356863910078205288614550619314017621
192.168.93.30   1b          Up     Normal  80.63 GB        16.67%              56713727820156410577229101238628035242
192.168.95.96   1d          Up     Normal  80.22 GB        16.67%              85070591730234615865843651857942052863
192.168.94.91   1c          Up     Normal  80.08 GB        16.67%              113427455640312821154458202477256070484
192.168.93.98   1b          Up     Normal  80.92 GB        16.67%              141784319550391026443072753096570088105
Datacenter: us-west
==========
Address         Rack        Status State   Load            Owns                Token
                                                                               141784319550391026443072753096570088205
192.168.84.119  2a          Up     Normal  80.96 GB        0.00%               100
192.168.86.21   2b          Up     Normal  80.34 GB        0.00%               28356863910078205288614550619314017721
192.168.85.106  2c          Up     Normal  80.95 GB        0.00%               56713727820156410577229101238628035342
192.168.84.25   2a          Up     Normal  80.82 GB        0.00%               85070591730234615865843651857942052963
192.168.86.134  2b          Up     Normal  80.32 GB        0.00%               113427455640312821154458202477256070584
192.168.85.31   2c          Up     Normal  80.37 GB        0.00%               141784319550391026443072753096570088205

Appreciate your responses. Thanks!

IF using Ec2Snitch, Racks are automatically discovered from the Ec2Metadata. You don't need to specify Racks anywhere. Juz. need to make sure we are always have equal number of nodes from all 3 availability zones.

Yes, this will ensure each replicas are placed in distinct availability zones.

Not applicable

@bkrishnankutty on the example above 1b,1c,1d for us-east and 2a, 2b, 2c for us-west-2. Does the letters sequence B,C,D vs A,B,C have a meaning or we can use A, B, C,...Z for each of the racks on each of the regions?

If Ec2MultiRegionSnitch is used. This will be a post install configuration step. Beyond executing steps on link below and rebooting C* nodes, do we need to do anything else to enable Ec2MultiRegionSnitch?

https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchEC2MultiRegion_c...

Any additional consideration when using Ec2MultiRegionSnitch?

@Maudrit When using EC2Snitch, we don't use or need to update casssandra-topology.properties file. What ever we mention in property file is automatically inferred from ec2metadata.

so the Rac Name 1b, 1c, 1d and 2a,2b,2c is automatically inferred from ec2metadata.

For multi region / DN Configuration, We recommend having a VPC in each region and establish VPN tunnel between the regions. with VPC & VPN we can use EC2Snitch. Without VPC we have to rely on EC2MultiRegionSnitch, which need public IP's and would invite more security considerations.

@Baba Krishnankutty @Maudrit Could you please suggest similar configuration reference for non-aws instances (OPDK)?