Solved Solved

Reason why autovacuum disabled by default in postgres?

Not applicable

I am woking out some issues w/ one of our private cloud installations and noticed that autovacuum was not enabled on install.

Is there a reason for this setting or is it okay for us to enable it?

Solved Solved
1 5 1,073
1 ACCEPTED SOLUTION
Solved Solved

Not applicable

Vacuum helps in a number of cases out of which 2 important are:

  • recover or reuse disk space occupied by deleted or updated rows
  • update statistics for query engine

Some other use cases are:

  • update visibility map for index-only scans
  • prevent transaction-id wrap around failures

Details are present: http://www.postgresql.org/docs/9.3/static/routine-vacuuming.html

Apigee Scenario:

  • No deletion to fact-data tables
  • Purge happens through drop tables [parent-child]
  • Only update happens in aggregation driver table - which is not beyond 20-25 records
  • Index is only timestamp column - and all queries are driven by timestamp predicates

So in our case:

  • Reuse of disk space occupied by deleted or updated rows are minimal - that they can be ignored.
  • When it comes to query planner, the only driving index is timestamp [client_received_start_timestamp / timestamp], so the query path hardly changes even if the stats change.

Traditionally the TPS or traffic rate for private customers had not been very high. So the issue of "transaction-id wrap-around" was hardly there, rather we drop the child-fact tables, before even we hit the problem. For high TPS customers, this can be a problem, and auto-vacuum must be enabled. Because the percentage of such customers are way-less, and the cost incurred at times for running auto-vacuum is sometimes high, we had traditionally switched it off by default.

So depending on the data retention period and traffic rate, it should be prudent to switch it on - if we are in doubt.

View solution in original post

Solved Solved
2 0 0
5 REPLIES 5

anilsr
Staff

Dear @Benjamin Goldman ,

Yes, You can enable it and use it. Apigee Edge Private Cloud Operations Guide mentions same. Please refer to Recurring Analytics Services Maintenance Tasks topic in document.

Many Apigee Analytics Services tasks can be performed using standard Postgres utilities. The routine maintenance tasks you would perform on the Analytics database – such as database reorganization using VACUUM, reindexing and log file maintenance - are the same as those you would perform on any Postgres database.

Cheers,

Anil Sagar

0 0 0

Not applicable

As Anil says it is a good practice to enable VACUUM - we do not presume in the installation process which option is best for your private cloud installation scenario and leave this as a choice to be made by the installation team after due analysis.

0 2 0

Not applicable

im posting this here because its at the bottom. I appreciate your candor on this - the reason im asking this question - though - is because according to pg documentation auto vacuum on is the default setting for PG (maybe since 8.1?) This might mean that Apigee disabled it on purpose - and we are wondering what that purpose is.

0 1 0

birute

@Sanjoy Bose I think you will be able to provide us in depth answer. Thanks much!

0 0 0
Solved Solved

Not applicable

Vacuum helps in a number of cases out of which 2 important are:

  • recover or reuse disk space occupied by deleted or updated rows
  • update statistics for query engine

Some other use cases are:

  • update visibility map for index-only scans
  • prevent transaction-id wrap around failures

Details are present: http://www.postgresql.org/docs/9.3/static/routine-vacuuming.html

Apigee Scenario:

  • No deletion to fact-data tables
  • Purge happens through drop tables [parent-child]
  • Only update happens in aggregation driver table - which is not beyond 20-25 records
  • Index is only timestamp column - and all queries are driven by timestamp predicates

So in our case:

  • Reuse of disk space occupied by deleted or updated rows are minimal - that they can be ignored.
  • When it comes to query planner, the only driving index is timestamp [client_received_start_timestamp / timestamp], so the query path hardly changes even if the stats change.

Traditionally the TPS or traffic rate for private customers had not been very high. So the issue of "transaction-id wrap-around" was hardly there, rather we drop the child-fact tables, before even we hit the problem. For high TPS customers, this can be a problem, and auto-vacuum must be enabled. Because the percentage of such customers are way-less, and the cost incurred at times for running auto-vacuum is sometimes high, we had traditionally switched it off by default.

So depending on the data retention period and traffic rate, it should be prudent to switch it on - if we are in doubt.

View solution in original post

Solved Solved
2 0 1
Top Labels in this Space
Top Solution Authors