What is the AltaVault eviction process?
Applies to
- AltaVault
- AVA400
- AVA800
- AVA-v
Answer
- AltaVault has a local disk cache that receives backup data as it is encoded and subsequently replicated to the cloud
- The eviction process is responsible for maintaining some free space on the disk cache (datastore) and keeps it from filling up entirely
- Deployments are typically sized such that the new data ingest rate will rotate the disk cache in about 30 days (default), which allows fast decoding (reading) of recently encoded (written) backup jobs
- This local cache feature is a key differentiator for AltaVault and ensures a low recovery time objective (RTO) for the most recent backups
- The evicter process uses three watermarks:
- Eviction threshold (evicter.maxpctused):
- This is the utilization percentage that the evicter works to maintain
- If the utilization exceeds this value, the evicter will start deleting slab files until it drops below it within a particular percentage
- Eviction alarm (evicter.maxpctused + evicter.alarmwindow):
- This is the utilization percentage where an alarm will be raised, indicating that the utilization of the local disk cache is too high
- Eviction upperbound (evicter.upperbound):
- This is the utilization percentage where Altavault will stop accepting write requests to prevent the physical disk from filling up entirely
- At this level, the filesystem will return a 'no space on filesystem' error to the front-end protocols (OST/SMB/NFS)
- Eviction threshold (evicter.maxpctused):
- When this occurs, log messages similar to the following will be seen
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (30318) encode failed: no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (31432) encode failed: no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (30318) flush failed: no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (30318) failed to commit write: no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (30318) destructor flush failed :no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [encoder.ERR] (30318) Encoder 0x299e71018 destructor, aborting
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (31432) flush failed: no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (31432) failed to commit write: no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (31432) destructor flush failed :no space on filesystem
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [encoder.ERR] (31432) Encoder 0x28f726798 destructor, aborting
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megastore.NOTICE] (30318) aborting transaction 149/14340392 (29 resources)
Apr 2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megastore.NOTICE] (31432) aborting transaction 149/14349236 (1668 resources)
Apr 2 02:34:02 ss3030-dmz-nk2 rfsd[8609]: [megastore.NOTICE] (30318) transaction 149/14340392 aborted
Apr 2 02:34:02 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (30318) error flushing write cache no space on filesystem
- The "
low space
" occurs when the /data partition usage climbs above 93% ( default settings) ie.evicter.maxpctused + evicter.alarmwindow
- Please be mindful of Percent used (evicter.pctused) which is the current disk cache utilization and is a measurement rather than a configurable parameter
Additional Information
There are two scenarios known to result in the system going read-only and printing the error messages above:
- The rate of new data being written to the AltaVault exceeds the rate at which the evicter can free up space
- Dropping the evicter threshold to a lower value can help by providing more GB of free space for the backup but only if there are periods of low activity where the slower moving evicter can catchup to the data injested by the encoder
- If the value assigned to
evicter.maxpctused
is greater than theevicter.upperbound
, which will cause the device to enter read-only mode before the utilization has reached the threshold, where the evicter will cut in and start freeing up space- This effectively deadlocks the evicter
- The appropriate values will depend upon the size of the local disk cache, and smaller appliances will have more conservative values
- Run the following commands to see the evicter parameters in a system dump:
$ grep evicter stats_megastore
evicter.alarmwindow: 3
evicter.maxpctused: 90
evicter.pctused: 88
evicter.upperbound: 95
- The output is bigger than what is seen above; however, these are the important parameters
- They can also be found in
output/rfsd.xml
, which is the 'running config
' of the rfsd service
<evicter
max_local_capacity="0"
max_used_pct="90"
upper_bound_pct="95"
evicter_num_threads="512"
/>
- These values are also recorded every 60 seconds in the
collect_stats/rfsctl-a.log
file and archives:
$ egrep "TIMESTAMP|evicter.pctused" collect_stats/rfsctl-a.log | less
TIMESTAMP: 2015-04-14 12:21:00
evicter.pctused: 88
TIMESTAMP: 2015-04-14 12:22:00
evicter.pctused: 88
TIMESTAMP: 2015-04-14 12:23:00
evicter.pctused: 88
TIMESTAMP: 2015-04-14 12:24:00
evicter.pctused: 88
TIMESTAMP: 2015-04-14 12:25:00
evicter.pctused: 88
- Run the following commands to display the evicter parameters on a live Altavault (the service has to be running):
av730-rtp # rfsctl exec evicter
evicter.alarmwindow: 3
evicter.maxpctused: 90
evicter.pctused: 88
evicter.upperbound: 95
- Again, some parameters have been suppressed for the sake of brevity
Evicter verification of cloud data and "Inconsistent Cloud Data alarm"
- Before deleting a slab the Evicter will check the slab's md5sum with the value stored in the cloud object
- The cloud doesn't calculate the md5sum but just returns the metadata value so we aren't confirming the integrity of the cloud slab but instead confirming the revision
- Slabs can be changed and enqueued to be re-uploaded to the cloud and this check ensures the copy in the cloud matches the local slab
- We don't want to delete a slab that hasn't yet been replicated to the cloud
- If the md5sum check fails an alarm is raised "Inconsistent Cloud Data"
- We have some experience with cloud vendors that send empty (null) responses or old data
- Evicter can pause replication
- If the evicter is not getting prompt responses to the HEAD requests agains the cloud slab objects it will tempoarily pause replication
Example:
Sep 28 03:01:20 altavault01 rfsd[6599]: [evicter.INFO] (8921) Evicter will pause replication
Sep 28 03:01:41 altavault01 rfsd[6599]: [evicter.INFO] (8921) Evicter will resume replication
Average Evicted Age
- The evicter keeps track of the age of the slabs that are deleted and this statistic is a good way to determine if the local disk cache is too small and is rotating too quickly
- This stat also affects the RTO because restoring files older than the average evicted age will likely be slower as data is download from the cloud
- The average evicted age can be viewed in the WebUI under Reports
$ /support/bin/stats_evicted.py
timestamp evicted_bytes evicted_age
Jul 01 04:38 3.1 GB 48 day
Jul 01 04:39 3.8 GB 48 day
Jul 01 04:40 3.6 GB 48 day
Jul 01 04:41 3.3 GB 48 day
<snip>
Jul 01 07:18 4.3 GB 47 day
Jul 01 07:19 2.6 GB 47 day
Jul 01 07:20 2.5 GB 47 day
Jul 01 07:21 2.3 GB 47 day
Min: 47 day, Max: 48 day, Avg: 48 day, StdDev: 4 hour
- If the average evicted age is less than 30 days (adjustable) the average evicted age alarm will trigger
- This value in the WebUI under Altavault: Reports > Eviction, but the graph may not be accurate outside of the 5-minute view as the values can be down-averaged through aggregation
- For example, Altavault calculates the
average_eviction_age
every five seconds. The periods where the evicter is not running will be recorded as 0
avg_evicted_age | 0 | 0 | 30 days | 30 days | 30 days | 30 days | 0 | 0 | 0 | 0 | 0 | 0 |
sample interval | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec | 5 sec |
- As the datapoints age, they are 'rolled up' into larger periods and after five minutes the values are averaged into 60-second intervals
- For this example, assume that the evicter runs only for 20 seconds of the 60 seconds
- The average value for the aggregated 60-second period would be ((30 days * 20 sec) + (0 * 40 sec)) / 60 sec = 10 days
- This is an undesirable side effect of this aggregation method
- The
avg_evicted_age
alarm will trigger by default if the value drops below 30 days (2592000 seconds), which can be adjusted to suit the customer's needs
# show alarm avg_evicted_age
Alarm Id: avg_evicted_age
Alarm Description: Datastore Eviction
Enabled: no
Alarm State: (enabled)
Error threshold: 2592000
Clear threshold: 3024000
Rate limit bucket counts: (email) { 5, 20, 50 }
Rate limit bucket windows: (email) { 3600, 86400, 604800 }
Rate limit bucket counts: (snmp) { 5, 20, 50 }
Rate limit bucket windows: (snmp) { 3600, 86400, 604800 }
Last checked at: 2015/06/16 11:41:39
Last checked value: 4294967295
Last error at:
Last clear at:
- The alarm can also be disabled entirely
(config) # alarm avg_evicted_age enable
(config) # no alarm avg_evicted_age enable
- For example, to adjust the threshold to rise at 14 days and clear at 15 days
(config) # alarm avg_evicted_age error-threshold 1209600
(config) # alarm avg_evicted_age clear-threshold 1296000
Effect of average eviction age on cloud sparsity
- After a file is deleted the slabs that it references are inspected to see if they are referenced by any other files stored on the appliance
- If any part of the slab is still referenced then it cannot be deleted but if it is in the local cache and more than 50% of it is unused then the slab can be compacted then reenqueued to be replicated to the cloud where it will overwrite the older, larger version
- The important part to remember is that slab compaction can only occur when the slab is in the local cache
- Cloud-only slabs can only be deleted once they are 100% unused
- Amazon Glacier uses a larger object called a package, which is a bundle of 64 slabs
- Most backup strategies maintain a set of rotating schedules where say daily backups are kept for a month, the weekly backups are kept for a year, monthly backups are kept for 5 years, etc
- If the age where backups are expired and deleted is less than the average eviction age then the slabs referenced by them are likely to be local, and therefore available to be compacted
- If the average eviction age is less than the age of the typical backup at the time of deletion then there will be some slabs that could have been compacted but were not because they are cloud-only
- Cloud objects that hold a lot of unused data is called cloud sparsity and environments that rotate their local cache quickly with a short average eviction age are more likely to develop cloud sparsity
- Users of Amazon Glacier are more susceptible to cloud sparsity because of the larger size of the data objects stored in the cloud (packages = 64 slabs)