Browse Source

Improve to better handle certain cases when gathering info on the system's disk capacity. (#7902)

* Remove trailing whitespace in

* Fix handling of APFS on macOS.

APFS can have multiple volumes in a single partition, which means that
the same functional 'volume' can appear multiple times in the output of
`df`. Duplicate lines for such volumes will show the same total size and
available space along with a common prefix for the device name.

This updates the parsing logic for `df` on macOS to account for this by
deduplicating lines in the `df` output that have the same total size,
available space, and same normalized device name.

This has the potential to incorrectly under-account space in some cases,
but the liklihood of that happeing is much less than the certainty of
overaccounting space on standard APFS configurations.

* Properly handle VirtIO block devices when using /sys.

The VirtIO Block device driver uses a dynamically allocated device major
number, meaning that we can't trivially match on it.

This updates the handling to properly look it up in `/proc/devices`
instead of just using the whole dynamic device number range.

* Add handling for NVMe block devices in sysfs code.

They use dynamic major numbers just like VirtIO Block devices do.

* Switch to device major discovery in /proc/devices for all device types.

This converts the code to use `/proc/devices` to look up correct device
major numbers for block devices taht we treat as disks just like we are
already doing for those that have dynamically assigned numbers. This
makes the code both more robust and easier to understand and modify.

This also excludes some particularly old hardware that we were
originally looking for. If needed, we can add in the required device
names, but for now it's better to keep the list concise.

* Correct handling of device major discovery.

We need to strip leading whitespace before calling cut, not after.

* Only use /sys/block if we can read /proc/devices.

We use `/proc/devices` to do device number lookups that we then use to
filter devices under `/sys/block`. As a result, if we can't read
`/proc/devices`, then we won't actually parse anything out of
`/sys/block` either, so we need to just fall back to parsing `df`

* Deduplicate `df` output by device name on Linux.

This ensures that we properly handle BTRFS subvolumes, counting each
actual volume only once.

* Use POSIX math expansion instead of awk to sum disk sizes.

This avoids the rather annoying habit of AWK of printing integers in
scientific notation instead of as exact values.

* Correct `sed` options for POSIX complaince.

* Fix disk info fetching for macOS>

POSIX tools, as found on macOS, lack a number of rather useful filtering
and sorting features, so we need to get rather creative with the
handling on macOS to make the disk space computation work correctly.

This unfortunately makes the calculation a bit less reliable than it
would have been had the existing calculations worked correctly, but it's
the best I can come up with without making things exponentiall more

* Properly handle sector size when using sysfs.
Austin S. Hemmelgarn 4 years ago
1 changed files with 21 additions and 18 deletions
  1. 21 18

+ 21 - 18

@@ -105,6 +105,7 @@ else
+        # shellcheck disable=SC2153
         if [ "${NAME}" = "unknown" ] || [ "${VERSION}" = "unknown" ] || [ "${ID}" = "unknown" ]; then
                 if [ -f "/etc/lsb-release" ]; then
                         if [ "${OS_DETECTION}" = "unknown" ]; then
@@ -121,9 +122,9 @@ else
                         if [ "${ID}" = "unknown" ]; then CONTAINER_ID="${DISTRIB_CODENAME}"; fi
                 if [ -n "$(command -v lsb_release 2>/dev/null)" ]; then
-                        if [ "${OS_DETECTION}" = "unknown" ]; then 
+                        if [ "${OS_DETECTION}" = "unknown" ]; then
-                        else 
+                        else
                         if [ "${NAME}" = "unknown" ]; then CONTAINER_NAME="$(lsb_release -is 2>/dev/null)"; fi
@@ -307,8 +308,7 @@ if [ "${KERNEL_NAME}" = "Darwin" ]; then
-        total="$(/bin/df -k -t ${types} | tail -n +1 | awk '{s+=$2} END {print s}')"
-        DISK_SIZE="$((total * 1024))"
+        DISK_SIZE=$(($(/bin/df -k -t ${types} | tail -n +2 | sed -E 's/\/dev\/disk([[:digit:]]*)s[[:digit:]]*/\/dev\/disk\1/g' | sort -k 1 | awk -F ' ' '{s=$NF;for(i=NF-1;i>=1;i--)s=s FS $i;print s}' | uniq -f 9 | awk '{print $8}' | tr '\n' '+' | rev | cut -f 2- -d '+' | rev) * 1024))
 elif [ "${KERNEL_NAME}" = FreeBSD ] ; then
@@ -320,16 +320,20 @@ elif [ "${KERNEL_NAME}" = FreeBSD ] ; then
         total="$(df -t ${types} -c -k | tail -n 1 | awk '{print $2}')"
         DISK_SIZE="$((total * 1024))"
-        if [ -d /sys/block ] ; then
-                # List of device majors we actually count towards total disk space.
-                # The meanings of these can be found in `Documentation/admin-guide/devices.txt` in the Linux sources.
-                # The ':' surrounding each number are important for matching.
-                dev_major_whitelist=':3:8:9:21:22:28:31:33:34:44:45:47:48:49:50:51:52:53:54:55:56:57:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:88:89:90:91:93:94:96:98:101:104:105:106:107:108:109:110:111:112:114:116:128:129:130:131:132:134:135:136:137:138:139:140:141:142:143:153:160:161:179:180:202:256:257:'
-                if [ "${VIRTUALIZATION}" != "unknown" ] ; then
-                    # We're running virtualized, add the local range of device major numbers so that we catch paravirtualized block devices.
-                    dev_major_whitelist="${dev_major_whitelist}240:241:242:243:244:245:246:247:248:249:250:251:252:253:254:"
-                fi
+        if [ -d /sys/block ] && [ -r /proc/devices ] ; then
+                dev_major_whitelist=''
+                # This is a list of device names used for block storage devices.
+                # These translate to the prefixs of files in `/dev` indicating the device type.
+                # They are sorted by lowest used device major number, with dynamically assigned ones at the end.
+                # We use this to look up device major numbers in `/proc/devices`
+                device_names='hd sd mfm ad ftl pd nftl dasd intfl mmcblk ub xvd rfd vbd nvme'
+                for name in ${device_names} ; do
+                        if grep -qE " ${name}\$" /proc/devices ; then
+                                dev_major_whitelist="${dev_major_whitelist}:$(grep -E "${name}\$" /proc/devices | sed -e 's/^[[:space:]]*//' | cut -f 1 -d ' ' | tr '\n' ':'):"
+                        fi
+                done
@@ -338,18 +342,17 @@ else
                            (echo "${dev_major_whitelist}" | grep -q ":$(cut -f 1 -d ':' "${disk}/dev"):") && \
                            grep -qv 1 "${disk}/removable"
-                            size="$(cat "${disk}/size")"
+                            size="$(($(cat "${disk}/size") * 512))"
                             DISK_SIZE="$((DISK_SIZE + size))"
         elif df --version 2>/dev/null | grep -qF "GNU coreutils" ; then
-                DISK_SIZE="$(df -x tmpfs -x devtmpfs -x squashfs -l --total -B1 --output=size | tail -n 1 | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
+                DISK_SIZE=$(($(df -x tmpfs -x devtmpfs -x squashfs -l -B1 --output=source,size | tail -n +2 | sort -u -k 1 | awk '{print $2}' | tr '\n' '+' | head -c -1)))
-                total="$(df -T -P | grep "${include_fs_types}" | awk '{s+=$3} END {print s}')"
-                DISK_SIZE="$((total * 1024))"
+                DISK_SIZE=$(($(df -T -P | tail -n +2 | sort -u -k 1 | grep "${include_fs_types}" | awk '{print $3}' | tr '\n' '+' | head -c -1) * 1024))