mirror of
https://github.com/markusressel/zfs-inplace-rebalancing
synced 2026-02-04 21:14:09 +00:00
Support Hardlinks (Any number) - Update (#70)
* Support hardlink groups, add debug -Adds debug functionality with extended details -Supports detecting inode groups for hardlink processing. -Pulls files and sorts by, then groups by inode group with awk -Checks all files in an inode group's counts when calculating skipping counts -Removes existing skip hardlink flag -Removes hardlinks and recreates them directly after the balance copy/delete/move operation per inode group to minimize 'downtime' * Update README.md to denote hardlink support Adds details around the debug flag, hardlink support, removed --skip-hardlinks functionality, and temporary files used during the script processing. * Add additional hardlink group database notes * typo * Flip default debug * Fix space handling in paths * Fix echo bug Removed the 'recreating hardlinks' echo for inode groups of 1 file. * Introduce echo_debug * Reintroduce cp documentation * Remove unnecessary -e flag * Remove unused flag * Fix broken test due to | in the filename * Add ignore on temporary inode files * Update tests to assert the hardlinks are still working * Fix ShellCheck issues * Fix pasting issue * Make grep compatible with BSD/MacOS * Print whole line in debug * Fix stat for BSD * Fix stat for BSD part 2 * Fix shellcheck entry * Fix stats working with FreeBSD * Use bash's double brackets for consistency sake * Expand to a string --------- Co-authored-by: undaunt <31376520+undaunt@users.noreply.github.com>
This commit is contained in:
parent
97d11fdb0b
commit
b788358600
3
.gitignore
vendored
3
.gitignore
vendored
@ -1,5 +1,8 @@
|
||||
test.log
|
||||
error.log
|
||||
rebalance_db.txt
|
||||
files_list.txt
|
||||
sorted_files_list.txt
|
||||
grouped_inodes.txt
|
||||
testing_data
|
||||
.vscode
|
||||
@ -7,6 +7,8 @@ Simple bash script to rebalance pool data between all mirrors when adding vdevs
|
||||
|
||||
This script recursively traverses all the files in a given directory. Each file is copied with a `.balance` suffix, retaining all file attributes. The original is then deleted and the *copy* is renamed back to the name of the original file. When copying a file ZFS will spread the data blocks across all vdevs, effectively distributing/rebalancing the data of the original file (more or less) evenly. This allows the pool data to be rebalanced without the need for a separate backup pool/drive.
|
||||
|
||||
When the script detects an inode group of hardlinked files, it will proceed to copy one file in the inode group. The original file and all hardlinks are then deleted, the *copy* is renamed back to the name of the original file, and new hardlinks are generated from that copy to replace all other linked files that were removed.
|
||||
|
||||
The way ZFS distributes writes is not trivial, which makes it hard to predict how effective the redistribution will be. See:
|
||||
- https://jrs-s.net/2018/04/11/zfs-allocates-writes-according-to-free-space-per-vdev-not-latency-per-vdev/
|
||||
- https://jrs-s.net/2018/08/24/zfs-write-allocation-in-0-7-x/
|
||||
@ -28,6 +30,10 @@ Since file attributes are fully retained, it is not possible to verify if an ind
|
||||
1
|
||||
```
|
||||
|
||||
All files in a given inode group will be added to the database when processed. The highest count in a given inode group of files will be used to determine if the group should be skipped when processing against the number of passes in a given script execution.
|
||||
|
||||
The hardlink support process creates temporary files in the script location alongside `rebalance_db.txt` which are removed upon the end of each run. `files_list.txt` lists all files found in the given target location. `sorted_files_list.txt` lists all files sorted by inode number. `grouped_inodes.txt` lists all files by inode, but with all files from a given inode space separated on one line.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Balance Status
|
||||
@ -83,6 +89,7 @@ chmod +x ./zfs-inplace-rebalancing.sh
|
||||
|
||||
Dependencies:
|
||||
* `perl` - it should be available on most systems by default
|
||||
* `awk` - it should be available on most systems by default
|
||||
|
||||
## Usage
|
||||
|
||||
@ -100,7 +107,7 @@ You can print a help message by running the script without any parameters:
|
||||
|-----------|-------------|---------|
|
||||
| `-c`<br>`--checksum` | Whether to compare attributes and content of the copied file using an **MD5** checksum. Technically this is a redundent check and consumes a lot of resources, so think twice. | `true` |
|
||||
| `-p`<br>`--passes` | The maximum number of rebalance passes per file. Setting this to infinity by using a value `<= 0` might improve performance when rebalancing a lot of small files. | `1` |
|
||||
| `--skip-hardlinks` | Skip rebalancing hardlinked files, since it will only create duplicate data. | `false` |
|
||||
| `--debug` | Shows additional output, including listing all files in the target location 3 times (list, inode sorted list, inode groupings) and more granular move/copy/link/count transaction information. | `false` |
|
||||
|
||||
### Example
|
||||
|
||||
|
||||
59
testing.sh
59
testing.sh
@ -22,6 +22,9 @@ Green='\033[0;32m' # Green
|
||||
Yellow='\033[0;33m' # Yellow
|
||||
Cyan='\033[0;36m' # Cyan
|
||||
|
||||
|
||||
OSName=$(echo "$OSTYPE" | tr '[:upper:]' '[:lower:]')
|
||||
|
||||
## Functions
|
||||
|
||||
# print a given text entirely in a given color
|
||||
@ -44,19 +47,24 @@ function prepare() {
|
||||
|
||||
# return time to the milisecond
|
||||
function get_time() {
|
||||
|
||||
case "$OSTYPE" in
|
||||
darwin*)
|
||||
date=$(gdate +%s%N)
|
||||
;;
|
||||
*)
|
||||
date=$(date +%s%N)
|
||||
;;
|
||||
esac
|
||||
|
||||
if [[ "${OSName}" == "darwin"* ]]; then
|
||||
date=$(gdate +%s%N)
|
||||
else
|
||||
date=$(date +%s%N)
|
||||
fi
|
||||
echo "$date"
|
||||
}
|
||||
|
||||
function get_inode() {
|
||||
if [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
||||
inode=$(stat -f "%i" "$1")
|
||||
else
|
||||
inode=$(stat -c "%i" "$1")
|
||||
fi
|
||||
|
||||
echo "$inode"
|
||||
}
|
||||
|
||||
function assertions() {
|
||||
# check error log is empty
|
||||
if grep -q '[^[:space:]]' $log_error_file; then
|
||||
@ -66,16 +74,9 @@ function assertions() {
|
||||
fi
|
||||
}
|
||||
|
||||
function assert_matching_file_copied() {
|
||||
if ! grep "Copying" $log_std_file | grep -q "$1"; then
|
||||
echo "File matching '$1' was not copied when it should have been!"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
function assert_matching_file_not_copied() {
|
||||
if grep "Copying" $log_std_file | grep -q "$1"; then
|
||||
echo "File matching '$1' was copied when it should have been skipped!"
|
||||
function assert_matching_file_hardlinked() {
|
||||
if [[ "$(get_inode "$1")" != "$(get_inode "$2")" ]]; then
|
||||
echo "File '$1' was not hardlinked to '$2' when it should have been!"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
@ -111,25 +112,13 @@ cat $log_std_file
|
||||
assertions
|
||||
color_echo "$Green" "Tests passed!"
|
||||
|
||||
color_echo "$Cyan" "Running tests with skip-hardlinks false..."
|
||||
color_echo "$Cyan" "Running tests with hardlinks..."
|
||||
prepare
|
||||
ln "$test_pool_data_path/projects/[2020] some project/mp4.txt" "$test_pool_data_path/projects/[2020] some project/mp4.txt.link"
|
||||
./zfs-inplace-rebalancing.sh --skip-hardlinks false $test_pool_data_path >> $log_std_file 2>> $log_error_file
|
||||
./zfs-inplace-rebalancing.sh $test_pool_data_path >> $log_std_file 2>> $log_error_file
|
||||
cat $log_std_file
|
||||
# Both link files should be copied
|
||||
assert_matching_file_copied "mp4.txt"
|
||||
assert_matching_file_copied "mp4.txt.link"
|
||||
assertions
|
||||
color_echo "$Green" "Tests passed!"
|
||||
|
||||
color_echo "$Cyan" "Running tests with skip-hardlinks true..."
|
||||
prepare
|
||||
ln "$test_pool_data_path/projects/[2020] some project/mp4.txt" "$test_pool_data_path/projects/[2020] some project/mp4.txt.link"
|
||||
./zfs-inplace-rebalancing.sh --skip-hardlinks true $test_pool_data_path >> $log_std_file 2>> $log_error_file
|
||||
cat $log_std_file
|
||||
# Neither file should be copied now, since they are each a hardlink
|
||||
assert_matching_file_not_copied "mp4.txt.link"
|
||||
assert_matching_file_not_copied "mp4.txt"
|
||||
assert_matching_file_hardlinked "$test_pool_data_path/projects/[2020] some project/mp4.txt" "$test_pool_data_path/projects/[2020] some project/mp4.txt.link"
|
||||
assertions
|
||||
color_echo "$Green" "Tests passed!"
|
||||
|
||||
|
||||
@ -1,14 +1,14 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
# exit script on error
|
||||
# Exit script on error
|
||||
set -e
|
||||
# exit on undeclared variable
|
||||
# Exit on undeclared variable
|
||||
set -u
|
||||
|
||||
# file used to track processed files
|
||||
# File used to track processed files
|
||||
rebalance_db_file_name="rebalance_db.txt"
|
||||
|
||||
# index used for progress
|
||||
# Index used for progress
|
||||
current_index=0
|
||||
|
||||
## Color Constants
|
||||
@ -24,20 +24,28 @@ Cyan='\033[0;36m' # Cyan
|
||||
|
||||
## Functions
|
||||
|
||||
# print a help message
|
||||
# Print a help message
|
||||
function print_usage() {
|
||||
echo "Usage: zfs-inplace-rebalancing --checksum true --skip-hardlinks false --passes 1 /my/pool"
|
||||
echo "Usage: zfs-inplace-rebalancing.sh --checksum true --passes 1 --debug false /my/pool"
|
||||
}
|
||||
|
||||
# print a given text entirely in a given color
|
||||
# Print a given text entirely in a given color
|
||||
function color_echo() {
|
||||
color=$1
|
||||
text=$2
|
||||
echo -e "${color}${text}${Color_Off}"
|
||||
}
|
||||
|
||||
# Print a given text entirely in a given color
|
||||
function echo_debug() {
|
||||
if [ "$debug_flag" = true ]; then
|
||||
text=$*
|
||||
echo "${text}"
|
||||
fi
|
||||
}
|
||||
|
||||
function get_rebalance_count() {
|
||||
file_path=$1
|
||||
file_path="$1"
|
||||
|
||||
line_nr=$(grep -xF -n "${file_path}" "./${rebalance_db_file_name}" | head -n 1 | cut -d: -f1)
|
||||
if [ -z "${line_nr}" ]; then
|
||||
@ -51,63 +59,55 @@ function get_rebalance_count() {
|
||||
fi
|
||||
}
|
||||
|
||||
# rebalance a specific file
|
||||
function rebalance() {
|
||||
file_path=$1
|
||||
|
||||
# check if file has >=2 links in the case of --skip-hardlinks
|
||||
# this shouldn't be needed in the typical case of `find` only finding files with links == 1
|
||||
# but this can run for a long time, so it's good to double check if something changed
|
||||
if [[ "${skip_hardlinks_flag}" == "true"* ]]; then
|
||||
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
||||
# Linux
|
||||
#
|
||||
# -c --format=FORMAT
|
||||
# use the specified FORMAT instead of the default; output a
|
||||
# newline after each use of FORMAT
|
||||
# %h number of hard links
|
||||
|
||||
hardlink_count=$(stat -c "%h" "${file_path}")
|
||||
elif [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
||||
# Mac OS
|
||||
# FreeBSD
|
||||
# -f format
|
||||
# Display information using the specified format
|
||||
# l Number of hard links to file (st_nlink)
|
||||
|
||||
hardlink_count=$(stat -f %l "${file_path}")
|
||||
else
|
||||
echo "Unsupported OS type: $OSTYPE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ "${hardlink_count}" -ge 2 ]; then
|
||||
echo "Skipping hard-linked file: ${file_path}"
|
||||
return
|
||||
fi
|
||||
fi
|
||||
# Rebalance a group of files that are hardlinked together
|
||||
function process_inode_group() {
|
||||
paths=("$@")
|
||||
num_paths="${#paths[@]}"
|
||||
|
||||
# Progress tracking
|
||||
current_index="$((current_index + 1))"
|
||||
progress_percent=$(printf '%0.2f' "$((current_index * 10000 / file_count))e-2")
|
||||
progress_raw=$((current_index * 10000 / file_count))
|
||||
progress_percent=$(printf '%0.2f' "${progress_raw}e-2")
|
||||
color_echo "${Cyan}" "Progress -- Files: ${current_index}/${file_count} (${progress_percent}%)"
|
||||
|
||||
if [[ ! -f "${file_path}" ]]; then
|
||||
color_echo "${Yellow}" "File is missing, skipping: ${file_path}"
|
||||
echo_debug "Processing inode group with ${num_paths} paths:"
|
||||
for path in "${paths[@]}"; do
|
||||
echo_debug " - $path"
|
||||
done
|
||||
|
||||
# Check rebalance counts for all files
|
||||
should_skip=false
|
||||
for path in "${paths[@]}"; do
|
||||
rebalance_count=$(get_rebalance_count "${path}")
|
||||
if [ "${rebalance_count}" -ge "${passes_flag}" ]; then
|
||||
should_skip=true
|
||||
break
|
||||
fi
|
||||
done
|
||||
|
||||
if [ "${should_skip}" = true ]; then
|
||||
if [ "${num_paths}" -gt 1 ]; then
|
||||
color_echo "${Yellow}" "Rebalance count (${passes_flag}) reached, skipping group: ${paths[*]}"
|
||||
else
|
||||
color_echo "${Yellow}" "Rebalance count (${passes_flag}) reached, skipping: ${paths[0]}"
|
||||
fi
|
||||
return
|
||||
fi
|
||||
|
||||
if [ "${passes_flag}" -ge 1 ]; then
|
||||
# check if target rebalance count is reached
|
||||
rebalance_count=$(get_rebalance_count "${file_path}")
|
||||
if [ "${rebalance_count}" -ge "${passes_flag}" ]; then
|
||||
color_echo "${Yellow}" "Rebalance count (${passes_flag}) reached, skipping: ${file_path}"
|
||||
return
|
||||
fi
|
||||
main_file="${paths[0]}"
|
||||
|
||||
# Check if main_file exists
|
||||
if [[ ! -f "${main_file}" ]]; then
|
||||
color_echo "${Yellow}" "File is missing, skipping: ${main_file}"
|
||||
return
|
||||
fi
|
||||
|
||||
tmp_extension=".balance"
|
||||
tmp_file_path="${file_path}${tmp_extension}"
|
||||
tmp_file_path="${main_file}${tmp_extension}"
|
||||
|
||||
echo "Copying '${main_file}' to '${tmp_file_path}'..."
|
||||
echo_debug "Executing copy command:"
|
||||
|
||||
echo "Copying '${file_path}' to '${tmp_file_path}'..."
|
||||
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
||||
# Linux
|
||||
|
||||
@ -115,33 +115,36 @@ function rebalance() {
|
||||
# -a -- keep attributes, includes -d -- keep symlinks (dont copy target) and
|
||||
# -p -- preserve ACLs to
|
||||
# -x -- stay on one system
|
||||
cp --reflink=never -ax "${file_path}" "${tmp_file_path}"
|
||||
cmd=(cp --reflink=never -ax "${main_file}" "${tmp_file_path}")
|
||||
echo_debug "${cmd[@]}"
|
||||
"${cmd[@]}"
|
||||
elif [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
||||
# Mac OS
|
||||
# FreeBSD
|
||||
# Mac OS and FreeBSD
|
||||
|
||||
# -a -- Archive mode. Same as -RpP. Includes preservation of modification
|
||||
# time, access time, file flags, file mode, ACL, user ID, and group
|
||||
# ID, as allowed by permissions.
|
||||
# -x -- File system mount points are not traversed.
|
||||
cp -ax "${file_path}" "${tmp_file_path}"
|
||||
cmd=(cp -ax "${main_file}" "${tmp_file_path}")
|
||||
echo_debug "${cmd[@]}"
|
||||
"${cmd[@]}"
|
||||
else
|
||||
echo "Unsupported OS type: $OSTYPE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# compare copy against original to make sure nothing went wrong
|
||||
# Compare copy against original to make sure nothing went wrong
|
||||
if [[ "${checksum_flag}" == "true"* ]]; then
|
||||
echo "Comparing copy against original..."
|
||||
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
||||
# Linux
|
||||
|
||||
# file attributes
|
||||
original_perms=$(lsattr "${file_path}")
|
||||
original_perms=$(lsattr "${main_file}")
|
||||
# remove anything after the last space
|
||||
original_perms=${original_perms% *}
|
||||
# file permissions, owner, group, size, modification time
|
||||
original_perms="${original_perms} $(stat -c "%A %U %G %s %Y" "${file_path}")"
|
||||
original_perms="${original_perms} $(stat -c "%A %U %G %s %Y" "${main_file}")"
|
||||
|
||||
|
||||
# file attributes
|
||||
@ -157,7 +160,7 @@ function rebalance() {
|
||||
# note: no lsattr on Mac OS or FreeBSD
|
||||
|
||||
# file permissions, owner, group size, modification time
|
||||
original_perms="$(stat -f "%Sp %Su %Sg %z %m" "${file_path}")"
|
||||
original_perms="$(stat -f "%Sp %Su %Sg %z %m" "${main_file}")"
|
||||
|
||||
# file permissions, owner, group size, modification time
|
||||
copy_perms="$(stat -f "%Sp %Su %Sg %z %m" "${tmp_file_path}")"
|
||||
@ -166,6 +169,9 @@ function rebalance() {
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo_debug "Original perms: $original_perms"
|
||||
echo_debug "Copy perms: $copy_perms"
|
||||
|
||||
if [[ "${original_perms}" == "${copy_perms}"* ]]; then
|
||||
color_echo "${Green}" "Attribute and permission check OK"
|
||||
else
|
||||
@ -173,7 +179,7 @@ function rebalance() {
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if cmp -s "${file_path}" "${tmp_file_path}"; then
|
||||
if cmp -s "${main_file}" "${tmp_file_path}"; then
|
||||
color_echo "${Green}" "File content check OK"
|
||||
else
|
||||
color_echo "${Red}" "File content check FAILED"
|
||||
@ -181,30 +187,47 @@ function rebalance() {
|
||||
fi
|
||||
fi
|
||||
|
||||
echo "Removing original '${file_path}'..."
|
||||
rm "${file_path}"
|
||||
echo "Removing original files..."
|
||||
for path in "${paths[@]}"; do
|
||||
echo_debug "Removing $path"
|
||||
rm "${path}"
|
||||
done
|
||||
|
||||
echo "Renaming temporary copy to original '${file_path}'..."
|
||||
mv "${tmp_file_path}" "${file_path}"
|
||||
echo "Renaming temporary copy to original '${main_file}'..."
|
||||
echo_debug "Moving ${tmp_file_path} to ${main_file}"
|
||||
mv "${tmp_file_path}" "${main_file}"
|
||||
|
||||
# Only recreate hardlinks if there are multiple paths
|
||||
if [ "${num_paths}" -gt 1 ]; then
|
||||
echo "Recreating hardlinks..."
|
||||
for (( i=1; i<${#paths[@]}; i++ )); do
|
||||
echo_debug "Linking ${main_file} to ${paths[$i]}"
|
||||
ln "${main_file}" "${paths[$i]}"
|
||||
done
|
||||
fi
|
||||
|
||||
if [ "${passes_flag}" -ge 1 ]; then
|
||||
# update rebalance "database"
|
||||
line_nr=$(grep -xF -n "${file_path}" "./${rebalance_db_file_name}" | head -n 1 | cut -d: -f1)
|
||||
if [ -z "${line_nr}" ]; then
|
||||
rebalance_count=1
|
||||
echo "${file_path}" >>"./${rebalance_db_file_name}"
|
||||
echo "${rebalance_count}" >>"./${rebalance_db_file_name}"
|
||||
else
|
||||
rebalance_count_line_nr="$((line_nr + 1))"
|
||||
rebalance_count="$((rebalance_count + 1))"
|
||||
sed -i '' "${rebalance_count_line_nr}s/.*/${rebalance_count}/" "./${rebalance_db_file_name}"
|
||||
fi
|
||||
# Update rebalance "database" for all files
|
||||
for path in "${paths[@]}"; do
|
||||
line_nr=$(grep -xF -n "${path}" "./${rebalance_db_file_name}" | head -n 1 | cut -d: -f1)
|
||||
if [ -z "${line_nr}" ]; then
|
||||
rebalance_count=1
|
||||
echo "${path}" >> "./${rebalance_db_file_name}"
|
||||
echo "${rebalance_count}" >> "./${rebalance_db_file_name}"
|
||||
else
|
||||
rebalance_count_line_nr="$((line_nr + 1))"
|
||||
rebalance_count=$(awk "NR == ${rebalance_count_line_nr}" "./${rebalance_db_file_name}")
|
||||
rebalance_count="$((rebalance_count + 1))"
|
||||
echo_debug "Updating rebalance count for ${path} to ${rebalance_count}"
|
||||
sed -i "${rebalance_count_line_nr}s/.*/${rebalance_count}/" "./${rebalance_db_file_name}"
|
||||
fi
|
||||
done
|
||||
fi
|
||||
}
|
||||
|
||||
checksum_flag='true'
|
||||
skip_hardlinks_flag='false'
|
||||
passes_flag='1'
|
||||
debug_flag='false'
|
||||
|
||||
if [[ "$#" -eq 0 ]]; then
|
||||
print_usage
|
||||
@ -225,18 +248,18 @@ while true; do
|
||||
fi
|
||||
shift 2
|
||||
;;
|
||||
--skip-hardlinks)
|
||||
if [[ "$2" == 1 || "$2" =~ (on|true|yes) ]]; then
|
||||
skip_hardlinks_flag="true"
|
||||
else
|
||||
skip_hardlinks_flag="false"
|
||||
fi
|
||||
shift 2
|
||||
;;
|
||||
-p | --passes)
|
||||
passes_flag=$2
|
||||
shift 2
|
||||
;;
|
||||
--debug)
|
||||
if [[ "$2" == 1 || "$2" =~ (on|true|yes) ]]; then
|
||||
debug_flag="true"
|
||||
else
|
||||
debug_flag="false"
|
||||
fi
|
||||
shift 2
|
||||
;;
|
||||
*)
|
||||
break
|
||||
;;
|
||||
@ -251,30 +274,93 @@ color_echo "$Cyan" "Start rebalancing $(date):"
|
||||
color_echo "$Cyan" " Path: ${root_path}"
|
||||
color_echo "$Cyan" " Rebalancing Passes: ${passes_flag}"
|
||||
color_echo "$Cyan" " Use Checksum: ${checksum_flag}"
|
||||
color_echo "$Cyan" " Skip Hardlinks: ${skip_hardlinks_flag}"
|
||||
color_echo "$Cyan" " Debug Mode: ${debug_flag}"
|
||||
|
||||
# count files
|
||||
if [[ "${skip_hardlinks_flag}" == "true"* ]]; then
|
||||
file_count=$(find "${root_path}" -type f -links 1 | wc -l)
|
||||
# Generate files_list.txt with device and inode numbers using stat, separated by a pipe '|'
|
||||
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
||||
# Linux
|
||||
find "$root_path" -type f -not -path '*/.zfs/*' -exec stat --printf '%d:%i|%n\n' {} \; > files_list.txt
|
||||
elif [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
||||
# Mac OS and FreeBSD
|
||||
find "$root_path" -type f -not -path '*/.zfs/*' -exec stat -f "%d:%i|%N" {} \; > files_list.txt
|
||||
else
|
||||
file_count=$(find "${root_path}" -type f | wc -l)
|
||||
echo "Unsupported OS type: $OSTYPE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
color_echo "$Cyan" " File count: ${file_count}"
|
||||
echo_debug "Contents of files_list.txt:"
|
||||
if [ "$debug_flag" = true ]; then
|
||||
cat files_list.txt
|
||||
fi
|
||||
|
||||
# create db file
|
||||
# Sort files_list.txt by device and inode number
|
||||
sort -t '|' -k1,1 files_list.txt > sorted_files_list.txt
|
||||
|
||||
echo_debug "Contents of sorted_files_list.txt:"
|
||||
if [ "$debug_flag" = true ]; then
|
||||
cat sorted_files_list.txt
|
||||
fi
|
||||
|
||||
# Use awk to group paths by inode key and handle spaces in paths
|
||||
awk -F'|' '{
|
||||
key = $1
|
||||
path = substr($0, length(key)+2)
|
||||
if (key == prev_key) {
|
||||
print "\t" path
|
||||
} else {
|
||||
if (NR > 1) {
|
||||
# Do nothing
|
||||
}
|
||||
print key
|
||||
print "\t" path
|
||||
prev_key = key
|
||||
}
|
||||
}' sorted_files_list.txt > grouped_inodes.txt
|
||||
|
||||
echo_debug "Contents of grouped_inodes.txt:"
|
||||
if [ "$debug_flag" = true ]; then
|
||||
cat grouped_inodes.txt
|
||||
fi
|
||||
|
||||
# Count number of inode groups
|
||||
file_count=$(grep -c '^\w' grouped_inodes.txt)
|
||||
|
||||
color_echo "$Cyan" " Number of files to process: ${file_count}"
|
||||
|
||||
# Initialize current_index
|
||||
current_index=0
|
||||
|
||||
# Create db file
|
||||
if [ "${passes_flag}" -ge 1 ]; then
|
||||
touch "./${rebalance_db_file_name}"
|
||||
fi
|
||||
|
||||
# recursively scan through files and execute "rebalance" procedure
|
||||
# in the case of --skip-hardlinks, only find files with links == 1
|
||||
if [[ "${skip_hardlinks_flag}" == "true"* ]]; then
|
||||
find "$root_path" -type f -links 1 -print0 | while IFS= read -r -d '' file; do rebalance "$file"; done
|
||||
else
|
||||
find "$root_path" -type f -print0 | while IFS= read -r -d '' file; do rebalance "$file"; done
|
||||
paths=()
|
||||
|
||||
# Read grouped_inodes.txt line by line
|
||||
while IFS= read -r line; do
|
||||
if [[ "$line" == $'\t'* ]]; then
|
||||
# This is a path line
|
||||
path="${line#$'\t'}"
|
||||
paths+=("$path")
|
||||
else
|
||||
# This is a new inode key
|
||||
if [[ "${#paths[@]}" -gt 0 ]]; then
|
||||
# Process the previous group
|
||||
process_inode_group "${paths[@]}"
|
||||
fi
|
||||
paths=()
|
||||
fi
|
||||
done < grouped_inodes.txt
|
||||
|
||||
# Process the last group after the loop ends
|
||||
if [[ "${#paths[@]}" -gt 0 ]]; then
|
||||
process_inode_group "${paths[@]}"
|
||||
fi
|
||||
|
||||
# Clean up temporary files
|
||||
rm files_list.txt sorted_files_list.txt grouped_inodes.txt
|
||||
|
||||
echo ""
|
||||
echo ""
|
||||
color_echo "$Green" "Done!"
|
||||
|
||||
Loading…
Reference in New Issue
Block a user