mirror of
https://github.com/markusressel/zfs-inplace-rebalancing
synced 2026-02-04 21:14:09 +00:00
Support Hardlinks (Any number) - Update (#70)
* Support hardlink groups, add debug -Adds debug functionality with extended details -Supports detecting inode groups for hardlink processing. -Pulls files and sorts by, then groups by inode group with awk -Checks all files in an inode group's counts when calculating skipping counts -Removes existing skip hardlink flag -Removes hardlinks and recreates them directly after the balance copy/delete/move operation per inode group to minimize 'downtime' * Update README.md to denote hardlink support Adds details around the debug flag, hardlink support, removed --skip-hardlinks functionality, and temporary files used during the script processing. * Add additional hardlink group database notes * typo * Flip default debug * Fix space handling in paths * Fix echo bug Removed the 'recreating hardlinks' echo for inode groups of 1 file. * Introduce echo_debug * Reintroduce cp documentation * Remove unnecessary -e flag * Remove unused flag * Fix broken test due to | in the filename * Add ignore on temporary inode files * Update tests to assert the hardlinks are still working * Fix ShellCheck issues * Fix pasting issue * Make grep compatible with BSD/MacOS * Print whole line in debug * Fix stat for BSD * Fix stat for BSD part 2 * Fix shellcheck entry * Fix stats working with FreeBSD * Use bash's double brackets for consistency sake * Expand to a string --------- Co-authored-by: undaunt <31376520+undaunt@users.noreply.github.com>
This commit is contained in:
parent
97d11fdb0b
commit
b788358600
3
.gitignore
vendored
3
.gitignore
vendored
@ -1,5 +1,8 @@
|
|||||||
test.log
|
test.log
|
||||||
error.log
|
error.log
|
||||||
rebalance_db.txt
|
rebalance_db.txt
|
||||||
|
files_list.txt
|
||||||
|
sorted_files_list.txt
|
||||||
|
grouped_inodes.txt
|
||||||
testing_data
|
testing_data
|
||||||
.vscode
|
.vscode
|
||||||
@ -7,6 +7,8 @@ Simple bash script to rebalance pool data between all mirrors when adding vdevs
|
|||||||
|
|
||||||
This script recursively traverses all the files in a given directory. Each file is copied with a `.balance` suffix, retaining all file attributes. The original is then deleted and the *copy* is renamed back to the name of the original file. When copying a file ZFS will spread the data blocks across all vdevs, effectively distributing/rebalancing the data of the original file (more or less) evenly. This allows the pool data to be rebalanced without the need for a separate backup pool/drive.
|
This script recursively traverses all the files in a given directory. Each file is copied with a `.balance` suffix, retaining all file attributes. The original is then deleted and the *copy* is renamed back to the name of the original file. When copying a file ZFS will spread the data blocks across all vdevs, effectively distributing/rebalancing the data of the original file (more or less) evenly. This allows the pool data to be rebalanced without the need for a separate backup pool/drive.
|
||||||
|
|
||||||
|
When the script detects an inode group of hardlinked files, it will proceed to copy one file in the inode group. The original file and all hardlinks are then deleted, the *copy* is renamed back to the name of the original file, and new hardlinks are generated from that copy to replace all other linked files that were removed.
|
||||||
|
|
||||||
The way ZFS distributes writes is not trivial, which makes it hard to predict how effective the redistribution will be. See:
|
The way ZFS distributes writes is not trivial, which makes it hard to predict how effective the redistribution will be. See:
|
||||||
- https://jrs-s.net/2018/04/11/zfs-allocates-writes-according-to-free-space-per-vdev-not-latency-per-vdev/
|
- https://jrs-s.net/2018/04/11/zfs-allocates-writes-according-to-free-space-per-vdev-not-latency-per-vdev/
|
||||||
- https://jrs-s.net/2018/08/24/zfs-write-allocation-in-0-7-x/
|
- https://jrs-s.net/2018/08/24/zfs-write-allocation-in-0-7-x/
|
||||||
@ -28,6 +30,10 @@ Since file attributes are fully retained, it is not possible to verify if an ind
|
|||||||
1
|
1
|
||||||
```
|
```
|
||||||
|
|
||||||
|
All files in a given inode group will be added to the database when processed. The highest count in a given inode group of files will be used to determine if the group should be skipped when processing against the number of passes in a given script execution.
|
||||||
|
|
||||||
|
The hardlink support process creates temporary files in the script location alongside `rebalance_db.txt` which are removed upon the end of each run. `files_list.txt` lists all files found in the given target location. `sorted_files_list.txt` lists all files sorted by inode number. `grouped_inodes.txt` lists all files by inode, but with all files from a given inode space separated on one line.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
### Balance Status
|
### Balance Status
|
||||||
@ -83,6 +89,7 @@ chmod +x ./zfs-inplace-rebalancing.sh
|
|||||||
|
|
||||||
Dependencies:
|
Dependencies:
|
||||||
* `perl` - it should be available on most systems by default
|
* `perl` - it should be available on most systems by default
|
||||||
|
* `awk` - it should be available on most systems by default
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
@ -100,7 +107,7 @@ You can print a help message by running the script without any parameters:
|
|||||||
|-----------|-------------|---------|
|
|-----------|-------------|---------|
|
||||||
| `-c`<br>`--checksum` | Whether to compare attributes and content of the copied file using an **MD5** checksum. Technically this is a redundent check and consumes a lot of resources, so think twice. | `true` |
|
| `-c`<br>`--checksum` | Whether to compare attributes and content of the copied file using an **MD5** checksum. Technically this is a redundent check and consumes a lot of resources, so think twice. | `true` |
|
||||||
| `-p`<br>`--passes` | The maximum number of rebalance passes per file. Setting this to infinity by using a value `<= 0` might improve performance when rebalancing a lot of small files. | `1` |
|
| `-p`<br>`--passes` | The maximum number of rebalance passes per file. Setting this to infinity by using a value `<= 0` might improve performance when rebalancing a lot of small files. | `1` |
|
||||||
| `--skip-hardlinks` | Skip rebalancing hardlinked files, since it will only create duplicate data. | `false` |
|
| `--debug` | Shows additional output, including listing all files in the target location 3 times (list, inode sorted list, inode groupings) and more granular move/copy/link/count transaction information. | `false` |
|
||||||
|
|
||||||
### Example
|
### Example
|
||||||
|
|
||||||
|
|||||||
59
testing.sh
59
testing.sh
@ -22,6 +22,9 @@ Green='\033[0;32m' # Green
|
|||||||
Yellow='\033[0;33m' # Yellow
|
Yellow='\033[0;33m' # Yellow
|
||||||
Cyan='\033[0;36m' # Cyan
|
Cyan='\033[0;36m' # Cyan
|
||||||
|
|
||||||
|
|
||||||
|
OSName=$(echo "$OSTYPE" | tr '[:upper:]' '[:lower:]')
|
||||||
|
|
||||||
## Functions
|
## Functions
|
||||||
|
|
||||||
# print a given text entirely in a given color
|
# print a given text entirely in a given color
|
||||||
@ -44,19 +47,24 @@ function prepare() {
|
|||||||
|
|
||||||
# return time to the milisecond
|
# return time to the milisecond
|
||||||
function get_time() {
|
function get_time() {
|
||||||
|
if [[ "${OSName}" == "darwin"* ]]; then
|
||||||
case "$OSTYPE" in
|
date=$(gdate +%s%N)
|
||||||
darwin*)
|
else
|
||||||
date=$(gdate +%s%N)
|
date=$(date +%s%N)
|
||||||
;;
|
fi
|
||||||
*)
|
|
||||||
date=$(date +%s%N)
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
|
|
||||||
echo "$date"
|
echo "$date"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function get_inode() {
|
||||||
|
if [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
||||||
|
inode=$(stat -f "%i" "$1")
|
||||||
|
else
|
||||||
|
inode=$(stat -c "%i" "$1")
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "$inode"
|
||||||
|
}
|
||||||
|
|
||||||
function assertions() {
|
function assertions() {
|
||||||
# check error log is empty
|
# check error log is empty
|
||||||
if grep -q '[^[:space:]]' $log_error_file; then
|
if grep -q '[^[:space:]]' $log_error_file; then
|
||||||
@ -66,16 +74,9 @@ function assertions() {
|
|||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
function assert_matching_file_copied() {
|
function assert_matching_file_hardlinked() {
|
||||||
if ! grep "Copying" $log_std_file | grep -q "$1"; then
|
if [[ "$(get_inode "$1")" != "$(get_inode "$2")" ]]; then
|
||||||
echo "File matching '$1' was not copied when it should have been!"
|
echo "File '$1' was not hardlinked to '$2' when it should have been!"
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
}
|
|
||||||
|
|
||||||
function assert_matching_file_not_copied() {
|
|
||||||
if grep "Copying" $log_std_file | grep -q "$1"; then
|
|
||||||
echo "File matching '$1' was copied when it should have been skipped!"
|
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
@ -111,25 +112,13 @@ cat $log_std_file
|
|||||||
assertions
|
assertions
|
||||||
color_echo "$Green" "Tests passed!"
|
color_echo "$Green" "Tests passed!"
|
||||||
|
|
||||||
color_echo "$Cyan" "Running tests with skip-hardlinks false..."
|
color_echo "$Cyan" "Running tests with hardlinks..."
|
||||||
prepare
|
prepare
|
||||||
ln "$test_pool_data_path/projects/[2020] some project/mp4.txt" "$test_pool_data_path/projects/[2020] some project/mp4.txt.link"
|
ln "$test_pool_data_path/projects/[2020] some project/mp4.txt" "$test_pool_data_path/projects/[2020] some project/mp4.txt.link"
|
||||||
./zfs-inplace-rebalancing.sh --skip-hardlinks false $test_pool_data_path >> $log_std_file 2>> $log_error_file
|
./zfs-inplace-rebalancing.sh $test_pool_data_path >> $log_std_file 2>> $log_error_file
|
||||||
cat $log_std_file
|
cat $log_std_file
|
||||||
# Both link files should be copied
|
# Both link files should be copied
|
||||||
assert_matching_file_copied "mp4.txt"
|
assert_matching_file_hardlinked "$test_pool_data_path/projects/[2020] some project/mp4.txt" "$test_pool_data_path/projects/[2020] some project/mp4.txt.link"
|
||||||
assert_matching_file_copied "mp4.txt.link"
|
|
||||||
assertions
|
|
||||||
color_echo "$Green" "Tests passed!"
|
|
||||||
|
|
||||||
color_echo "$Cyan" "Running tests with skip-hardlinks true..."
|
|
||||||
prepare
|
|
||||||
ln "$test_pool_data_path/projects/[2020] some project/mp4.txt" "$test_pool_data_path/projects/[2020] some project/mp4.txt.link"
|
|
||||||
./zfs-inplace-rebalancing.sh --skip-hardlinks true $test_pool_data_path >> $log_std_file 2>> $log_error_file
|
|
||||||
cat $log_std_file
|
|
||||||
# Neither file should be copied now, since they are each a hardlink
|
|
||||||
assert_matching_file_not_copied "mp4.txt.link"
|
|
||||||
assert_matching_file_not_copied "mp4.txt"
|
|
||||||
assertions
|
assertions
|
||||||
color_echo "$Green" "Tests passed!"
|
color_echo "$Green" "Tests passed!"
|
||||||
|
|
||||||
|
|||||||
@ -1,14 +1,14 @@
|
|||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
# exit script on error
|
# Exit script on error
|
||||||
set -e
|
set -e
|
||||||
# exit on undeclared variable
|
# Exit on undeclared variable
|
||||||
set -u
|
set -u
|
||||||
|
|
||||||
# file used to track processed files
|
# File used to track processed files
|
||||||
rebalance_db_file_name="rebalance_db.txt"
|
rebalance_db_file_name="rebalance_db.txt"
|
||||||
|
|
||||||
# index used for progress
|
# Index used for progress
|
||||||
current_index=0
|
current_index=0
|
||||||
|
|
||||||
## Color Constants
|
## Color Constants
|
||||||
@ -24,20 +24,28 @@ Cyan='\033[0;36m' # Cyan
|
|||||||
|
|
||||||
## Functions
|
## Functions
|
||||||
|
|
||||||
# print a help message
|
# Print a help message
|
||||||
function print_usage() {
|
function print_usage() {
|
||||||
echo "Usage: zfs-inplace-rebalancing --checksum true --skip-hardlinks false --passes 1 /my/pool"
|
echo "Usage: zfs-inplace-rebalancing.sh --checksum true --passes 1 --debug false /my/pool"
|
||||||
}
|
}
|
||||||
|
|
||||||
# print a given text entirely in a given color
|
# Print a given text entirely in a given color
|
||||||
function color_echo() {
|
function color_echo() {
|
||||||
color=$1
|
color=$1
|
||||||
text=$2
|
text=$2
|
||||||
echo -e "${color}${text}${Color_Off}"
|
echo -e "${color}${text}${Color_Off}"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# Print a given text entirely in a given color
|
||||||
|
function echo_debug() {
|
||||||
|
if [ "$debug_flag" = true ]; then
|
||||||
|
text=$*
|
||||||
|
echo "${text}"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
function get_rebalance_count() {
|
function get_rebalance_count() {
|
||||||
file_path=$1
|
file_path="$1"
|
||||||
|
|
||||||
line_nr=$(grep -xF -n "${file_path}" "./${rebalance_db_file_name}" | head -n 1 | cut -d: -f1)
|
line_nr=$(grep -xF -n "${file_path}" "./${rebalance_db_file_name}" | head -n 1 | cut -d: -f1)
|
||||||
if [ -z "${line_nr}" ]; then
|
if [ -z "${line_nr}" ]; then
|
||||||
@ -51,63 +59,55 @@ function get_rebalance_count() {
|
|||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
# rebalance a specific file
|
# Rebalance a group of files that are hardlinked together
|
||||||
function rebalance() {
|
function process_inode_group() {
|
||||||
file_path=$1
|
paths=("$@")
|
||||||
|
num_paths="${#paths[@]}"
|
||||||
# check if file has >=2 links in the case of --skip-hardlinks
|
|
||||||
# this shouldn't be needed in the typical case of `find` only finding files with links == 1
|
|
||||||
# but this can run for a long time, so it's good to double check if something changed
|
|
||||||
if [[ "${skip_hardlinks_flag}" == "true"* ]]; then
|
|
||||||
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
|
||||||
# Linux
|
|
||||||
#
|
|
||||||
# -c --format=FORMAT
|
|
||||||
# use the specified FORMAT instead of the default; output a
|
|
||||||
# newline after each use of FORMAT
|
|
||||||
# %h number of hard links
|
|
||||||
|
|
||||||
hardlink_count=$(stat -c "%h" "${file_path}")
|
|
||||||
elif [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
|
||||||
# Mac OS
|
|
||||||
# FreeBSD
|
|
||||||
# -f format
|
|
||||||
# Display information using the specified format
|
|
||||||
# l Number of hard links to file (st_nlink)
|
|
||||||
|
|
||||||
hardlink_count=$(stat -f %l "${file_path}")
|
|
||||||
else
|
|
||||||
echo "Unsupported OS type: $OSTYPE"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [ "${hardlink_count}" -ge 2 ]; then
|
|
||||||
echo "Skipping hard-linked file: ${file_path}"
|
|
||||||
return
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
|
|
||||||
|
# Progress tracking
|
||||||
current_index="$((current_index + 1))"
|
current_index="$((current_index + 1))"
|
||||||
progress_percent=$(printf '%0.2f' "$((current_index * 10000 / file_count))e-2")
|
progress_raw=$((current_index * 10000 / file_count))
|
||||||
|
progress_percent=$(printf '%0.2f' "${progress_raw}e-2")
|
||||||
color_echo "${Cyan}" "Progress -- Files: ${current_index}/${file_count} (${progress_percent}%)"
|
color_echo "${Cyan}" "Progress -- Files: ${current_index}/${file_count} (${progress_percent}%)"
|
||||||
|
|
||||||
if [[ ! -f "${file_path}" ]]; then
|
echo_debug "Processing inode group with ${num_paths} paths:"
|
||||||
color_echo "${Yellow}" "File is missing, skipping: ${file_path}"
|
for path in "${paths[@]}"; do
|
||||||
|
echo_debug " - $path"
|
||||||
|
done
|
||||||
|
|
||||||
|
# Check rebalance counts for all files
|
||||||
|
should_skip=false
|
||||||
|
for path in "${paths[@]}"; do
|
||||||
|
rebalance_count=$(get_rebalance_count "${path}")
|
||||||
|
if [ "${rebalance_count}" -ge "${passes_flag}" ]; then
|
||||||
|
should_skip=true
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
if [ "${should_skip}" = true ]; then
|
||||||
|
if [ "${num_paths}" -gt 1 ]; then
|
||||||
|
color_echo "${Yellow}" "Rebalance count (${passes_flag}) reached, skipping group: ${paths[*]}"
|
||||||
|
else
|
||||||
|
color_echo "${Yellow}" "Rebalance count (${passes_flag}) reached, skipping: ${paths[0]}"
|
||||||
|
fi
|
||||||
|
return
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ "${passes_flag}" -ge 1 ]; then
|
main_file="${paths[0]}"
|
||||||
# check if target rebalance count is reached
|
|
||||||
rebalance_count=$(get_rebalance_count "${file_path}")
|
# Check if main_file exists
|
||||||
if [ "${rebalance_count}" -ge "${passes_flag}" ]; then
|
if [[ ! -f "${main_file}" ]]; then
|
||||||
color_echo "${Yellow}" "Rebalance count (${passes_flag}) reached, skipping: ${file_path}"
|
color_echo "${Yellow}" "File is missing, skipping: ${main_file}"
|
||||||
return
|
return
|
||||||
fi
|
|
||||||
fi
|
fi
|
||||||
|
|
||||||
tmp_extension=".balance"
|
tmp_extension=".balance"
|
||||||
tmp_file_path="${file_path}${tmp_extension}"
|
tmp_file_path="${main_file}${tmp_extension}"
|
||||||
|
|
||||||
|
echo "Copying '${main_file}' to '${tmp_file_path}'..."
|
||||||
|
echo_debug "Executing copy command:"
|
||||||
|
|
||||||
echo "Copying '${file_path}' to '${tmp_file_path}'..."
|
|
||||||
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
||||||
# Linux
|
# Linux
|
||||||
|
|
||||||
@ -115,33 +115,36 @@ function rebalance() {
|
|||||||
# -a -- keep attributes, includes -d -- keep symlinks (dont copy target) and
|
# -a -- keep attributes, includes -d -- keep symlinks (dont copy target) and
|
||||||
# -p -- preserve ACLs to
|
# -p -- preserve ACLs to
|
||||||
# -x -- stay on one system
|
# -x -- stay on one system
|
||||||
cp --reflink=never -ax "${file_path}" "${tmp_file_path}"
|
cmd=(cp --reflink=never -ax "${main_file}" "${tmp_file_path}")
|
||||||
|
echo_debug "${cmd[@]}"
|
||||||
|
"${cmd[@]}"
|
||||||
elif [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
elif [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
||||||
# Mac OS
|
# Mac OS and FreeBSD
|
||||||
# FreeBSD
|
|
||||||
|
|
||||||
# -a -- Archive mode. Same as -RpP. Includes preservation of modification
|
# -a -- Archive mode. Same as -RpP. Includes preservation of modification
|
||||||
# time, access time, file flags, file mode, ACL, user ID, and group
|
# time, access time, file flags, file mode, ACL, user ID, and group
|
||||||
# ID, as allowed by permissions.
|
# ID, as allowed by permissions.
|
||||||
# -x -- File system mount points are not traversed.
|
# -x -- File system mount points are not traversed.
|
||||||
cp -ax "${file_path}" "${tmp_file_path}"
|
cmd=(cp -ax "${main_file}" "${tmp_file_path}")
|
||||||
|
echo_debug "${cmd[@]}"
|
||||||
|
"${cmd[@]}"
|
||||||
else
|
else
|
||||||
echo "Unsupported OS type: $OSTYPE"
|
echo "Unsupported OS type: $OSTYPE"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# compare copy against original to make sure nothing went wrong
|
# Compare copy against original to make sure nothing went wrong
|
||||||
if [[ "${checksum_flag}" == "true"* ]]; then
|
if [[ "${checksum_flag}" == "true"* ]]; then
|
||||||
echo "Comparing copy against original..."
|
echo "Comparing copy against original..."
|
||||||
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
||||||
# Linux
|
# Linux
|
||||||
|
|
||||||
# file attributes
|
# file attributes
|
||||||
original_perms=$(lsattr "${file_path}")
|
original_perms=$(lsattr "${main_file}")
|
||||||
# remove anything after the last space
|
# remove anything after the last space
|
||||||
original_perms=${original_perms% *}
|
original_perms=${original_perms% *}
|
||||||
# file permissions, owner, group, size, modification time
|
# file permissions, owner, group, size, modification time
|
||||||
original_perms="${original_perms} $(stat -c "%A %U %G %s %Y" "${file_path}")"
|
original_perms="${original_perms} $(stat -c "%A %U %G %s %Y" "${main_file}")"
|
||||||
|
|
||||||
|
|
||||||
# file attributes
|
# file attributes
|
||||||
@ -157,7 +160,7 @@ function rebalance() {
|
|||||||
# note: no lsattr on Mac OS or FreeBSD
|
# note: no lsattr on Mac OS or FreeBSD
|
||||||
|
|
||||||
# file permissions, owner, group size, modification time
|
# file permissions, owner, group size, modification time
|
||||||
original_perms="$(stat -f "%Sp %Su %Sg %z %m" "${file_path}")"
|
original_perms="$(stat -f "%Sp %Su %Sg %z %m" "${main_file}")"
|
||||||
|
|
||||||
# file permissions, owner, group size, modification time
|
# file permissions, owner, group size, modification time
|
||||||
copy_perms="$(stat -f "%Sp %Su %Sg %z %m" "${tmp_file_path}")"
|
copy_perms="$(stat -f "%Sp %Su %Sg %z %m" "${tmp_file_path}")"
|
||||||
@ -166,6 +169,9 @@ function rebalance() {
|
|||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
echo_debug "Original perms: $original_perms"
|
||||||
|
echo_debug "Copy perms: $copy_perms"
|
||||||
|
|
||||||
if [[ "${original_perms}" == "${copy_perms}"* ]]; then
|
if [[ "${original_perms}" == "${copy_perms}"* ]]; then
|
||||||
color_echo "${Green}" "Attribute and permission check OK"
|
color_echo "${Green}" "Attribute and permission check OK"
|
||||||
else
|
else
|
||||||
@ -173,7 +179,7 @@ function rebalance() {
|
|||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if cmp -s "${file_path}" "${tmp_file_path}"; then
|
if cmp -s "${main_file}" "${tmp_file_path}"; then
|
||||||
color_echo "${Green}" "File content check OK"
|
color_echo "${Green}" "File content check OK"
|
||||||
else
|
else
|
||||||
color_echo "${Red}" "File content check FAILED"
|
color_echo "${Red}" "File content check FAILED"
|
||||||
@ -181,30 +187,47 @@ function rebalance() {
|
|||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
echo "Removing original '${file_path}'..."
|
echo "Removing original files..."
|
||||||
rm "${file_path}"
|
for path in "${paths[@]}"; do
|
||||||
|
echo_debug "Removing $path"
|
||||||
|
rm "${path}"
|
||||||
|
done
|
||||||
|
|
||||||
echo "Renaming temporary copy to original '${file_path}'..."
|
echo "Renaming temporary copy to original '${main_file}'..."
|
||||||
mv "${tmp_file_path}" "${file_path}"
|
echo_debug "Moving ${tmp_file_path} to ${main_file}"
|
||||||
|
mv "${tmp_file_path}" "${main_file}"
|
||||||
|
|
||||||
|
# Only recreate hardlinks if there are multiple paths
|
||||||
|
if [ "${num_paths}" -gt 1 ]; then
|
||||||
|
echo "Recreating hardlinks..."
|
||||||
|
for (( i=1; i<${#paths[@]}; i++ )); do
|
||||||
|
echo_debug "Linking ${main_file} to ${paths[$i]}"
|
||||||
|
ln "${main_file}" "${paths[$i]}"
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
if [ "${passes_flag}" -ge 1 ]; then
|
if [ "${passes_flag}" -ge 1 ]; then
|
||||||
# update rebalance "database"
|
# Update rebalance "database" for all files
|
||||||
line_nr=$(grep -xF -n "${file_path}" "./${rebalance_db_file_name}" | head -n 1 | cut -d: -f1)
|
for path in "${paths[@]}"; do
|
||||||
if [ -z "${line_nr}" ]; then
|
line_nr=$(grep -xF -n "${path}" "./${rebalance_db_file_name}" | head -n 1 | cut -d: -f1)
|
||||||
rebalance_count=1
|
if [ -z "${line_nr}" ]; then
|
||||||
echo "${file_path}" >>"./${rebalance_db_file_name}"
|
rebalance_count=1
|
||||||
echo "${rebalance_count}" >>"./${rebalance_db_file_name}"
|
echo "${path}" >> "./${rebalance_db_file_name}"
|
||||||
else
|
echo "${rebalance_count}" >> "./${rebalance_db_file_name}"
|
||||||
rebalance_count_line_nr="$((line_nr + 1))"
|
else
|
||||||
rebalance_count="$((rebalance_count + 1))"
|
rebalance_count_line_nr="$((line_nr + 1))"
|
||||||
sed -i '' "${rebalance_count_line_nr}s/.*/${rebalance_count}/" "./${rebalance_db_file_name}"
|
rebalance_count=$(awk "NR == ${rebalance_count_line_nr}" "./${rebalance_db_file_name}")
|
||||||
fi
|
rebalance_count="$((rebalance_count + 1))"
|
||||||
|
echo_debug "Updating rebalance count for ${path} to ${rebalance_count}"
|
||||||
|
sed -i "${rebalance_count_line_nr}s/.*/${rebalance_count}/" "./${rebalance_db_file_name}"
|
||||||
|
fi
|
||||||
|
done
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
checksum_flag='true'
|
checksum_flag='true'
|
||||||
skip_hardlinks_flag='false'
|
|
||||||
passes_flag='1'
|
passes_flag='1'
|
||||||
|
debug_flag='false'
|
||||||
|
|
||||||
if [[ "$#" -eq 0 ]]; then
|
if [[ "$#" -eq 0 ]]; then
|
||||||
print_usage
|
print_usage
|
||||||
@ -225,18 +248,18 @@ while true; do
|
|||||||
fi
|
fi
|
||||||
shift 2
|
shift 2
|
||||||
;;
|
;;
|
||||||
--skip-hardlinks)
|
|
||||||
if [[ "$2" == 1 || "$2" =~ (on|true|yes) ]]; then
|
|
||||||
skip_hardlinks_flag="true"
|
|
||||||
else
|
|
||||||
skip_hardlinks_flag="false"
|
|
||||||
fi
|
|
||||||
shift 2
|
|
||||||
;;
|
|
||||||
-p | --passes)
|
-p | --passes)
|
||||||
passes_flag=$2
|
passes_flag=$2
|
||||||
shift 2
|
shift 2
|
||||||
;;
|
;;
|
||||||
|
--debug)
|
||||||
|
if [[ "$2" == 1 || "$2" =~ (on|true|yes) ]]; then
|
||||||
|
debug_flag="true"
|
||||||
|
else
|
||||||
|
debug_flag="false"
|
||||||
|
fi
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
*)
|
*)
|
||||||
break
|
break
|
||||||
;;
|
;;
|
||||||
@ -251,30 +274,93 @@ color_echo "$Cyan" "Start rebalancing $(date):"
|
|||||||
color_echo "$Cyan" " Path: ${root_path}"
|
color_echo "$Cyan" " Path: ${root_path}"
|
||||||
color_echo "$Cyan" " Rebalancing Passes: ${passes_flag}"
|
color_echo "$Cyan" " Rebalancing Passes: ${passes_flag}"
|
||||||
color_echo "$Cyan" " Use Checksum: ${checksum_flag}"
|
color_echo "$Cyan" " Use Checksum: ${checksum_flag}"
|
||||||
color_echo "$Cyan" " Skip Hardlinks: ${skip_hardlinks_flag}"
|
color_echo "$Cyan" " Debug Mode: ${debug_flag}"
|
||||||
|
|
||||||
# count files
|
# Generate files_list.txt with device and inode numbers using stat, separated by a pipe '|'
|
||||||
if [[ "${skip_hardlinks_flag}" == "true"* ]]; then
|
if [[ "${OSName}" == "linux-gnu"* ]]; then
|
||||||
file_count=$(find "${root_path}" -type f -links 1 | wc -l)
|
# Linux
|
||||||
|
find "$root_path" -type f -not -path '*/.zfs/*' -exec stat --printf '%d:%i|%n\n' {} \; > files_list.txt
|
||||||
|
elif [[ "${OSName}" == "darwin"* ]] || [[ "${OSName}" == "freebsd"* ]]; then
|
||||||
|
# Mac OS and FreeBSD
|
||||||
|
find "$root_path" -type f -not -path '*/.zfs/*' -exec stat -f "%d:%i|%N" {} \; > files_list.txt
|
||||||
else
|
else
|
||||||
file_count=$(find "${root_path}" -type f | wc -l)
|
echo "Unsupported OS type: $OSTYPE"
|
||||||
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
color_echo "$Cyan" " File count: ${file_count}"
|
echo_debug "Contents of files_list.txt:"
|
||||||
|
if [ "$debug_flag" = true ]; then
|
||||||
|
cat files_list.txt
|
||||||
|
fi
|
||||||
|
|
||||||
# create db file
|
# Sort files_list.txt by device and inode number
|
||||||
|
sort -t '|' -k1,1 files_list.txt > sorted_files_list.txt
|
||||||
|
|
||||||
|
echo_debug "Contents of sorted_files_list.txt:"
|
||||||
|
if [ "$debug_flag" = true ]; then
|
||||||
|
cat sorted_files_list.txt
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Use awk to group paths by inode key and handle spaces in paths
|
||||||
|
awk -F'|' '{
|
||||||
|
key = $1
|
||||||
|
path = substr($0, length(key)+2)
|
||||||
|
if (key == prev_key) {
|
||||||
|
print "\t" path
|
||||||
|
} else {
|
||||||
|
if (NR > 1) {
|
||||||
|
# Do nothing
|
||||||
|
}
|
||||||
|
print key
|
||||||
|
print "\t" path
|
||||||
|
prev_key = key
|
||||||
|
}
|
||||||
|
}' sorted_files_list.txt > grouped_inodes.txt
|
||||||
|
|
||||||
|
echo_debug "Contents of grouped_inodes.txt:"
|
||||||
|
if [ "$debug_flag" = true ]; then
|
||||||
|
cat grouped_inodes.txt
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Count number of inode groups
|
||||||
|
file_count=$(grep -c '^\w' grouped_inodes.txt)
|
||||||
|
|
||||||
|
color_echo "$Cyan" " Number of files to process: ${file_count}"
|
||||||
|
|
||||||
|
# Initialize current_index
|
||||||
|
current_index=0
|
||||||
|
|
||||||
|
# Create db file
|
||||||
if [ "${passes_flag}" -ge 1 ]; then
|
if [ "${passes_flag}" -ge 1 ]; then
|
||||||
touch "./${rebalance_db_file_name}"
|
touch "./${rebalance_db_file_name}"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# recursively scan through files and execute "rebalance" procedure
|
paths=()
|
||||||
# in the case of --skip-hardlinks, only find files with links == 1
|
|
||||||
if [[ "${skip_hardlinks_flag}" == "true"* ]]; then
|
# Read grouped_inodes.txt line by line
|
||||||
find "$root_path" -type f -links 1 -print0 | while IFS= read -r -d '' file; do rebalance "$file"; done
|
while IFS= read -r line; do
|
||||||
else
|
if [[ "$line" == $'\t'* ]]; then
|
||||||
find "$root_path" -type f -print0 | while IFS= read -r -d '' file; do rebalance "$file"; done
|
# This is a path line
|
||||||
|
path="${line#$'\t'}"
|
||||||
|
paths+=("$path")
|
||||||
|
else
|
||||||
|
# This is a new inode key
|
||||||
|
if [[ "${#paths[@]}" -gt 0 ]]; then
|
||||||
|
# Process the previous group
|
||||||
|
process_inode_group "${paths[@]}"
|
||||||
|
fi
|
||||||
|
paths=()
|
||||||
|
fi
|
||||||
|
done < grouped_inodes.txt
|
||||||
|
|
||||||
|
# Process the last group after the loop ends
|
||||||
|
if [[ "${#paths[@]}" -gt 0 ]]; then
|
||||||
|
process_inode_group "${paths[@]}"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# Clean up temporary files
|
||||||
|
rm files_list.txt sorted_files_list.txt grouped_inodes.txt
|
||||||
|
|
||||||
echo ""
|
echo ""
|
||||||
echo ""
|
echo ""
|
||||||
color_echo "$Green" "Done!"
|
color_echo "$Green" "Done!"
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user