-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Improve sort buffer sizing heuristics and honor explicit --buffer-size #8833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
you have to refresh the fuzz/Cargo.lock file |
GNU testsuite comparison:
|
please follow this documentation for the performance work: We would like to see hyperfine results and codspeed benchmark :) |
src/uu/sort/src/sort.rs
Outdated
quarter.max(max) | ||
} | ||
|
||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was already not convinced in #8802
but
there is probably a better way than parsing /proc/meminfo
esp in the sort.rs code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented it using a different method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said it in the other pr but it does not belong to sort but uucore
And maybe we already have such functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?
CodSpeed Performance ReportMerging #8833 will degrade performances by 3.04%Comparing Summary
Benchmarks breakdown
Footnotes
|
impressive wins :) |
587a73c
to
287d6c4
Compare
GNU testsuite comparison:
|
GNU testsuite comparison:
|
i am planning to merge #8746
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
please try to avoid commit messages like "fix" |
src/uu/sort/src/sort.rs
Outdated
use fnv::FnvHasher; | ||
#[cfg(target_os = "linux")] | ||
use nix::libc::{RLIMIT_NOFILE, getrlimit, rlimit}; | ||
use libc::{RLIMIT_NOFILE, rlimit}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was more here. why do you move to libc ?
what was wrong with nix ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I was using libc elsewhere, I thought it would be better to match that, but the original code is better.
GNU testsuite comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
・Add comment
・Change to Nix
src/uu/sort/src/sort.rs
Outdated
quarter.max(max) | ||
} | ||
|
||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented it using a different method
src/uu/sort/src/sort.rs
Outdated
use fnv::FnvHasher; | ||
#[cfg(target_os = "linux")] | ||
use nix::libc::{RLIMIT_NOFILE, getrlimit, rlimit}; | ||
use libc::{RLIMIT_NOFILE, rlimit}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I was using libc elsewhere, I thought it would be better to match that, but the original code is better.
I have doubt about the memory functions being in sort itself. Should be in uucore if we already don't have any. Also please keep in mind that you are writing to a human. If I want an Ai explaining me things, I can do it myself :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments to the source code.
src/uu/sort/src/sort.rs
Outdated
quarter.max(max) | ||
} | ||
|
||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?
GNU testsuite comparison:
|
move memory functions to uucore |
GNU testsuite comparison:
|
GNU testsuite comparison:
|
src/uu/sort/src/ext_sort.rs
Outdated
// Heuristically chosen: Dividing by 10 seems to keep our memory usage roughly | ||
// around settings.buffer_size as a whole. | ||
let buffer_size = settings.buffer_size / 10; | ||
// Cap oversized buffer requests at 512MiB to avoid unnecessary allocations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain why 512 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cap automatic_buffer_size at 1 GiB so the reader and sorter never hold more than about 1 GiB of data at once; without that cap they could grab several gigabytes and waste memory.
Since it becomes 1GB when allocated simultaneously, I set it to 512MB.
GNU testsuite comparison:
|
GNU testsuite comparison:
|
src/uucore/Cargo.toml
Outdated
tempfile = { workspace = true } | ||
|
||
[target.'cfg(target_os = "linux")'.dependencies] | ||
procfs = "0.18" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the one from workspace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8968
fix PR
Looks great. |
Fitst uucore change |
GNU testsuite comparison:
|
src/uu/sort/src/sort.rs
Outdated
} | ||
} | ||
|
||
fn file_size_hint(files: &[OsString]) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please document this
and maybe it would make sense to move these memory functions into a specific file
src/uu/sort/src/sort.rs
Outdated
fn physical_memory_bytes() -> Option<u128> { | ||
#[cfg(all(target_family = "unix", not(target_os = "redox")))] | ||
{ | ||
let pages = unsafe { libc::sysconf(libc::_SC_PHYS_PAGES) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use nix instead of libc here?
it will probably remove unsafe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coreutils/src/uu/sort/Cargo.toml
Line 40 in 8e7c6b0
[target.'cfg(target_os = "linux")'.dependencies] |
Is it okay to modify this part too?
Since it depends on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe ? please give it a try :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8959 (comment)
In some environments, it seems the environment variables are missing.
GNU testsuite comparison:
|
1 similar comment
GNU testsuite comparison:
|
GNU testsuite comparison:
|
- move heuristics into a new buffer_hint module and default to automatic sizing when the buffer flag is absent - tune chunk and external sort buffers to avoid runaway allocations - add fast lexicographic and ASCII case-insensitive comparisons for the default mode - refresh spell-check and dependency metadata for the new code
- keep the latest path/lock pair in a shared registry so SIGINT always cleans the active directory - guard handler installation with an atomic flag and reset state when the wrapper is dropped
a0a8c75
to
9914a22
Compare
GNU testsuite comparison:
|
Add automatic buffer-size heuristics (ported from commit a0e77d9). We now size external-sort chunks based on input file sizes and available memory, clamping to 512 KiB–128 MiB so we avoid both tiny buffers and risky overcommit on constrained systems.
Respect user-provided --buffer-size. Only automatically computed sizes are raised to the safety minimum; explicit values are left untouched, which keeps external sorting and --compress-program working even when users choose small buffers.
Performance Comparison (baseline vs. current)
Measurements come from hyperfine --warmup 3 --runs 10; values are means in milliseconds (lower is better).