Precursor Post
Ever found yourself trying to sort a list of strings in Rust, only to realize that no single standard sorting method does exactly what you need? Maybe you have a mix of fixed-length identifiers like ULIDs alongside human-readable filenames with numbers?
// Desired Sort Order?
[
"id-00000000000000000000000001", // Fixed length, zero-padded
"id-00000000000000000000000002",
"report_2.csv", // Variable length, numbers
"report_10.csv"
]
Standard lexicographical sort (slice.sort()
) gets the IDs right but messes up the filenames: report_10.csv
would come before report_2.csv
.
Natural sorting libraries (like the excellent natord
crate) handle the filenames beautifully (report_2.csv
before report_10.csv
) but might not apply the strict lexicographical ordering you want for those fixed-length, zero-padded IDs.
This specific scenario – needing lexicographical sorting for same-length strings and natural sorting for different-length strings within the same list – is exactly why I created natlex_sort
!
Introducing natlex_sort
natlex_sort
is a small Rust library built to solve this hybrid sorting problem. It provides comparison functions and sorting utilities based on one simple rule:
- Strings of the same length are compared lexicographically (byte-wise).
- Strings of different lengths are compared using natural ordering.
This approach allows your fixed-length identifiers to sort precisely as stored, while your variable-length names get the intuitive numerical sorting you expect.
How It Works
Under the hood, natlex_sort
uses these comparison strategies:
- For
&str
:- Checks lengths.
- If equal, uses standard Rust
a.cmp(b)
. - If different, uses
natord::compare
(ornatord::compare_ignore_case
).
- For
&[u8]
:- Checks lengths.
- If equal, uses standard byte slice
a.cmp(b)
. - If different, uses custom logic optimized for ASCII text with embedded numbers, comparing numeric parts intelligently.
- Case-Insensitive Variants: Both string and byte slice comparisons have
_ignore_case
versions. For same-length items, these perform a case-insensitive comparison first, falling back to a case-sensitive comparison (a.cmp(b)
) only as a tie-breaker to ensure a stable, total order (required by Rust's sorting traits).
Getting Started
Add natlex_sort
to your project's Cargo.toml
:
[dependencies]
natlex_sort = "0.1.0" # Check crates.io for the latest version
Usage Example
Let's sort a mixed list:
use natlex_sort::nat_lex_sort;
let mut items = vec![
"item10",
"item2",
"id_002", // Same length as id_001
"id_001",
"Item1", // Different length from item2/item10, different case
];
println!("Before sort: {:?}", items);
// Use the case-sensitive hybrid sort
nat_lex_sort(&mut items);
println!("After sort: {:?}", items);
// Expected output:
// Before sort: ["item10", "item2", "id_002", "id_001", "Item1"]
// After sort: ["Item1", "id_001", "id_002", "item2", "item10"]
// Why this order?
// - "id_001" vs "id_002": Same length (6) -> lexicographical -> "id_001" < "id_002"
// - "item2" vs "item10": Different length -> natural -> 2 < 10 -> "item2" < "item10"
// - "Item1" vs "item2"/"item10": Different length -> natural -> 1 < 2/10 -> "Item1" < others
// - "Item1" vs "id_001": Different length -> natural -> 'I' (73) < 'i' (105) -> "Item1" < "id_001"
The library also provides:
nat_lex_sort_ignore_case()
for case-insensitive sorting.nat_lex_sort_bytes()
andnat_lex_sort_bytes_ignore_case()
for&[&[u8]]
.- Direct comparison functions (
nat_lex_cmp
,nat_lex_byte_cmp
, etc.) for use withsort_by
or other algorithms. - An owning
NatLexOrderedString
wrapper convenient for ordered collections likeBTreeMap
.
API Highlights
- Comparators:
nat_lex_cmp
,nat_lex_cmp_ignore
,nat_lex_byte_cmp
,nat_lex_byte_cmp_ignore
- Sorters:
nat_lex_sort
,nat_lex_sort_ignore_case
,nat_lex_sort_bytes
,nat_lex_sort_bytes_ignore_case
- Wrapper:
NatLexOrderedString
(uses case-sensitivenat_lex_cmp
)
Why Use natlex_sort
?
If you need to sort lists containing both fixed-length identifiers (where lexicographical order is key) and variable-length names with numbers (where natural order makes sense), natlex_sort
provides a simple, targeted solution based on string length differences.
Check it out!
- GitHub Repository: github.com/normano/natlex_sort
- Crates.io: crates.io/crates/natlex_sort
- API Documentation: docs.rs/natlex_sort
Give it a try if you encounter this specific sorting challenge! Contributions, feedback, and bug reports via GitHub issues are welcome. Happy sorting!