Rust for System Tools: Building Cleanser
When I decided to build a macOS storage cleanup tool, I knew I needed something fast, safe, and reliable. Rust was the obvious choice, and the journey taught me a lot about systems programming and performance optimization.
Why Rust for CLI Tools?
Memory Safety Without Garbage Collection
System tools often need to process large amounts of data efficiently. Rust gives you: - Zero-cost abstractions - Memory safety without runtime overhead - Predictable performance characteristicsExcellent Concurrency Primitives
For a tool that scans filesystems, parallel processing is essential:use rayon::prelude::*;
fn scan_directories(paths: Vec<PathBuf>) -> Vec<FileInfo> {
paths
.par_iter()
.flat_map(|path| scan_directory(path))
.collect()
}
Cross-Platform by Default
Rust's standard library abstracts platform differences well, making cross-platform development straightforward.Architecture Decisions
Parallel File Scanning
The biggest performance win came from parallelizing filesystem operations:use rayon::ThreadPoolBuilder;
pub struct Scanner {
thread_pool: ThreadPool,
max_depth: usize,
}
impl Scanner {
pub fn new(threads: usize, max_depth: usize) -> Self {
let thread_pool = ThreadPoolBuilder::new()
.num_threads(threads)
.build()
.expect("Failed to create thread pool");
Self { thread_pool, max_depth }
}
pub fn scan(&self, root: &Path) -> ScanResult {
self.thread_pool.install(|| {
self.scan_recursive(root, 0)
})
}
}
Smart Caching System
To avoid rescanning unchanged directories:use std::collections::HashMap;
use std::time::SystemTime;
#[derive(Debug, Clone)]
struct CacheEntry {
last_modified: SystemTime,
file_count: usize,
total_size: u64,
}
pub struct ScanCache {
entries: HashMap<PathBuf, CacheEntry>,
}
impl ScanCache {
pub fn is_valid(&self, path: &Path) -> bool {
if let Some(entry) = self.entries.get(path) {
if let Ok(metadata) = path.metadata() {
if let Ok(modified) = metadata.modified() {
return modified <= entry.last_modified;
}
}
}
false
}
}
Risk-Based Cleanup Levels
Different users have different risk tolerances:#[derive(Debug, Clone, Copy)]
pub enum CleanupLevel {
Safe, // Only obvious temp files
Moderate, // Include build artifacts
Risky, // Include caches that might slow things down
}
impl CleanupLevel {
pub fn should_clean(&self, file_type: &FileType) -> bool {
match (self, file_type) {
(_, FileType::TempFile) => true,
(CleanupLevel::Moderate | CleanupLevel::Risky, FileType::BuildArtifact) => true,
(CleanupLevel::Risky, FileType::Cache) => true,
_ => false,
}
}
}
Performance Optimizations
SHA-256 Hashing for Duplicates
Finding duplicate files efficiently:use sha2::{Sha256, Digest};
use std::collections::HashMap;
pub fn find_duplicates(files: &[PathBuf]) -> HashMap<String, Vec<PathBuf>> {
files
.par_iter()
.filter_map(|path| {
let hash = compute_file_hash(path).ok()?;
Some((hash, path.clone()))
})
.collect::<Vec<_>>()
.into_iter()
.fold(HashMap::new(), |mut acc, (hash, path)| {
acc.entry(hash).or_insert_with(Vec::new).push(path);
acc
})
.into_iter()
.filter(|(_, paths)| paths.len() > 1)
.collect()
}
fn compute_file_hash(path: &Path) -> Result<String, Box<dyn std::error::Error>> {
let mut file = File::open(path)?;
let mut hasher = Sha256::new();
std::io::copy(&mut file, &mut hasher)?;
Ok(format!("{:x}", hasher.finalize()))
}
Memory-Efficient File Processing
For large files, stream processing instead of loading everything into memory:use std::io::{BufReader, Read};
fn process_large_file(path: &Path) -> Result<ProcessResult, Box<dyn std::error::Error>> {
let file = File::open(path)?;
let mut reader = BufReader::new(file);
let mut buffer = [0; 8192]; // 8KB buffer
let mut total_size = 0;
loop {
let bytes_read = reader.read(&mut buffer)?;
if bytes_read == 0 {
break;
}
total_size += bytes_read;
// Process chunk without storing entire file
}
Ok(ProcessResult { size: total_size })
}
Safety Features
Validation Before Deletion
Never delete without confirmation:pub struct SafeDeleter {
dry_run: bool,
require_confirmation: bool,
}
impl SafeDeleter {
pub fn delete_files(&self, files: &[PathBuf]) -> Result<DeletionResult, DeletionError> {
// Validate all files exist and are safe to delete
self.validate_files(files)?;
if self.require_confirmation {
self.prompt_for_confirmation(files)?;
}
if self.dry_run {
return Ok(DeletionResult::dry_run(files));
}
// Actually delete files
self.perform_deletion(files)
}
fn validate_files(&self, files: &[PathBuf]) -> Result<(), DeletionError> {
for file in files {
if self.is_system_critical(file) {
return Err(DeletionError::SystemCritical(file.clone()));
}
}
Ok(())
}
}
Distribution Strategy
Homebrew Integration
Making installation easy for macOS users:# Formula/cleanser.rb
class Cleanser < Formula
desc "High-performance macOS storage cleanup tool"
homepage "https://github.com/phpfc/cleanser"
url "https://github.com/phpfc/cleanser/archive/v1.0.0.tar.gz"
sha256 "..."
depends_on "rust" => :build
def install
system "cargo", "install", *std_cargo_args
end
test do
system "#{bin}/cleanser", "--version"
end
end
CI/CD Pipeline
Automated testing and releases:name: Release
on:
push:
tags: ['v*']
jobs:
build:
runs-on: macos-latest
steps:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Build release
run: cargo build --release
- name: Run tests
run: cargo test
- name: Create release
uses: actions/create-release@v1
Lessons Learned
1. Start Simple, Optimize Later
My first version was single-threaded and slow. Profiling showed where the bottlenecks were, then I optimized those specific areas.2. Error Handling is Critical
System tools need robust error handling. Rust'sResult type makes this natural:
fn scan_directory(path: &Path) -> Result<Vec<FileInfo>, ScanError> {
let entries = fs::read_dir(path)
.map_err(|e| ScanError::ReadDir { path: path.to_owned(), source: e })?;
// Process entries...
}
3. User Experience Matters
Even CLI tools need good UX. Clear progress indicators, helpful error messages, and sensible defaults make all the difference.4. Performance Testing is Essential
I usedcriterion for benchmarking and flamegraph for profiling. Measuring performance objectively was crucial for optimization decisions.
Results
The final tool achieved:
- 10x faster than equivalent Python scripts
- Memory usage under 50MB even for large scans
- Zero crashes in production use
- 1000+ downloads via Homebrew
Conclusion
Rust proved to be an excellent choice for building system tools. The combination of performance, safety, and excellent tooling made development productive and the result reliable.
The key takeaways:
1. Leverage Rust's concurrency primitives early
2. Design for safety from the beginning
3. Profile and optimize based on real usage
4. Invest in good distribution and CI/CD
Building Cleanser taught me that Rust isn't just for systems programming - it's great for any tool where performance and reliability matter.