cstartup: background, design, results

After starting init(8) or similar on Linux and other UNIX flavors, older-style system startup activity includes running several shell processes to interpret about 2-3 dozen rc scripts/etc/rc.d/* /etc/rc* or a few large ones. On many systems, each shell also interprets secondary scripts loaded with shell functions used as shared or common services. Although shell scripts are simple enough to create and maintain, their ease and convenience comes at a cost. While the overhead from runtime interpreting scripts is undesirable, always running tasks serially is about the slowest way to execute most workloads, meanwhile most other CPU resources are left idle. These issues and a motivation for faster boot times led to methods of executing startup workloads in parallel, especially on modern hardware with many CPU cores/threads. While some methods modestly reduce system startup time, most include one or more forms of runtime interpreting and a mix of elements like these:

minimally reducing script-related conveniences
dependency comments added to scripts and more interpreting
a few occurences of running 2 or 3 scripts in parallel
scanning many subdirs and interpreting symlinks as dependencies
ease-of-use sacrificed for countless features and heavy integration
excess complexity, numerous tools, long/awkward command-lines
ineffiencies – varied in number and scale

was named with a dual-meaning from the phrase "concurrent startup" and from being written in C. The overall design follows a UNIX philosophy – minimal, fast, and efficient. The primary focus is minimizing runtime for the startup workload and faster boot times. Major elements are a parallel execution engine, translator, and display/disassembler. Each element works with system startup and shutdown workloads — see operating modes. To support most systems and for versatility, workloads are listed in ASCII config files, and these are translated to binary for runtime performance. While avoids complexity and heavy excess found in other startup services, similarly there are no high-overhead elements like attempting to auto-convert rc scripts/etc/rc.d/* /etc/rc* or divine prerequisites between startup tasks. Custom extensions and new config-driven internal functions are sometimes written to suit unique embedded systems. Design summary:

eliminate use of shells and scripts
eliminate runtime interpeting and related I/O
list system startup and shutdown tasks with ASCII config files
support groups of tasks similar to separate scripts
before booting: translate config files to binary
during boot: read a binary file and use the content directly
create several threads and leverage CPU resources
support runtime prerequisites
run tasks in parallel as directed
run custom internal functions replacing various admin processes

Target hardware requirements are CPUs with at least 4-cores/8-threads.
Example platforms include embedded systems, servers, and HEDTs.

After init(8) or similar on most Linux and other UNIX flavors, can often reduce startup runtime by about 33%. A wide range of hardware configurations and startup workloads can lead to reductions that also vary widely, ranging roughly 19% to 77%. A mix of several factors contribute to runtime reductions with , eg:

size and nature of the startup workload
abundance or scarcity of prerequisites ⇆ degree of concurrency
CPU and memory resources
tuning N threads
custom internal functions
static vs. loadable kernel modules and their timing
booting from a (SSD or NVMe) vs. spinning media

For overall workload optimization, a tuned thread count can lead to saturating hardware resources while avoiding overloading and excess context switching. Some startup processes can be replaced with custom internal functions and further reduce startup runtime, eg: high resource consumers and frequently run simpler processes.