After starting init(8) or similar on Linux and other UNIX flavors,
older-style system startup activity includes running several
shell processes to interpret about 2-3 dozen
or a few large ones.
On many systems, each shell also interprets secondary scripts loaded
with shell functions as shared or common services.
Although shell scripts are simple enough to create and maintain,
their ease and convenience comes at a cost.
While the overhead from runtime interpreting scripts is undesirable,
always running tasks serially is about the slowest way to execute most
workloads; meanwhile other CPU resources are often left idle.
These issues and a motivation for faster boot times led to methods of
executing startup workloads in parallel, especially on modern hardware
with many CPU cores/threads.
While some methods modestly reduce system startup time,
most include one or more forms of runtime interpreting
and a mix of elements like these:
- minimally reducing script-related conveniences
- dependency comments added to scripts and more interpreting
- a few occurences of running 2 or 3 scripts in parallel
- scanning many subdirs and interpreting symlinks as dependencies
- ease-of-use sacrificed for countless features and heavy integration
- excess complexity, numerous tools, long/awkward command-lines
- ineffiencies – varied in number and scale
was named with a dual-meaning from the phrase "concurrent startup"
and from being written in C.
The overall design follows a UNIX philosophy
– minimal, fast, and efficient.
The primary focus is minimizing runtime for the
startup workload and faster boot times.
Major elements are a parallel execution engine,
translator, and display/disassembler.
Each element works with system startup and shutdown workloads —
To support most systems and for versatility,
workloads are listed in ASCII config files.
Config files are translated to binary for runtime performance.
avoids complexity and heavy excess found in other startup services,
similarly there are no high-overhead elements like attempting to auto-convert
or divine prerequisites between startup tasks.
Custom extensions and new config-driven internal functions are
sometimes written to suit unique embedded systems.
- eliminate use of shells and scripts
- eliminate runtime interpeting and related I/O
- list system startup and shutdown tasks with ASCII config files
- support groups of tasks similar to separate scripts
- before booting: translate config files to binary
- during boot: read a binary file and use the content directly
- create several threads and leverage CPU resources
- support runtime prerequisites
- run tasks in parallel as directed
- run custom internal functions replacing various admin processes
Target hardware requirements are CPUs with at least 4-cores/8-threads.
Example platforms include embedded systems, servers, and HEDTs.
RESULTS & TUNING
After init(8) or similar on most Linux and other UNIX flavors,
can often reduce startup runtime by about 33%.
A wide range of hardware configurations and startup workloads can lead to
reductions that also vary widely, ranging roughly 19% to 77%.
A mix of several factors contribute to runtime reductions with
- size and nature of the startup workload
- abundance or scarcity of prerequisites ⟷ degree of concurrency
- CPU and memory resources
- tuning N threads
- custom internal functions
- static vs. loadable kernel modules and their timing
- booting from a (SSD or NVMe) vs. spinning media
For overall workload optimization, a tuned thread count can
lead to saturating hardware resources while avoiding overloading
and excess context switching.
Some startup processes can be replaced with custom internal functions
and further reduce startup runtime, eg: high resource consumers and
frequently run simpler processes.