The moment after a computer is powered on, the CPU (central processing Unit) gets power and does not know what it suppose to do. Mean while a special hardwire circuit raises the logical value of the RESET pin of the CPU (means a special HW circuit sets the RESET pin of the CPU). Once the CPU receives the RESET pin set, it assorted and sets some of its special registers to its default value and the specific code resides in the address location of 0xFFFF FFF0 gets executed known as system BOIS (Basic Input Output System). BIOS code is resides in the flash memory (ROM) present on the mother board. The BIOS then does the POST (Power On Self Test) operation to determine the HWs which are responsible for booting process. When a boot device is found it loads the boot-sector (1st stage boot-loader) from the boot device into RAM (Random Access Memory) and execute (The code that resides in the MBR (Master Boot Record) is known as boot-loader). The 1st stage boot loader is less than 512 bytes in size (a single sector), and its main job is to load the 2nd stage boot-loader into the RAM. When the 2nd stage boot loader is in RAM and executing, a splash screen is commonly displayed, and Linux and an optional RAM disc (temporary root file system) are loaded into memory (NB: The code segment in MBR is known as boot loader and is also responsible for selecting more than one OS at this point of time while 2nd boot-loader is in execution). When the kernel images are loaded into the RAM, the 2nd stage boot-loader passes control to the kernel image and the kernel is decompressed inside RAM and initialized. During this time the 2nd stage boot loader checks the system hardware, enumerates (numbers) the attached hardware devices, mounts the root device and then loads the necessary kernel modules. When all the aforesaid tasks get completed, the 1st User-space program (init) starts.
Brief description and Functionality
Basic Input Output System [BIOS] :
BIOS is a
program resides in the memory location 0xFFFFFFF0 of ROM/Flash memory. It
consists of some interrupt driven low-level procedures, used by the Operating
Systems to handle the hardware devices. After the initialization process of
Linux completes, it does not use the BIOS. BIOS has two parts, POST code and
runtime service. POST part will flush from the memory once the POST operation
has completed, but the runtime service will remain in the memory and available
to the target OS.
·
BIOS
execute a series of tests in the computer hardware, in order to establish which
devices are present in the computer and whether they are working properly or
not. This series of tests are often called POST (power on self test). During
this phase user may see the BIOS version banners in the screen.
·
It
initializes the hardware devices and ensures that all hardware devices are
operate without conflict on the IRQ(interrupt)
lines and IO ports. In case of PCI based architecture, it shows the table
of all installed PCI devices.
·
BIOS
runtime service searches for devices that are both active and bootable in the
order of preference defined by the CMOS (complementary metal oxide
semiconductor) settings. The boot device can be a floppy disc, a CD-ROM, a
partition in a hard disc, a device in the network, or an USB flash memory
stick.
·
Commonly
Linux is booted from the Hard disc where the MBR (Master Boot Record) contains
the primary boot-loader. The MBR is a 512-byte sector, located on the 1st
sector on the disc (sector = 1; cylinder = 0; head = 0).
·
As
soon as it gets a valid device to boot, it copies the 1st sector of
the device into RAM, starting from physical address 0x7C00 of RAM. Then it jumps to that address and execute the
loaded instructions there (inside RAM, physical address 0x7C00).
Master Boot Record [MBR] :
Every hard
disc must have a consistent starting point where all the key information
(number of partitions the disc has, what kind of partitions they are etc.)
about the disc is stored. The place
where these information are stored is known as MBR/master Boot sector/ Boot
sector. MBR is always located in cylinder 0, head 0, and sector 1 (1st
sector of the disc). BIOS always look into this sector to boot the OS. MBR
contains the following structures:
·
Partition
Table : this table contains the information regarding the partitions that are
contained on the hard disc. The size of this table for information describing
is 4, which mean the hard disc can have maximum 4 true partitions known as
primary partitions; more than 4 are logical partitions and linked with one of
the physical partition.
·
Master
Boot Code : the MBR contains the small initial boot program that the BIOS loads
and executes to start the boot process.
Stage-1 Boot Loader :
The primary
boot loader that resides in the MBR is a 512-byte image containing both program
code and a small partition table. The 1st 446 bytes are the primery
boot loader, which contains a record for each of four partitions (16 bytes each).
The last 2 bytes of MBR contains the magic
number (0XAA55). The magic
number serves as the validation check of the MBR. The job of the primary boot
loader is to find and load the secondary boot loader. Primary boot loader 1st
searches the partition table for an active partition. When it finds an active
partition, it scans all other partitions to make sure that all others are
in-active or not. When this is verified (only there exists only one active
partition), the active partition’s boot record is read from the device into RAM
and executed.
Stage-2 Boot Loader :
The secondary
boot loader is also known as the kernel loader. The task of this stage is to
load the Linux kernel into the RAM along with optional RAM disk. The 1st
and 2nd stage boot loaders combined are called Linux loader (LILO)
or Grand Unified Boot Loader (GRUB). The advantage of GRUB is that it includes
the knowledge of Linux file systems. Instead of using raw sectors on the disk,
as LILO does, GRUB can load the Linux kernel from an ext2 or ext3 file
system. It (GRUB) does this by making the 2 stage boot loader into a three
stage boot loader (after MBR (1st stage), it boots a 1.5 boot loader
that understands the particular file system containing the Linux Kernel Image).
Kernel Boot Procedures:
With the
kernel image in the memory and control given from the stage 2 boot loader, the
kernel stage begins. The kernel image is a compressed kernel image. Typically it’s
a zImage (compressed image less than 512 kilo bytes) or a bzimage (big
compressed image more than 512 KB), that has been previously compressed with
zlib. At the head of this image is a routine that does some minimal amount of
H/W setup and then decompressed the kernel contented within the kernel image
and places it into high memory. If an initial RAM disk image is present, then
this routine will moves it to the memory and notes it for later use. Then the
routine calls the Kernel and then the Kernel boot process begins.
Kernel Start-up Procedures
Setup.S [/arch/i386/boot/setup.S]
Setup.S is
an assembly code and responsible for getting the system data from the BIOS, and
putting them into the appropriate location of the system memory. This code asks
the BIOS for memory/disc/other parameters and put them into a protected memory region
(0x90000-0x901FF). It also re-initializes all H/Ws and moves from real mode to
protected mode memory addressing. It then sets up a provisional GDT and IDT and
also re-programmes the Programmable Interrupt Controller (PIC) and maps the 16
IRQ lines from 0 to 15.
Code Flow:
In
setup.S file, the 1st instruction found is a jump.
1. “start: jmp trampoline” and followed by a set of initializes.
At trampoline a procedure was called “start-of-setup (trampoline: call start_of_setup)”
which starts the actual work.
start_of_setup:
#
Bootlin depends on this being done early
movw $0x01500,
%ax
movb $0x81,
%dl
int $0x13
2. Resets the disk
controller. (# Reset the disk controller.)
#ifdef
SAFE_RESET_DISK_CONTROLLER
#
Reset the disk controller.
movw $0x0000,
%ax
movb $0x80,
%dl
int $0x13
#endif
3. Setup the code and
data segment registers to SETUPSEG(0x9020)
#
Set %ds = %cs, we know that SETUPSEG = %cs at this point
movw %cs,
%ax # aka SETUPSEG
movw %ax,
%ds
4. Then looks for the
signature, at the end of the setup block (SIG1 0xAA55, SIG2 0x5A5A) to ensure that
loader (LILO) loaded us right.
#
Check signature at end of setup
cmpw $SIG1,
setup_sig1
jne bad_sig
cmpw $SIG2,
setup_sig2
jne bad_sig
jmp good_sig1
5. If the signature is
missing we have to find the rest of the setup code. If we are unable to get the
code we will give up throwing a message “No setup signature found …” at put the
processor in halt state.
6. Change the
data-segment register to INITSEG 0x9000.
good_sig:
movw %cs,
%ax # aka
SETUPSEG
subw $DELTA_INITSEG,
%ax # aka INITSEG
movw %ax,
%ds
7. Check if the loader
version is proper, just to ensure that the loader can deal with the high loaded
kernel. Jump to ‘loader_ok’ if a proper loader version else Strike a message
“Wrong loader, giving up…” .
#
Check if an old loader tries to load a big-kernel
testb $LOADED_HIGH, %cs:loadflags #
Do we have a big kernel?
jz loader_ok # No, no
danger for old loaders.
cmpb $0,
%cs:type_of_loader # Do we
have a loader that
#
can deal with us?
jnz loader_ok # Yes,
continue.
pushw %cs #
No, we have an old loader,
popw %ds # die.
lea loader_panic_mess,
%si
call prtstr
jmp no_sig_loop
8. Get the extended
memory size in Kb that can be found at the offset 0×1E0.If different memory
detection scheme is used then try these three.First, try e820h, which lets us
assemble a memory map, then try e801h, which returns a 32-bit memory size, and
finally 88h, which returns 0-64m.
loader_ok:
#
Get memory size (extended mem, kB)
................................................................
................................................................
mem88:
#endif
movb $0x88,
%ah
int $0x15
movw %ax,
(2)
#
Set the keyboard repeat rate to the max
movw $0x0305,
%ax
xorw %bx,
%bx
int $0x16
9. Set the keyboard
repeat rate to the max.
#
Set the keyboard repeat rate to the max
movw $0x0305,
%ax
xorw %bx,
%bx
int $0x16
10. Check for video
adapter and its parameters and allow the user to browse video modes. This done
by calling video which is there in video.S
#
Check for video adapter and its parameters and allow the
#
user to browse video modes.
call video # NOTE:
we need %ds pointing
#
to bootsector
11. Get hd0 data,check
if hd1 is there ,scan for MCA bus.
#
Get hd0 data...
..........................
..........................
#
Get hd1 data...
............................
............................
12. After some more checking
finally we move to protected mode. If there is a valid pointer to a real mode
switch routine at offset “realmode_switch” then call that ,else leave it to the
“default_switch”. The default_switch routine disables interrupts [cli] &
NMI.
13. Now we move the
system to its rightful place … but we check if we have a big-kernel. In that
case we must not move it …we get the “code32_start” address & modify “code32
“ which is [0x1000 4K] for default for zImage or [0x100000 1Mb] for big
kernel,as it can be changed by the loader.
14. Now we will set up
the GDT and IDT.
15. Make sure any
possible coprocessor is properly reset.
16. Now we mask all
interrupts – the rest is done in init_IRQ() ,called from start_kernel() &
mask all IRQs but IRQ2 which is cascaded.
17. This the time when
we actually jump into the protected mode by setting the PE bit. [Movw $1, %ax
& lmsw %ax]
18. The last line
executed in this file is a jump to an assembly function called “startup_32”,
which performs additional initialization [/arch/i386/boot/compressed/head.S].
Head.S [ /arch/i386/boot/compressed/head.S ]
It performs
the following operations:
1. Initializes the segmentation
register.
startup_32:
cld
cli
movl $(__BOOT_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
lss stack_start,%esp
xorl %eax,%eax
1: incl %eax # check that A20 really IS enabled
movl %eax,0x000000 # loop forever if it isn't
cmpl %eax,0x100000
je 1b
...............................
................................
2. Sets up a provisional stack.
/*
* Initialize eflags. Some BIOS's leave bits like NT set. This would
* confuse the debugger if this code is traced.
* XXX - best to initialize before switching to
protected mode.
*/
pushl $0
popfl
3. Decompresses the kernel Image [decompress_kernel()
misc.c]
/*
* Do the decompression, and jump to the new
kernel..
*/
subl $16,%esp # place for structure on the stack
movl %esp,%eax
pushl %esi # real mode pointer as second arg
pushl %eax # address of structure as first arg
call
decompress_kernel
orl
%eax,%eax
jnz
3f
popl %esi # discard address
popl %esi # real mode pointer
xorl %ebx,%ebx
ljmp $(__BOOT_CS), $0x100000
4. “decompress_kernel” returns a value
telling whether we were loaded high or not. If not we straight away jump to
startup_32 function in the decompressed kernel, in /arch/i386/kernel/head.S
[0x100000],else we move the “move_in_place” routine to address 0×1000 [4K].This
will move the kernel to its final destination [0x100000].
................
..................
3:
movl $move_routine_start,%esi
movl $0x1000,%edi
movl $move_routine_end,%ecx
subl %esi,%ecx
addl $3,%ecx
shrl $2,%ecx
cld
...................
...................
Head.S [ /arch/i386/kernel/head.S
]
The second
startup_32() continues the initialization sequence. its main job is to set up
an environment within which the first process can execute. This includes:
1. Initializes
the segmentation registers with their final values.
/*
* Set segments to known values.
*/
cld
lgdt boot_gdt_descr - __PAGE_OFFSET
movl $(__BOOT_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
.................
.................
2. Sets up
the Kernel Mode stack for Process 0.
xorl
%eax,%eax
movl $__bss_start -
__PAGE_OFFSET,%edi
movl $__bss_stop -
__PAGE_OFFSET,%ecx
subl %edi,%ecx
shrl $2,%ecx
rep ; stosl
3.
Initializes the provisional kernel Page Tables & create a PDE
page_pde_offset
= (__PAGE_OFFSET >> 20);
movl $(pg0 - __PAGE_OFFSET), %edi
movl $(swapper_pg_dir -
__PAGE_OFFSET), %edx
movl $0x007, %eax /* 0x007 =
PRESENT+RW+USER */
10:
leal 0x007(%edi),%ecx /* Create PDE entry
*/
movl %ecx,(%edx) /* Store
identity PDE entry */
movl %ecx,page_pde_offset(%edx) /* Store kernel PDE entry
*/
addl $4,%edx
movl $1024, %ecx
.............
..............
..............
4. Stores
the address of the Page Global Directory in the cr3 register, and enables
paging by setting the PG bit in the cr0 register.
5. Fills
the bss segment of the kernel with zeros.
6. Invokes
setup_idtO to fill the IDT with null interrupt handlers.
7. The
first page frame is loaded with the system parameters learned from the BIOS and
the
parameters passed to the operating system from the boot loader.
parameters passed to the operating system from the boot loader.
8. Loads
the gdtr and idtr registers with the addresses of the GDT and IDT tables.
9. The
first CPU calls “start_kernel” which does the rest of initialization, all other
CPUs call “initialize_secondary”
.....................
......................
#ifdef
CONFIG_SMP
movb ready, %cl
cmpb $1,%cl
je 1f # the first CPU calls start_kernel
#
all other CPUs call initialize_secondary
call initialize_secondary
jmp L6
1:
#endif
/* CONFIG_SMP */
call start_kernel
L6:
jmp L6 # main should never return here, but
#
just in case, we know what happens.
Start_kernel()[ /init/main.c ]
The
start_kernel is the first function written in C. It performs the following
tasks.
1.Take a global kernel lock (it is needed so that only one CPU goes through initialisation).
1.Take a global kernel lock (it is needed so that only one CPU goes through initialisation).
/*__inint
Module */
asmlinkage
void __init start_kernel(void)
{
...................
...................
lock_kernel();
...................
}
2.Perform
arch-specific setup (memory layout analysis, copying boot command line again,
etc.).
page_address_init();
..................
setup_arch(&command_line);
setup_per_cpu_areas();
3.Print Linux kernel “banner” containing the version, compiler used to build it etc. to the kernel ring buffer for messages. This is taken from the variable linux_banner defined in init/version.c and is the same string as displayed by cat /proc/version.
printk(linux_banner);
4.Initialise
traps.
..............
................
trap_init();
................
5.Initialise
irqs.
.................
.................
init_IRQ();
................
6.Initialise
data required for scheduler.
.................
.................
sched_init();
................
7.Initialise
time keeping data.
..................
time_init();
.................
.................
8.Initialise
softirq subsystem.
...............
softirq_init();
..................
9.Parse
boot commandline options.
..................
...................
parse_args("Booting
kernel", command_line, __start___param,
__stop___param - __start___param,
&unknown_bootoption);
..................
10.Initialise
console.
............
console_init();
..............
11.If
module support was compiled into the kernel, initialise dynamical module
loading facility.
12.If
“profile=” command line was supplied, initialise profiling buffers.
............
............
profile_init();
............
13.kmem_cache_init(),
initialise most of slab allocator.
............
..............
kmem_cache_init();
................
14.Enable
interrupts.
.........
............
if (panic_later)
panic(panic_later,
panic_param);
profile_init();
local_irq_enable();
..................
15.Calculate
BogoMips value for this CPU.
............
..............
calibrate_delay();
................
16.Call
mem_init() which calculates max_mapnr, totalram_pages and high_memory and
prints out the “Memory: …” line.
...................
.....................
mem_init();
.......................
17.kmem_cache_sizes_init(),
finish slab allocator initialisation.
..............
..............
kmem_cache_init();
.................
18.Initialise
data structures used by procfs.
.................
..................
proc_root_init();
....................
19.fork_init(),
create uid_cache, initialise max_threads based on the amount of memory
available and configure RLIMIT_NPROC for init_task to be max_threads/2.
.................
..................
fork_init(num_physpages);
...................
20.Create
various slab caches needed for VFS, VM, buffer cache, etc.
............
anon_vma_init();
...............
vfs_caches_init_early();
..............
vfs_caches_init(num_physpages);
.............
buffer_init();
.............
21.If
System V IPC support is compiled in, initialise the IPC subsystem. Note that
for System V shm, this includes mounting an internal (in-kernel) instance of
shmfs filesystem.
22.If quota
support is compiled into the kernel, create and initialise a special slab cache
for it.
23.Perform
arch-specific “check for bugs” and, whenever possible, activate workaround for
processor/bus/etc bugs. Comparing various architectures reveals that “ia64 has
no bugs” and “ia32 has quite a few bugs”, good example is “f00f bug” which is
only checked if kernel is compiled for less than 686 and worked around
accordingly.
24.Set a
flag to indicate that a schedule should be invoked at “next opportunity” and
create a kernel thread init() which execs execute_command if supplied via
“init=” boot parameter, or tries to exec /sbin/init, /etc/init, /bin/init,
/bin/sh in this order; if all these fail, panic with “suggestion” to use
“init=” parameter.
25.Go into
the idle loop, this is an idle thread with pid=0.
static void
noinline rest_init(void)
{
kernel_thread(init, NULL, CLONE_FS |
CLONE_SIGHAND);
numa_default_policy();
unlock_kernel();
cpu_idle();
}