Thursday, March 4, 2010

READING THE LINUX KERNEL SOURCE CODE

Linux kernel is of the most stable and widely used kernel available and also the most rapidly changing one (as per the rate of change of lines of code). With over 1.7 million lines of code , some assembly and mostly C code, the obvious question is where to start reading it. This is what we are going to talk about here.


GETTING THE KERNEL SOURCES:
Linux kernel is distributed in versions ,the latest being version 2.6, each version further have many releases like 2.6.23.1-42. You can get the source code of all the releases at www.kernel.org as a .tar.bz2(or any other format) compressed image. To decompress XXX.tar.bz2 file using something like .

# tar -cjf XXX.tar.bz2 DIR_TO_STORE/

Decompressing the file creates a directory structure with a number of sub-directories.
Here we used an older version of Linux (other version also have almost identical structure) and downloaded the file Linux-2.2.0.tar.bz2. which is 10.1MB in size. Its decompression resulted in the creation of a directory Linux-2.2 with the following important sub-directories :

arch- It has a directory for each architecture supported( alpha, i386 etc) and contains all the architecture specific code. It means it contains all the low level code that define the underlying architecture.

boot- It contains two very important assembly language code files boot.S and head.S( though in the newer versions these have been moved inside the arch/XXX/kernel/ sub-directory, where XXX is the architecture name ). These files contain code for the initial booting of the kernel.

Documentation- The Linux kernel documentation.

fs- code files related to file system.

drivers- code for drivers for various devices supported like pci,usb, IDE, acpi, CDrom etc.

init- this is where the first process is started by the kernel.

kernel- files relating to system calls , kernel synchronization , timing etc.

mm- memory management code files.

ipc- inter-process communication code.

include- all the .h header files used in various .c files in the kernel .


Where to start:
The best place to start with should be the place where init process is created by the kernel in the file SOURCE_CODE_DIR/init/main.c. Here is where usermode operation of the system starts after successful boot. You would also like to have a look at the head.S and boot.S start up files to see what actually happens at startup. When the kernel boots up all the messages printed out by the kernel to the display is done by a kernel space function printk(), kernel cannot use the traditional C function printf() because printf() itself ultimately depends upon the kernel system calls to perform its functions. All this messages produced during startup are saved in a kernel buffer and can be viewed by the command dmesg even after bootup is complete.

# dmesg

Now you can search for these strings along with the printk() to find there location in the kernel sources to find where actually these events happen like say you can search for "printk(" Calibrating delay loop...") " in the sources to find that it is printed in the file inti/main.c.

You can visit http://lxr.linux.no which hosts code of all the versions Linux and can cross-reference the code there. It provides hyperlinked network between various keywords in the code, say you have a function call xyz() in a file , if you click on xyz() it will take you to the location in the file where xyz() is defined.

You can have a look at a nice article on where to start reading Linux kernel source code at http://en.wikiversity.org/wiki/Reading_the_Linux_Kernel_Sources.

Happy reading kernel!!

Mohan Gupta
CSE Final Year
NIT Jalandhar

No comments:

Post a Comment