We are still actively working on the spam issue.

C

From InstallGentoo Wiki
Revision as of 23:38, 12 February 2014 by Eva-02 (talk | contribs) (Control structures)
Jump to: navigation, search

C is a programming language designed for application and systems programming. It was created by the late master himself, Dennis Ritchie, so that he could create Unix. C is standardized by ISO and is currently at the C11 standard.

C propaganda.jpg

Qualities

C is a very simple language, whose parts can combine to form very complex code structures.

Language

C is an imperative, procedural, structured language. This means:

  • Code is a sequence of statements that change program state
  • Statements may be organized in control structures like if, while, for
  • Code may be organized as functions, which may be called from elsewhere

Type system

C is statically but weakly typed. This means:

  • Types must be declared
  • Values can be loosely converted between types regardless of compatibility

Examples & Discussion

In this section you can find usage examples for the C programming language, which will be dissected and discussed to great length. It is not designed to be an easy-to-follow walkthrough or tutorial. It is designed to explain the features of the language in a complete manner as soon as they are introduced. When reading, feel free to skip to the next example and come back later if necessary.

The examples provided here can be saved to a file, compiled and run using a C compiler such as gcc or clang using the following commands:

$ gcc -o example example.c && ./example

Hello World!

Let's start with a simple, traditional "Hello, world!" program.

#include <stdio.h>
 
int main(void) {
    printf("Hello, world!\n");
    return 0;
}

The C preprocessor

The first line in this code is an #include statement.

The include statement is a C preprocessor directive which tells the compiler to look for the given header, in our case stdio.h. The compiler looks for the header in pre-configured system directories, such as /usr/include if you use a Unix-like operating system. If it is found, it substitutes the include statements for the contents of the header. Basically, it is copy-pasting the contents of the header into the place the include statement was in your source file.

Preprocessing happens before compilation proper and is essentially text manipulation. There are many preprocessor commands and they have a multitude of uses. The result of preprocessing a C source file is called a translation unit, which is the program that ultimately gets compiled, and will most likely contain a lot more text than the source file itself. Translation units are compiled into object files, which are the actual binary output of the compiler.

Note that, due to the include directive, a translation unit may actually be composed of multiple source files. The include directive can find any kind of text file, not just headers. For example, one source file might include other source files selectively depending on the supported features of the target platform. Thus, the source files that will be included need not be compiled directly; when the original source file is preprocessed, the appropriate implementation will be inserted into it and then compiled along with it.

Header files

A header file contains declarations of structures and functions. Including a header file is how you introduce the library to your source code. If the header file is not included, the compiler will not recognize the function when it comes across it. Even the standard library is subject to this process: we must #include <stdio.h> (standard input/output) if we wish to use the standard I/O functions.

Linking

In most cases, only the function declarations are supplied in header files; exactly how to access the actual code implementing those functions declared in the header is determined during a process called linking.

When you compile a translation unit, you obtain an object file. Since most projects involve multiple source files, we end up with several object files. Linking is the process that unifies them all into your actual program.

When it is compiling a translation unit, the compiler only knows about the existence of functions. It was explicitly told about their existence by the header files you included. The code it generates is incomplete. The linker is there to resolve all the references to those external functions and organize things so that they actually get executed.

There are three ways to link a program:

  • Static linking
  • Dynamic linking
  • Runtime linking
Static linking

In this mode, the linker copies all the code of every function directly into the binary, thus creating a self-contained program.

The advantage is your program can be run directly without needing to install the supporting libraries; their code is bundled right into the binary. This is great for operating systems like Windows, where dynamic linking is painful.

The disadvantage is the entire program itself needs to be updated in order to update the version of the library it uses, a problem that is solved by dynamic linking. It can also waste disk space by duplicating information.

Dynamic linking

In this mode, when the program is run, the operating system will invoke the linker in order to resolve the dependencies.

The advantage is that the library stays completely separate from your program, saving memory and hard disk space as well as allowing you to update it separately from your program.

The disadvantage is you now need to ensure the library is installed before attempting to run the program, for it will fail if it isn't. The dynamic libraries a program depends on are known as dependencies. When a program fails due to missing dependencies, it is known as dependency hell, or DLL hell in Windows circles.

Runtime linking

Operating systems often provide a way for you to load a shared library from your code.

The advantage is you can select the exact library you want to load via any kind of algorithm or even user input. This is frequently used to implement plugin systems.

The main function

After the include directive, the main function is defined.

The main function is the program's entry point. After the operating system is done loading your program into memory, it calls the main function in order to actually start the execution of your program.

The printf function

The printf function is defined by stdio.h and stands for print formatted. It is responsible for writing the string given to it to the standard output stream.

In this case, it writes Hello, world! to the terminal, followed by a line break.

Standard streams

The standard I/O streams are a program's main means of communicating with the outside world.

All programs start with three standard I/O streams connected to it:

  • Standard input stream
  • Standard output stream
  • Standard error stream
Standard input stream

This I/O stream is dedicated to any kind of input data that is being fed to the program. It can be directly accessed via the stdin global variable. IThe terminal's shell usually takes ownership of this stream, using it to feed keyboard input into your program. It could also be redirected to a file.

Standard output stream

This I/O stream is dedicated to the program's output. If your program generates any kind of data, this is where you should send that data to. It can be directly accessed via the stdout global variable. It is usually bound to the terminal's shell but is often redirected to a file.

By default, the printf function writes the data passed to it to this stream. The fprintf function lets you specify which stream you want to write to. The following lines are equivalent:

printf("Hello, world!\n");
fprintf(stdout, "Hello, world!\n");
Standard error stream

This I/O stream is dedicated to everything that isn't part of the program's output. It can be directly accessed via the stderr global variable. Like to the standard output stream, it is usually bound to the terminal's shell.

Contrary to what the name implies, this stream isn't only for error messages; any and all communication with the user should be written there. The distinction between program output and other kinds of data are important; on Unix-like system, it is common to redirect the output of a program to a file. In that case, error messages will still appear on the user's terminal, where he will immediately spot them.

Variables

Variables represent the value stored in a memory location. They hold the data you store in them, and from then on you can refer to that data by name.

Variables always have a type that determines how the data is encoded in terms of bits. You can convert variables from one type to another in a process known as type casting. This process alters how the data is interpreted at the bit level and can produce unexpected or undesirable results due to incompatibilities between types.

C has several built-in types. One of them is int, one of several integer types. These integers are encoded as a sequence of bits. Precisely how the exact sequence of bits is interpreted by the processor depends on the endianness of the platform. Integers can be signed or unsigned: this affects how the most significant bit (MSB) is interpreted. If it is signed, the MSB is used to encode the sign; if it is unsigned, the MSB is just another digit.

#include <stdio.h>

int main(void) {

    // Allocate automatic storage for an integer
    // Assign it the value of 9001
    // Bind it to the name power_level
    int power_level = 9001;

    // %d serves as a placeholder
    // It means "put the first argument here"
    // It also tells printf that the argument is a signed integer
    printf("%d\n", power_level);
}

Control structures

C is a structured programming language, which means it has formal, high-level syntactic constructs for controlling program flow. A step up the likes of assembly and BASIC, where all you have is goto. Speaking of the devil: while generally discouraged, that keyword has its uses. However, that is for another time.

An algorithm defining a program consists of a sequence of operations to execute, but this sequence does not need to be linear. Sometimes, we need to use available information in order to make decisions about what to do. The most primitive way of doing this is to provide a primitive conditional jump operation that performs an operation, usually a comparison, and either continues on as if nothing happened or jumps to the specified location in the code. In machine code, that is how it works.

Control statements are a step above that. They allow us to reason about our program like we do in real life:

#include <stdio.h>

int main(void) {
    printf("Calculating powerlevel...\n");

    int power_level = 1000;

    printf("Hmm... %d. Not impressive.\n", power_level);

    // Executes the following block while the condition is true
    // 1 is always true, so it is an infinite loop
    while (1) {

        // Increments power_level by one
        ++power_level;

        // Is power_level divisible by 2000?
        if (power_level % 2000 == 0) {
            // It is, so print it
            printf("%d ...\n", power_level);
        }

        // Is power_level higher than 9000?
        if (power_level > 9000) {
            // It is, so get surprised!
            printf("It's over nine thousand!!\n");

            // Breaks the while loop, letting the program finish
            break;
        }
    }

    return 0;
}

Functions

Functions are a way to group a block code that provides some functionality under a name. This way, it can be called whenever we need it. In the example, a function's been present all this time: main is a function, and our example code are the code blocks that are executed when it is called. The main function is called by the operating system when it spawns your program's process.

Functions can return values and take arguments. The main function takes no arguments, as evidenced by the void between the parentheses, and it returns an integer value back to the operating system; it is the program's exit status. How the operating system interprets that value depends.

#include <stdbool.h>
#include <stdio.h>

/**
 * Define a function that returns whether the
 * given power level is over 9000
 */
bool is_over_nine_thousand(int power_level) {
    return power_level > 9000;
}

int main(void) {
    printf("Calculating powerlevel...\n");

    int power_level = 1000;

    printf("Hmm... %d. Not impressive.\n", power_level);

    // true is actually 1
    while (true) {
        ++power_level;

        if (power_level % 2000 == 0) {
            printf("%d ...\n", power_level);
        }

        // Call the function passing power_level as argument
        // Use its result as argument to the if control structure
        if (is_over_nine_thousand(power_level)) {
            printf("It's over nine thousand!!\n");
            break;
        }
    }

    return 0;
}

Pointers

Pointers are a type of variable whose value represents the location of another variable in memory. They provide an indirect way to refer to variables; with pointers, you can refer to the variable itself. Through the pointer, you can read from and write to the variable's contents even if the variable doesn't really exist in the current scope.

Pointer types are denoted by appending an asterisk to the type being pointed to. So, the type of a pointer to an int would be noted as int *.

The pointer itself is a variable, and you can create another variable to point to the pointer if you wish. If the type of a pointer to an int is int *, the type of a pointer to an int * is int **.

#include <stdbool.h>
#include <stdio.h>

bool is_over_nine_thousand(int power_level) {
    return power_level > 9000;
}

// Define a function that increments the power level
void increment_power_level(int * power_level) {

    // "Follows" the pointer, getting to the variable it points to
    ++(*power_level);
}

int main(void) {
    printf("Calculating powerlevel...\n");

    int power_level = 1000;

    printf("Hmm... %d. Not impressive.\n", power_level);

    while (true) {

        // Unary & is the "address-of" operator
        // Returns a pointer to the variable, which is passed to the function
        // The function is able to operate on our variable directly through the pointer
        increment_power_level(&power_level);

        if (power_level % 2000 == 0) {
            printf("%d ...\n", power_level);
        }

        if (is_over_nine_thousand(power_level)) {
            printf("It's over nine thousand!!\n");
            break;
        }
    }

    return 0;
}

Remember, variables hold a value. If you pass variables around, you are copying that value into another variable: the argument of a function. If you pass a pointer to a variable, it's as if you were passing the variable itself to the function. It allows you to read and modify it remotely. This is useful if you have a variable that holds a ton of data, like the pixels of an image. Making copies of that simply won't do. The solution is to pass a pointer to that memory.

Finally, pointers themselves are variables. You can read and modify pointers remotely as well. They, too, are copied when they are passed to functions or assigned to other pointers. However, the size of a pointer is always constant and they are small enough for it to not matter.

Many other languages pretend to hide pointers from you. However, variable names in those languages work very much like pointers do in C. What they really do is hide variables from you. You never actually get at the memory itself; that memory stays hidden away, protected in the bowels of the virtual machine that implements the language.

Even Java has a NullPointerException, and it manifests itself in other languages in various forms. Nobody will ever, ever be free of that error, because it's just how abstraction works: through levels of indirection.