Programming concepts

From InstallGentoo Wiki
Jump to: navigation, search
Cleanup.png
Cleanup.png
CLEANUP CANDIDATE
Relevant discussion may be found on the talk page. Reason: Incomplete and long. Consider splitting into multiple smaller articles.


This article will contain a lot of the important concepts involved in programming, including common methods, pitfalls, techniques, and even some conventional wisdom.

This article is still a work in progress. Please take a look and contribute what you know, if you know something.

Crucial programmer skills

There are some skills that are considered vital to being a valuable programmer, and that if you wish to be one, you should definitely get to grips with.

  • Version control systems:
    • Git (recommended, especially for open source stuff) and/or Mercurial
  • Repositories
    • Gitorious
    • GitHub
    • BitBucket
  • Document typesetting/preparation
    • LaTeX typesetting system
    • Markdown
  • Console text editors
  • Command line skill
  • Compiler use
    • GNU Compiler Collection (gcc)
  • Debuggers
    • gdb, valgrind

It's also highly recommended that you subscribe to and check out Computerphile on youtube, as their videos are excellent (and interesting as fuck) for explaining concepts. Here's a list.

Guides to get you on your way

Best practices

On seeking help

Google, Stack Overflow and the documentation of whatever you are working on are your best friends!. Remember to consult them first, always! Usually, you will find a solution to your problems here, and even if you don't, you will have a better understanding of it, and people will appreciate that you've done some of your own research first (it saves time in explanations). ESR covers this in more depth in his article on How to ask smart questions

On programming the smart way

You can learn to program, but without a good foundation of theory behind you to help you make smart decisions with your code, you're going to be a shitty programmer. Go and watch some Computerphile videos, as mentioned above, and check out the Computer Science page. There's something to be said about optimising your algorithm before optimising your code.

Another way of making your time programming easier is using abstraction to simplify your understanding of how your program does things, by "hiding" away a lot of the technical details and dealing with them at a simpler, "higher" level. Check out this video. Abstraction is used all the time in programming; otherwise we'd get overwhelmed by the sheer size and complexity of our programs and programming languages

On knowing when to take a break

One thing that is important is knowing when to take a break. Sometimes, if you're stuck you'll sit there for hours trying to figure something out, and you'll replay the same thing over and over again in your mind. If you find yourself in this situation, instead, go and take a couple of hours' break. You could even take a shower, get some food, or sleep; just take a break away from your place and computer of programming. You'll find that once you come back to it, refreshed and ready, that you'll be able to spot the problem immediately, or at least come up with a solution more easily.

Terminology

There is a hell of a lot of terminology in programming. Here we'll try to clear it up.

Numeric data

Note: For a far more comprehensive listing of the numeric data sizes, see Wikipedia:Orders_of_magnitude_(data). Also, sometimes KiB is used instead of KB.

  • bit (b) - The smallest computer unit.
  • nybble - 4 bits.
  • octet - 8 bits.
  • byte (B) - Usually 8 bits, however, a byte isn't always 8 bits. It depends on the architecture of the CPU. Some bytes can be 16 bits, for example. Don't confuse a byte and an octet.
  • word - 4 bytes.
  • kilobyte (KB) - 1000 bytes.
  • kibibyte (KiB) - 1024 bytes
  • megabyte (MB) - 1000 KB
  • mebibyte (MiB) - 1024 KiB
  • gigabyte (GB) - 1000 MB
  • gibibyte (GiB) - 1024 MiB
  • terabyte (TB) - 1000 GB
  • tibibyte (TiB) - 1024 GiB
Endianness

Endianness refers to the order in which bytes are stored. On big endian architectures, the most significant byte is stored in the smallest address. In other words, it works in the same way as decimal is written by humans. For example, two thousand and fourteen is written as 2014, with the 2 on the far left representing the biggest, most significant value in the number, and with 4 on the far right representing the smallest, least significant value. Big endian systems store the bigger digits in the smaller addresses, and as the significance decreases, the address increases. Note that it's on a byte-by-byte basis, not bit-by-bit. Small endian processors, such as Intel x86, store the most significant byte in the biggest address. So, for example, 8 (1000 in binary) stored in a 4-byte integer would be stored as 00001000 00000000 00000000 00000000.

Signed int vs Unsigned int

TL;DR: This is really simple, I'm just providing details for those interested. Just give consideration to overflow.

A signed integer is able to store negative as well as non-negative numbers. Two's complement is the most common underlying representation of signed integers in almost any programming language (and CPU), and is also used to perform addition and subtraction on them. Operations described here such as addition and subtraction are done by your CPU, but you still have to watch out for overflows.

A signed integer in two's complement is negative if its most significant bit (MSB) is 1, and non-negative (either positive or zero) if the MSB is 0. If the number is non-negative, its representation is the number itself in binary; if it is negative, its magnitude may be obtained by flipping all its bits (including the most significant) and adding 1 to the result. The same operation is done to convert a non-negative number to a negative one. Note that if you were to perform this operation on the representation of zero (e.g. the octet 0000 0000) you would still obtain zero (with a carry). Also note that since zero is in the "non-negative" half of the range of representable numbers, you can always represent one more negative than you can represent actual positives. For instance, with 8 bits, the range of representable numbers goes from -128 to +127.

(TODO: Add carry considerations to the next section)

Adding two signed numbers in two's complement is as easy as adding them bit by bit (or just add them in your programming language of choice in decimal notation), but you still have to watch for overflow. Overflow happens when you get a result that goes beyond the largest or smallest possible number that is representable with the bit-length you're using. For instance, with an octet, for which the range in decimal goes from -128 to +127, adding +42 (two's complement: 0010 1010) to +113 (2C: 0111 0001) will yield -101 (2C: 1001 1011), which is clearly wrong. Similarly, adding -123 (2C: 1000 0101) to -15 (2C: 1111 0001) will yield +118 (2C: 0111 0110), which is also wrong. The basic rule to detect overflow is as follows: if the two operands have different signs, overflow never happens. If they have the same sign and the result has a different sign than the operands, then you have overflow. The fact that the range of representable numbers is always finite is a basic limitation of computers, though with a 32-bit signed integer (the usual size as you declare an "int" variable) your range goes from -2,147,483,648 all the way to 2,147,483,647, which is good enough for a lot of applications.

Subtraction of signed integers A and B is as easy as adding A to -B. Be aware that if A and B have different signs, using -B means we're doing a same-sign addition, so overflow may still occur.

If you don't need to represent negative numbers, you can use an unsigned integer (declare it "unsigned int" in C). An unsigned integer isn't interpreted as being in two's complement, and, as such, its most significant bit is also used in representing the number's magnitude (instead of just being a sign flag). Having one more bit to represent magnitude means the range of representable non-negative numbers doubles, but this also means you can't represent negative numbers. For instance, with 8 bits, the representable range of unsigned (non-negative) numbers goes from 0 to 255. With a common 32-bit unsigned int, the range goes from 0 to 4,294,967,296.

Two's complement representation is not to be confused with sign-magnitude representation. Sign-magnitude was an old way of representing signed integers, which also reserved the most significant bit as a sign flag, but considered it a separate "entity", making addition and subtraction, along with detecting overflow, a complicated ordeal, not to mention the useless complexities of considering positive zero (0000 0000 with 8 bits) and negative zero (1000 0000 with 8 bits). Thankfully sign-magnitude representation isn't used anymore.

ASCII vs Unicode

See this video.

Primitive data types

Note: The size of most of these data types are implementation specific. For example, in C, on a 32 bit machine, ints are 4 bytes. They're 8 bytes on a 64 bit linux machine. See Wikipedia:Primitive_data_type for more info.

  • short/short int - Half the size of an int.
  • int - A signed (negative or non-negative) integer.
  • long/long int - Twice the size of an int.
  • unsigned/unsigned int - An unsigned (non-negative) integer, with double the positive range of int. May be combined with short or long.
  • float - Floating point number. This is what computers use to represent decimal numbers. Check out this video on the floating point problem.
  • double - Double-precision floating point. This is a decimal number that is usually twice as big as a float.
  • boolean - True or False
  • char - A single byte, most used to represent ASCII character encoding. On essentially all devices, these are one byte (but one byte is not necessarily one octet).
  • unsigned char - A single byte with only non-negative values. May be used to represent 1-byte unsigned integers.
  • string - A sequence of characters. In C, these are not a primitive data type: they are represented as a char array which must be ended with a null byte, '\0'.

Abstract data types

This should be moved to a new page and extended.

  • Tree - An ordered structure that starts with a root node, and has child nodes, which have their own child nodes, etc. May be balanced for better performance.
  • Array - A sequence of data types.
  • Set - A collection in which no value occurs twice. Usually implemented as a balanced tree or hashtable.
  • Map / dictionary - A mapping of keys to values, in which the key is unique. These are usually represented by a hash table or a balanced tree.
  • Linked List - Starts with a head node, which contains data and has a pointer to the next node which is similar and so on.

Keywords

  • import/include
    • These types of keywords load libraries or other source files into a source file so that you can use functions, constants, classes, or other kinds of definitions from them. For example, to use strlen in C, you would put #include <string.h> at the top of your file. Which keyword used depends on the language. There are numerous keywords across all programming languages with similar usage (although the manner in which they accomplish importing/including may be very different from one another).
  • void
    • In certain statically typed languages, a void function means that the function doesn't return anything. In C, you can also cast pointers to void type, which means they are typeless and can be cast to other types later.
  • const vs final
    • In C/C++ making a variable or pointer const means that you promise not to write to it after initialization. If you attempt to do so anyway, the compiler will complain by showing an error.
  • abstract
    • An abstract class is one that you cannot create objects from (i.e. it cannot be instantiated). An abstract method is one which you declare, but don't write the code for (these are called prototypes in C). An class with abstract methods must be declared abstract. See the javadoc on the matter.
  • OOP static vs C static
  • class
    • A class is not a container. It's actually more like a blueprint. You specify how an object will work with methods in the class, and you specify attributes of the object in the class. You can then create new objects from that class (called instantiation)
  • interface
  • register
  • inline

Programming Paradigms

Check out this video.

Different languages are sometimes written in completely different ways. By becoming familiar with multiple languages that use different paradigms, you become a more effective programmer as the mindset used behind one paradigm can usually improve your way of thinking in another. Note that most languages can be used to program in more than one paradigms. Python, for example, has built-in support for imperative, OO, and even functional programming.

Imperative

  • C
  • Python

Goto

Goto lets the programmer skip to another point in the code, specified by a label.

int digits(int n)
{
	int numberOfDigits = 0;
start:
	n = n / 10;
	numberOfDigits++;
	if (n == 0)
		return numberOfDigits;
	goto start;
}

You can create some serious clusterfucks of impossible-to-debug code by using goto carelessly. This is why most modern languages forbid or discourage it, but actually things like exceptions and breaks are all kinds of goto, just limited to narrow, "safe" contexts.

Cleanup.png
Cleanup.png
CLEANUP CANDIDATE
Relevant discussion may be found on the talk page. Reason: Too technical for beginner programmers and laymen


For vs foreach vs while

Conceptual

Loops are a way of doing repeated iterations. The simplest one is while. For this loop, you define:

  1. A true/false condition
  2. Some logic that you want to be run

The computer will run the logic once, check the condition, and, if true, repeat. The code will thus run over and over until the condition is met (usually the computer can't instantly know as soon as the condition has been made true, it only checks it after that iteration is done running).

By giving a condition that is always true (like 1=1) you can make an infinite loop - the program will keep running whatever is inside that loop over and over and the only way to make it stop is forcibly terminating.

If we want the while loop to run a certain number of times, we can add a counter that we increment at each iteration. The condition will be whether it reached a certain value. Many languages have something called a for loop which does this. The typical for loop has a variable that starts at 0, gets incremented by 1 at each iteration, and the loop ends when it reaches some pre-defined value.

The for loop essentially runs one iteration for every number from x to y (hence the name). You can generalize it by instead making it run for every member of a given array. This is usually called a foreach loop. Keep in mind that foreach technically guarantees that there will be one iteration per member, the order in which those iterations can change. Sometimes running the same foreach twice can generate different sequences of events.

Practical

In practice, most languages have their own syntax for these types of loops, and often mix and/or expand their functionality:

  • Python's for loop is exclusively a foreach loop.
  • In C, the for loop uses more complex conditions and can be used similarly to a while loop.
  • The C-like do...while and the Pascal-like repeat...until are the same as a while loop, except that their block will be executed at least once, regardless of whether their condition is true or false.

A large collection of different loop syntax is available on Wikipedia:

Ternary operators

Most common programming language operators are binary, meaning they operate on two values (e.g., (3 + 2) * 4, + operates on 3 and 2, * operates on (3 + 2) and 4).

Some programming languages include ternary operators, which operate on three values. The most commonly seen example of a ternary operator, particularly in languages that borrow syntax from C, is the ternary conditional, or ?:.

Its usage can usually be described as

(condition) ? (if_true) : (if_false)

If (condition) evaluates as true, then the (if_true) statement is evaluated and its value is returned. If (condition) evaluates as false, then the (if_false) statement is evaluated and its value is returned instead.

Since the statement returns a value, it can be used to conditionally assign or substitute:

printf("There %s %d %s in the garden.\n",
    fox_count == 1 ? "is" : "are",
    fox_count,
    fox_count == 1 ? "fox" : "foxes"
);

If fox_count is 1, it will print "There is 1 fox in the garden". If fox_count were 3, it would instead print "There are 3 foxes in the garden.".

Functional

Functional programming languages are so named because everything is defined in terms of functions, as opposed to a sequence of actions like in imperative programming, or objects, like in OOP.

  • Lisp dialects (Common Lisp, Scheme, Clojure)
  • Haskell

First class and higher order functions

A function which takes simple data (i.e. a string, or an int) and returns simple data is called a first class function. A function which takes a first class function as an argument is called a second class function, and a function which takes a second class function is called a third class function, and so on. A second class function or higher is called a higher order function.

Anonymous functions / closures

An anonymous function is just a function without a name. Functional languages make it convenient to create a function and pass it around like any other piece of data, so the ability to create an anonymous function can make code clearer. The alternative would be to make the programmer declare and define all his functions before using them, but this would be distracting and annoying in languages where functions play so crucial a role. It should be noted that even though an anonymous function is anonymous to begin with, one way or another it is bound to an identifier so that it can be used. For example, many Lisp programmer define their functions like so:

(define square
  (lambda (x) (* x x)))

Since square has the value of an anonymous function which squares its input, the programmer can call square to access this function. An anonymous function is often returned by or passed to another "higher-order" function.

The run-time system is tasked with remembering all the information which a function needs in order to operate. For example, if an anonymous function is defined within another function, and the anonymous function refers to a variable from the outer function, the run-time system must keep that variable alive for the inner function, even after the outer function is done. This is called a closure.

(define (make-serial-number-generator)
  (let ((current-serial-number 0))
    (lambda ()
      (set! current-serial-number (+ current-serial-number 1))
      current-serial-number)))

In this example, make-serial-number-generator returns an anonymous function (a serial number generator) which operates on a variable from the outer function. Each time the serial number generator is called, it updates the value of current-serial-number, which it then returns. The effect is that each call to the serial number generator produces the next serial number. This is similar to the concept of static variables in C.

Currying

Map

To map a function onto a list just means applying the function to each member of the list. For example, to compute the first 100 terms of y = 3x + 1, one might run the following code:

(map (lambda (n) (+ 1 (* 3 n)))
     (iota 100))

Reduce

Reducing is a generalization of the common pattern in functional programs of applying a certain procedure to a base and somehow combining it with a list of data. For example, the factorial function is usually defined by recursively multiplying every number between n and 1 by 1. Here, the procedure is multiplication, the base is 1, and the list of data is every number between n and 1 (inclusive). In practice:

(define (factorial n)
  (reduce * 1 (iota n 1)))

Filter

Filtering is a simple concept with powerful applications. Filtering a list of data is just running the data through a test and removing anything that doesn't pass. For example, to choose all the numbers that are divisible between 3 or 5 below 100, one might run the following:

(filter (lambda (n)
           (or (= (modulo n 3) 0)
               (= (modulo n 5) 0)))
        (iota 100))

Filter takes in a function and a list. The function is used to test each value of the list. A list of all the members which qualify (meaning that the function returns true when passed the member) is returned.

Tail recursion optimisation

A tail call is when a function's last action is to call itself. Whenever this is the case, the run-time environment doesn't need to remember any of the information from the previous call. The advantage to this is that a function can call itself as many times as it needs to without getting a stack overflow.

A classic example comes from SICP. The following code does not use tail recursion.

(define (factorial n)
  (if (= n 1)
      1
      (* n (factorial (- n 1)))))

Because each successive call to factorial also includes a multiplication by n, the run-time environment must store each n in memory. This can lead to stack overflows or sluggish code. In contrast, consider the following tail-recursive code.

(define (factorial n)
  (fact-iter 1 1 n))

(define (fact-iter product counter max-count)
  (if (> counter max-count)
      product
      (fact-iter (* counter product)
                 (+ counter 1)
                 max-count)))

Because the last action of fact-iter is either to return product or to call itself, the implementation can optimize this so that it can't produce a stack overflow and it's faster.

Functional reactive

Functional reactive programming (FRP) is a subset of reactive programming built on the foundation of functional programming.

  • Elm

Object Oriented

Defining Object Oriented Programming (OOP) appears to be difficult for people to ellaborate on and even more difficult for beginners to grasp. The best way I was explained was to envision real life objects such as a fork, a knife, a plate and a piece of chicken. Each object has methods and attributes. You would stab with a fork, cut with a knife, and these are it's basic methods. The chicken can be raw, or cooked, which would be its attributes. You can use the knife and fork together to stabalize and cut the chicken, but only if its cooked.

Most OOP languages have incredible documentation that is recommended to have handy for references while developing software with these languages.

OOP is not necessarily GUI programming.

  • Java
  • C#
  • C++
  • D
  • Vala

Inheritance

Inheritance is when a class inherits all methods and attributes/properties from its previous class, and can override them.

Polymorphism

Polymorphism is when two or more objects have methods which have the same name. Generally, these methods do similar things, but they must accomplish these tasks differently because of the differences in the objects. For example, Cats and Dogs could both have a method, useBathroom, but the Cat method would involve the litter box, and the Dog's method would involve going outside. The programmer wouldn't even have to know which one he was dealing with. If he uses generic types, he could message an Animal to useBathroom, and it would, no questions asked.

Encapsulation

Abstraction

Function / Method / Operator overloading

Delegates

Interfaces, and Abstract and Virtual Classes

Constructors and destructors

A constructor is a function that is run when an object is initialized. A destructor is run when an object is destroyed.

An example in C++:

#include <iostream>
using namespace std;

class Example
{
    public:
        Example(int);  // Constructor
        ~Example(); // Destructor
        int value;
};

/* Constructor definition */
Example::Example(int arg)
{
    cout << "Example object created!" << endl;
    value = arg;
}

/* Destructor definition */
Example::~Example()
{
    cout << "Example object destroyed!" << endl;
}

int main(int argc, char *argv[])
{
    Example e(5);
    cout << e.value << endl;
    return 0;
}

Output:

~ » g++ example.cpp -o example && ./example
Example object created!
5
Example object destroyed!

If you would like to learn more about constructors or destructors, you should Google the syntax specific to the language you are trying to learn.

Declarative

Declarative languages differ from the others in that you specify what you want to get, not how you're going to get it like you would in other languages. SQL is used for querying databases, while Prolog is a logic language; it allows you to set up rules into a Knowledge Base, then query the knowledge base about things.

  • SQL
  • Prolog

Aspect Oriented

  •  ?

Concepts

For a lot of core programming concepts, it is highly recommended you check out the page on The C Programming Language, as C is the most used programming language, and because of how small the language is, it gives you valuable experience of stuff like memory management and good programming practices.

Magic numbers and meaningful constants

A magic number is any kind of literal in a program's source code whose meaning is not necessarily clear. For example, in the following code:

if Height > 1.98 then
  AllowedInPark := False
else
  Manlet := True

The meaning of the literal number 1.98, while possible to figure out from context, is not entirely clear. Magic numbers may be even more cryptic, such as in this example:

repeat
  Read(ACharacter)
until ACharacter = #13#10

Unless you have memorised ASCII control character literals, the meaning of #13#10 is difficult to grasp.

Source code is designed for humans to read and understand, so to avoid magic numbers, meaningful constants come in handy. Constants tie values to identifiers, allowing the programmer to give meaningful names to these values:

const
  MaxVehicleHeight = 1.98; // metres
...
if Height > MaxVehicleHeight then
    AllowedInPark := False
const
  EndOfLine = #13#10;
...
repeat
  Read(ACharacter)
until ACharacter = EndOfLine

There are other benefits to using constants instead of magic numbers: for example, when you use the same value multiple times in a program. If you later need to change that value, you can simply change the definition of the constant, instead of having to change every occurrence of the magic number individually in your source code.

Marshalling and Serialisation

Design patterns

Software testing

Firstly, you'll need to know the difference between black-box testing, and white-box testing. Black-box testing is testing a piece of software without knowing what is inside that software, i.e. you only test its functionality based on its input and output. White-box testing is the opposite; you test based on the software's methods and internal workings.

Unit testing

Computer graphics

Competitive Programming

You are in the jungle. You have a pocket-knife. Someone asks you to kill a mountain lion.

Anyone but a programmer would be asking "WTF is a MOUNTAIN lion doing in a JUNGLE?!", but that's not what you have been trained to do as a programmer. You are here to solve problems, not to question them.

Years of training has taught you well. You use your knife to sharpen a stick. You cut vines to lash sharp stones on one end. Maybe you're from a top university, and you've learned to extract essential ingredients from plant and insect life around you to fashion a poison to tip your weapon with.

Convinced that you have an effective and efficient way to kill the lion, you set forth to accomplish your task. Maybe your stick is too short, or your poisons don't work. It's okay - you live to refine your method and try again another day.

Then someone figures out a way to fashion a low-grade explosive from harvesting chemicals in the jungle. Your method of fashioning a spear to kill the lion is now far from the best way to accomplish your task. Nevertheless, it's still a simple way, and will continue to be taught in schools. Every lion-killer will be taught how to build his tools from scratch.

That's "real-life" programming.

In competitive programming, you start out with the same resources (a pocket-knife), except you have 2 minutes to kill the lion.

As a beginner, you will stare at the lion and do nothing.

Soon, you learn that if you kill a squirrel, sometimes the judge thinks it's a lion and you're good to go.

A more experienced programmer just keeps stabbing the lion and hopes that the lion dies in time. Soon, you learn that there are certain spots on a lion that are damage immune. You learn to not even bother stabbing those spots. Sometimes, the lion doesn't expose those spots, so you get really good at killing squirrels.

And then, to be a great competitive programmer, you need to be able to do two things.

First, you must learn how to find the lion's critical point and kill it in one swift stroke.

Second, you must learn how to be so handy with your knife that you can fashion a sharp stick in 1 minute, and spend the next minute stabbing the lion to death.

But never ever will you be able to have enough time to fashion an explosive to take the lion out.

Material

Books

There are books that teach programming competition. They are a very good resource to learn Computer Science


Compete

There are many pages to compete in online programming competitions. They often offer tutorials and many other more:

Training sites

If you want to train:

See Also