Difference between revisions of "Programming concepts"

Revision as of 18:31, 7 March 2014

CLEANUP CANDIDATE

Relevant discussion may be found on the talk page. Reason: Incomplete and long. Consider splitting into multiple smaller articles.

This article will a lot of the important concepts involved in programming, including common methods, pitfalls, techniques and even some conventional wisdom.

This article is still a work in progress. Please take a look and contribute what you know, if you know something.

Crucial programmer skills

There are some skills that are considered vital to being a valuable programmer, and that if you wish to be one, you should definitely get to grips with.

Version control systems:
- Git (recommended, especially for open source stuff) and/or Mercurial
Repositories
- Gitorious
- GitHub
- BitBucket
Document typesetting/preparation
- LaTeX typesetting system
- Markdown
Console text editors
- Vim and/or Emacs
Command line skill
- GNU/Linux/Unix shell (read the [man pages])
- Windows cmd
- regular expressions
Compiler use
- GNU Compiler Collection (gcc)
Debuggers
- gdb, valgrind

It's also highly recommended that you subscribe to and check out Computerphile on youtube, as their videos are excellent (and interesting as fuck) for explaining concepts. Here's a list.

Guides to get you on your way

Best practices

Google, Stack Overflow and the documentation of whatever you are working on are your best friends!. Remember to consult them first, always! Usually, you will find a solution to your problems here, and even if you don't, you will have a better understanding of it, and people will appreciate that you've done some of your own research first (it saves time in explanations).

You can learn to program, but without a good foundation of theory behind you to help you make smart decisions with your code, you're going to be a shitty programmer. Go and watch some Computerphile videos, as mentioned above, and check out the [Computer Science] page. There's something to be said about optimising your algorithm before optimising your code.

One thing that is important is knowing when to take a break. Sometimes, if you're stuck you'll sit there for hours trying to figure something out, and you'll replay the same thing over and over again in your mind. If you find yourself in this situation, instead, go and take a couple of hours' break. You could even take a shower, get some food, or sleep; just take a break away from your place and computer of programming. You'll find that once you come back to it, refreshed and ready, that you'll be able to spot the problem immediately, or at least come up with a solution more easily.

The art of abstraction is important.

Terminology

There is a hell of a lot of terminology in programming. Here we'll try to clear it up.

Numeric data

Note: For a far more comprehensive listing of the numeric data sizes, see here. Also, sometimes KiB is used instead of KB. Consult this page for more info.

bit (b) - The smallest computer unit.
nybble - 4 bits.
octet - 8 bits.
byte (B) - Usually 8 bits, however, a byte isn't always 8 bits. It depends on the architecture of the CPU. Some bytes can be 16 bits, for example.
word - 4 bytes.
kilobyte (KB) - 1024 bytes.
megabyte (MB) - 1024 KB
gigabyte (GB) - 1024 MB
terabyte (TB) - 1024 GB

Endianness

Endianness refers to the order in which bytes are stored. On big endian architectures, the most significant byte is stored in the smallest address. In other words, it works in the same way as decimal is written by humans. For example, two thousand and fourteen is written as 2014, with the 2 on the far left representing the biggest, most significant value in the number, and with 4 on the far right representing the smallest, least significant value. Big endian systems store the bigger digits in the smaller addresses, and as the significance decreases, the address increases. Note that it's on a byte-by-byte basis, not bit-by-bit. Small endian processors, such as Intel x86, store the most significant byte in the biggest address. So, for example, 8 (1000 in binary) stored in a 4-byte integer would be stored as 00001000 00000000 00000000 00000000.

Signed int vs Unsigned int

A signed integer is able to store negative numbers in addition to positive numbers. An integer is negative if the left-most bit is set to 1. "One's complement" is used to perform addition and subtraction on negative numbers. The one's complement of a number is the binary representation with all the bits inverted. With this system, a negative number becomes positive whenever a number big enough is added to it and all the bits flip back, while the leading 1 carries past the edge of the integer.

By using an unsigned integer, one can use the leading bit for positive numbers, doubling the amount of positive integers that can be held, but also making the integer unable to hold negative numbers.

ASCII vs Unicode

See this video.

Primitive data types

Note: The size of most of these data types are implementation specific. For example, in C, on a 32 bit machine, ints are 4 bytes. They're 8 bytes on a 64 bit linux machine. See here for more info.

short/short int - Half the size of an int.
int - A positive or negative integer.
long/long int - Twice the size of an int.
float - Floating point number. This is what computers use to represent decimal numbers. Check out this video on the floating point problem.
double - Double-precision floating point. This is a decimal number that is usually twice as big as a float.
boolean - True or False
char - A single character. On essentially all devices, these are one byte.
string - A sequence of characters. In C, these are represented as a char array and must be ended with a null byte, '\0'.

Abstract data types

This should be moved to a new page and extended.

Tree - An ordered structure that starts with a root node, and has child nodes, which have their own child nodes, etc. May be balanced for better performance.
Array - A sequence of data types
Set - A collection in which no value occurs twice. Usually implemented as a balanced tree or hashtable.
Map / dictionary - A mapping of keys to values, in which the key is unique. These are usually represented by a hash table or a balanced tree.
Linked List - Starts with a head node, which contains data and has a pointer to the next node which is similar and so on.

Keywords

import/include
- These types of keywords load libraries or other source files into a source file so that you can use functions, constants, classes, or other kinds of definitions from them. For example, to use strlen in C, you would put #include <string.h> at the top of your file. Which keyword used depends on the language. There are numerous keywords across all programming languages with similar usage (although the manner in which they accomplish importing/including may be very different from one another).
void
- In certain statically typed languages, a void function means that the function doesn't return anything. In C, you can also cast pointers to void type, which means they are typeless and can be cast to other types later.
const vs final
- In C/C++ making a variable or pointer const means that you promise not to write to it after the first assignment. If you attempt to do so anyway, the compiler will complain by showing an error.
abstract
- An abstract class is one that you cannot create objects from (i.e. it cannot be instantiated). An abstract method is one which you declare, but don't write the code for (these are called prototypes in C). An class with abstract methods must be declared abstract. See the javadoc on the matter.
OOP static vs C static
class
- A class is not a container. It's actually more like a blueprint. You specify how an object will work with methods in the class, and you specify attributes of the object in the class. You can then create new objects from that class (called instantiation)
interface
register
inline

Programming Paradigms

Check out this video.

Different languages are sometimes written in completely different ways. By becoming familiar with multiple languages that use different paradigms, you become a more effective programmer as the mindset used behind one paradigm can usually improve your way of thinking in another. Note that most languages can be used to program in more than one paradigms. Python, for example, has built-in support for imperative, OO, and even functional programming.

Imperative

C
Python

Goto

Goto lets the programmer skip to another point in the code, specified by a label.

int digits(int n)
{
	int numberOfDigits = 0;
	start:
	n = n / 10;
	numberOfDigits++;
	if (n == 0)
		return numberOfDigits;
	goto start;
}

Some programmers discourage using goto for various reasons, which include a perceived tendency to produce unreadable 'spaghetti code' and the inability of some compilers to optimise effectively. This is a classic example of misdirected hatred. Goto is a tool which can be used and misused as any other.

For vs foreach vs while

Conceptual

The idea of a for loop is to iterate over some known range of ordinals, repeating a block of statements for each ordinal, with a counter variable that represents the current ordinal in the range at each iteration. The range may simply represent an actual range of ordinals, or, as is often the case, it may represent the indexes of some kind of list.

A foreach is similar to a for loop, except that instead of iterating over a separate range of ordinals, it is used to enumerate over a list, and the counter variable will instead contain the current element of the list at each iteration.

while loops on the other hand are loops which repeat some block of statements until a certain condition is met, and don't have as clear a beginning or ending as for or foreach loops do, since the condition may be much more complex.

Practical

In practice, most languages have their own syntax for these types of loops, and often mix and/or expand their functionality:

Python's for loop is exclusively a foreach loop.
In C, the for loop uses more complex conditions and can be used similarly to a while loop.
The C-like do...while and the Pascal-like repeat...until are the same as a while loop, except that their block will be executed at least once, regardless of whether their condition is true or false.

A large collection of different loop syntax is available on Wikipedia:

Ternary operators

Most common programming language operators are binary, meaning they operate on two values (e.g., (3 + 2) * 4, + operates on 3 and 2, * operates on (3 + 2) and 4).

Some programming languages include ternary operators, which operate on three values. The most commonly seen example of a ternary operator, particularly in languages that borrow syntax from C, is the ternary conditional, or ?:.

Its usage can usually be described as

(condition) ? (if_true) : (if_false)

If (condition) evaluates as true, then the (if_true) statement is evaluated and its value is returned. If (condition) evaluates as false, then the (if_false) statement is evaluated and its value is returned instead.

Since the statement returns a value, it can be used to conditionally assign or substitute:

printf("There %s %d %s in the garden.\n",
    fox_count == 1 ? "is" : "are",
    fox_count,
    fox_count == 1 ? "fox" : "foxes"
);

If fox_count is 1, it will print "There is 1 fox in the garden". If fox_count were 3, it would instead print "There are 3 foxes in the garden.".

Functional

Functional programming languages are so named because everything is defined in terms of functions, as opposed to a sequence of actions like in imperative programming, or objects, like in OOP.

Lisp dialects (Common Lisp, Scheme, Clojure)
Haskell

First class and higher order functions

A function which takes simple data (i.e. a string, or an int) and returns simple data is called a first class function. A function which takes a first class function as an argument is called a second class function, and a function which takes a second class function is called a third class function, and so on. A second class function or higher is called a higher order function.

Anonymous functions / closures

An anonymous function is just a function without a name. Functional languages make it convenient to create a function and pass it around like any other piece of data, so the ability to create an anonymous function can make code clearer. The alternative would be to make the programmer declare and define all his functions before using them, but this would be distracting and annoying in languages where functions play so crucial a role. It should be noted that even though an anonymous function is anonymous to begin with, one way or another it is bound to an identifier so that it can be used. For example, many Lisp programmer define their functions like so:

(define square
  (lambda (x) (* x x)))

Since square has the value of an anonymous function which squares its input, the programmer can call square to access this function. An anonymous function is often returned by or passed to another "higher-order" function.

The run-time system is tasked with remembering all the information which a function needs in order to operate. For example, if an anonymous function is defined within another function, and the anonymous function refers to a variable from the outer function, the run-time system must keep that variable alive for the inner function, even after the outer function is done. This is called a closure.

(define (make-serial-number-generator)
  (let ((current-serial-number 0))
    (lambda ()
      (set! current-serial-number (+ current-serial-number 1))
      current-serial-number)))

In this example, make-serial-number-generator returns an anonymous function (a serial number generator) which operates on a variable from the outer function. Each time the serial number generator is called, it updates the value of current-serial-number, which it then returns. The effect is that each call to the serial number generator produces the next serial number. This is similar to the concept of static variables in C.

Currying

Map

To map a function onto a list just means applying the function to each member of the list. For example, to compute the first 100 terms of y = 3x + 1, one might run the following code:

(map (lambda (n) (+ 1 (* 3 n)))
     (iota 100))

Reduce

Reducing is a generalization of the common pattern in functional programs of applying a certain procedure to a base and somehow combining it with a list of data. For example, the factorial function is usually defined by recursively multiplying every number between n and 1 by 1. Here, the procedure is multiplication, the base is 1, and the list of data is every number between n and 1 (inclusive). In practice:

(define (factorial n)
  (reduce * 1 (iota n 1)))

Filter

Filtering is a simple concept with powerful applications. Filtering a list of data is just running the data through a test and removing anything that doesn't pass. For example, to choose all the numbers that are divisible between 3 or 5 below 100, one might run the following:

(filter (lambda (n)
           (or (= (modulo n 3) 0)
               (= (modulo n 5) 0)))
        (iota 100))

Filter takes in a function and a list. The function is used to test each value of the list. A list of all the members which qualify (meaning that the function returns true when passed the member) is returned.

Tail recursion optimisation

A tail call is when a function's last action is to call itself. Whenever this is the case, the run-time environment doesn't need to remember any of the information from the previous call. The advantage to this is that a function can call itself as many times as it needs to without getting a stack overflow.

A classic example comes from SICP. The following code does not use tail recursion.

(define (factorial n)
  (if (= n 1)
      1
      (* n (factorial (- n 1)))))

Because each successive call to factorial also includes a multiplication by n, the run-time environment must store each n in memory. This can lead to stack overflows or sluggish code. In contrast, consider the following tail-recursive code.

(define (factorial n)
  (fact-iter 1 1 n))

(define (fact-iter product counter max-count)
  (if (> counter max-count)
      product
      (fact-iter (* counter product)
                 (+ counter 1)
                 max-count)))

Because the last action of fact-iter is either to return product or to call itself, the implementation can optimize this so that it can't produce a stack overflow and it's faster.

Functional reactive

Functional reactive programming (FRP) is a subset of reactive programming built on the foundation of functional programming.

Elm

Object Oriented

Defining Object Oriented Programming (OOP) appears to be difficult for people to ellaborate on and even more difficult for beginners to grasp. The best way I was explained was to envision real life objects such as a fork, a knife, a plate and a piece of chicken. Each object has methods and attributes. You would stab with a fork, cut with a knife, and these are it's basic methods. The chicken can be raw, or cooked, which would be its attributes. You can use the knife and fork together to stabalize and cut the chicken, but only if its cooked.

Most OOP languages have incredible documentation that is recommended to have handy for references while developing software with these languages.

OOP is not necessarily GUI programming.

Java
C#
C++
D

Inheritance

Polymorphism

Polymorphism is when two or more objects have methods which have the same name. Generally, these methods do similar things, but they must accomplish these tasks differently because of the differences in the objects. For example, Cats and Dogs could both have a method, useBathroom, but the Cat method would involve the litter box, and the Dog's method would involve going outside. The programmer wouldn't even have to know which one he was dealing with. If he uses generic types, he could message an Animal to useBathroom, and it would, no questions asked.

Encapsulation

Abstraction

Function / Method / Operator overloading

Declarative

Declarative languages differ from the others in that you specify what you want to get, not how you're going to get it like you would in other languages. SQL is used for querying databases, while Prolog is a logic language; it allows you to set up rules into a Knowledge Base, then query the knowledge base about things.

SQL
Prolog

Aspect Oriented

?

Concepts

For a lot of core programming concepts, it is highly recommended you check out the page on The C Programming Language, as C is the most used programming language, and because of how small the language is, it gives you valuable experience of stuff like memory management and good programming practices.

Magic numbers and meaningful constants

A magic number is any kind of literal in a program's source code whose meaning is not necessarily clear. For example, in the following code:

if Height > 1.98 then
  AllowedInPark := False

The meaning of the literal number 1.98, while possible to figure out from context, is not entirely clear. Magic numbers may be even more cryptic, such as in this example:

repeat
  Read(ACharacter)
until ACharacter = #13#10

Unless you have memorised ASCII control character literals, the meaning of #13#10 is difficult to grasp.

Source code is designed for humans to read and understand, so to avoid magic numbers, meaningful constants come in handy. Constants tie values to identifiers, allowing the programmer to give meaningful names to these values:

const
  MaxVehicleHeight = 1.98; // metres
...
if Height > MaxVehicleHeight then
    AllowedInPark := False

const
  EndOfLine = #13#10;
...
repeat
  Read(ACharacter)
until ACharacter = EndOfLine

There are other benefits to using constants instead of magic numbers: for example, when you use the same value multiple times in a program. If you later need to change that value, you can simply change the definition of the constant, instead of having to change every occurrence of the magic number individually in your source code.

Design patterns

Software testing

Firstly, you'll need to know the difference between black-box testing, and white-box testing. Black-box testing is testing a piece of software without knowing what is inside that software, i.e. you only test its functionality based on its input and output. White-box testing is the opposite; you test based on the software's methods and internal workings.

Unit testing

Computer graphics

Competitive Programming

You are in the jungle. You have a pocket-knife. Someone asks you to kill a mountain lion.

Anyone but a programmer would be asking "WTF is a MOUNTAIN lion doing in a JUNGLE?!", but that's not what you have been trained to do as a programmer. You are here to solve problems, not to question them.

Years of training has taught you well. You use your knife to sharpen a stick. You cut vines to lash sharp stones on one end. Maybe you're from a top university, and you've learned to extract essential ingredients from plant and insect life around you to fashion a poison to tip your weapon with.

Convinced that you have an effective and efficient way to kill the lion, you set forth to accomplish your task. Maybe your stick is too short, or your poisons don't work. It's okay - you live to refine your method and try again another day.

Then someone figures out a way to fashion a low-grade explosive from harvesting chemicals in the jungle. Your method of fashioning a spear to kill the lion is now far from the best way to accomplish your task. Nevertheless, it's still a simple way, and will continue to be taught in schools. Every lion-killer will be taught how to build his tools from scratch.

That's "real-life" programming.

In competitive programming, you start out with the same resources (a pocket-knife), except you have 2 minutes to kill the lion.

As a beginner, you will stare at the lion and do nothing.

Soon, you learn that if you kill a squirrel, sometimes the judge thinks it's a lion and you're good to go.

A more experienced programmer just keeps stabbing the lion and hopes that the lion dies in time. Soon, you learn that there are certain spots on a lion that are damage immune. You learn to not even bother stabbing those spots. Sometimes, the lion doesn't expose those spots, so you get really good at killing squirrels.

And then, to be a great competitive programmer, you need to be able to do two things.

First, you must learn how to find the lion's critical point and kill it in one swift stroke.

Second, you must learn how to be so handy with your knife that you can fashion a sharp stick in 1 minute, and spend the next minute stabbing the lion to death.

But never ever will you be able to have enough time to fashion an explosive to take the lion out.

Material

Books

There are books that teach programming competition. They are a very good resource to learn computer Science

Compete

There are many pages to compete in online programming competitions. They often offer tutorials and many other more:

TopCoder
Codeforces
Codechef
Google CodeJam
Facebook HackerCup (I do not reccomend this one -- Reason?)
ICPC (Might be available in your college)
IOI If you are underage and autistic, look for the local version in your country of this one.

Training sites

If you want to train:

@@ Line 143: / Line 143: @@
 }
 </pre>
-The use of goto is highly discouraged because of the tendency to produce spaghetti code and because compilers often lose the ability to optimize and analyze a piece of code if it uses goto. It is best used sparingly.
+Some programmers discourage using goto for various reasons, which include a perceived tendency to produce unreadable 'spaghetti code' and the inability of some compilers to optimise effectively. This is a classic example of misdirected hatred. Goto is a tool which can be used and misused as any other.
 ==== For vs foreach vs while ====