We are still actively working on the spam issue.

Computer science

From InstallGentoo Wiki
Jump to: navigation, search
Imbox move.png
Imbox move.png
MOVE CANDIDATE
This page is being proposed to be moved to somewhere else. Relevant discussion may be found on the talk page. Reason: Computer science is a broad area and the information on this page only relates to algorithms and data types.
Note: Check out this page on the /sci/ wiki. It does a much better job than this page (currently)

What is Computer Science? As Harold Abelson from the SICP lectures says, the name Computer Science is in fact misleading.

"Computer science is not about computers, in the same sense that physics is not really about particle accelerators, and biology is not about microscopes and Petri dishes."

Thus, "computer" is not needed in the title. Secondly, a lot of Computer Science is mathematics and theory. There is very little actual scientific method like one would find in Physics, Chemistry, or Biology, so in this sense, it's not really a science as much as it is a form of applied mathematics or engineering. So Computer Science should be replaced with a much more informative name.

So what is Computer Science then? CS consists of algorithms for solving problems in the most mathematically convenient or efficient way. Take for example the problem of finding the fastest route between several locations in a city; this is solved using a graph searching algorithm. There are also data structures designed for storing and indexing huge amounts of data in a database, protocols for making sure that data transmission is reliable, statistical learning methods behind AI, and so on.

Getting started

The most recommended Computer Science books are Structure and Interpretations of Computer Programs (SICP), and what is considered the holy grail of CS literature: The Art of Computer Programming (TAOCP) by Donald Knuth.

You'll find some eBooks at http://books.gentoomen.org/ and you can buy physical ones for cheap from http://abebooks.com

A good place to start would be the SICP lectures on youtube, found here, and Computerphile's videos, found here.

Algorithms

An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values as output. It is the part that does the magic.

There are a few programming concepts discussed in the Programming concepts page.

For the rest, check Introduction to Algorithms (by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein), and if you have the balls and the time, check The Art of Computer Programming (by Donald D. Knuth).

There is also a good introduction to many concepts explained here.

Sorting Algorithms

A big topic in computer applications is the quick and efficient sorting of data. There are many different sorting techniques, as a programmer one has to balance the algorithm's ease of reading vs the efficiency of said code. Many standard libraries/APIs already have sorting functions already written, however it is still good practice to know some of the theory behind how the different algorithms work.

Check out this video

Bubble Sort

The bubble sort is a very simple to read algorithm that is the standard first sorting method that most programmers learn. The bubble sort gets its name because values are swapped among each other, gradually moving to the top of the list.

//bubble sort example
#include <stdio.h>
#define SIZE 10
int main( void ){
    int a[SIZE] = {2,6,9,1,4,15,7,2}; //array to be sorted
    int count; //to be used to count passes
    int i; //comparisons counter
    int hold; //temp variable to hold numbers

    //start of bubble sort
    for(count=0; count<SIZE-1; count++){//loop to control number of passes
    //size is decreased by 1 to prevent array out of bounds error

    for(i=0; i<SIZE-1; i++){//comparison loop
        if(a[i]>a[i+1]){//if a[i] bigger than a[i+1], swap values
        hold = a[i];
        a[i] = a[i+1];
        a[i+1] = hold;
        }//end if
    }//end inner for

    }//end outer for
    
    //print array
    for(i=0; i<SIZE; i++)
    printf("%4d", a[i]);
    return 0;
}

Selection Sort

A bit further up on the food chain is the selection sort. The selection sort goes through each element position and finds the value which should occupy the next position in the sorted array. When it finds the appropriate element, the algorithm exchanges it with the valuewhich previously occupied the desired position. On the second pass it looks for the second smallest element and swaps it with the second position and so on.

//Selection sort example
//finds smallest element between left hand and right hand
//moves element to correct position
#include <stdio.h>
void selectionSort(int array[], int n){
    //n is length
    int lh, rh, i, temp;
        
    for (lh=0; lh < n; lh++){//everything left of lh is considered sorted
        rh = lh;    //reset right hand
        //runs through lh-rh bounds of array        
        for (i = lh + 1; i<n; i++){
            //if lowest element found, copy index to right hand
            if (array[i] < array[rh]) rh = i;
        }
        //move lowest element to correct position
        temp = array[lh];
        array[lh] = array[rh];
        array[rh] = temp;
        }    
    }
int main( void ){
    int len = 8;    
    int a[8] = {2,6,9,1,4,15,7,2}; //array to be sorted
    int i;    

    selectionSort(a, len);
   
    //print array
    for(i=0; i<len; i++)
    printf("%4d", a[i]);
    return 0;
}

Merge sort

   // This is an iterative version of merge sort.
   // It merges pairs of two consecutive lists one after another.
   // After scanning the whole array to do the above job,
   // it goes to the next stage. Variable k controls the size
   // of lists to be merged. k doubles each time the main loop
   // is executed.
   #include <stdio.h>
   #include <math.h>
   int i,n,t,k;
   int a[100000],b[100000];
   int merge(l,r,u)
     int l,r,u;
   { int i,j,k;
     i=l; j=r; k=l;
     while (i<r && j<u) { 
       if (a[i]<=a[j]) {b[k]=a[i]; i++;} 
       else {b[k]=a[j]; j++;}
       k++;
     }
     while (i<r) { 
       b[k]=a[i]; i++; k++;
     }
     while (j<u) { 
       b[k]=a[j]; j++; k++;
     }
     for (k=l; k<u; k++) { 
       a[k]=b[k]; 
     }
   }
   sort()
   { int k,u;
     k=1;
     while (k<n) {
       i=1;
       while (i+k<=n) {
         u=i+k*2;
         if (u>n) u=n+1;
         merge(i,i+k,u);
         i=i+k*2;
       }
       k=k*2;
     }
   }
   main()
   { printf("input size \n");
     scanf("%d",&n); 
   /*  for (i=1;i<=n;i++) scanf("%d",&a[i]); */
     for (i=1;i<=n;i++) a[i]=random()%1000;
     t=clock();
     sort();
     for (i=1;i<=10;i++) printf("%d ",a[i]);
     printf("\n");
     printf("time= %d millisec\n",(clock()-t)/1000);
   }

Quick Sort

Watch this video

Performance

Search Algorithms

Think of an ordered phone book. Consider two ways to search it for a name: linear search and binary search.

Linear search consists of starting at page 1, at the first name, and checking every name, one at a time, in order, until you find the name. Considering a phone book of one million names, this is very useful if you're searching for the number of Aaron A. Aardvark, but could be tedious if looking for Matthew Maloney, or downright awful if looking for a person named Z. Zerthis.

Binary search consists of opening the book up to half way, and comparing the name you are searching for with the items on either side. If the name you are looking for is earlier on, then you'll want the left half. If it is later on, you'll want the right half. What you have done here, is effectively cut the amount of names to search in half. You then go to the middle of whichever half you chose, and repeat the process until you've found what you're looking for. This is what we humans naturally do in a general sense.

To really get a hold on how much better Binary search is compared to linear search, consider a name that is roughly 3 quarters to the end of the phone book. With 1 million names, a linear search would take you roughly 750,000 comparisons before you found that name. With binary search, it would take roughly 3 or 4 comparisons.

There are also two string searching/pattern matching algorithms known as the Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) algorithms.

Computers use graph searching algorithms called the Floyd-Warshall algorithm, and Dijkstra's (pronounced Daik-stra) algorithm, to find the optimal path between two or more nodes connected by pathes on a graph.

Recursion

Recursion is the concept of something entering inside itself, or rather a new instance of itself. Functional programming languages such as Lisp dialects are known to employ this often.

Recursion can be tough to get your head around at first, because for a lot of people it is a new concept. However, once understood, it's quite simple. The main things to remember are the base case, the terminating case, and the concept of an accumulator.

Base case: This is the starting step of recursion.

Termination case: This tells the recursion when to stop. Without this, it would simply recur forever, or at least until the machine runs out of memory.

Accumulator: This is how you can pass data through a function recursively. It's called an accumulator because it accumulates data, or collects it as it goes. Here's an example of a program in Python that recursively adds numbers from 0 to 10 into a list, and then returns it.

The base case will be starting at 0. The terminating case is at 10. We'll use a list as an accumulator.

Lets define our function:

   def one_to_ten(number, accumulator):
       if number > 10:
           return accumulator
       else:
           accumulator.append(number)
           return one_to_ten(number + 1, accumulator)
   
   empty_list = []
   base_case = 0
   
   result = one_to_ten(base_case, empty_list)
   
   print(result) # result is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Going through it step by step:

 1. Our function takes a number and the accumulator
 2. Check that the termination case hasn't been met
 3. If it has, don't call the function anymore. Just return the accumulator.
 4. Otherwise, put the number in the accumulator
 5. Call the function from within itself, but with number + 1, so the next recursion of it
    will have the next number. Note that we're also passing in the newly updated accumulator.

So, the function enters a second instance of itself, but with number plus one, and it does this 10 times, until number is 10 (which will be 10 levels deep). However, when number becomes 11, it will not recur anymore, and it will return the accumulator back to the 9th function, which will also finish and return to the 8th function, which returns back to the 7th function, and so on until the first call of the function has returned. You could say it naturally unrolls itself.

Linked Lists

Structure

A linked list consists of a series of nodes. Each node contains data, and a pointer to the next node. A visual representation of a linked list might look something like this:

  Node1          Node2          Node3
 /------\       /------\       /------\
 | Data |  ---->| Data |  ---->| Data |  ---->
 |------|  |    |------|  |    |------|  |
 | Next ----    | Next ----    | Next ----
 \------/       \------/       \------/

As the diagram shows, each node has a Data field, and a Next field. Each node's Next points to another node, except the last node, which points to nothing. If you consider these three nodes as a single, connected entity, then you have a linked list. One of the interesting things about a linked list is that, unlike an array, where all the elements are grouped next to each other in memory, the individual nodes can be anywhere in memory, with their Next field serving to point each to its successor.

There are different types of linked lists. The diagram depicts a singly linked list, meaning each node contains a single link, which points to the next node.

Linked lists can also be doubly or circularly linked. Both are similarly structured to a singly linked list, except that a doubly linked list also contains a Previous field which points to the node before it, and the last node in a circularly linked list points back to the first node in the list.

Implementation

To represent a node in Pascal, we will use a record.

First we must define a pointer to a node, which can be thought of as one of the arrows in the diagram.

 type
   PNode = ^TNode;

Now we can define the actual node type itself

   TNode = record
     Data: Integer;
     Next: PNode;
   end;

Like the diagram, our node has a Data field, and a Next field. The Data field will contain an Integer, and the Next field will point to another node.

With a node type defined, we can define our linked list type. The linked list itself only contains a pointer to the first node in the list, which we will call its Head. We will also define a pointer to a linked list, which will come in handy later when we want to pass our lists between different functions to modify it.

   PLinkedList = ^TLinkedList;
   TLinkedList = record
     Head: PNode;
   end;

Usage

We now have all the structure we need for a proper linked list. But, lists and nodes by themselves aren't very interesting. We should also create some functions that will let us create and delete lists, add and remove nodes, and visualise our lists by printing them to the terminal screen.

We'll start with creating a new list. For the most part, we'll be working with pointers to lists rather than lists themselves, so we will define a function that will create a new list and return a pointer to it that we can use

 function NewList: PLinkedList;
 begin
   New(NewList);
   NewList^.Head := Nil
 end;

We assign Nil to Head to indicate that it does not point to any node, like the arrow that points to nothing in the diagram. This is also useful for testing if any node has another node attached to it: if Next = Nil, then that node is not followed by any other node, and it is at the end of the list.

We also need a function to delete the lists we create when we are done with them, along with all of the nodes we've added to it. To do this, we must step over the list one node at a time, deleting each node until we reach the end.

 procedure DisposeList(List: PLinkedList);
 var
   RemovedNode, CurrentNode: PNode;
 begin
   CurrentNode := List^.Head;
   while CurrentNode <> Nil do
   begin
     RemovedNode := CurrentNode;
     CurrentNode := CurrentNode^.Next;
     Dispose(RemovedNode)
   end
 end;

Now we want to be able to add nodes, to the front or back of the list. Since we mainly care about the data each node contains, these functions will create the new nodes for us, and assign the data we want to add to their Data field.

Adding to the front of the list is easiest, since we already know exactly where the first node is: the list's Head field points to it. To place it in front, we first place the node Head points to after our new node (by pointing the new node's Next field to it), then we reassign the list's Head field to point to our new node:

 procedure AddToFront(List: PLinkedList; Data: Integer);
 var
   NewNode: PNode;
 begin
   New(NewNode);
   NewNode^.Data := Data;
   NewNode^.Next := List^.Head;
   List^.Head := NewNode
 end;

Adding to the back of the list is a little more complicated, because first we need to find the last node in the list. There are a couple of cases we must consider:

  1. The list is empty (Head = Nil).
  2. The list has one or more nodes.

In the first case, we can simply assign Head to point at our new node. In the second case, we must step over the list, one node at a time, until we reach a node that has no node following it (Next = Nil).

 procedure AddToBack(List: PLinkedList; Data: Integer);
 var
   NewNode, CurrentNode: PNode;
 begin
   New(NewNode);
   NewNode^.Data := Data;
   NewNode^.Next := Nil;
   if List^.Head = Nil then
   begin
     List^.Head := NewNode
   end else
   begin
     CurrentNode := List^.Head;
     while CurrentNode^.Next <> Nil do
     begin
       CurrentNode := CurrentNode^.Next
     end;
     CurrentNode^.Next := NewNode
   end
 end;

We also want a way to remove nodes from the front and back of the list.

Like adding, removing from the front is relatively simple, we just assign Head to the node pointed to by the Head node's Next field, and dispose of the old Head node:

 procedure RemoveFromFront(List: PLinkedList);
 var
   RemovedNode: PNode;
 begin
   RemovedNode := List^.Head;
   List^.Head := List^.Head^.Next;
   Dispose(RemovedNode)
 end;

And also like adding, removing from the back involves a number of cases:

  1. The list is empty.
  2. The list contains one node.
  3. The list contains two or more nodes.

In the first case, we don't have to do anything, since there are no nodes to remove. For the second case, we must remove the node that the list's Head field points to. For the third case, we must step over the list until we reach the second to last node, and then remove the node that node's Next field points to.

 procedure RemoveFromBack(List: PLinkedList);
 var
   CurrentNode: PNode;
 begin
   if List^.Head <> Nil then
   begin
     if List^.Head^.Next = Nil then
     begin
       Dispose(List^.Head);
       List^.Head := Nil
     end else
     begin
       CurrentNode := List^.Head;
       while CurrentNode^.Next^.Next <> Nil do
       begin
         CurrentNode := CurrentNode^.Next
       end;
       Dispose(CurrentNode^.Next);
       CurrentNode^.Next := Nil
     end
   end
 end;

The final function we will create is much simpler. This function will print out a visual representation of the data in a list to the terminal screen:

 procedure PrintList(List: PLinkedList);
 var
   CurrentNode: PNode;
 begin
   CurrentNode := List^.Head;
   while CurrentNode <> Nil do
   begin
     Write(CurrentNode^.Data);
     if CurrentNode^.Next <> Nil then
     begin
       Write(' => ')
     end;
     CurrentNode := CurrentNode^.Next
   end;
   Writeln
 end;

Testing

Now we can test our linked list structure and functions:

 var
   List: PLinkedList;
 begin
   List := NewList;
   AddToFront(List, 3);
   AddToFront(List, 2);
   AddToFront(List, 4);
   PrintList(List);
   AddToBack(List, 1);
   AddToBack(List, 0);
   PrintList(List);
   RemoveFromFront(List);
   RemoveFromBack(List);
   PrintList(List);
   DisposeList(List)
 end.

This will result in the following output:

 4 => 2 => 3
 4 => 2 => 3 => 1 => 0
 2 => 3 => 1

Full source: linkedlistexample.pas

Binary Search Trees

A binary search tree is a way to very quickly search sorted data. It is simply constructing a tree by following a binary search.

   /* This program repeatedly inserts n random numbers into
      a binary search tree. Then it traverse the tree
      in in-order to store the keys into array a.
      After the traversal, sorted numbers are in array a. */
   #include <stdio.h>
   #include <math.h>
   #include <stdlib.h>
   #define nil 0
   int k=1;
   struct item *proot;
   struct item{
     int key;
     struct item *adr;
     struct item *left;
     struct item *right;
   };
   struct item a[1000000];
   void insert(int x){
     struct item *p, *q;
     p=proot;
     do{
       q=p;
       if (x<=(*p).key)p=(*p).left; else p=(*p).right;
     }
     while (p!=nil);
     p=malloc(sizeof(struct item)); 
     (*p).key=x; (*p).left=nil; (*p).right=nil;
     (*p).adr=p;
     if (x<=(*q).key) (*q).left=p; else (*q).right=p;
   }
   traverse(struct item *proot)
   { struct item *p;
     p=proot;  
     if (p!=nil){
       printf("("); 
       traverse((*p).left);
       printf("%d", (*p).key);
       a[k]=(*p); k++;
       traverse((*p).right);
       printf(")"); 
     }
   }
   main()
   { 
     int i,n,t;
     scanf("%d",&n);
     proot=malloc(sizeof(struct item));
     (*proot).key=99999;
     (*proot).left=nil;
     (*proot).right=nil;
     for(i=1;i<=n;i++) a[i].key=random()%100000;
   //  for (i=1;i<=n;i++)printf("%d ",a[i].key);
     printf("\n");
     t=clock();
     for (i=1;i<=n;i++)insert(a[i].key);
     traverse(proot->left);
     printf("\n");
     printf("time= %d millisec\n\n", (clock()-t)/1000);
   /*
     for (i=1;i<=n;i++)
       printf("%3d %3d %3d %3d\n\n",(int)(a[i].adr) % 1000, a[i].key, 
                                (int)(a[i].left) % 1000, (int)(a[i].right) % 1000);
   */
   }

Hash Tables