We are still actively working on the spam issue.
Difference between revisions of "Computer science"
m |
(→Sorting Algorithms: Added some computerphile videos) |
||
Line 27: | Line 27: | ||
A big topic in computer applications is the quick and efficient sorting of data. There are many different sorting techniques, as a programmer one has to balance the algorithm's ease of reading vs the efficiency of said code. Many standard libraries/APIs already have sorting functions already written, however it is still good practice to know some of the theory behind how the different algorithms work. | A big topic in computer applications is the quick and efficient sorting of data. There are many different sorting techniques, as a programmer one has to balance the algorithm's ease of reading vs the efficiency of said code. Many standard libraries/APIs already have sorting functions already written, however it is still good practice to know some of the theory behind how the different algorithms work. | ||
+ | |||
+ | Check out [https://www.youtube.com/watch?v=kgBjXUE_Nwc this video] | ||
===Bubble Sort=== | ===Bubble Sort=== | ||
Line 154: | Line 156: | ||
} | } | ||
+ | ===Quick Sort=== | ||
+ | Watch [https://www.youtube.com/watch?v=XE4VP_8Y0BU this video] | ||
===Performance=== | ===Performance=== | ||
Revision as of 11:29, 13 February 2014
What is Computer Science? As Harold Abelson from the SICP lectures says, the name Computer Science is in fact misleading.
"Computer science is not about computers, in the same sense that physics is not really about particle accelerators, and biology is not about microscopes and Petri dishes."
Thus, "computer" is not needed in the title. Secondly, a lot of Computer Science is mathematics and theory. There is very little actual scientific method like one would find in Physics, Chemistry, or Biology, so in this sense, it's not really a science as much as it is a form of applied mathematics or engineering. So Computer Science should be replaced with a much more informative name.
So what is Computer Science then? CS consists of algorithms for solving problems in the most mathematically convenient or efficient way. Take for example the problem of finding the fastest route between several locations in a city; this is solved using a graph searching algorithm. There are also data structures designed for storing and indexing huge amounts of data in a database, protocols for making sure that data transmission is reliable, statistical learning methods behind AI, and so on.
Contents
Getting started
The most recommended Computer Science books are Structure and Interpretations of Computer Programs (SICP), and what is considered the holy grail of CS literature: The Art of Computer Programming (TAOCP) by Donald Knuth.
You'll find some eBooks at http://books.gentoomen.org/ and you can buy physical ones for cheap from http://abebooks.com
A good place to start would be the SICP lectures on youtube, found here
Algorithms
An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values as output. It is the part that does the magic.
There are a few programming concepts discussed in the Programming concepts page.
For the rest, check Introduction to Algorithms (by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein), and if you have the balls and the time, check The Art of Computer Programming (by Donald D. Knuth).
There is also a good introduction to many concepts explained here.
Sorting Algorithms
A big topic in computer applications is the quick and efficient sorting of data. There are many different sorting techniques, as a programmer one has to balance the algorithm's ease of reading vs the efficiency of said code. Many standard libraries/APIs already have sorting functions already written, however it is still good practice to know some of the theory behind how the different algorithms work.
Check out this video
Bubble Sort
The bubble sort is a very simple to read algorithm that is the standard first sorting method that most programmers learn. The bubble sort gets its name because values are swapped among each other, gradually moving to the top of the list.
//bubble sort example
#include <stdio.h>
#define SIZE 10
- int main( void ){
- int a[SIZE] = {2,6,9,1,4,15,7,2}; //array to be sorted
- int count; //to be used to count passes
- int i; //comparisons counter
- int hold; //temp variable to hold numbers
-
- //start of bubble sort
- for(count=0; count<SIZE-1; count++){//loop to control number of passes
- //size is decreased by 1 to prevent array out of bounds error
-
- for(i=0; i<SIZE-1; i++){//comparison loop
- if(a[i]>a[i+1]){//if a[i] bigger than a[i+1], swap values
- hold = a[i];
- a[i] = a[i+1];
- a[i+1] = hold;
- }//end if
- if(a[i]>a[i+1]){//if a[i] bigger than a[i+1], swap values
- }//end inner for
- for(i=0; i<SIZE-1; i++){//comparison loop
-
- }//end outer for
-
- //print array
- for(i=0; i<SIZE; i++)
- printf("%4d", a[i]);
- return 0;
}
Selection Sort
A bit further up on the food chain is the selection sort. The selection sort goes through each element position and finds the value which should occupy the next position in the sorted array. When it finds the appropriate element, the algorithm exchanges it with the valuewhich previously occupied the desired position. On the second pass it looks for the second smallest element and swaps it with the second position and so on.
//Selection sort example
- // finds smallest element between left hand and right hand
- // moves element to correct position
- include <stdio.h>
- void selectionSort(int array[], int n){
- //n is length
- int lh, rh, i, temp;
- for (lh=0; lh < n; lh++){//everything left of lh is considered sorted
- rh = lh; //reset right hand
- //runs through lh-rh bounds of array
- for (i = lh + 1; i<n; i++){
- //if lowest element found, copy index to right hand
- if (array[i] < array[rh]) rh = i;
- }
- //move lowest element to correct position
- temp = array[lh];
- array[lh] = array[rh];
- array[rh] = temp;
- }
- }
- int main( void ){
- int len = 8;
- int a[8] = {2,6,9,1,4,15,7,2}; //array to be sorted
- int i;
-
- selectionSort(a, len);
-
- //print array
- for(i=0; i<len; i++)
- printf("%4d", a[i]);
- return 0;
}
Merge sort
// This is an iterative version of merge sort. // It merges pairs of two consecutive lists one after another. // After scanning the whole array to do the above job, // it goes to the next stage. Variable k controls the size // of lists to be merged. k doubles each time the main loop // is executed. #include <stdio.h> #include <math.h> int i,n,t,k; int a[100000],b[100000]; int merge(l,r,u) int l,r,u; { int i,j,k; i=l; j=r; k=l; while (i<r && j<u) { if (a[i]<=a[j]) {b[k]=a[i]; i++;} else {b[k]=a[j]; j++;} k++; } while (i<r) { b[k]=a[i]; i++; k++; } while (j<u) { b[k]=a[j]; j++; k++; } for (k=l; k<u; k++) { a[k]=b[k]; } } sort() { int k,u; k=1; while (k<n) { i=1; while (i+k<=n) { u=i+k*2; if (u>n) u=n+1; merge(i,i+k,u); i=i+k*2; } k=k*2; } } main() { printf("input size \n"); scanf("%d",&n); /* for (i=1;i<=n;i++) scanf("%d",&a[i]); */ for (i=1;i<=n;i++) a[i]=random()%1000; t=clock(); sort(); for (i=1;i<=10;i++) printf("%d ",a[i]); printf("\n"); printf("time= %d millisec\n",(clock()-t)/1000); }
Quick Sort
Watch this video
Performance
Search Algorithms
Think of an ordered phone book. Consider two ways to search it for a name: linear search and binary search.
Linear search consists of starting at page 1, at the first name, and checking every name, one at a time, in order, until you find the name. Considering a phone book of one million names, this is very useful if you're searching for the number of Aaron A. Aardvark, but could be tedious if looking for Matthew Maloney, or downright awful if looking for a person named Z. Zerthis.
Binary search consists of opening the book up to half way, and comparing the name you are searching for with the items on either side. If the name you are looking for is earlier on, then you'll want the left half. If it is later on, you'll want the right half. What you have done here, is effectively cut the amount of names to search in half. You then go to the middle of whichever half you chose, and repeat the process until you've found what you're looking for. This is what we humans naturally do in a general sense.
To really get a hold on how much better Binary search is compared to linear search, consider a name that is roughly 3 quarters to the end of the phone book. With 1 million names, a linear search would take you roughly 750,000 comparisons before you found that name. With binary search, it would take roughly 3 or 4 comparisons.
There are also two string searching/pattern matching algorithms known as the Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) algorithms.
Computers use graph searching algorithms called the Floyd-Warshall algorithm, and Dijkstra's (pronounced Daik-stra) algorithm, to find the optimal path between two or more nodes connected by pathes on a graph.
Recursion
Recursion is the concept of something entering inside itself, or rather a new instance of itself. Functional programming languages such as Lisp dialects are known to employ this often.
Recursion can be tough to get your head around at first, because for a lot of people it is a new concept. However, once understood, it's quite simple. The main things to remember are the base case, the terminating case, and the concept of an accumulator.
Base case: This is the starting step of recursion.
Termination case: This tells the recursion when to stop. Without this, it would simply recur forever, or at least until the machine runs out of memory.
Accumulator: This is how you can pass data through a function recursively. It's called an accumulator because it accumulates data, or collects it as it goes. Here's an example of a program in Python that recursively adds numbers from 0 to 10 into a list, and then returns it.
The base case will be starting at 0. The terminating case is at 10. We'll use a list as an accumulator.
Lets define our function:
def one_to_ten(number, accumulator): if number > 10: return accumulator else: accumulator.append(number) return one_to_ten(number + 1, accumulator) empty_list = [] base_case = 0 result = one_to_ten(base_case, empty_list) print(result) # result is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Going through it step by step:
1. Our function takes a number and the accumulator 2. Check that the termination case hasn't been met 3. If it has, don't call the function anymore. Just return the accumulator. 4. Otherwise, put the number in the accumulator 5. Call the function from within itself, but with number + 1, so the next recursion of it will have the next number. Note that we're also passing in the newly updated accumulator.
So, the function enters a second instance of itself, but with number plus one, and it does this 10 times, until number is 10 (which will be 10 levels deep). However, when number becomes 11, it will not recur anymore, and it will return the accumulator back to the 9th function, which will also finish and return to the 8th function, which returns back to the 7th function, and so on until the first call of the function has returned. You could say it naturally unrolls itself.
Linked Lists
Structure
A linked list consists of a series of nodes. Each node contains data, and a pointer to the next node. A visual representation of a linked list might look something like this:
Node1 Node2 Node3 /------\ /------\ /------\ | Data | ---->| Data | ---->| Data | ----> |------| | |------| | |------| | | Next ---- | Next ---- | Next ---- \------/ \------/ \------/
As the diagram shows, each node has a Data field, and a Next field. Each node's Next points to another node, except the last node, which points to nothing. If you consider these three nodes as a single, connected entity, then you have a linked list. One of the interesting things about a linked list is that, unlike an array, where all the elements are grouped next to each other in memory, the individual nodes can be anywhere in memory, with their Next field serving to point each to its successor.
There are different types of linked lists. The diagram depicts a singly linked list, meaning each node contains a single link, which points to the next node.
Linked lists can also be doubly or circularly linked. Both are similarly structured to a singly linked list, except that a doubly linked list also contains a Previous field which points to the node before it, and the last node in a circularly linked list points back to the first node in the list.
Implementation
To represent a node in Pascal, we will use a record.
First we must define a pointer to a node, which can be thought of as one of the arrows in the diagram.
type PNode = ^TNode;
Now we can define the actual node type itself
TNode = record Data: Integer; Next: PNode; end;
Like the diagram, our node has a Data field, and a Next field. The Data field will contain an Integer, and the Next field will point to another node.
With a node type defined, we can define our linked list type. The linked list itself only contains a pointer to the first node in the list, which we will call its Head. We will also define a pointer to a linked list, which will come in handy later when we want to pass our lists between different functions to modify it.
PLinkedList = ^TLinkedList; TLinkedList = record Head: PNode; end;
Usage
We now have all the structure we need for a proper linked list. But, lists and nodes by themselves aren't very interesting. We should also create some functions that will let us create and delete lists, add and remove nodes, and visualise our lists by printing them to the terminal screen.
We'll start with creating a new list. For the most part, we'll be working with pointers to lists rather than lists themselves, so we will define a function that will create a new list and return a pointer to it that we can use
function NewList: PLinkedList; begin New(NewList); NewList^.Head := Nil end;
We assign Nil to Head to indicate that it does not point to any node, like the arrow that points to nothing in the diagram. This is also useful for testing if any node has another node attached to it: if Next = Nil, then that node is not followed by any other node, and it is at the end of the list.
We also need a function to delete the lists we create when we are done with them, along with all of the nodes we've added to it. To do this, we must step over the list one node at a time, deleting each node until we reach the end.
procedure DisposeList(List: PLinkedList); var RemovedNode, CurrentNode: PNode; begin CurrentNode := List^.Head; while CurrentNode <> Nil do begin RemovedNode := CurrentNode; CurrentNode := CurrentNode^.Next; Dispose(RemovedNode) end end;
Now we want to be able to add nodes, to the front or back of the list. Since we mainly care about the data each node contains, these functions will create the new nodes for us, and assign the data we want to add to their Data field.
Adding to the front of the list is easiest, since we already know exactly where the first node is: the list's Head field points to it. To place it in front, we first place the node Head points to after our new node (by pointing the new node's Next field to it), then we reassign the list's Head field to point to our new node:
procedure AddToFront(List: PLinkedList; Data: Integer); var NewNode: PNode; begin New(NewNode); NewNode^.Data := Data; NewNode^.Next := List^.Head; List^.Head := NewNode end;
Adding to the back of the list is a little more complicated, because first we need to find the last node in the list. There are a couple of cases we must consider:
- The list is empty (Head = Nil).
- The list has one or more nodes.
In the first case, we can simply assign Head to point at our new node. In the second case, we must step over the list, one node at a time, until we reach a node that has no node following it (Next = Nil).
procedure AddToBack(List: PLinkedList; Data: Integer); var NewNode, CurrentNode: PNode; begin New(NewNode); NewNode^.Data := Data; NewNode^.Next := Nil; if List^.Head = Nil then begin List^.Head := NewNode end else begin CurrentNode := List^.Head; while CurrentNode^.Next <> Nil do begin CurrentNode := CurrentNode^.Next end; CurrentNode^.Next := NewNode end end;
We also want a way to remove nodes from the front and back of the list.
Like adding, removing from the front is relatively simple, we just assign Head to the node pointed to by the Head node's Next field, and dispose of the old Head node:
procedure RemoveFromFront(List: PLinkedList); var RemovedNode: PNode; begin RemovedNode := List^.Head; List^.Head := List^.Head^.Next; Dispose(RemovedNode) end;
And also like adding, removing from the back involves a number of cases:
- The list is empty.
- The list contains one node.
- The list contains two or more nodes.
In the first case, we don't have to do anything, since there are no nodes to remove. For the second case, we must remove the node that the list's Head field points to. For the third case, we must step over the list until we reach the second to last node, and then remove the node that node's Next field points to.
procedure RemoveFromBack(List: PLinkedList); var CurrentNode: PNode; begin if List^.Head <> Nil then begin if List^.Head^.Next = Nil then begin Dispose(List^.Head); List^.Head := Nil end else begin CurrentNode := List^.Head; while CurrentNode^.Next^.Next <> Nil do begin CurrentNode := CurrentNode^.Next end; Dispose(CurrentNode^.Next); CurrentNode^.Next := Nil end end end;
The final function we will create is much simpler. This function will print out a visual representation of the data in a list to the terminal screen:
procedure PrintList(List: PLinkedList); var CurrentNode: PNode; begin CurrentNode := List^.Head; while CurrentNode <> Nil do begin Write(CurrentNode^.Data); if CurrentNode^.Next <> Nil then begin Write(' => ') end; CurrentNode := CurrentNode^.Next end; Writeln end;
Testing
Now we can test our linked list structure and functions:
var List: PLinkedList; begin List := NewList; AddToFront(List, 3); AddToFront(List, 2); AddToFront(List, 4); PrintList(List); AddToBack(List, 1); AddToBack(List, 0); PrintList(List); RemoveFromFront(List); RemoveFromBack(List); PrintList(List); DisposeList(List) end.
This will result in the following output:
4 => 2 => 3 4 => 2 => 3 => 1 => 0 2 => 3 => 1
Full source: linkedlistexample.pas
Binary Search Trees
A binary search tree is a way to very quickly search sorted data. It is simply constructing a tree by following a binary search.
/* This program repeatedly inserts n random numbers into a binary search tree. Then it traverse the tree in in-order to store the keys into array a. After the traversal, sorted numbers are in array a. */ #include <stdio.h> #include <math.h> #include <stdlib.h> #define nil 0 int k=1; struct item *proot; struct item{ int key; struct item *adr; struct item *left; struct item *right; }; struct item a[1000000]; void insert(int x){ struct item *p, *q; p=proot; do{ q=p; if (x<=(*p).key)p=(*p).left; else p=(*p).right; } while (p!=nil); p=malloc(sizeof(struct item)); (*p).key=x; (*p).left=nil; (*p).right=nil; (*p).adr=p; if (x<=(*q).key) (*q).left=p; else (*q).right=p; } traverse(struct item *proot) { struct item *p; p=proot; if (p!=nil){ printf("("); traverse((*p).left); printf("%d", (*p).key); a[k]=(*p); k++; traverse((*p).right); printf(")"); } } main() { int i,n,t; scanf("%d",&n); proot=malloc(sizeof(struct item)); (*proot).key=99999; (*proot).left=nil; (*proot).right=nil; for(i=1;i<=n;i++) a[i].key=random()%100000; // for (i=1;i<=n;i++)printf("%d ",a[i].key); printf("\n"); t=clock(); for (i=1;i<=n;i++)insert(a[i].key); traverse(proot->left); printf("\n"); printf("time= %d millisec\n\n", (clock()-t)/1000); /* for (i=1;i<=n;i++) printf("%3d %3d %3d %3d\n\n",(int)(a[i].adr) % 1000, a[i].key, (int)(a[i].left) % 1000, (int)(a[i].right) % 1000); */ }