We are still actively working on the spam issue.

Programming languages

From InstallGentoo Wiki
Revision as of 04:47, 1 April 2015 by Thelismor (talk | contribs) (Inheritance)
Jump to: navigation, search
Practical Language Comparison.jpeg

Quick summary of programming languages.

Should I learn...

There's a lot of languages out there, and most of them have a lot of overlap over common problem domains. In principle, all languages being Turing complete, there's nothing stopping you from doing any task in any language. But in practice, there may be orders of magnitude difference (we're talking 100-1000x here) in how much time it takes to do a project in a suitable language vs. an unsuitable one. Among the joys you may experience by picking the wrong language are:

  • Fighting passive aggressive syntax that pesters you with irrelevant low-level details
    • You shouldn't underestimate the importance of simple syntactical decisions and syntactic sugar. Learning a language and developing software both take a lot of commitment and hard work. If you don't enjoy your language, it will destroy your motivation, and it will become very hard to get anywhere.
  • Having reinvent the wheel because nobody bothered porting critical library X
  • Having to figure shit out on your own because all guides for topic X are written for other languages (translating example code you don't understand is so much fun!)
  • Committing hundreds or thousands of man-hours into a project, only to realize that you can no longer improve on performance or a feature set because of the design decisions your language made, and if you want to progress your only hope is to rewrite everything in another language first
  • Spending time and effort learning a language only to realize the job market for it is shit (if you have career goal as opposed to being a hobbyist)

To a novice, a summary of language features will not be much help. Even something as basic as functional vs. procedural is irrelevant if you have never programmed and don't understand well what these things mean. If you are just starting out, you have to be careful about who you listen to. There are many languages out there, and a lot of them are pretty good for solving certain sets of problems. But few are suitable for a newbie (as a newbie, you will be learning not just the language, but also general computation). The last thing you won't is to get trolled by some faggot engaging in language wars.

If you are experienced (rule of thumb: have you created useful software in and gotten paid for it in more than one language?) then the choice becomes much easier. Based on the languages you do know, you will probably already have a taste for certain features. Consider the languages you're already comfortable with - what do they do well? What would you change? The language you are considering learning will invariably be better in some ways and worse in others than those you already know. The question is do the pros give you enough to justify both the cons and the extra effort of learning a new language.

Ultimately, you should aspire to learn how to program, not to learn any given language. To wit, all programming revolves around making the computer do things: The computer know a small set of atomic, fundamental "things" which it can already do. It is your task, as the programmer, to define how these fundamental tasks should be combined with each other to perform new tasks. Depending on how the fundamental tasks of a language are chosen, and what rules govern their combination, some tasks will be rendered easy or difficult, intuitive or unintuitive.

Intrinsic vs. Extrinsic

When comparing the comparative merits of languages, we may divide them into two main groups: Extrinsic properties and intrinsic properties. Intrinsic properties are properties of the language itself, such as syntax. Extrinsic properties are everything that isn't part of the language itself: The community, the tutorials available, the libraries that have been written, how the language and its users are perceived, and so on. Basically, intrinsic properties are what you get if you were using the language in a vacuum, with nothing but a description of the syntax and a basic compiler.

Languages do not exist in a vacuum, but considering them in a vacuum may inform the bigger decision of which language is worth your time. We therefore make the distinction between intrinsic and extrinsic aspects for the following reasons:

  • Intrinsic properties are constant in time (at least until a new version of the language is released). Extrinsic properties are very much inconstant (who knows what language will be popular in a five years?).
  • Intrinsic properties influence extrinsic properties, but not vice versa (the exception is when language designers listen to what the community wants for the subsequent version of the language).

Generally, your decision should be rooted in the intrinsics, since as you can see, they are the key factor and they are the ones you will be stuck with so long as you use the language. But that's not to say extrinsic factors aren't important or should be ignored (this is in fact a very big mistake).

Intrinsic properties

For most languages that are designed without a specific purpose in mind, intrinsics will be fairly similar. They will not be glaring flaws or radical improvements. When languages are designed with a specific purpose in mind, they tend to have more drastic changes. Usually, if a specialized language is designed from a problem domain, that problem domain is very hard to deal with using more generalist languages.

Syntax

Syntax determines what the instruction you write in a language will look like. You should ask yourself:

  • How easy is it to remember commands in this language?
  • Is it easy to comprehend the syntax?
  • Can I code with just a basic text editor, or is it impractical without a sophisticated utility program helping you by highlighting code or generating repetitive code segments?
  • Are there too many keywords?
  • Is it legible?

Syntactic sugar

In discussions of syntax, you will come upon this term. Syntactic sugar is basically shorthand for certain kinds of commands. Depending on what you do, syntactic sugar can make your life a lot easier by making code much more legible.

For example, in many languages assignment to a variable is done as follows:

x = 1

This means, "let x be 1" (in many languages, = is not like the mathematical equals sign, but more like the := sign).

In order to increase x by one, you would do:

x = 1+x

This means "let x be 1 plus its current value". Incrementing a number is an extremely frequent task to do in programming, and so some languages have a shorthand:

x += 1

This means "add 1 to x" (although really the compiler just interprets x += y as x = x+y in most implementations). The += is syntax sugar: In most languages you can do it, but in Matlab there is no += and you are forced to type x = x+1.

Of all the numbers you can increment by, 1 is by far the most common in actual programming tasks. Therefore, some languages go a step further, and allow:

x++

This means "increment x by 1". C allows this, but many languages don't. The difference may seem trivial, but it can have a large impact on legibility and clarity in a complex block of code. However, once again not all languages support the ++ operator.

Another example: In most languages, if you need a series of numbers such as 1, 2, 3, ... 15 or 12, 15, 18, ... 32 you must call a specialized function, such as:

series = range(12, 32, 3)

Matlab has syntactic sugar for this:

series = 12:3:32

This is very helpful when you work with arrays:

cols = MyArray[7:2:end]

will get you every second row of MyArray starting from the 7th (end is Matlab sugar for referring to the last element of an array). In other languages, not only must you call range inside the brackets (making code look busier) but you often can't even do this in one line at all - being able to select multiple elements of an array is another Matlab sugar. In languages like C, you would have to get each column separately and then combine them into a new array, which may take a little more work:

j = 0;
for(i=7; i<sizeof(MyArray); i += 2)
{
   cols[j] = MyArray[i];
   j++;
}
    

Syntactic sugar is convenient, but it is not necessarily good or bad. If you happen to often do the things your language's sugar tries to facilitate, and you like the sugar, it's good. Using the above two examples, you can probably see that Matlab has very nice sugar if you shuffle around arrays a lot, but C is a bit nicer if you work with algorithms that increment things a lot (as a novice, it is impossible to decide which one is relevant to you, but after you've programmed for a while you will quickly get a feel for it).

If the sugar does not help you, you can ignore it, but don't make the mistake of thinking that ignoring bad sugar will make it go away. If your language has some retarded sugar that everyone loves using for some reason, it can make reading other people's code hell. For instance, R has feature where assignment can be done in two equivalent ways:

x = 5
y -> 7

Presumably the arrow is supposed to avoid the confusion with equality. Since I am used from other languages to associate = with inequality, -> is unusual for me and I never use it. However, every once in a while I need to look up how to do something, and seeing a page of code full of ->s is very jarring - I often have to first search/replace them before I can even read the code. That's not to say it's a bad feature, but clearly for me the -> notation sugar hurts more than it helps (this is also an example of intrinsics influencing extrinsics).

Paradigm (imperative vs. declarative)

Humanity has thus far discovered two ways of specifying instructions for a computer.

  • The imperative approach gives the computer a list of instructions. The computer executes the commands in the order given (although some of the commands themselves may involve altering the initial order). For example, to compute the circumference of a circle, you would tell the computer to first take the radius, then multiply it by two to get the diameter, then multiply that by a pre-computed approximate value of pi, then print it to the screen. There are dozens if not hundreds of imperative languages like C, C++, Java and many others.
  • The declarative approach simply defines what each thing means, and lets the computer put the definitions together. It can be difficult to make the calculations happen in a certain order, but because no order is specified, paralleling tasks to improve performance can be very easy. For the circumference, you would give the computer the formula for circumference, the formula for approximating pi (for example 22/7), and ask it for circumference given a radius. Note how I don't say "print to the screen" - communicating with the user can also be a problem with declarative languages. In practice, the main representatives of this class are functional languages like Haskell or Lisp.

In theory, these two styles are exactly equivalent in terms what programs it is possible to write in them. But in practice, they are very different, and can have a huge impact on how easy a given task is.

Historically, imperative languages have dominated, and have been regarded as easier to understand. Perhaps the idea of giving someone explicit step-by-step instructions is more intuitive to the human mind than systems of functions, but regardless, most people find imperative languages easier, and interest in functional languages often correlates with mathematical aptitude. If you don't know what you're doing, you should probably start with an imperative language.

On a deeper level, there is an interesting note about having two paradigms together: There are ways of essentially calling modules written in a functional style from an imperative program, and vice versa. It is probably better to nest declarative logic inside imperative logic: The two advantages of imperative logic are being able to enforce a sequence and easy interaction with the user. Both of these are most powerful when done at the top level of the program. The advantages of functional logic involve flexibility in solving a well defined problem - so they are well suited to acting as a specialized module.

In many modern imperative languages, it is actually possible to write almost declarative, functional-style logic (in particular, look for "lambda expressions"). Because the language is not designed with purely functional programming in mind, this may be impractical (difficult to debug, clumsy to type out, slow and unoptimized) but in some small but important subset of problems you can actually mix the two paradigms together.

Object oriented programming

Since the 90s, the concept of OOP has been very popular and you will certainly hear mention of it. OOP is basically a way of organizing code in types of variables. Dominant earlier languages like C often specified a small number of basic types, like integer, decimal, string, memory address and array (list of one of the previous types). Code was organized into routines, each routine could call other routines (there is often a single top-level routine called main() which is what gets called when you run your program).

Since the basic types provided don't cover all that much, you represent complex data (for instance an email which has a sender, recipient, content, attachments, encoding, subject line, cc line, timestamp and so on) by putting each part in a separate variable. To help with this, there is something called a struct, which is basically a bag of variables. A struct can contain other structs (and arrays of structs), so by nesting them together you can create a lot of complicated, hierarchical variables.

Object oriented programming takes this one step further. Let's use a classic example of a program that tracks a fleet of cars. Suppose we have decided that we must store the license no, color, speed and current destination of each car. We could have a construct like so:

sruct Car {
	string license_no;
	int color;
	int speed;
	Location destination;
}

We could then perform operations on aspects of a given car:

taxi = new Car() 
taxi.license_no = "GAY4PAY"
taxi.color = hFFCC00
taxi.speed = get_speed_from_gps_tracker(taxi.license_no)
taxi.destination = "Mr. Smith's house"

truck = new Car() 
truck.licence_no = "45A1CF3"
truck.color = h888888
truck.speed = get_speed_from_gps_tracker(truck.license_no)
truck.destination = taxi.destination

In this case we have defined a yellow taxi and a grey truck, both of them heading to the same place, and automatically obtained the speed a function that talks to the GPS tracker on each car (the GPS identifies the cars by their plate numbers).

Now let's say one of the cars is coming back to the office, and we want to record this. We can just do truck.destination = "Office". But suppose we will type this line many times - what if somewhere we make a typo, or do something like truck.destination = "Main Office", and a different function that checks which car is at the office (by looking for the exact string "Office" and not "office" or "Main Office") gets broken? What if the name of the office changes? To avoid this, we decide to make a single function for recalling a car to the office (this is called the Don't Repeat Yourself principle). We put this at the bottom of our program, and make a few other improvements, so maybe the whole file looks like this:

import gps_library

main() {
	taxi = new Car() 
	taxi.license_no = "GAY4PAY"
	taxi.color = hFFCC00
	taxi.speed = get_speed_from_gps_tracker(taxi.license_no)
	set_location(taxi, "Mr. Smith's house")

	truck = new Car() 
	truck.licence_no = "45A1CF3"
	truck.color = h888888
	truck.speed = get_speed_from_gps_tracker(truck.license_no)
	truck.destination = taxi.destination

	// Some code here

	// Now recall both cars to office
	recall_to_office(taxi)
	recall_to_office(truck)
}

struct Car {
	string license_no
	int color
	int speed
	Location destination
}

struct Location {
	string name
	float lat
	float lon
}

recall_to_office(Car c) {
	c.recall_to_office
}

set_location(Car c, string target) {
	loc = new Location()

	loc.name = s
	loc.lat = get_lat_from_gps(s)
	loc.lon = get_lon_from_gps(s)
	
	c.destination = target
}

Code organization and reuse

You can see how already the file is getting cluttered. In a real project, especially if you have tools generating some code for you, you can easily end up with hundreds of such subroutines strewn across files. You might notice that in this case, set_location really only pertains to the behavior of Cars - so perhaps they should go together? This is what OOP does. Instead of a struct, you have something called an Class, which defines a kind of object (object is an OOP terms similar to variable, but usually only simple data types like numbers are called variables, although in most modern OOP languages everything is considered an object). If the struct is a bag of variables, then the class is a bag of variables and functions ("methods" in OOP jargon).

Now we can refactor the above code into:

import gps_library
 
main() {
	taxi = new Car() 
	taxi.license_no = "GAY4PAY"
	taxi.color = hFFCC00
	taxi.speed = get_speed_from_gps_tracker(taxi.license_no)
	taxi.set_destination ("Mr. Smith's house")

	truck = new Car() 
	truck.licence_no = "45A1CF3"
	truck.color = h888888
	truck.speed = get_speed_from_gps_tracker(truck.license_no)
	truck.destination = taxi.destination

	// Some code here

	// Now recall both cars to office
	taxi.recall_to_office()
	truck.recall_to_office()
}

class Car {
	string license_no
	int color
	int speed
	Location destination
	 
	set_destination(string target) {
	    loc = new Location()

	    loc.name = s
	    loc.get_coords_from_gps()
		 
	    destination = target
	}
	 
	 recall_to_office() {
		 set_location("Office")
	 }
}

class Location {
	string name
	float lat
	float lon
	 
	get_coords_from_gps() {
		lat = get_lat_from_gps(s)
		lon = get_lon_from_gps(s)
	}
}

At this point, you either think that organizing methods into classes makes things better, or you think it's completely insane. Ultimately, OOP is a matter of personal preference - and if the latter, then maybe it isn't for you.

You might think that the slightly rearranged code is a bit tidier, but not that much better. In fact, organizing code is not the main benefit of OOP. The main benefit is code reuse. In the above example, you could put the Car and Location classes into a separate file, and distribute it to other people who want to write software that operates on car. They won't need to modify anything (assuming the current functionality is sufficient for them) and can simply plug in these classes into their own program, avoiding duplication of work you have already done.

On the other hand, if one day you find a better implementation of car logic, you can simply swap the two files for each other to try the new one. You won't need to rewrite your whole program to try the new car code you found. Of course, the implementation of the class you found will have to match very closely, or you will have to rewrite things anyway, but modern OOP languages have mechanisms to help people write compatible code, and there very powerful IDEs like Eclipse, Netbeans and Visual Studio that do a lot of the class-stitching work for you.

Inheritance

Besides structs-with-methods, the other major feature of OOP is inheritance. It's a way of defining new classes by saying "this class is just like class X, but has this extra feature".

tour_bus = new Bus()
 tour_bus.passengers = 20
 tour_bus.set_destination("Beach")

 // ...
 
 class Bus : Car {
     int passengers
 }

Conveniently, if you later change the logic of class X, the change will propagate to all classes inheriting from X. This helps avoid repetition, and allows you to easily extend classes created by other people or classes from your own past projects.

Pedantry

Extrinsic properties

Summary

Individual languages

Assembly

  • Terse, but pedantic as fuck.
    • Small programs are simple to write, but larger ones become an unwieldy and complex mess in most cases.
  • Based Motorola ASM, so many variants, so much serial port downloading.
  • x86 ASM is pretty neat too, also known as PC ASM. Pain in the ass because AT&T ASM syntax is the UNIX standard, and Intel ASM syntax is the DOS standard and they are so close but its the little things that are different. Enough different to be a pain in the dick.
  • Not portable. This is as close to the metal as you get without writing actual machine op codes. Each instruction IS a 1:1 mapping to a machine opcode. Each CPU architecture has a different set of instructions.
  • Currently, Intel x86-64 ASM is the largest instruction set.

BASIC

  • >It can do anything C can do, guise!!1
  • Lots of proprietary implementations, only a few decent FOSS ones.
  • Still slower than C
  • >muh goto

C

  • Designed to be a "portable assembly language"
  • Small, simple language whose parts compose well
  • Very small standard library
  • Lingua franca for big libraries and APIs
    • If it has no C API, it's probably shit
  • Very mature compilers that produce extremely fast, well-optimized code
  • Implementations for pretty much every architecture or platform imaginable
    • If some hardware doesn't have a C compiler for it, chances are it doesn't matter in the first place
  • Lots of things are left undefined so that implementations can make the best choice possible in terms of speed
  • Potentially dangerous to write for the uninitiated
    • You manage your own resources yourself
    • You perform your own safety checks if you want them
    • Absolutely no hand-holding
    • Undefined behavior where anything can happen
  • Will force you to learn a lot about memory and lower level issues
  • Pretty much the only sane choice for systems and embedded programming
  • Can also be used for applications programming

C++

  • Very, very large and feature-filled language
  • Considered verbose at times
  • C, but with OOP on top, and massive set of massive libraries
  • Considered dangerous to write in because, like C, there is no memory management
  • Almost as fast as C
  • Not very orthogonal and features frequently clash with each other
  • Despite being called C/C++ frequently, good C++ is completely different from good C

C#

  • What Java should have been
  • Runs on .NET or the Mono framework (Mono framework allowing you to run it on GNU/Linux)
  • Is very similar to Java, with some extra stuff borrowed from C++ and Haskell
  • Dubious legal situation because parts of the language are encumbered with MS patent.

Erlang

  • Makes concurrency/multithread shit a breeze.
  • Uses a specialised VM that has a hard time crunching numbers, and an even harder time handling strings.

Golang

  • Also known as Go
  • Created by Rob Pike (one of the original UNIX guys) and some other engineers at Google
  • Mascot looks suspiciously similar to the Plan9 mascot
  • Is basically C with minimal extras, but with garbage collection, and some core language features to make it really good for concurrent programming (doing multiple things at once). Not really as fast as C, though.
  • Directory structure must be laid out in a certain way to build projects
  • Has an interactive tutorial at their website, and a go tool which allows you to pull from github, and package go, etc
  • Uses Goroutines for concurrency, which are like lightweight threads which are then fit into threads for more efficiency. The compiler handles the threads for you.

Haskell

  • Extremely expressive, offers abstraction capabilities near Lisp
  • Focuses on pure functional programming and type systems
  • Very rigid language, if you do something wrong, it probably won't even compile
  • Takes a lot of time to learn completely
  • Can be unwieldy for inherently stateful problems

Java

  • Very portable; compiling down to bytecode, which is then executed by the JVM
  • Object oriented language
  • Very large and enterprise
  • Huge libraries and a lot of software is written in it
  • Very verbose APIs
  • Receives a lot of undue criticism
  • Can be convoluted to write in sometimes
  • Is made fun of for the design patterns people use with it, and for the verbose naming schemes often used

Lisp

Main article: Lisp.

  • Family of programming languages which have the most notable features of using fully parenthesized prefix notation and being homoiconic.
  • Intially appeared in 1958.
  • Lisp is said to change the way one thinks about programming, because it exposes the abstraxt syntax tree to the programmer for modification and extension of the language itself to fit the domain-specific needs of a particular task.
  • No (visible) limit to abstraction

Pascal

  • Very strong, safe, and featured type system
  • Simple syntax that prefers words over symbols, and is highly-structured
  • Originally designed for teaching, and very easy to get started with
  • Fast, single-pass compilers
  • Covers both low-level and relatively high-level concepts well
  • Not very popular anymore, you won't find a job using it, and newer learning resources for it are lacking
  • The syntax is considered too verbose by some
  • Bias carried over from problems with earlier versions of the language
  • Large number of varied modern dialects and compilers may confuse newcomers

Perl

  • Very tacit and unreadable syntax
  • Called a "swiss army chainsaw" for its versatility
  • Slow for non-procedural tasks
  • Dynamic grammar makes it fun to write impossible code
  • Hated by Python fanboys worldwide
  • Can be OO, imperative, and even has functional elements.
  • Avoids the use of reserved keywords, prefers keyboard squiggles (&, $, @, ->, etc.)

PHP

No.

Go is a sane alternative to PHP and webapps and seems almost the only sane one. Heck, even Python is far more sane than PHP. Don't use Node.js, it's considered harmful.

Python

  • Very easy to read and simple (and fun) to write
  • Kinda slow
  • Uses whitespace indentation to separate logical blocks
  • Excellent for scripting
  • Considered the antithesis of Perl
  • Can be OO, imperative, and even has functional elements.
  • Is nice to start out with and program in, but after a few years of it, you'll start to want to play with some of the stuff Python sacrifices, like pointers, and speed.

Ruby

  • Focus on programmer happiness
  • Fully object-oriented
  • Elegant, readable code
  • As slow as any dynamic language will be
  • Excellent for general-purpose programming, scripting, text processing and web dev

Rust

  • Developed by Mozilla
  • Also known as rust-lang
  • Like Golang, is also designed for concurrent programming
  • First stable release by the end of 2014

Scheme

  • Based on Lisp with a focus on minimalism and simplicity
  • Popular in many universities and featured in SICP
  • Great for programs with a basis in recursion and iteration
  • Lacks portability and has little implementations

Vala

  • The GNOME foundation's response to C++
  • Compiles to C code, which can then be compiled with a normal C compiler
  • Uses the GTK and GObject (GNOME) libraries
  • Has elements of C++ and C#, but is more sane