
What is a scalable computer language?
Simply put, a scalable computer language is a language that one can write very large programs in (and extend very large programs that have already been written) without feeling an undue amount of pain. In a scalable programming language, the difficulty of managing the complexity of the program goes up roughly linearly with the size of the program. Conversely, a nonscalable computer language is one in which increasing the size and scope of a problem tends to make the program much harder to manage (i.e. the complexity of the program goes up much more than linearly with its size).
Aspects of a programming language that affect scalability
Garbage collection: very, very good
Garbage collection (GC) means that the computer language (literally, the computer language's runtime system) automatically manages the reclamation ("freeing") of memory that is no longer being used. This is a huge win for the programmer, and can dramatically increase the scalability of the language. The reason is simple. It's usually obvious where one has to allocate memory. It can be very difficult to know exactly when it's safe to free it. This is especially true when references to the allocated memory are passed around, returned from functions, stored in multiple data structures that are later deleted (or not), etc. etc. The effects of this are very insidious. Languages without GC implicitly discourage the programmer from using all but the simplest data structures, because more often than not, the memory management problem quickly becomes intractable when using more complex data structures. What usually happens in practice is that the programmer rolls his/her own (bad) garbage collector (maybe a reference counter), with performance that is usually worse than what you would get if you used a language with GC built in.
I saw a rather stark example of this recently. One of the things I do professionally is teach the C programming language to Caltech undergraduates. I emphasized how important it was to always free memory that had been allocated. However, many of my students simply ignored me, and their code was littered with memory leaks. I got so tired of writing "this code has a memory leak here" on their assignments that I wrote a very simple memory leak checker. They are now required to write code that passes through the memory leak checker without any reported leaks before they submit their assignments. However, I was somewhat dismayed to find that my own answers to the assignments had a couple of subtle memory leaks as well! Since I have more than ten years of C programming experience, and have worked on several very large projects, this suggests to me that manual memory management is much harder than I'd previously supposed it to be.
There is a cost to GC, both in time and space efficiency. Well-designed garbage collectors (especially generational GC) can be extremely efficient (more efficient, for instance, than naive approaches such as reference counting). However, in order to do this they tend to have significantly greater space usages than programs without GC (I've heard estimates on the order of 50% more total space used). On the other hand, a program that leaks memory has the greatest space usage of all. I've wasted way too much of my life hunting down memory leaks in large C programs, and I have no interest in continuing to do so.
In conclusion, I would say that of all the items I'm discussing here, GC is the single most important one to ensure that a programming language is scalable. This is why programmers who move from a language without GC (say C++) to one of roughly equivalent abstractive power but with GC (say Java) invariably say how much happier they are now that they don't have to worry about memory management and can concentrate on the algorithms they're trying to write. Personally, I'd rather pull my own teeth out than write a large project in a language without GC.
Direct access to memory and pointer arithmetic: very bad
Some computer languages, notably C and C++, allow the programmer to directly interact with memory addresses through pointers, as well as allowing pointer arithmetic (incrementing and decrementing pointer variables). This kind of low-level programming is sometimes necessary (e.g. when writing device drivers) and sometimes simply useful (e.g. when micro-optimizing code that has to run as fast as possible). However, my (imperfect) understanding of the issue is that programming with pointers makes precise garbage collection impossible, or nearly so. There is a kind of GC called "conservative" garbage collection (here is a free implementation of the Boehm-Demers conservative GC) which can work with languages like C and C++. It's certainly better than nothing, but there are no guarantees that all memory will be managed correctly (i.e. memory leaks are unlikely, but possible). In practice, this supposedly isn't a problem, but I note with some interest that very little C/C++ code that I've seen actually uses conservative GC, and those that do are usually in code that is implementing a language that includes GC.
The scalability cost of pointers goes far beyond just making GC harder. Pointers (and especially pointer arithmetic) tend to destroy any safety guarantees you might want to be able to make about a program. It's not hard to see why. When you have a pointer to (say) an integer, and you can add 1,000,000 to that pointer and dereference some random region of memory which may or may not be part of your program's run-time image, all hell can break loose. If you're lucky, you'll just get a core dump and your program will terminate. If you're not so lucky, some part of the program memory will be corrupted, leading to mysterious bugs that are extremely difficult to track down, because they manifest themselves far away from where the original problem was. This leads to a huge increase in debugging times, which dramatically hurts programmer productivity. As the program gets larger, the opportunities for this kind of problem increase, which represents a significant barrier to scalability.
The usual argument in favor of direct pointer manipulations is that they make it possible to write faster code. This is often true; I've seen cases where using pointer arithmetic judiciously increased the speed of a program by a factor of five. However, the reverse is also often true; many program optimizations (normally performed automatically by the compiler) are rendered much more difficult or impossible in code that uses pointers. In other words, languages that enable micro-optimizations often make macro-optimizations impossible.
The author of the Eiffel language, Bertrand Meyer, has said that (I'm paraphrasing) "you can have pointer arithmetic, or you can have correct programs, but you can't have both". I agree with him. I think that direct memory access through pointers is the single biggest barrier to programming language scalability.
None of this is meant to imply that pointers and pointer arithmetic don't have their place; they are crucial for low-level close-to-the-metal programming. However, large programs written at a low level are simply not scalable. The right way to use languages like C is to implement small, focused low-level components of applications written primarily in higher-level languages. In fact, this is also the right way to use C++; you write some low-level classes that use pointers in some of their methods and then encapsulate these methods so you never have to expose the pointer manipulations to a user of the class. This would be nearly ideal, except that the compiler has no way to enforce this; you can always use pointers if you want to.