Sorting: Divide-and-Conquer, and Other Things

Sorting is a very common task in everyday life: articles of clothing in one’s drawer; collections of magazines; food stuff in the fridge; and lots of other things. Depending on the number of items there are, and how we use them, we have devised all different ways to sort: according to color, volume, date of acquisition or purchase, the like.

Similar to how lists have been adopted into computing, sorting has come to great use in computing. Today, sorting has become on of the most common computing tasks. Records in databases are usually sorted for better search efficiency. Items in binary trees are sorted accordingly so that operations can take advantage of this property.

Consequently, there has been a substantial amount of study spent on this subject and various algorithms has been devised. Some of these are obvious adaptations of natural methods of sorting utilized by humans in everyday life. While some are totally different, created for the purpose of sorting items which number in thousands or even millions. It should be noted, though, that there are still problems related to sorting that remain unsolved. New algorithms are constantly being developed and old ones improved.

Many useful sorting algorithms devised thus far take advantage of the divide-and-conquer approach to solving problems. Depending on the algorithm, a different way of division is used. For example, mergesort divides a collection in half. Quicksort divides a collection into big and small values while radix sort divides the problem by working on one digit of the key at a time.

There are lots of other useful sorting algorithms that have been created. Some of them work even together for better efficiency and performance. For instance, shellsort and quicksort both take advantage of the best case behavior of insertion sort to gain processing speed.

The most direct approach to comparing two sorting algorithms would be to program both and measure their running time. However, this can be misleading and might put a bias towards one of two depending on the input values especially when using input values favorable to this particular algorithm. Each algorithms has its own strength and one works better than the other depending on the data that needs processing. One is more efficient than the other depending on the application.

Any considerable discussion of any one of these sorting algorithms will require an article of their own. Which is why, the next few articles will each be dedicated to some of these sorting algorithms. We start with insertion sort. This takes Θ(n2) running time but it is easy to understand and implement; bubble sort and selection sort also take such a long run time. Nevertheless, they are occasionally the best tools for use depending on the situation.


From Clifford A. Shaffer’s A Practical Introduction to Data Structures and Algorithm Analysis Edition 3.1 (Java Version)

Arrays vs Links: Which Makes for a Better List?

Lists are some of the most common programming problems a coder could encounter throughout an entire computing career. It’s so familiar and occurs naturally in the real world so often. Thus, its usefulness should never be understated.

People intuitively understand what a “list” actually means. For this to be implemented in a computing environment, it must be precisely defined. A list can be defined as a “finite, ordered sequence of data items known as elements”.1

There are only a few things that requires specification in lists. Some of them being: a list has elements, these elements have a certain position associated with them, a list has a notion of a length – which tells how many elements are currently present in the list, and the operations that can be done with the list – creating, inserting elements, removing elements, among others.

There are more ways to implementing lists than one. The most traditional being the array-based lists and the linked lists. The ease or complexity of implementing either of these two does not vary much. Complexity only increases depending on the requirements of the program.

However, like in every other thing where there is a choice, there are advantages and disadvantages to both of them. Here are some points which you might want to consider when choosing between arrays and links for implementation of a list:

One disadvantage of array-based lists is that their size must be predetermined before the array can be allocated.2 More to that, array-based lists cannot grow beyond this predetermined size. This predetermination of size can also be wasteful in resources especially when there are only a few slots in the array which gets filled. On the other hand, linked lists can grow and shrink depending on the data being put into the list. Hence, the list becomes more flexible and there will be less chance in resources being wasted.

However, a different space usage needs to be considered when analyzing the individual elements in the list. In this case, the converse happens compared to the case above since linked lists requires usage of extra space for pointers that give the notion of order to the elements in the list while data elements in an array have a notion of position in themselves. Thus, in cases where it is more likely that the list will be filled, array-based lists have an advantage since it doesn’t have the overhead required by the pointers in linked lists.

Generally, linked lists would be the better implementation for lists whose number of data elements varies widely or is unknown while array-based lists become more efficient when the user knows in advance about how large the list can grow.

Another important consideration for choosing how to implement a list structure is data access. In this case, array-based lists are faster for random access by position since each element has a notion of position in itself and traversal from a current element to a previous element can easily be done given the attached positions of the elements themselves. On the other hand, singly linked lists have a disadvantage because it will usually require marching down the entire list just to get from a current element to a previous element since there is no link from a current to a previous.3

Inserting and removing elements are vital operations to lists. These operations, if not handled properly, can threaten the order in a list no matter what kind of implementation is utilized.

For linked lists, given a pointer to a certain location, insertion and removing elements only takes little work. For insertion, the program need only create a new element, change the link of a previous element, then link it to the next element. Removal only takes breaking the link of a previous element from the element to be removed then relinking it to the element which the element that needs be removed links to. Garbage collection will attend to the removed data. In algorithm analysis, this is said to only take Θ(1) time.

For array-based lists, these operations take a bit much more since they will require the list to shift the remainder of the list up or down within the array in order to accommodate the new data or fill up the space vacated by the element which has been removed. This takes Θ(n) time.

A lot of applications make use of these operation a lot. More so than any other operations. Hence, the time they take to complete usually dominates the program. Which is why, in such cases, linked lists are often preferred to array-based lists.

Of course, it should be noted that there are workarounds to the aforementioned disadvantages. But even those workarounds have their own pros and cons. Hence, the decision as to which implementation to pick should still largely depend on the requirements of the overall program that is being developed.

“Ordered” in this definition means that each element has a position in the list. It does not necessarily mean that the list need be sorted.
2 A work-around to this problem is to make use of dynamic arrays but that requires, like Java’s Vector, but that’s an altogether different consideration.
3 This can, of course, be avoided by using doubly linked lists which has links to both a previous and a next element but that also comes with its own disadvantages. For example, a bigger space overhead.

From Clifford A. Shaffer’s A Practical Introduction to Data Structures and Algorithm Analysis Edition 3.1 (Java Version)

Three Steps: Beating the Problem

Writing computer programs can be as easy as 1-2-3 or as difficult as seeing the usefulness of understanding Einstein’s theory of relativity.

But any difficult problem always becomes easier when broken down into pieces. That way, you get to be able to see them as smaller problems and coding becomes easier from there.

General computer processes can usually be tackled with three steps: understand, specify and design. These three steps make higher programs become less.

Understand

First, it is important to understand the problem you are faced with. Your initial understanding need not be clear. However vague it may be doesn’t matter. The more important part is refining your understanding as you go along the entire coding process. Taking an inventory of the concepts in your problem helps a lot.

For example, if you are trying to build a chess program, you’ll want to begin with the notion of a “piece”, which could be the pawn, the rook, the knight, the bishop, the queen, or the king. Furthermore, note that each of these pieces have properties of “position” and “moves”.

Specify

Second, lay down how this problem can be converted to code.

In one part of the chess program, you can tackle player moves. This function can take a certain piece, its position on the board, and the new position to which it is being moved. The process of this program verifies whether the piece can be moved into the new position and outputs the result of the verification process – that is, moving the piece to the new position.

There are a few things to note when deciding a valid move:

  1. The type of the piece being moved. In chess, different pieces can only move in a certain manner. The rook can only move straight. The knight can only move in an L. The bishop can only move diagonally. There are limitations as to how a chess piece can move.
  2. The state of the new position. It can be so that the player is simply moving to a vacant square or the player can be trying to eliminate an opponent’s piece.
  3. The notion of promotion. Pawns that reach the other end can get promoted. Hence, it is important to note whether a certain piece have been moved to the opposite edge and whether promotion can happen.

These are some of the things you’ll want to consider when checking the validity of a move.

A similar specification process will work for other parts of the chess program.

You don’t have to think about it so hard in the initial phase. Things get improved as you go along your coding.

Design

Finally, you design working code.

Now that you know about the types of data you are dealing with and the functions associated with them, you can move on to writing your code.

Design of computer programs can take entire university level courses if the intention is to delve in.

Anyway, this step is where you initially write code. In the process, you get to be able to refine your understanding of the problem. Also, it might turn out that your specifications have been insufficient. So, you get back to that step and improve your specifications.

Basically, writing computer programs, and problem-solving in general, is a continuous process of refining and improving. Even Google’s search engine continues to be refined to this day despite the fact that, in all the sense of the word, it is already a complete program. Besides that, millions are already using it. Quite possibly even billions.

Therefore, this three-step process is more of an iterative process which you loop again and again while continuously improving your solution.

Code Tuning: Speeding Up Computer Programs

From some of my recent readings in the reading materials from the university, I have come across this rather curious term: code tuning. It should not be a new term so, what makes it curious is the fact that I haven’t heard from anyone or read about it anywhere else before.

Of course, it is undeniable that going for a better algorithm is highly important to reducing the demands of a program. But, code tuning can also help in improving the running time of a computer program. This is because code tuning is simply the manual optimization of a program to make it run faster and smoother, just like tuning a car engine.

Similar to when a person tunes a car engine by removing quirks, replacing old parts with new and better ones, and even just cleaning the engine, code tuning is done in the same vein.

More than just speeding up programs, code tuning also helps your code become more readable and understandable to not just the compiler or the interpreter but also to other coders who might need to be looking into your code or even just those who are simply interested in your code.

So here are some suggestions for ways to speed up programs with code tuning:

  • The most important thing to realize is that most statements in a program do not have much effect on the running time of that program. There are normally just a few key subroutines, possibly even key lines of code within the key subroutines, that account for most of the running time. There is little point to cutting in half the running time of a subroutine that accounts for only 1% of the total running time. Focus your attention on those parts of the program that have the most impact.
  • When tuning code, it is important to gather good timing statistics. Many compilers and operating systems include profilers and other special tools to help gather information on both time and space use. These are invaluable when trying to make a program more efficient, because they can tell you where to invest your effort.
  • A lot of code tuning is based on the principle of avoiding work rather than speeding up work. A common situation occurs when we can test for a condition that lets us skip some work. However, such a test is never completely free. Care must be taken that the cost of the test does not exceed the amount of work saved. While one test might be cheaper than the work potentially saved, the test must always be made and the work can be avoided only some fraction of the time.
  • Be careful not to use tricks that make the program unreadable. Most code tuning is simply cleaning up a carelessly written program, not taking a clear program and adding tricks. In particular, you should develop an appreciation for the capabilities of modern compilers to make extremely good optimizations of expressions. “Optimization of expressions” here means a rearrangement of arithmetic or logical expressions to run more efficiently. Be careful not to damage the compiler’s ability to do such optimizations for you in an effort to optimize the expression yourself. Always check that your “optimizations” really do improve the program by running the program before and after the change on a suitable benchmark set of input. Many times I have been wrong about the positive effects of code tuning in my own programs. Most often I am wrong when I try to optimize an expression. It is hard to do better than the compiler.

Most of the time and space improvement that can be gained from better code still springs from a better data structure or algorithm. Therefore, it is important to always note to:

First tune the algorithm, then tune the code.


From Clifford A. Shaffer’s A Practical Introduction to Data Structures and Algorithm Analysis Edition 3.1 (Java Version)

Unclosed: Ending the Hiatus

hiatus*
A hiatus is a pause in which nothing happens, or a gap where something is missing.

It’s been about a year since I posted that previous post. It is about that time when I didn’t know what to do with the blog anymore.

I started blogging at a different WordPress page, J’s log, at jauntyskipper.wordpress.com. That was even a few months before the hiatus. It’s a page full of tumblelogs. A page of posts that will be considered asides if put in this same page.

I like it in there, too. I put down thoughts in there. Words I deem worth quoting from any one of my readings. Things I notice in the world around me that I consider worth putting into words. Passing thoughts. Ideas. Pieces of code. Just whatever. It is a tumblelog of just about all things I come across that I had the time and chance to put down.

However, this page has been started. And it’s somehow remained alive despite the huge break in the posts. I realized it’ll be a total waste if I simply abandoned this page.

And somehow this time seems like a good time to start posting again. To begin with, I’m posting something new that can be read about in this page. Of course, I’ll still be writing about the old stuff. But it never hurts to expand your reach.

* Definition taken from “Collins Cobuild Advanced Dictionary of English” © HarperCollins Publishers 2009