Slides in PDF (232KB)
The previous lecture introduced you to graphs and a simple Breadth First Search (BFS) algorithm on graphs. It demonstrated to you how algorithms could be analyzed for their time complexity—the time it takes to run them on a given size of the input. That lecture also proved the correctness of the algorithm using the principle of mathematical induction.
As a quick reminder, the time complexity of an algorithm is an expression, or a formula, that determines the number of steps it takes for the algorithm to run to completion given the size of the input. For the BFS algorithm presented in the last lecture this time was represented in terms of the number of nodes in the graph, n. Noticed that we did not talk about the speed of a computer or the amount of memory it has or any hardware specific details. In fact, the time complexity of an algorithm always refers to the number of steps, never real time. And each of those steps is assumed to be executable on a computer in constant time. The notion of computer is also very specific—in particular, the Turing Machine. If this excites you, I would suggest taking COMP 481 and COMP 482.
The second thing to recall is the proof of correctness. Some people were not clear about the principle of induction. The Principle of Mathematical Induction can be stated as follows:
If we write P(x) to mean that a property P holds for the integer x, then, if
- P(c), for some constant c (base case), and
- P(n) ⇒ P(n+1) ∀ n ≥ c (induction step)
then ∀ x ≥ x, P(x).
There is also a principle of structural induction, but, we will not discuss that in this class. In order to prove an algorithm correct, we need to prove two properties about it:
The second part was proved in the last lecture using the principle of mathematical induction. The induction in that case was on the size of the queue that is constructed during the algorithm.
Here are some questions that you should think about as an exercise.
Now, let's come to the topic of today's lecture. This course is called Applied Algorithms and Data Structures. Algorithms and data structures have been studied deeply as theoretical concepts. As I mentioned before, COMP 481 delves deeper into the theory of computation and COMP 482 studies algorithms and their properties. In this course we are not just concerned about the mathematical properties of algorithms, but also their practical applications. In particular, we want to study the best way to implement the algorithms as real programs. We want to learn what it takes to encode complicated algorithms in a real programming language on a real computer. This course is just as much, if not more, about learning the management of large software projects, learning to work in teams, and learning to recognize how real software goes through its lifecycle. Today's lecture will introduce you to this aspect of the course. Its primary purpose is to give you a 10,000-foot view of what software development is about.
Let me start with a quote from an online article that I was reading recently.
If software were an office building, it would be built by a thousand carpenters, electricians and plumbers. Without architects. Or blueprints. It would look spectacular, but inside, the elevators would fail regularly. Thieves would have unfettered access through open vents at street level. Tenants would need consultants to move in. They would discover that the doors unlock whenever someone brews a pot of coffee. The builders would provide a repair kit and promise that such idiosyncrasies would not exist in the next skyscraper they build (which, by the way, tenants will be forced to move into).
Strangely, the tenants would be OK with all this. They'd tolerate the costs and the oddly comforting rhythm of failure and repair that came to dominate their lives. If someone asked, “Why do we put up with this building?” shoulders would be shrugged, hands tossed and sighs heaved. “That's just how it is. Basically, buildings suck.”
What this is saying is “software sucks!” This may be a bit of an exaggeration, but is not that far from the facts. The fact is that a vast majority of software written today is buggy, insecure, hard to use, and software writers expect to be paid for that and the companies selling that software expect to make money out of it! Here are some of the examples of software projects that have failed [Neumann, 1995].
What can we learn from these, and numerous other, examples of failures? What can we do to avoid future failures? Don't be surprised if the rest of this lecture sounds like a management course. Developing large software has a lot to do with managing a software project.
One thing that should be clear by now is that software plays an increasingly important role in our daily activities, as these examples clearly illustrate. This means that all of us have high stakes in making sure that the software that is written is error-free and reliable. Another important thing to notice is that writing bad software, or failing to write software in time, can have very high costs. The costs may be monetary, but the effects of bad software may also be serious enough to endanger lives!
People have been aware of these problems for a couple of decades now. The discipline that has emerged to study these problems, and find solutions to them, is called Software Engineering.
What is Software Engineering? Why does software need to be “engineered”?
The sciences today are broadly divided into two categories: natural sciences and social sciences. The first one includes physics, chemistry, biology, etc., that study natural phenomena. The second includes those that study human societies and their activities. The application of the first category of sciences has given rise to a slew of engineering disciplines, such as, mechanical engineering, civil engineering, chemical engineering, electrical engineering, and so on.
Computer science—the science that studies various aspects of the software, is relatively new and falls in none of these categories. It is the science of an artificial system, and is quite recent. It was only in the late 1960's and early 1970's that it was realized that software was not living up to its expectations.
Software developers were not able to set concrete objectives, predict the resources necessary to attain those objectives, and manage the customers' expectations. More often than not, the moon was promised, a lunar rover built, and a pair of square wheels delivered.
This is why software engineering was developed. This quote, from Bruegge and Dutoit, also illustrates exactly what software engineering is about. If you look at the well established engineering disciplines, say Civil Engineering, they are characterized by best-practices that clearly describe the desirable approaches that have been developed over years of experience. Civil engineers have a Civil Engineers' handbook that describes exactly the steps needed to go about building a structure. It would tell you how to test soil, when to build pillars—and when not to, what kind of beams to use for what kind of structures, and so on. Similar handbooks and best practices exist in all mature engineering disciplines.
Unfortunately, software engineering is not yet so mature. There is no handbook for software writers. But, we are slowly getting there. In COMP 212 you have learned about design patterns. That is the first step in the right direction. Moreover, developing software is actually a very complicated task. It has been said that large software products are the most complicated entities ever created by human beings!
A large software project involves four main activities:
Another very important aspect of developing large projects is Project Management. Project management is about setting deadlines, managing teams of people, and building a time line of the project. For the rest of the lecture we will discuss some specific aspects of program development.
I mentioned earlier that modeling is an important part of software development. In software engineering the modern way to build models of the problems is based on an object-oriented approach to solving problems. That's the reason we are using Java for this course! There are also some tools to help in the modeling process. UML, or Unified Modeling Language is perhaps the most important tool in modeling today. You have already seen bits and pieces of it in COMP 212. Notice that UML specification is very strict about the types of boxes to use for various components and the types of lines and arrows. Pay close attention to those whenever you use UML.
UML, or Unified Modeling Language, is used to describe three different types of system models:
Following is an example of UML use case.
Here, the system is bound within the box and its relationship to the external environment is shown. Use case diagrams are used to focus on the behavior of the system form an external point of view and are used in the requirements elicitation phase to represent the functionality of the system. This diagram shows that the external actor WatchUser may interact with the watch using the use case ReadTime or SetTime. However, the WatchRepairPerson may only change the battery using the ChangeBattery use case. Communication is depicted by solid lines.
The next example shows generalization in use case diagrams.
This says that Authenticate is a generalization of Authenticate WithPassowrd and Authenticate WithCard
Finally, the next examples show how actions can be included or extended.
These examples show a part of a model that describes car driving. The first one states that depressing the clutch is an action that is included in shifting gears. The second one states that the set of actions to be taken when the car is stalled is an extension of the actions taken in shifting gears. Notice that the “extend” relationship is usually used to take care of exceptional situations. It is akin to catching exceptions to perform recovery action.
Class diagrams are most useful in describing class hierarchies in an object-oriented system. This is the part that you are, perhaps, most familiar with. Here is an example of a simple object, called SimpleWatch.
The rectangles with sharp corner always depicts entities. The solid line establishes an association. The numbers next to the entities is the number of instances of that entity that are involved in the association. For example 2 Battery entities are associated with 1 SimpleWatch entity. If the number is replaced by a star, “*”, then it represents 0 or more instances.
Class diagrams can be used to establish aggregation relationship.
The diamond is attached to an entity that aggregates the other entity. For example the State contains County entities.
A hollow triangular arrow is used to show superclass relationship. The arrow points to the superclass. Also, a class entity can be divided into three sections, the name of the entity, its internal state, and the methods that be invoked in it by an actor. The following figure shows both these aspects of a class diagram.
Instances of an object can be represented by an entity with the name underlined. It may be associated with a class entity with a dashed line that can also be decorated by a label as the following example illustrates.
Sequence diagrams are similar to timing diagrams for digital circuits, except that the time is not to scale. Sequence diagrams are used to describe the chronology of internal processes inside a system. The following example is that of setting TimeZone in SimpleWatch.
Time flows from top to bottom for each entity involved in a sequence diagram. The horizontal arrows are called “messages”. An initiation message is shown by a solid arrow while a return message is shown by a dashed arrow. The arrows end in solid triangles.
As a final example in this whirlwind tour of UML, here is a statechart diagram for the SimpleWatch that captures the action of setting time on the watch. Notice that rectangles with rounded sides are used to denote states. A solid circle and an encircled solid circle are used to denote the start and the end state, respectively.
The states could be replaced by activities, in which case a similar diagram would be called the activity diagram.
Software development is a fairly complicated process. Could we build an abstraction to model the software development process itself? The answer is “yes” and, in fact, people have been trying to model software development for a long time. A model for software development is called the software life cycle. Armed with the working knowledge of UML, we can now describe software lifecycle also in UML.
Let's start with a simplistic view of software development.
This is clearly a very simple model. The model could be also be viewed in an activity centric way as follows:
Or, in an entity centric way, thus:
A bunch of standards for software lifecycle has evolved. The most recent being ISO 12207 which was introduced in 1995 and revised in 2002. This represents a maturing process of the software engineering discipline. We will briefly discuss a few of the more well known ones.
The following is an entity diagram of the IEEE 1074 model. In other words, it shows the entities involved in the model and relationships to each other. Notice, how the UML representation denotes the hierarchy of entities using “*” labels and diamond connectors.
The standard carefully describes all the entities and their relationships.
A classic software life-cycle model, the waterfall model was proposed in 1970, by Royce. It is essentially a sequential model that is drawn in a waterfall style—hence the name.
The V-Model flips the second half of the Waterfall and recognizes that certain activities are at the same level of abstraction and should be connected. This is reflected by the dashed horizontal lines.
Boehm's model is a spiral model that is drawn on polar coordinates. The spiral is divided into four quadrants that represent four different types of activities that may be involved in a project.
The models discussed so far suffer from being linear. Recall what I mentioned earlier about knowledge acquisition and rationale management being important parts of modern projects. None of these models captures that ongoing process of getting feedback from the customer, or accounting for changes.
The Sawtooth model tries to fix that by drawing a horizontal line and placing actions either above the line (in the customer domain) or below it (in the developer domain). The actions then go back and forth between the two domains. This makes the activity diagram look like a sawtooth (hence the name). A refinement of the Sawtooth model is the Shark Tooth model.
It should be emphasized that all these models must be taken with a grain of salt. These should be treated as guidelines only and often need to be adapted to specific projects. Nevertheless, they provide a very useful starting point.
For this class, we encourage you to study these models carefully and develop your own for your project. We also strongly encourage you to describe your system using UML diagrams.
To illustrate the power of UML, let's look at two design patterns. Design patterns are subsystems and, as such, can be described using UML diagrams.
This first design pattern is useful in encoding recursive data structures, say, trees. Another example that could be described by it is the recursive directory structure in which each directory could contain either files or other directories.
The second example describes a design pattern for a compiler.
Here the compiler is described a subsystem. The pattern says that the components of a compiler should be modular and separate from each other. Notice that the entire subsystem is also labeled by a tabbed rectangle. This is the UML way to abstract away a subsystem—if you want to use the compiler subsystem without worrying about the details inside it you can simply use a tabbed rectangle that has the tab labeled “compiler&rdquo".
(from XP web-site)
Extreme Programming is a software development paradigm that has evolved only in the past few years. It is targeted towards small teams, from 2 to 30 people. The project management machinery that we have described so far is fairly heavyweight, but, it works well for large project teams. For smaller teams, it may be desirable to give up a bit of the formality to reduce the project management overheads. Extreme Programming, also simply called “XP”, aims to do precisely that. For projects done in this course XP is perhaps the most suitable paradigm.
XP consists of four distinct types of activities, planning, designing, coding, and testing. Each of these activities has certain rules associated with them. We will only discuss some of the rules related to the coding aspect.
Next lecture will bring us back to graphs, when we will discuss another fundamental graph algorithm, called the Depth First Search.