Cannot see some symbols? Some text is gibberish? Check the compatibility note.

Prelude

The previous lecture introduced you to graphs and a simple Breadth First Search (BFS) algorithm on graphs. It demonstrated to you how algorithms could be analyzed for their time complexity—the time it takes to run them on a given size of the input. That lecture also proved the correctness of the algorithm using the principle of mathematical induction.

As a quick reminder, the time complexity of an algorithm is an expression, or a formula, that determines the number of steps it takes for the algorithm to run to completion given the size of the input. For the BFS algorithm presented in the last lecture this time was represented in terms of the number of nodes in the graph, n. Noticed that we did not talk about the speed of a computer or the amount of memory it has or any hardware specific details. In fact, the time complexity of an algorithm always refers to the number of steps, never real time. And each of those steps is assumed to be executable on a computer in constant time. The notion of computer is also very specific—in particular, the Turing Machine. If this excites you, I would suggest taking COMP 481 and COMP 482.

The second thing to recall is the proof of correctness. Some people were not clear about the principle of induction. The Principle of Mathematical Induction can be stated as follows:

If we write P(x) to mean that a property P holds for the integer x, then, if

P(c), for some constant c (base case), and

P(n) ⇒ P(n+1) ∀ n ≥ c (induction step)

then ∀ x ≥ x, P(x).

There is also a principle of structural induction, but, we will not discuss that in this class. In order to prove an algorithm correct, we need to prove two properties about it:

The program must terminate on all possible inputs; and
The program must produce correct results for all possible inputs.

The second part was proved in the last lecture using the principle of mathematical induction. The induction in that case was on the size of the queue that is constructed during the algorithm.

Here are some questions that you should think about as an exercise.

An important part of proving an algorithm correct is proving that it terminates. Why does BFS terminate?
How can the algorithm be modified to work if the graph is directed? Weighted?
Is it easier to write a recursive version of BFS?
How much space does the algorithm need, in addition to that needed to store the graph?

Lecture 2: Software Lifecycle and Team Programming

Now, let's come to the topic of today's lecture. This course is called Applied Algorithms and Data Structures. Algorithms and data structures have been studied deeply as theoretical concepts. As I mentioned before, COMP 481 delves deeper into the theory of computation and COMP 482 studies algorithms and their properties. In this course we are not just concerned about the mathematical properties of algorithms, but also their practical applications. In particular, we want to study the best way to implement the algorithms as real programs. We want to learn what it takes to encode complicated algorithms in a real programming language on a real computer. This course is just as much, if not more, about learning the management of large software projects, learning to work in teams, and learning to recognize how real software goes through its lifecycle. Today's lecture will introduce you to this aspect of the course. Its primary purpose is to give you a 10,000-foot view of what software development is about.

Ailing Software

Let me start with a quote from an online article that I was reading recently.

If software were an office building, it would be built by a thousand carpenters, electricians and plumbers. Without architects. Or blueprints. It would look spectacular, but inside, the elevators would fail regularly. Thieves would have unfettered access through open vents at street level. Tenants would need consultants to move in. They would discover that the doors unlock whenever someone brews a pot of coffee. The builders would provide a repair kit and promise that such idiosyncrasies would not exist in the next skyscraper they build (which, by the way, tenants will be forced to move into).

Strangely, the tenants would be OK with all this. They'd tolerate the costs and the oddly comforting rhythm of failure and repair that came to dominate their lives. If someone asked, “Why do we put up with this building?” shoulders would be shrugged, hands tossed and sighs heaved. “That's just how it is. Basically, buildings suck.”

What this is saying is “software sucks!” This may be a bit of an exaggeration, but is not that far from the facts. The fact is that a vast majority of software written today is buggy, insecure, hard to use, and software writers expect to be paid for that and the companies selling that software expect to make money out of it! Here are some of the examples of software projects that have failed [Neumann, 1995].

Year 1900 bug: In 1992, Mary from Winona, Minnesota, received an invitation to attend a kindergarten. Mary was 104 at that time.
Interface misuse: On April 10, 1990, in London, an underground train left the station without its driver. The driver had taped the button that started the train, relying on the system that prevented the train from moving when doors were open. The train operator had left his train to close a door which was stuck. When the door was finally shut, the train simply left.
Late and over budget: In 1995, bugs in the automated luggage system of the new Denver International Airport caused suitcases to be chewed up. The airport opened 16 months late, $3.2 billion over-budget, with mostly manual luggage system.
On-time delivery: After 18 months of development, a $200 million system was delivered to a health insurance company in Wisconsin in 1984. However, the system did not work correctly; $60 million in overpayments were issues. The system took 3 years to fix.
Unnecessary complexity: The C-17 cargo plane by McDonnel Douglas ran $500 million over budget because of problems with its avionics software. The C-17 included 19 onboard computers, 80 microprocessors, and 6 different programming languages.

What can we learn from these, and numerous other, examples of failures? What can we do to avoid future failures? Don't be surprised if the rest of this lecture sounds like a management course. Developing large software has a lot to do with managing a software project.

One thing that should be clear by now is that software plays an increasingly important role in our daily activities, as these examples clearly illustrate. This means that all of us have high stakes in making sure that the software that is written is error-free and reliable. Another important thing to notice is that writing bad software, or failing to write software in time, can have very high costs. The costs may be monetary, but the effects of bad software may also be serious enough to endanger lives!

People have been aware of these problems for a couple of decades now. The discipline that has emerged to study these problems, and find solutions to them, is called Software Engineering.

Software Engineering

What is Software Engineering? Why does software need to be “engineered”?

The sciences today are broadly divided into two categories: natural sciences and social sciences. The first one includes physics, chemistry, biology, etc., that study natural phenomena. The second includes those that study human societies and their activities. The application of the first category of sciences has given rise to a slew of engineering disciplines, such as, mechanical engineering, civil engineering, chemical engineering, electrical engineering, and so on.

Computer science—the science that studies various aspects of the software, is relatively new and falls in none of these categories. It is the science of an artificial system, and is quite recent. It was only in the late 1960's and early 1970's that it was realized that software was not living up to its expectations.

Software developers were not able to set concrete objectives, predict the resources necessary to attain those objectives, and manage the customers' expectations. More often than not, the moon was promised, a lunar rover built, and a pair of square wheels delivered.

This is why software engineering was developed. This quote, from Bruegge and Dutoit, also illustrates exactly what software engineering is about. If you look at the well established engineering disciplines, say Civil Engineering, they are characterized by best-practices that clearly describe the desirable approaches that have been developed over years of experience. Civil engineers have a Civil Engineers' handbook that describes exactly the steps needed to go about building a structure. It would tell you how to test soil, when to build pillars—and when not to, what kind of beams to use for what kind of structures, and so on. Similar handbooks and best practices exist in all mature engineering disciplines.

Unfortunately, software engineering is not yet so mature. There is no handbook for software writers. But, we are slowly getting there. In COMP 212 you have learned about design patterns. That is the first step in the right direction. Moreover, developing software is actually a very complicated task. It has been said that large software products are the most complicated entities ever created by human beings!

A large software project involves four main activities:

Modeling: A model of a system is an abstract representation of a system. The process of modeling, therefore, is the process of extracting the essential properties of the system. The model of a system allows us to understand and answer questions about a system that may be too complicated or big to comprehend first hand. For example, the periodic table in chemistry is a model of all the chemical elements. Similarly, the classification system in biology is a model of life-forms.
Problem-solving: Engineering, inherently, is a problem solving activity. Finding an engineering solution often involves trying to solve a problem that is not completely understood, or does not have perfect solutions. The engineer would typically use past experience and may have to rely on hit-and-trial. This is where design patterns come in handy because they encode the past experience of a large number of people.
Knowledge Acquisition: Acquiring knowledge about a system is a non-linear activity. Assuming that large projects can be executed linearly is a very common mistake. A piece of information acquired later in the project may invalidate an earlier piece of knowledge. So, acquiring knowledge is an ongoing activity in executing a project. For example, commercial software developers must continuously deal with changing technology, competition, and the business environment.
Rationale Management: Design decisions are usually taken within some context. The decision makers might use of their past experience, present alternatives, and their intuition. This additional knowledge that the decision makers make use of is called the rationale of the system. However, the rationale of a system must be captured and understood as best as possible. Changing scenarios can completely alter the direction of a project. What do you think could have prompted Apple to completely rewrite their operating system when they moved from MacOS 9 to MacOS X?

Another very important aspect of developing large projects is Project Management. Project management is about setting deadlines, managing teams of people, and building a time line of the project. For the rest of the lecture we will discuss some specific aspects of program development.

Modeling: An Introduction to UML

I mentioned earlier that modeling is an important part of software development. In software engineering the modern way to build models of the problems is based on an object-oriented approach to solving problems. That's the reason we are using Java for this course! There are also some tools to help in the modeling process. UML, or Unified Modeling Language is perhaps the most important tool in modeling today. You have already seen bits and pieces of it in COMP 212. Notice that UML specification is very strict about the types of boxes to use for various components and the types of lines and arrows. Pay close attention to those whenever you use UML.

UML, or Unified Modeling Language, is used to describe three different types of system models:

The functional model described the functionality of the system from the user's point of view. It is described in UML by use case diagrams.
The object model describes the structure of a system in terms of objects, attributes, associations, and operations. In UML this is described using the class diagrams.
The dynamic model describes the internal behavior of the program. UML sequence diagrams, statechart diagrams, and activity diagrams are used to represent the dynamic models.

Use Case Diagrams

Following is an example of UML use case.

use case diagram

Here, the system is bound within the box and its relationship to the external environment is shown. Use case diagrams are used to focus on the behavior of the system form an external point of view and are used in the requirements elicitation phase to represent the functionality of the system. This diagram shows that the external actor WatchUser may interact with the watch using the use case ReadTime or SetTime. However, the WatchRepairPerson may only change the battery using the ChangeBattery use case. Communication is depicted by solid lines.

The next example shows generalization in use case diagrams.

generalization in use case diagrams

This says that Authenticate is a generalization of Authenticate WithPassowrd and Authenticate WithCard

Finally, the next examples show how actions can be included or extended.

include in use case diagrams

extend in use case diagrams

These examples show a part of a model that describes car driving. The first one states that depressing the clutch is an action that is included in shifting gears. The second one states that the set of actions to be taken when the car is stalled is an extension of the actions taken in shifting gears. Notice that the “extend” relationship is usually used to take care of exceptional situations. It is akin to catching exceptions to perform recovery action.

Class Diagrams

Class diagrams are most useful in describing class hierarchies in an object-oriented system. This is the part that you are, perhaps, most familiar with. Here is an example of a simple object, called SimpleWatch.

class diagram of SimpleWatch

The rectangles with sharp corner always depicts entities. The solid line establishes an association. The numbers next to the entities is the number of instances of that entity that are involved in the association. For example 2 Battery entities are associated with 1 SimpleWatch entity. If the number is replaced by a star, “*”, then it represents 0 or more instances.

Class diagrams can be used to establish aggregation relationship.

class diagram using aggregation

The diamond is attached to an entity that aggregates the other entity. For example the State contains County entities.

A hollow triangular arrow is used to show superclass relationship. The arrow points to the superclass. Also, a class entity can be divided into three sections, the name of the entity, its internal state, and the methods that be invoked in it by an actor. The following figure shows both these aspects of a class diagram.

class diagram showing inheritance

Instances of an object can be represented by an entity with the name underlined. It may be associated with a class entity with a dashed line that can also be decorated by a label as the following example illustrates.

class diagram showing class instances

Sequence Diagrams

Sequence diagrams are similar to timing diagrams for digital circuits, except that the time is not to scale. Sequence diagrams are used to describe the chronology of internal processes inside a system. The following example is that of setting TimeZone in SimpleWatch.

example of a sequence diagram

Time flows from top to bottom for each entity involved in a sequence diagram. The horizontal arrows are called “messages”. An initiation message is shown by a solid arrow while a return message is shown by a dashed arrow. The arrows end in solid triangles.

Statechart Diagrams and Activity Diagrams

As a final example in this whirlwind tour of UML, here is a statechart diagram for the SimpleWatch that captures the action of setting time on the watch. Notice that rectangles with rounded sides are used to denote states. A solid circle and an encircled solid circle are used to denote the start and the end state, respectively.

example of a statechart diagram

The states could be replaced by activities, in which case a similar diagram would be called the activity diagram.

Software Life Cycle

Software development is a fairly complicated process. Could we build an abstraction to model the software development process itself? The answer is “yes” and, in fact, people have been trying to model software development for a long time. A model for software development is called the software life cycle. Armed with the working knowledge of UML, we can now describe software lifecycle also in UML.

Let's start with a simplistic view of software development.

a simple model of software life cycle

This is clearly a very simple model. The model could be also be viewed in an activity centric way as follows:

a simple activity centric view of software development

Or, in an entity centric way, thus:

a simple entity centric view of software development

A bunch of standards for software lifecycle has evolved. The most recent being ISO 12207 which was introduced in 1995 and revised in 2002. This represents a maturing process of the software engineering discipline. We will briefly discuss a few of the more well known ones.

The IEEE 1074 Model

The following is an entity diagram of the IEEE 1074 model. In other words, it shows the entities involved in the model and relationships to each other. Notice, how the UML representation denotes the hierarchy of entities using “*” labels and diamond connectors.

IEEE 1074 software lifecycle model

The standard carefully describes all the entities and their relationships.

The Waterfall Model

A classic software life-cycle model, the waterfall model was proposed in 1970, by Royce. It is essentially a sequential model that is drawn in a waterfall style—hence the name.

the waterfall model of software life cycle

The V-Model

The V-Model flips the second half of the Waterfall and recognizes that certain activities are at the same level of abstraction and should be connected. This is reflected by the dashed horizontal lines.

the V-Model of software life cycle

Other Models

Boehm's model is a spiral model that is drawn on polar coordinates. The spiral is divided into four quadrants that represent four different types of activities that may be involved in a project.

The models discussed so far suffer from being linear. Recall what I mentioned earlier about knowledge acquisition and rationale management being important parts of modern projects. None of these models captures that ongoing process of getting feedback from the customer, or accounting for changes.

The Sawtooth model tries to fix that by drawing a horizontal line and placing actions either above the line (in the customer domain) or below it (in the developer domain). The actions then go back and forth between the two domains. This makes the activity diagram look like a sawtooth (hence the name). A refinement of the Sawtooth model is the Shark Tooth model.

It should be emphasized that all these models must be taken with a grain of salt. These should be treated as guidelines only and often need to be adapted to specific projects. Nevertheless, they provide a very useful starting point.

For this class, we encourage you to study these models carefully and develop your own for your project. We also strongly encourage you to describe your system using UML diagrams.

Design Patterns and UML

To illustrate the power of UML, let's look at two design patterns. Design patterns are subsystems and, as such, can be described using UML diagrams.

UML model of a recursive design pattern

This first design pattern is useful in encoding recursive data structures, say, trees. Another example that could be described by it is the recursive directory structure in which each directory could contain either files or other directories.

The second example describes a design pattern for a compiler.

UML model of a compiler

Here the compiler is described a subsystem. The pattern says that the components of a compiler should be modular and separate from each other. Notice that the entire subsystem is also labeled by a tabbed rectangle. This is the UML way to abstract away a subsystem—if you want to use the compiler subsystem without worrying about the details inside it you can simply use a tabbed rectangle that has the tab labeled “compiler&rdquo".

Extreme Programming

(from XP web-site)

Extreme Programming is a software development paradigm that has evolved only in the past few years. It is targeted towards small teams, from 2 to 30 people. The project management machinery that we have described so far is fairly heavyweight, but, it works well for large project teams. For smaller teams, it may be desirable to give up a bit of the formality to reduce the project management overheads. Extreme Programming, also simply called “XP”, aims to do precisely that. For projects done in this course XP is perhaps the most suitable paradigm.

XP consists of four distinct types of activities, planning, designing, coding, and testing. Each of these activities has certain rules associated with them. We will only discuss some of the rules related to the coding aspect.

customer is always available: This harks back to my earlier comment about continuous evaluation of your project and continuous feedback from the customer.
code must be formatted to standard: Established best-practices and design patterns come in handy here. Formatting the code is also extremely important for later maintainability.
unit test must be coded first: It is often very tempting to add extra features to a class when you are writing it, believing that those extra features may be useful. Resist that temptation! 90% of the extra features never get used. Writing unit tests first and then developing code incrementally to pass those tests helps in resisting that temptation. Unit tests also serve as very important documentation.
program in pairs: This may sound counter-intuitive at first. One person types while the other sits and observes the code being written and charts the next coding steps. In reality, it works very well. Everybody who has used pair-programming has come back with a positive feedback! We require pair-programming in this course in the first project, and strongly encourage it for the rest.
code is owned collectively: Also called “ego-less” programming, the notion of owning code collectively may be hard to get used to initially. Programmers often feel very strongly about the codes they write and are known to get upset when others modify or improve their codes. However, pair programming, following best-practices for formatting, frequent integration, and unit-tests, all combine to make collective ownership easier. Often, when the modifications integrate very well, the original writer may continue using the modified code without even realizing it!
optimize last: It is tempting to make estimates about program run times (or other resource usages) and try to optimize at the early development stages. Optimization of code should always come last, based on real performance data. Correctness and maintainability are always the first goals.

Next Lecture

Next lecture will bring us back to graphs, when we will discuss another fundamental graph algorithm, called the Depth First Search.

References

[Bruegge, 2000] Bernd Bruegge and Allen H. Dutoit, Object-Oriented Software Engineering: Conquering Complex and Changing Systems
[Newmann, 1995] P. G. Newmann, Computer-Related Risks. Addison-Wesley, Reading, MA, 1995.
[Brooks, 1995] Frederick P. Brooks, Jr., The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley, Reading, MA, 1995.
[XP web-site] Extreme Programming, http://www.extremeprogramming.org/

Arun Chauhan

Last modified: Wed Jan 22 18:12:47 CST 2003