Thursday, March 26, 2009

Risky Words

Taking a break from the book review to post something small but still significant.

Clarity is a very useful tool to manage risk. If something isn’t clear, we’re at risk of getting something wrong. Many times your best risk management tools are your own ears. Nuance in language can be fun when you're flirting with someone at a bar, tossing out double-entendres but when preciseness or clarity is valued, like in requirements specifications, nuance can be a killer.
Delicious Bookmark this on Delicious

Friday, March 06, 2009

Chapter 3 - How Can We Study Programming?

This is a continuation of the review of Gerald M. Weinberg's book, The Psychology of Computer Programming. The review begins here for Chapter 1 - Reading Programs. The post on Chapter 2 can be found here.

What chance have we to understand the human factor? How do you study programming when the important majority of it takes place between one's ears?

Perhaps programming is too complex a behavior to be studied and must remain largely a mysterious process.


INTROSPECTION

Although some modern students of human behavior tend to discredit it as nonscientific, introspection has always been the first foundation stone of their science.


The author tells a story about a student having difficulty with a particular PL/I programming problem:

ANGLES(I)=2*ATAND(SQRT((S-A(IND(I,1)))*(S-A(IND(I,2))))/(S*(S-A(IND(I,3))))));


When it was finished the author asked the student to recall the thought processes -- the difficulties -- he had experienced with this statement, here is that list.

  1. He had trouble getting the parentheses matched, because there were so many of them.
  2. The original ANGLES(i) was a structure, not an arry, so the compiler complained of an illegal subscript list, but he had not been able to find the problem because there were so many subscript lists.
  3. When the program finally got into execution, there were these further difficulties:
    1. IND was a matrix on another page of the listing, and he had trouble finding it.
    2. Even though the parentheses were now matched, one pair was in the wrong place and was most difficult to find.
    3. The last difficulty turned out to be a problem of precision caused by capricious declaration of different data types, by the large expression, and by the division--which would not have been necessary if the two-argument form of ATAND had been used.
The student provided a number of insights, from the proper size of statements, the choice of data structures, to the design of compiler and execution-time diagnostics.

None of these insights could be obtained without introspection of this sort, yet thousands of programmers [millions now days - ed.] each day have theses same problems and simply grumble that they are having "bugs." Without the introspection, they will simply continue to have "bugs" until, in the end, the bugs will have them.

From this single instance of introspection, we could come up with "Laws of Programming"
  1. The mind cannot cope with more than five levels of parentheses.
  2. Compiler diagnostics should be more explicit.
  3. PL/I precision rules are too complicated to use.
Although each of these statements might turn out to be a "law" in the psychology of programming, writing it on the basis of a single introspective example hardly qualifies it as a law. To be a "law," a principle must be explored so as to set limits to its application, because all laws have limits. Indeed, the limits are Usually more important to know than the laws themselves, and it is only by investigating a multitude of cases that limits can be set. Thus, without introspections, investigation would be sterile, but without investigation, introspection will be of questionable value in application.

OBSERVATION

One way to follow up introspection is by observation of what people actually do, as opposed to what they think they do.
Like the study of parentheses and how many levels of nesting programmers use.

One of the problems with observation, however, is that it tells us what people do do, but not necessarily what they can do.

If your census showed that programmers never nested parentheses more than 5 levels deep, does that mean they are unable to?

A second problem with observation is deciding just what it is we are observing.
All you know about the programmers who limit themselves to 5 levels of parentheses is a simple fact, to know why they don't use 6 or more would of course require introspection on the programmers part. If you were designing a language, would you limit the size of arrays to three dimensions simply because you've never seen anyone use more?

A third problem with observation is the problem of interference of the observer with the subject being observed.
This is called the "Hawthorne Effect". The act of observing workers actually affected their performance, a kind of uncertainty principle.

EXPERIMENT

Here is a very interesting section where the author describes the costs of conducting experiments on programmers using typical experimental psychology methods. How do you collect data without influencing the results. To a degree it is somewhat like the problem of testing the performance of a program. The modifications necessary to observe the programs use of memory or CPU time actually uses memory and CPU time so how can you know your results haven't been skewed?

You could do experiments using programming majors in school but would the results apply to professionals? You could study professionals but that could be prohibitively expensive. The author describes doing an experiment using one-quarter time of nine experienced programmers for three months. The average salary was $14,000 [I'm assuming annually - ed.], add in overhead (workspace, machine time, etc.) came to $20,000 apiece, or $1250 apiece over the course of the study. Add to that the experimenters salaries and so forth and the bill comes to $30,000 for nine subjects. [In 1970 dollars! - ed.] Using the Inflation Calculator ($30,000 in 1970 compared to 2009) this comes out to a cost of $163,254.90. Serious money, just to study programming.

... we must deal with one other problem, which is important because of a seldom-questioned view of programming. That view is that programming is an individual activity, not a social activity.

Not that the individual level is unimportant, but we might start by asking why, if the average programmer spends two-thirds of his time working with other people rather than working alone (yes, it's true!), that 99 percent of the studies have been on individual programmers.

The answer is that it is expensive to study programmers in groups. Plus, you can't put a number of trainees together and call them a 'team'.

Putting a bunch of people to work on the same problem doesn't make them a team--as the sloppy performance in all-star games should teach us.

The neglect of social effects also casts doubt upon all individual studies, since these studies force the individual to separate himself from his normal working environment. In one of our studies, one of the programmers came to me when he had finished coding and asked who could check his work. In his home group, this was standard practice at this point in development, but it was not allowed during the study -- lest it should "invalidate" the results.

Looking back, it seems that forcing programmers to work in isolation is like trying to study the swimming behavior of frogs by cutting off their legs and putting those legs in the water to watch them swim. After all, everyone knows that the frog swims with its legs, so why complicate the experiments with all the rest of the frog?

PSYCHOLOGICAL MEASUREMENT

For the present, most of the work in the psychology of programming is going to have to be "looking for questions."
I conjecture that we will move beyond looking for questions only when we start believing and acting on the questions that we've found over the almost 40 years since this book was published. This is definitely not a programming problem. When the software managers are able to dictate practices based on a whim or the local political winds, "We don't have time to do code reviews! Why can't you just make it work? I don't understand why it's constantly breaking!", you might start looking for answers in the abnormal psychology section of the library.

USING BEHAVIORAL SCIENCE DATA

Because of the nature of programming, there is not much hope that actual results can be transferred directly from other fields to ours. Our use of results has to be for insights, not for answers.
So maybe all those comparisons with engineering and construction are invalid on the face of it. We might look at Paul Graham's book Hackers and Painters for insight.

Finally, as we pass successfully through all this, we may find ourselves in a sufficiently objective frame of mind to examine some of the myths of programming -- the articles of programming faith. Faith, as Bertrand Russell pointed out, is the belief in something for which there is no evidence; and myths, as Ambrose Bierce once defined them, are the sacred beliefs of other people. Perhaps we shall simply carry away our old beliefs unchanged, but if just one unfounded programming myth should die as a result, this book will have been worth the effort.

SUMMARY


Common pitfalls:
  1. Using introspection without validation.
  2. Use of observation on an overly narrow base.
  3. Observing the wrong variables, or failing to observe the right ones.
  4. Interfering with the observed phenomenon.
  5. Generating too much data and not enough information.
  6. Overly constraining experiments.
  7. Depending excessively on trainees as subjects.
  8. Failing to study group effects and group behaviour.
  9. Measuring what is simple to measure.
  10. Using unjustified precision.

Several of these items have had some attention paid to them over the last decade. Extreme Programming was one of the first movements focused on groups of developers to get enough traction to garner a following. There are several similar movements, SCRUM, Agile, etc. Instead of assuming the problem was in the process that humans were following, they finally addressed the human factor as a first-order success factor (thanks to Alistair Cockburn)

The question is, when will we learn that we are the problem?


This ends Part 1 - Programming as Human Performance. Next up is Part 2 Chapter 4 - Programming as a Social Activity
Delicious Bookmark this on Delicious

Thursday, March 05, 2009

What Makes A Good Program?

This is a continuation of the review of "The Psychology of Programming". You can find the post on Chapter 1 here.

Chapter 2 - What Makes A Good Program?

If we plan to study programming as a human activity, we are going to have to develop some measures of programming performance. That is, we are going to have some idea of what we mean when we say that one programmer is better than another, or one program is better than another. Although we all have opinions on these questions, we shall find that the answers are not as simple as we might wish. For programming is not just human behavior; it is complex human behavior.


What do we mean by complex human behavior? This article at betterexplained.com does a good job highlighting the false dichotomy between simplicity and powerful. Another explanation my good friend Scott sent me used simple, complicated, and complex. I'll let Dave Nicolette speak for himself.

The spectrum goes from easy to understand and predict to extremely hard to understand and predict (but possible with enough effort), to impossible to understand and predict regardless of the level of effort.

In our studies of the psychology of programming, we shall be hampered by our inability to measure the goodness of programs on an absolute scale.
Plenty of effort has been wasted trying to discover that mythical absolute scale; lines-of-code, lines per day, runtime, memory usage, the list goes on and on. The only thing that is agreed it would be that there is no single metric which can be applied to all programs.

Looking honestly at the situation, we are never looking for the best program, seldom looking for a good one, but always looking for one that meets the requirements.
This might be the hidden reason for why so much lip-service is paid to software quality. Crappy code that ready right now always beats higher-quality, however defined, that is late. Unless there is a business reason which drives a requirement connected to quality (like uptime, throughput, etc.) where time-to-market is trumped by proper outcomes, managers will revert to turning a blind-eye to quality. Risk deferred is risk ignored.


Specifications
Of all the requirements that we might place on a program, first and foremost is that it be correct. In other words, it should give the correct outputs for each possible input. This is what we mean when we say that a program "works," and it is often and truly said that "any program that works is better than any program that doesn't."
Here follows a good story about a system for a car maker to handle all the options which prospective car purchasers could order. When the programmer arrives on the scene of a disaster in the making, the long-settled approach used was not only overly complicated but didn't produce the correct results. When the programmer exclaimed that the Emporer had no clothes, he was condemned for being uncooperative. While on the way home, he can't stop thinking about the problem. In a fit of insight, he realizes that a workable approach was within reach. Upon returning to the customer site, he describes his solution. Naturally the audience gave him a cool reception but they listened without questions - until the original create of the old system raised his hand and asked, "And how long does _your_ program take?" to which the response was, "That varies with the input but on average, about ten seconds per card" (an indication of how old this story is). "Aha," was the triumphant reply. "But _my_ program takes only one second per card." This seemed to settle the issue with the audience until our protagonist observes,

"But your program doesn't work. If the program doesn't have to work, I can write one that takes one millisecond per card - and that's faster than our card reader."
Here is where the sarcastic could say "We just redefine what 'done' means", or "that isn't a bug, it's an undocumented feature"

In effect, then, there is a difference between a program written for one user and a piece of "software." When there are multiple users, there are multiple specifications. When there are multiple specifications, there are multiple definitions of when the program is working. In our discussions of programming practices, we are going to have to take into account the difference between programs developed for one user and programs developed for many. They will be evaluated differently, and they should be produced by different methods.
Imagine a scene from "I Love Lucy" where Ricky Ricardo and Fred Mertz are trying to rearrange furniture at the behest of Lucy and Ethel. "Move that painting to the other wall", Lucy commands. When she then turns her attention to having the men move the couch "a teensy bit" closer to the wall, Fred moves his side an inch while Ricky moves his side a foot; after all, how many "teensies" are in a foot? Ethel, meanwhile, sneaks over to the picture while everyone else is busy with the couch and moves the painting back to its original location, "It looked better that anyway", she thinks to herself.

However comical this might be, when this happens in software it can be many, many times worse. With multiple stakeholders, multiple upstream and downstream dependencies, and a large group of developers trying to write code, it should be easy to realize how requirements (and their management) can become a nightmare.

Schedule
One of the recurring problems in programming is meeting schedules, and a program that is late is often worthless.
Most people prefer to wait a fixed ten minutes for the bus each morning than to wait one minute on four days and tewenty-six minutes once a week. Even though the average wait is six minutes in the second case, the derangement caused by one long and unexpected delay more than compensates for this. This is where a psychological study would be rewarding. Project management is not a science and only partially is controlled by mathematics. Knowing the psychological landscape in which the project functions can make a big difference in its success.

Adaptability


Few programmers of any experience would contradict the assertion that most programs are modified in their lifetime. Why, then, when we are forced to modify programs do we find it such a Herculean task that w4e often decide to throw them away and start over? Reading programs gives us some insight, for we rarely find a program that contains any evidence of having been written with an eye to subsequent modification. But this is only a symptom, not a disease. Why, we should ask, do programmers, who know full well that programs will inevitably be modified, write programs with no thought to such modifications? Why, indeed, their programs sometimes look as if they had been devilishly contrived to resist modification - protected like the Pharaohs' tomb against all intruders?
This really goes to the heart of the quality software movement. If it's shown time and again that the majority of software cost is in maintenance, why do we fail so miserably to factor in maintainability while writing software?

Adaptability is not free. Sometimes, to be sure, we get a program that happens to be adaptable as well as satisfactory in all other ways, but we generally pay for what we get - and often fail to get what we pay for.
It's as if the manager figures that maintenance doesn't come out of his budget and she's only being measured by whether the project comes in on time even if the resulting code will cost twice as much to maintain.


Fisher's Fundamental Theorem states - in terms appropriate to the present context - that the better adapted a system is to a particular environment, the less adaptable it is to new environments.
This has large effects on requirements. If one requirement is to have a database that runs on a specific system, a hidden cost is now attached where if that vendor goes out of business, the code can be next to impossible to move to another platform. You satisfied the requirement but paid a huge price. "An ounce of prevention is worth a pound of cure."

However, the same managers who scream for efficiency are the ones who tear their hair when told the cost of modifications. Conversely, the managers who ask for generalized and easily modified programs are wont to complain when they find out how slow and spacious there programs turn out to be. We must be adult about such matters: neither psychology nor magic is going to help us to achieve contradictory goals simultaneously. Asking for efficiency and adaptability in the same program is like asking for a beautiful and modest wife. Although beauty and modesty have been known to occur in the same woman, we'll probably have to settle for one or the other. At least that's better than neither.
Be carefule what you wish for, you just might get it...

Efficiency

Moreover, with the cost per unit of computation decreasing every year and cost per unit of programming increasing, we have long since passed the point where the typical installation spends more money on programming than it does on production work. [like the cost of machine time, ed.] This imbalance is even more striking when all the work improperly classified as "production" is put under the proper heading of "debugging."
Even 30 years ago, the cost of development was the largest cost in the field of programming. If you don't have a manager who understands the challenges and complexities of software development, you are facing a two-front war, 1) the challenge of writing software that works in the time required, using the resources allotted and, 2) a struggle to keep a clueless, however well-meaning, boss from making things worse through ignorance much less incompetence.

Summary

Questions to ponder:
1. Does the program meet specifications? Or, rather, how well does it met specifications?
2. Is it produced on schedule, and what is the variability in the schedule that we can expect from particular approaches?
3. Will it be possible to change the program when conditions change? How much will it cost to make the change?
4. How efficient is the program, and what do we mean by efficiency? Are we trading efficiency in one area for inefficiency in another?


I can here the pointy-haired-boss now, "Why are you asking so many (implied -stupid-) questions? I don't understand why you're making it so hard, just go write it already!"

This is a good place to point out that repeating the same behavior and expecting a different outcome is one of the definitions of insanity.

Next is Chapter 3 - How Can We Study Programming
Delicious Bookmark this on Delicious