[an error occurred while processing this directive]

.NET Framework

by Howard Gilbert

The .NET Framework is a new execution environment for Windows programs. It is available as an optional upgrade for any version of Windows after (but not including) Windows 95. Specifically, it can be added to Windows 98, NT 4.0, Millennium Edition, Windows 2000, and Windows XP.

Microsoft distributes the .NET Framework in three forms::

Check immediately on the Microsoft Web Site for updates and Service Packs. SP1 became available in record time and may have already been replaced.

A program compiled and linked to run in the .NET Framework environment has a new EXE or DLL format. It can only execute on a system that contains the Framework. This means that you will need to apply the Redistributable package to any machine on which you want to run programs you develop with the SDK or Visual Studio.NET.

The .NET Framework is a released and fully supported Microsoft system component. The Framework supports execution of any program compiled to use its services. Microsoft's Visual Studio.NET supports development of Framework programs in C++, Visual Basic, JavaScript, and C# (pronounced C-Sharp), but other companies sell compilers that create .NET programs in other languages. For example, support for Perl is available from www.activestate.com.

Microsoft intends to add support for the Java programming language, but currently this support is in Beta 2 test status. Three tools will be provided:

  1. A Java source file can be directly compiled to a .NET Framework application. Because Sun owns the trademark on the Java language, and this use does not conform to Sun's standard, this use of the language has been renamed J# (J-Sharp).
  2. A Java binary *.class file can be directly converted to a .NET Framework executable.
  3. A Java source file can be translated to C#

The J# Beta 2 package can be downloaded from the Microsoft site.

The J# compiler takes Java 1.1.4 compliant source and produces the same output files as any other .NET Framework language. J# can use any .NET Framework system libraries. Since Java programs will import references to the standard Java class libraries (java.lang, java.io, java.net, java.util, java.sql, etc.). Microsoft created additional .NET Framework class libraries to emulate the standard Java libraries. These emulated Java class libraries are, however, really .NET Framework classes. This produces the unexpected result that a Visual Basic program could, at least in theory, create an object of type java.net.ServerSocket if it wanted to.

Microsoft does not provide emulation libraries for subsequently released Java components. If an application depends on Swing,  JNDI, JavaMail, or vendor JDBC libraries, then these applications may not convert cleanly to .NET. Microsoft would urge you to convert to their own technologies (replacing JNDI with ADSI for example).

J# support comes in two pieces. The first piece corresponds to the Redistributable/SDK packages and contains the library modules that emulate the standard Java libraries. It also contains the command line J# compiler. This unit must be installed on top of the Redistributable package to provide the libraries needed for the execution of any program developed with J#.

The second package contains the Visual J# component that can be integrated into the Visual Studio.NET IDE. This second package will only be installed on development machines.

Although the C/C++ compiler in Visual Studio.NET is able to compile traditional C and C++ programs and produce a normal Windows EXE or DLL file, the older version of Visual Studio 6.0 is still supported to create Visual Basic programs that run as traditional Windows applications. Visual J++ is probably supported for a while as well, although anyone serious about Java development will convert to another IDE that supports a more current version of Java.

A version of the .NET Framework for Windows CE handheld units is also available in Beta test.

A Shared Source distribution of a subset of the .NET Framework is available for non-commercial research purposes. Microsoft has ported this distribution to the FreeBSD version of Unix. It is certainly possible that Microsoft would use this code (known as "Rotor") to create versions of the Framework for the Mac or popular Unix systems. However, there is currently no commitment to make such code available.

Misconceptions

Microsoft has used ".NET" as a brand name or buzzword for a number of new server products. This has nothing to do with the Framework.

Microsoft says a lot about "software as a service". It is trivial to use Visual Studio.NET to turn code written in C#, Visual Basic, or Java into a Web Service that receives requests over the Web and generates XML replies, but this is only a tiny part of the Framework.

The .NET Framework is not the C# language. C# is a language derived from C++ with ideas from Java. The Framework supports over twenty different languages including such widely different languages as COBOL, Smalltalk, APL, and Pascal. No individual language includes all the runtime elements used by all languages. For example, languages like C# that are derived from C cannot define internal procedures supported by languages based on PL/I or Pascal. When an internal procedure runs, the CLR must provide it with access not only to the variables in its own area of the stack but also to variables in the part of the stack belonging to its parent procedure. C# doesn't need or use this runtime feature.

The .NET Framework may compete with Java 2 Extended Edition, but so does everything else. Any programmer with a C++ compiler can write Web Services on any version of Unix. The fact that two separate products happen to have staked out the same problem domain to emphasize in their advertising doesn't say anything about their structure or design.

The most meaningful predecessor to the .NET Framework is a joint development project funded by IBM and Apple from 1992 to 1996 called Taligent. As was noted in the article A Brief History of Taligent:

It was decided that people didn't really want a new operating system, but that rapid application development was still important. The Taligent OS became a layer that could sit on top of any modern operating system and provide numerous services to applications software, thereby shortening the development cycle. This layer consisted of more than a hundred object-oriented frameworks and well over a thousand classes. It ran on top of AIX, HP-UX, OS/2, Windows NT, and a new Apple OS kernel, and it was called CommonPoint. 

CommonPoint was most similar in scope and portability to Sun's subsequent Java environment, but based on C++ and without Java's virtual machine and new object programming language

Today we can look back at CommonPoint and Taligent from a perspective that includes the Java language and its runtime environment. If we decided to solve the same set of problems using essentially the same technologies, and decided to target the Windows family of operating systems that run on 90% of the world's computers, the result would be the .NET Framework.

Old Problem

The earliest computers ran one program at a time. IBM invented the modern Mainframe Computer in 1965 with its "System 360" family. These computers ran several different programs for different users at the same time. To maintain integrity, each program ran in its own isolated region of memory. Over time, mainframe, Unix, Windows, and Mac OS systems all developed the ability to isolate each running program (a "process") in its own area of memory (an "address space").

As a program runs, it acquires system resources. In addition to memory, it may open files on disk, connect to network servers, start database transactions, start timers, and so on. Operating systems connect all of these resources to the "process" that requests them. When the program ends, either normally or because it had a fatal error, the system cleans up by closing the files, disconnecting network sessions, backing out the transactions, and so on.

The system must go to a lot of work to create a new process and to clean up after it when it ends. This can be a substantial burden if the program is going to do relatively little work. Yet most network requests are fairly trivial. Systems had to find a way to turn computers into network servers without the expense of the usual process management.

In 1969, IBM came up with one possible solution. It created an early runtime framework called "CICS". CICS was a little operating system inside the real OS. It duplicated the usual program interfaces to manage memory, open files, communicate with remote systems, etc. However, CICS ran lots of programs in the same process address space. There was not guarantee that two programs would not interfere with each other, but CICS only ran programs carefully written by IT professional staff. These programs were tested and retested before being allowed to run in the production system.

CICS was extremely efficient. It could handle thousands of network users on a machine that might have trouble running a dozen ordinary programs.However, this efficiency came at a price. If any program had an error, it could crash the entire CICS environment and leave the thousands of remote users unable to access the system. If any program forgot to release resources, then the footprint would get bigger and bigger until the system ran out of the resource and came to a halt. Since all the programs ran in a shared environment, the system allowed them all to use the same files and access the same data.

Now jump forward thirty years. The World Wide Web has become a driving force for computer systems. Applications can run on a Web server, and there is a new standard for "Web services" that allows even smaller components and subroutines to reside on the Web and be called by other programs. The Web server, however, confronts the same dilemma that IBM confronted earlier without completely solving.

Using the same approach as CICS, a Web server can run all the application programs in its own process address space. This is very efficient and supports lots of remote users while minimizing use of storage and CPU. However, the first time any application program has an error then entire server may fail. There is also a problem when all these different application programs written by different programmers share a common userid and, therefore, access to all the same data.

Of course, the Web server can choose a more traditional design to run each application in its own isolated process address space. Now a failure in one application doesn't effect the server or other applications. Each process can have its own userid and therefore its own file access privileges. However, every request received by the server has to be packaged up and transferred using operating system services over to the application's environment, and every response from the application has to be shipped back. This adds a lot of overhead to the system and reduces the number of remote users that can be supported.

New Answer

Since this problem has been around for decades, one might assume that there was no third alternative. Then Sun's Java programming language and its runtime environment (the Java Virtual Machine or JVM) demonstrated that there was another way to run programs. Instead of fully compiling programs to machine instructions, Java converts the source into an intermediate "byte code" that expresses the program but which cannot be directly executed. When the program is loaded, the runtime finishes the translation by converting the byte code into Intel CPU machine instructions. During this last translation, the byte code is verified, and only well behaved instructions are generated. The instructions generated to represent a Java routine cannot access data that is supposed to be private to another routine. It cannot index past the end of an array or buffer. It cannot call the operating system directly, but can only access system services indirectly by calling library routines provided by the Runtime.

Java started as a language to program network attached devices. It became successful as a language for programming embedded in a Web page. Recently, Sun's Enterprise Edition extended it to Web servers and distributed components. It provides a solution that is almost as efficient as the CICS "everything in one shared address space" approach, while providing the integrity and security normally associated with "each program in its own isolated process".

Sun mostly sells hardware. It produces the Solaris version of the Unix operating system to run on that hardware, but remains committed to the open Unix standard. In Java, Sun sees a language for applications that can freely move between devices and operating systems using a lowest common denominator of system services.

Microsoft doesn't make computers. Windows isn't a version of Unix and isn't supposed to be like anything else. Microsoft aggressively pursues a competitive advantage by adding into Windows features that make it "better" than other systems. Microsoft needs to give application programmers access to these unique features. Microsoft licensed Java and was initially enthusiastic about it. However, Microsoft treated Java as "just another language" and incorporated extensions so its version of Java could do anything that Visual Basic could do. However, a program that used services unique to Windows would not run on other operating systems.

Sun went to Federal Court and made all sorts of claims. They claimed copyright protection. When that didn't work, they claimed breach of contract. After the contract ran out, they have recently made antitrust claims. Sun's claims were not just limited to the material that Sun had developed and licensed to Microsoft. It even claimed the right to limit any Java-related development Microsoft did independently. The judge issued preliminary injunctions that killed off further Microsoft's Java development, and then the case dragged on for three years of preliminary motions without coming to trial to actually decide any of the issues.

So Microsoft was left in a bind. The Java approach provided a new, elegant solution to the design and hosting of distributed applications. However, the injunctions that Sun got from the court prevented Microsoft from actually using Java to solve the problem using Windows technology. So Microsoft had to design some new environment that was aggressively "not-Java". The ideas that Microsoft wanted to borrow were not patented, so Microsoft had a legal right to use them. As long as it did not use the forbidden word "Java" in its development, Microsoft would remain free from Sun legal claims. The result was the .NET Framework.

Like Java, a .NET Framework program is compiled to an intermediate code called Microsoft Intermediate Language or "MSIL". The Java byte code contains only those features needed to support the Java language itself. The Java designers kept the language as simple as possible, and excluded some features like internal procedures (needed by Pascal) or unsigned integers (used in C++). Since Microsoft was prohibited from using Java itself, MSIL was expanded to a "Common Language Runtime" that could support more than 20 different programming languages.

Managed Code

To run in the .NET Framework, an EXE or DLL must contain code generated by a compiler in MSIL format, plus information called "Metadata" that describes all the types, fields, and functions defined by the module. There is a new form of EXE/DLL for the Framework. Each EXE or DLL can either contain Framework code or traditional Windows machine instructions. They cannot be mixed together in the same module.

Traditional Windows programs, including the command shell or the graphic desktop, ask the system to run an EXE module just as they always have. If the EXE happens to be a Framework program, then the Common Language Runtime (CLR) will be called to initialize the Framework execution environment.

Even if the EXE is not a Framework program, it can call a DLL linked in the Framework format. In this case, the CLR is called to initialize its environment when the first Framework DLL is loaded. There is only one CLR environment in a process. When subsequent Framework DLLs are loaded, they discover that the Framework is already active and join it.

The CLR has to create the data structures to manage memory, programs, objects, threads, errors, etc. When a routine is called for the first time the CLR invokes the Just In Time (JIT) compiler to convert the MSIL into ordinary machine instructions. This translation verifies that the MSIL program is safe and well behaved. This is not the place to describe all the things that the Framework checks for, but a few examples are in order:

The Framework EXE and DLL modules and the MSIL generated code that they contain is called "Managed Code". As the name suggests, this code is controlled by the CLR and has been generated so that it must remain well behaved. It cannot access memory improperly. It cannot directly call system services.

All the programs, libraries, and COM components, that were written for Windows before the .NET Framework (or are written later using traditional tools) are called Unmanaged Code. "Unmanaged" means that these programs start out as Intel instructions, can reference any memory location in the process address space, and can call Windows system services directly. They are still controlled by the Windows operating system rules, but they cannot be constrained by the CLR.

Managed code can call Unmanaged library routines. The development tools will automatically generated interface routines that allow Managed code to call any COM service registered in the machine. The Managed part of the program runs under the CLR, while the Unmanaged code runs under Windows and the Win32 API. A COM component can even be written in Java and run under the Microsoft JVM.

Java code is packaged as individual binary *.class files. For easier distribution, these files are often zipped up in a JAR archive file, but they are still considered as separate files. Each named Class designates a specific type of object and provides the code for operations performed on the objects.

Managed code, however, must be linked into an EXE program or DLL library module. The Framework calls such modules an "Assembly". Even if every source file defines only one Class, after several are linked together the Assembly itself will define a collection of classes and data types. Its Metadata merges together the information about each type, data field, method function, argument, and so on. Assembly Metadata provides information to a compiler when building a program that creates object of a type defined by the Assembly or calls functions associated with these objects. Then again at runtime the Metadata provides dynamic information about the objects.

Operating System within an Operating System

Most of the books and articles on .NET approach it from the application programmer's point of view. Microsoft is, after all, trying to attract a lot of current programmers to its Visual Studio.NET development tool and to .NET Framework programming. From this perspective there are many similarities to Java, particularly in the standard libraries and data types. However, while Java started with individual application programs and then evolved into Enterprise Edition and its containers for distributed applications, the .NET Framework starts with the multiuser "application server" environment and then supports ordinary applications as a simplified case.

If you want to use the Framework, look at it from the application programmer's point of view. If you want to understand the framework, compare it to the operating system. An OS has to run multiple programs on behalf of many users. It must provide integrity, so the programs cannot interfere with each other. It must provide security, so each program can access on the data it is authorized to see. It must trap and recover from errors. When a program ends, it must release all the storage, files, and other resources that the program was using.

Windows does all these things using the traditional rules of processes, address spaces, and userids. The .NET Framework duplicates the same functions at a finer grained level of detail. Windows, Linux, and other PC operating systems separate the ordinary application program from the system code using program management features built into the Intel CPU chip. Application programs run in a hardware enforced state that prevents them from messing up the OS or other applications. The OS runs without this constraining flag and can do whatever it wants. In traditional computer architecture, hardware was the only way to achieve system integrity.

Then Java came along with its idea of "byte codes" converted only at the last minute to machine instructions. The translation mechanism ensures that the program cannot generate statements that do "bad" things. This idea is an even more powerful way to constrain the program and provide program integrity. The CPU only had a single flag that could separate benign from dangerous operations. The logic that generates the final program from the byte code can look at hundreds of different program characteristics and fashion a wide range of precise constraints.

Although the Framework is physically integrated into Windows, it operates within the application program environment. That is, the entire framework runs with the CPU flag set so that it can only perform operations available to all other applications. Any integrity checks that the framework creates are added on top of and not instead of ordinary system security. Every program running under the Framework must run in some process and runs under some userid. The system enforces the usual rules, so the Framework program can only access files that are permitted to the userid under which the program is running.

Assembly (like a DLL module or jar)

An Assembly is an EXE or DLL file linked in the new Framework format. In theory, an Assembly can consist of more than one file, but the Visual Studio tool cannot build Assemblies larger than a single EXE or DLL.

Once it is built and tested, an Assembly can be simply copied to other computers or it can be packaged up with a fancy Setup tool. EXE files and private DLLs are installed in ordinary file directories. DLLs that will be shared by lots of different programs may be installed with a Windows utility into a special database called the Global Assembly Cache. In theory, an Assembly can be loaded over the network at runtime from a designated URL.

There is a small ambiguity in the terminology. A programmer uses the name "Assembly" to identify the module or set of files that are built, distributed, and installed. The Framework, however, uses the term "Assembly" to represent a runtime object that manages a chunk of MSIL code and Metadata loaded into memory. This ambiguity has always existed with previous informal terms. The word "program" may represent an EXE file on disk, but it also is used to describe a process that is running on a machine. However, Assembly is a new term and someone reading the Framework documentation might be confused when it switches from one meaning to the other.

In Windows, a DLL just has a name. In the Framework, each assembly has more identifying information. If it came from disk, then the source directory is remembered. If it came from the network, then the URL and location of the server are remembered. More importantly, during development an assembly can be digitally signed by the program authors. This provides reliable information on the original source of the code (no matter where it is loaded from) and provides protection against tampering.

During the translation of MSIL to instructions, the Framework constrains each unit of code to be properly behaved. A program may call upon a library assembly to provide services, but it cannot access data the assembly regards as private. An assembly written by professional staff may be granted privileges by policies configured by the administrators that are not granted to ordinary application programs. This assembly can then be called by an ordinary program. When it runs under the application, the assembly has the extra privileges. However, the assembly decides whether and how to use them. The application program can request services of the assembly, but it cannot interfere with the internal behavior of the privileged assembly or subvert the identity checks.

Application Domain (like a process)

The operating system creates a process address space to run an EXE program. There is a Windows system service that allows one running program to create a new process to run a second problem in a new address space. When a process ends, the system frees all its resources and, by deleting the address space, frees all the memory allocated by the process.

The .NET Framework and its Common Language Runtime have to operate under an operating system process in a standard address space. However, the Framework wants to run lots of smaller programs inside the same process. It creates a light-weight version of a process called an "Application Domain" (or more commonly, just "App Domain"). The first time the CLR is initialized in any process address space it creates the initial default Application Domain called "default". Then just as Windows has a service to create a new process, the Framework exposes a service that allows Managed code to create a secondary Application Domain.

Managed code creates another App Domain when:

The Framework loads Assemblies for requests associated with some App Domain. The Assembly, its classes and types, and all the objects created from types defined in the Assembly, all are queued by the Framework to the App Domain that loaded or created them. When the App Domain ends, the Framework can find all the objects created by it. The Framework then:

From the Windows operating system point of view, a Windows process loads a DLL module into the address space. From the Framework's point of view, an App Domain loads an Assembly. There is an important difference in the two views. Windows only loads one copy of a DLL into an address space. A second request for the same DLL resolves to the previously loaded module. The Framework also only loads one Assembly into an App Domain. However, when different App Domains running in the same process space each load the same Assembly DLL, the Framework creates more than one "Assembly" object even though there is, in fact, only one DLL module in memory that both are sharing.

This is important because the Framework Assembly object tied to the App Domain serves as an anchor for all the objects created by that App Domain from data types and classes defined in the Assembly. Each App Domain maintains its own pool of objects. When the App Domain terminates, either because the program is done or because an error occurred, these objects can be finalized and their storage can be released without effecting other App Domains running in the same process.

For this design to work, App Domains cannot share the same physical object. Even though they are in the same address space, and Unmanaged code under the same circumstances could easily share pointers to the same memory location, the Framework refuses to give a direct reference to an object created by one App Domain to any code running under another App Domain. The Framework behaves as if they really were running in different processes.

Now operating systems have been dealing with communication between processes for decades. If you want to pass an object as an argument to another process, there are two solutions. Code can make a copy of the object ("marshal" or "serialize" the object) and pass the copy. If it is necessary that there be only one copy of the object, then the system creates a surrogate "stub" object and passes it along as the argument. Whenever an operation is performed on the stub object, the request is packaged up an shipped back to the caller's environment for actual execution.

The Framework provides both options for communication between App Domains. Since App Domains run in the same process, the Framework does provide one shortcut simplification. Although objects and their references are absolutely segregated, threads are not subject to this restriction. Therefore, when code in a thread running in one App Domain calls a function attached to a surrogate stub object, it really calls Framework logic that locates the real object from the stub, switches identity over to the other App Domain, and then directly calls the real routine using the real object. Looking down from the Windows system point of view, the thread calling stack has a history of routines called under the first App Domain, a transition routine belonging to the Framework itself, and then additional routines running in the second App Domain. As the routines return to their caller (or when an exception is thrown) the thread stack unravels back to the calling environment.

This careful isolation ensures that the Framework can, at any time, rip an App Domain out of the system, free all of its logical resources, release all of its storage, and simply toss it away without damaging any of the other code running under other App Domains in the same address space. If some of the other code was making one of these remote method calls to the terminating App Domain, it receives back an error exception. However, it can easily handle the error and continue processing.

Permissions

The Framework creates App Domains and then loads assemblies into them. As it does so, it examines the system policy of the machine on which it is running and assigns permissions based on the properties attached to the domains and assemblies. A particular application may be granted wide privileges when installed on the desktop computer belonging to a ordinary employee. Run the same program on a production database server and it may have relatively limited permissions.

The Framework has a sophisticated initial design that can be extended by local programming. For example, an Enterprise may have a requirement that program changes run through a formal test process before they are deployed on production servers. It could create its own 'Integration Testing Approval" attribute that is assigned to assemblies after they have passed the test. Production servers could then be configured to refuse to load or use any assembly that lacked this attribute.

Finalize

When a program ends, every resource that it was using must be freed. If files were opened, the I/O buffer space must be freed and the file must be closed on disk. If a database transaction is incomplete, the database must be told to do a Rollback operation. Depending on the system rules, a program may wait for other programs it started to complete.

Thirty years ago an operating system simply had a built-in list of resources. It would run through the list when a program ended freeing each type of resource in order. If a new type of resource was created, it had to be added to the operating system list.

Object Oriented programming added a new model. If every system resource is represented by an Object, then each Object can have associated with it a "Finalize" routine (also known as a "Destructor" in some languages). When the Object is no longer in use, the runtime calls its Finalize routine to clean up any associated resources.

The operating system can only cleanup when the entire process ends, and then it cleans up everything. Using Finalize routines, the Framework can cleanup objects after the subroutine that allocated them ends, provided that they have not been added to some queue, array, table, or other structure that is still in use by another part of the program.

Objects are freed and Finalize is called by a process called "Garbage Collection". The system performs Garbage Collection when it feels a need to do so. Programs, therefore, cannot assume or predict when a Finalize routine will run.

Unless an App Domain terminates. Then the Framework design forces all objects allocated by the App Domain to be freed and ensures that their Finalize routines have run.

All the built-in standard classes of the .NET Framework are carefully written to clean up after themselves. If your application code only uses resources allocated by Framework classes, or Managed library routines that properly clean up with their own Finalize routines, or COM components through the interface stubs generated by Visual Studio, then it probably isn't necessary for your code to have its own Finalize routine. Everything will get cleaned up for you. A Finalize routine is essential, however, if your class somehow allocated Unmanaged (operating system) resources.

Framework Objectives

The objective of the Framework should come as no surprise.

Bad Habits

Today, most programming is still done in computer languages whose limitations have been know for decades. Existing languages allow, and sometimes require, unchecked subscripts that can overrun the end of an array or a buffer. Pointers values can remain long after the thing they point to has been discarded. Rules for automatic conversion between unlike data types vary wildly from language to language and are impossible to remember. Dangerous language conventions make errors the default state unless they are explicitly avoided, like the rule that case blocks in C and Java fall through to the next case unless explicitly aborted with a "break" statement.

C, COBOL, Basic, and FORTRAN should have been repaired or replaced long ago. Why are they still in use? For the same reason that we insist on crashing billion dollar rockets onto the surface of Mars because we retain the measurements of "feet" and "pounds" while the rest of the world has converted to metric. For the same reason that we speak in a language filled with irregular verbs and archaic spellings. For the same reason we continue to type on a keyboard layout originally designed to slow typing down to avoid jamming a mechanical mechanism. Conversion requires effort, organization, and a willingness to put up with short term inconvenience for a long term advantage.

The .NET Framework, when it is fully in control, prohibits the most dangerous of legacy programming structures. For example, the Framework services could not reproduce the behavior of the C language "malloc()" function. This function allocates a certain number of raw bytes of memory and returns a pointer to the storage. The C program then "casts" the storage into the form of an array, string, buffer, or data structure.

The Framework enforces "type safety" as represented by the "new" operation of C++ or Java. Instead of asking for raw storage, the program asks the system to allocate "an array of 100 integers" or "a buffer of 1024 characters". The Framework calculates the required number of bytes, allocates the storage, and then passes it back to a program that can only use the storage as the type of data structure that the program declared it to be when it was allocated. For example, an array of 100 four byte integers cannot be treated instead as an array of 200 two byte integers, even though the two arrays occupy the same amount of storage.

The Framework supports existing languages, so C++ or Java still have all the problems in their original design. A programmer can still make errors by, for example, running a loop one more iteration beyond where it should have stopped. The Framework cannot guarantee that a program will produce the right answers. It can, however, guarantee that the program doesn't clobber storage or cause trouble for other programs.

If you doubt the need for or resistance to change, listen to the howls of complaint from Visual Basic programmers who discover that they have to change the way they code ever so slightly to operate inside Visual Basic.NET. At least Visual Basic is a language developed entirely by Microsoft. Some of the other languages that have been adapted to the Framework are subject to ANSI and international standards. Adding even slightly non-standard features is not done lightly.

Libraries of Reusable Components

Too much time is wasted by programmers re-inventing, recoding, and re-debugging the wheel. Often an application programmer starts with essentially a blank slate. Need to perform a common technical or business function? Code it again from scratch. Any preexisting code that does the same thing is probably buried in some old monolithic program, written in some programming language you don't know, undocumented, and uncommented. Even if you extracted this old code, adapting it to the requirements of your new application will be difficult.

The solution to this problem has been understood since the '70s. We need to create small modules that each perform one specific function. Each such module should be well written, documented, and already debugged. The module can then be stored in a library where it can be subsequently used by anyone who needs the same service. Ideally the operating system vendor will provide a large library of software modules that perform general purpose services, and the customer will add libraries of routines that provide business logic specific to the enterprise.

The largest and most thorough attempt to reform software engineering was initiated by the Defense Department in the '80s. They examined all the problems in existing programming languages, emphasized the importance of reusable components, evaluated alternatives through a multi-year study with wide public comment, and proposed as a solution the Ada programming language. They supported this choice with a mandate requiring the use of Ada throughout the billions of dollars of DOD procurement. Even with this leverage, the initiative failed.

The failure of Ada suggests that any advance in Software Engineering will not come from the creation of the Great Wonderful New Programming Language. No matter how good any language may be, you will never get everyone to learn it and you will never convert the billions of lines of existing code to it. Esperanto has been around for a hundred years. It is clearly the easiest language to learn and use. Nobody speaks it.

So while Java is a really good programming language and a very useful tool, we have good reason to be skeptical that it will finally be the answer to the Software Engineering deadlock. The .NET Framework stands a better chance. Instead of attacking the old bad languages head-on, more progress may be made by working on the Runtime. If routines written in many different programming languages can be combined to form an application, then over time the old bad languages can be incrementally replaced by new good languages without the cost and disruption of some one-time massive conversion.

Different Skills

Every organization has lots of different people with different skills. Some people are very good at financial planning, but they are not programmers. The logic behind many complex financial calculations is expressed in Excel spreadsheets. Well, Excel is a kind of programming, and it may be the best "language" in which to express that particular type of logic. A language called APL, invented in 1970, happens to be the most direct way of expressing formulas of Linear Algebra used in a number of Operations Research and statistical calculations. In a manner of speaking, the HTML language used by Web pages provides one method of programming a user interface.

A business needs an application that solicits data from users, performs a financial calculation, and displays the results. The true believer may argue that it should all be coded in Java or in one of the scripting environments (JSP, taglib) that derive from Java. An alternate approach farms the work out to different people. Given the human interface part to someone who cares about every little color and the alignment of columns, even if that person only knows how to design Web pages in HTML. Give the financial calculation part of the program over to an expert in financial calculation, even if he can only create spreadsheets. Test each element separately. Then combine then in a single application that hides Excel behind a Web interface.

Using conventional programming technology it was generally necessary to have separate libraries of reusable program logic for every programming language that an institution supported. Even if there was an effort to create reusable modules, there was a good chance that the module you needed was written in a language that was the wrong choice for you.

The .NET Framework solved this problem by creating a Common Language Runtime environment shared by all programs in over twenty programming languages. Library components developed in any language can, if they follow a few rules, be used by programs written in any other language. Should additional features be required, then existing library routines should be extended (through the object oriented technique of "inheritance") to create derived library routines with additional services. Should existing functions require new variations, then existing library routines should be modified (through the object oriented technique of "method override") to create alternate routines with different behavior. These extensions and modifications can be made by separate modules written in any programming language without regard for the language used to write the original library module.

Perl has much better features for casually parsing data in text files than Java has. Java has native threading capability not found in other languages. Visual Basic has more programmers than any other language.

Common practice is for an IT manager to assign each application to a single programmer. That programmer then uses a single programming language to write all parts of the application. There will be some sections where the language is a good choice, and some where it is a disastrously poor choice. The manager only had to spend a few minutes thinking about the problem, and any failure or problem is usually blamed on the programmer.

However, in the .NET Framework environment with a Common Language Runtime, the application requirement can be broken down into separate functional components. The user interface, database access, business logic, security, and other elements can be individually specified and then assigned to people who specialize in that particular area of technology. Each person is then free to code his assigned component in whatever programming language she is most comfortable using, or the language whose features make that type of coding easiest to complete. The Framework then allows all these disparate elements to be combined in a single application .

Business applications would then be developed by stringing together a sequence of calls to previously developed library modules rather than writing large amounts of new code.

Although the Framework may unite new code written in programming languages, existing applications like Excel do not run under the control of its Common Language Runtime. So the Framework supports use of legacy applications and components through the COM interface. Any COM component can be treated as if it was part of the Framework library, and the necessary interface logic will be automatically generated.

Unlearn

To really understand a new programming model, each person must first unlearn some of his basic assumptions. The .NET Framework is surprising in some of its originality.

The Framework is not based on Visual Basic (or COM).

VB was the first big success story in Windows development. It may be the most widely used programming language in the world. VB defined an interface for extensions (the "OCX" modules), and this interface became the basis for OLE, which in turn became COM.

The recurring idea in VB/COM is something called a "Variant". A Variant is a variable that can hold any type of value. It can contain an integer, floating point number, string, array, or a pointer to a system Object. All variables in VB are Variants unless they are declared to have a more precise type. All the COM interfaces use Variant as the preferred argument or returned data type.

Variants are so widely used through previous versions of Windows interfaces that they seemed to be a Microsoft way of doing things. Since they embody a "weakly typed" style of programming, they were the opposite of every idea of good Software Engineering. You cannot get high quality code from poor programming technique. It seemed that Microsoft would always have bugs and the resulting security problems.

Then Microsoft released the .NET Framework which bans the Variant. Framework programming has to be strongly typed. This is a major change to Visual Basic programmers, and in the near term they will complain. However, it is an essential step that must be made sooner or later if the Windows platform is to advance to sound software development practices.

The Framework is neither Java nor anti-Java

The project that created Java was originally looking for a language to program function into small networked devices. The primary example was a cable TV set top box. They may have used the small memory size of such devices as an excuse, or they may have been influenced by the general Unix bias that "less is more." The Java language was designed to make do with as few features as possible, so the JVM could be as simple as possible. The Java byte code was designed to minimize file size to conserve network bandwidth.

Of course, no design survives as technology changes. Java today is much larger than the original design, and cable TV set top boxes are no longer as important a platform as large corporate application servers. However, Java remains strongly influenced by some of its original design decisions.

The .NET Framework starts as an extension of Windows. There is an intent to release compact versions of the Framework for Pocket PC handheld computers and embedded systems, but by the time that market develops the average toaster oven may have a few hundred megabytes of memory.

From Nov 1998 when Sun got the first set of injunctions from the Federal district court, until Feb 2001 when Microsoft and Sun settled the case out of court and established a new legal context, Microsoft was effectively blocked from any further program development that was associated with anything called "Java". Yet within this timeframe Microsoft worked out the design and did a lot of initial implementation on the .NET Framework. Any similarity between the Framework and corresponding Java classes is "purely coincidental."

After the settlement, Microsoft was given more freedom to work with the Java language as long as they used no Sun code and did not call anything they did by the "Java" name. At that point it may have been too late for Java to catch up to the ongoing .NET Framework development. Microsoft released an early beta of its J# development system that generates Framework programs using Java syntax.

"Types" not Just "Objects"

Hardware and Complex Types

Every CPU chip has a certain set of primitive data types. Usually they include integers of various sizes (1, 2, 4, and 8 bytes) and floating point numbers of various sizes (4, 8, 16 bytes). The operations on these basic types include arithmetic (add, subtract, multiply, divide) and logical (and, xor, shift, etc.). However, the modern Intel CPU chip includes a number of less well know data types and operations for multimedia processing. They operate on short vectors of numbers and speed up video and audio compression/decompression.

That is as far as the hardware will go. Software can construct new "complex data types" from a structure that contains two or more of the underlying primitive types. For example, most languages have some version of a "Date" data type that can be treated as having separate fields for year, month, day, and often hour, minute, second. Operations on these complex data types are then defined as program logic that performs primitive hardware operations on the component primitive fields in the complex object.

Consider any programming language that allows a variable A to be declared to be of complex type "Date" and to contain the value that we humans instinctively know as "September 28, 2002 at 10:30 AM EST". Now suppose there is another variable named B that is of type "TimeInterval" and has the value that we understand as "6 days, 4 hours, and 15 minutes" (or 8895 minutes if you prefer). Now there is an intuitive meaning to the expression "A+B". It is the value, "October 2, 2002 at 2:45 PM EST", the date and time 6 days, 4 hours, and 15 minutes after the date stored in A.

Some languages allow programmers to define new meanings for the primitive operations like "+" and "-". When the language does not permit this, you can at least define a function with a name like "addTimeInterval()" that takes a TimeInterval argument (like B) and adds it to a Date object like A.

Twenty years ago the Software Engineering community recognized this model. The Department of Defense adopted it as the basis for their new programming language named "Ada". The purpose wasn't just to get a new language. It wasn't just to get a library of reusable software components. The real breakthrough was to characterize this library of reusable components as a library of programmer defined complex data "types" and "operations" that could be performed on these types.

Over the subsequent two decades, we have tended to rename and play around with this idea without really changing it. Object Oriented languages renamed "type" to "class" and introduced a few new ideas like inheritance. However, without some really big force like the DOD pressing everyone to agree on a common programming model, programmers preferred to fight over the idiosyncratic features of their favorite programming language rather than emphasize their common ground.

E Pluribus Unum

The .NET Framework doesn't simply support the execution of programs written in your choice of 20+ different languages. It supports the development of a single program made up of 20 components each written in a different language. It must be possible for a program written in any Framework language to call routines written in any other language. More importantly, programs written in any language need to understand the types, fields, operations, and arguments in reusable library components written in any other language.

Microsoft is big enough in the software industry to command attention. The inherent nature of the problem the Framework set out to solve means that everyone's personal favorite language features are irrelevant. If you prefer to program your particular component in Perl that is up to you. Your decision should have no effect on what language I choose to write my program in, even if I have to use the services of your component.

That means, however, that the documentation of the library of reusable components has to describe its contents without regard the the language any component is written in and without regard to any language that might use it. Microsoft decided to update the original DOD-Ada model and describe code as a library of "data types" and "operations" on those types.

If someone has written a suitable version of the Date and TimeInterval code, including fine points (such as the observation that when a time interval pushes you into late October the EST become EDT and you lose an hour), then a Perl, JavaScript, Visual Basic, or J# program should be able to create objects of these types and perform operations on them. Now one language may call something a "class", another may call it a "type", another may call it a "record", and another may call it a "struct". In the library it is the same thing no matter what syntax the individual language may prefer to use.

Get the H Out

Traditionally, programming language have defined complex data types and functions that operate on them in a special type of source file called a "header" or "include" file. In languages derived from C, they are referred to as *.h files based on the Unix naming convention.

To use these traditional programming techniques, any shared reusable routine stored in a common library would have to create source declarations for its types and operation functions in every programming language that any potential client program could use.

Java also provided a solution to this problem, but Ada was really the first language to come up with the idea. When the library routine is compiled, the system embeds a binary language-independent representation of its data types and operation functions in Metadata that is output by the compiler. Each .NET Framework compiler is then modified to read Metadata from library Assemblies to get the needed definitions rather than requiring language-specific source header versions of the same information.

Constraints

Consider a rather simple data structure that contains a date expressed as month, day, and year. It has to be expressed in some language, so choose Java:

class SimpleDate {
    int month;
    int day;
    int year;
}

If this is compiled and placed in a library, then it defines a data type name "SimpleDate" that has three fields and no operations.

The three fields are all declared to be integers, but the problem imposes constraints on the values that can meaningfully be stored in each field. The "month" field, for example, would have to have a value between 1 and 12. The "day" field would have to be between 1 and 31.

However, the combination of month= 4 and day=31 doesn't work. April 31 breaks the "Thirty days has September, April, ..." rule. The rule for Feb. 29 is even more complicated.

So the underlying problem imposes constraints on the values that can be stored in individual fields, and on the combination of values that fields can have. These constraints can be enforced by defining functions that must be used to set a value in the object.

class SimpleDate {
    int month;
    int day;
    int year;
    void set(int m, int d, int y) {
        // Check range of each argument
        // Enforce "Thirty days has September rule"
        // For February 29, check for leap year
        ...
        month=m; day=d; year=y;
    }

    void tomorrow() {
        // Increment date to next day
    }
}

When this code is compiled and stored in the library, its Metadata defines a type called "SimpleDate". When you create an object of this type, there are two operations defined on that object. The operation "set" takes three integer argument. The operation "tomorrow" takes no arguments.

Obviously a real library routine has to be filled in with code. The tomorrow operation adds one to the day, but wraps at the end of the month and wraps to the next year on Jan. 31.

Java is an object oriented language whose syntax makes it clear that set() and tomorrow() are members of the SimpleDate class. Older languages, however, don't have a syntax to express the idea that functions are members of a data structure. So when this library code is used in an Object Oriented client program, the syntax will look something like:

    SimpleDate independence = new SimpleDate();
    independence.set( 7, 4, 1776)

This code creates an object named "independence" of the type SimpleDate and then calls the set method of the "independence" object to establish a date of July 4, 1776. Older programming languages have functions that appear to be independent and require that "independence" be an argument of the set function. The same statement then looks something like:

    SimpleDate_set( independence, 7, 4, 1776);

The library routine remains the same no matter how simple or difficult the client programming language makes the call.

Reference and Value Types

Java defines a small set of primitive "value"types: byte, short, char, int, long, float, double, and boolean. When you define a variable with one of these types, the system allocates just enough storage to hold the value itself. For example, if you create an array of 100 four byte integers, the space required to hold those 100 values occupies only 400 bytes. Individual value variables are allocated on the program call stack.

In Java, every other data type is a reference type. This means that the variable only contains a "reference" (a pointer) to the value of the variable. The value itself is an object that is dynamically allocated in the "heap" area of memory.

The difference between "value" and "reference" also shows up in the way that arguments are passed to a function. Different programming languages have different conventions. For the Framework to support all languages, it has to support complex value types.

For example, the SimpleDate type example given above could be coded in C# as either a value or a reference type. If it is defined as a value type, then the three integers (month, day, and year) would be allocated on the stack (for local variables and parameters of type SimpleDate defined inside a function) or embedded inside another type (if the variable is a field of some other class or structure).

Value types are simpler and more efficient. However, if a library service requires reference types, then any value type can easily be converted to a corresponding reference type. This is called "boxing". The Framework dynamically allocates memory from the heap that can hold the fields of the type. It then copies the value of the value object into the newly allocated space, and returns a reference to the newly allocated storage.

Interoperability

Although it is possible for any routine written in one .NET programming language to call a routine written in any other language, problems can arise when someone defines a routine that takes arguments of a data type that other languages do not support. For example, a routine that requires an unsigned four byte integer as an argument cannot be called from languages that only support signed integers. Languages that are case sensitive can create variables and functions that cannot be unambiguously referenced by other languages where variable names are case insensitive. The Framework defines a subset of types and a set of conventions that guarantee that your program can be called by any other Framework language. If you go outside that subset, your code may not be available to all programmers.

The Problems Weren't All Solved, They Were Renamed

Every Java programmer has been frustrated by the uncertainties caused by CLASSPATH and the rules for locating libraries of binary class files. The Framework solves none of these problems. By repackaging everything, it exposes the same set of problems with a new set of terms.

Output from the .NET compilers is linked to form an EXE or DLL. Either form of program module contains a bunch of MSIL units created by the compilers, and a composite metadata directory listing all the types, functions, and arguments that the EXE or DLL either defines or uses.

The .NET name for an EXE or DLL is an "Assembly". Now technically an Assembly can contain more than one file. It could contain an EXE and a DLL, or two DLLs, or a DLL and some associated resource data files. You can also go for a swim in Long Island Sound in January; its just not particularly comfortable. Visual Studio.NET can only produce single file assemblies, and that is all you should expect to see under normal circumstances.

When you run an application program, you load one EXE and any DLLs that it references. In Unix, DOS, and traditional Windows, the system searches for the EXE and DLL in directories defined by the PATH environment variable. In recent years, however, it has become more common for Windows applications to be installed as a collection of files in a single directory that is not added to the PATH.

General purpose common library routines are installed with a system utility and are managed by Windows in something called the Global Assembly Cache (GAC). More casual libraries, however, are simply installed in directories on disk.

It is trivial for an application program to find any Assemblies (DLLs) that are in the same directory as the EXE. Any other libraries that the application uses have to listed in a configuration file for the application written in XML and packaged with the EXE. Every executable program has its own "CLASSPATH" defined in its own deployment descriptor XML file. How these deployment descriptor files are created, and how they are properly configured on end user machines, this has become an unresolved problem.

Java and J# Features

Java Platform J#
Java source is compiled to byte code in binary class files. Java source is compiled to MSIL.
Binary class files can be stored individually on disk or combined in a zip archive (often with a ".jar" extension). Compiler output must be linked into EXE or DLL units called "assemblies". Real Java binary class files can be translated to MSIL, but they must be linked into DLL assemblies before they can be used.
Programs execute in a Java Virtual Machine. Programs execute under the CLR.
100% pure Java, although C++ "native" routines can be added to the system libraries. Java can call or be called by routines written in 20+ other languages. Unmanaged COM code can also be called through generate stubs.
Current version of Java is 1.4 with many new libraries supplied by Sun. J# supports emulation of the Java 1.1.4 language libraries, but it can also use any class in the standard .NET Framework library.
Class files are loaded individually when first used from directories and jar files in the CLASSPATH. However, specialized environments can store Class files elsewhere. Oracle's Java stores Class files in the database. The Browser downloads class and jar files from the Web Server. DLL (assemblies) are loaded with the application from the current directory, Global Assembly Cache, and locations specified in an XML application deployment descriptor file. Assemblies can, in the architecture, be loaded from the Web based on a URL.
JSP supports Web Applications that embed units of Java source inside <% %> delimiters in an HTML page. ASP.NET supports Web Applications that embed the source of any .NET language inside <% %> delimiters in an HTML page, but the preprocessor only handles one programming language per page.

Conclusions

Managed code will, in the foreseeable future, run more slowly and require more resources than a well written conventional Windows program. However, memory will always get cheaper and CPUs will always get faster.

Microsoft has made a first good step at fully populating an extensive library of common data types and services. Now that the Framework has been released, market forces will drive other companies to expand the library of reusable code. Meanwhile, programmers have not gotten much smarter over the last few decades. If you want to do the best with the Human Resources that you have at hand, the .NET Framework may provide the excuse and opportunity to make significant upgrades in your campus or corporate Software Engineering procedures. This is the first new offering that can simultaneously offer improved code quality, faster development, and ease of use.

[an error occurred while processing this directive]