
Harold W. Schranz

Research School of Chemistry,
Australian National University, Canberra, ACT 0200.
Harold.Schranz@anu.edu.au
B. How Big is the Internet?
1. Size Estimates
2. Growth of the Internet
C. Tools for Exploiting the Internet
1. E-mail
2. UseNet (Newsgroups)
3. Telnet
4. FTP & Archie
5. Gopher & Veronica
6. WAIS
7.The World Wide Web
D. Appendix
1. Notes
2. Further Information
3. Postscript
Today's Internet is a global resource connecting millions of users that began as an experiment over 20 years ago by the U.S. Department of Defence. While the networks that make up the Internet are based on a standard set of protocols, the Internet also has gateways to networks and services that are based on other protocols (Fig. 1).

Fig. 1 The Matrix: Internet, BitEARN, FidoNet, UUCP
(
Matrix Inc.)
The concept that the Internet is not a network, but a collection of networks, means little to the end user. You want to do something useful: run a program, or access some unique data. You shouldn't have to worry about how it's all stuck together.
Consider the telephone system--it's an internet, too. Telecom, Optus, Pacific Bell, AT&T, MCI, British Telephony, Telefonos de Mexico, and so on, are all separate corporations that run pieces of the telephone system. They worry about how to make it all work together; all you have to do is dial.
2. The Internet and The Matrix
These categories fit inside each other (see Fig. 2):
the Matrix
includes the Consumer Internet,
which includes the Core Internet.
`It's like those Russian dolls,'' said John S. Quarterman, editor of Matrix News and Matrix Maps Quarterly, ``where you open up Yeltsin and find Gorbachev, and inside Gorbachev is Brezhnev.''

Fig. 2 The Matrix
(
Matrix Inc.)
To find which category fits your site, apply these simple tests:
* If you can send mail to an address in the Matrix, such as
mids@tic.com, you're in the Matrix.
* If you can connect with FTP to ftp.ripe.net, or use Mosaic or Lynx to reach http://www.ripe.net, you are in the Consumer Internet.
* If your computer runs an FTP, Telnet, Gopher, WWW, or other interactive server that users outside your own organisation (your company, university, etc.) can use, then you're in the Core Internet.
Starting 15 October 1994, Matrix Information and Directory Services and Texas Internet Consulting sent survey questionnaires by electronic mail to most of the domains representing organizations on the Internet, and we tabulated responses received through 15 December 1994. They received 1468 usable responses and used them to estimate the sizes of the Internet and the Matrix as of October 1994.
These are estimates, not exact and definitive figures. However, they are based on a large sample of the organizations (companies, universities, governmental agencies, individuals, etc.) on the Internet. Comparing the domain names of the responses with those of the original survey list, they calculated a confidence interval of about 38 percent. These are probably the only estimates of the size of the Internet that have associated any confidence interval at all.
Estimates of size of the Internet as of October 1994:
* Core Internet ~ 7.8 million users (people) of ~ 2.5 million computers
that can provide interactive services such as Telnet (remote login), FTP (file transfer) or WWW (hypertext).
* Consumer ~ 13.5 million users of ~ 3.5 million computers that Internet can use the interactive services supplied by the core Internet (eg. people who can use Mosaic or Lynx to browse the World Wide Web).
* Matrix ~ 27.5 million users who can exchange electronic mail
with other users in the Matrix.
Based on the average of various measures, (which show rates from 55% to 135%) it is estimated that the Internet has been doubling every year and has been growing exponentially since 1988. Each year there are as many new people on the Internet as all the people on the Internet the year before. That's why it is important to cite a date for any estimate of the size of the Internet. ``Here today, lots more tomorrow.'' said Quarterman, ``We'll be measuring it as it grows.'' Extrapolation of current data predicts that the whole planet will be on the Internet by 2003.
The basic concepts behind e-mail parallel those of regular mail. You send mail to people at their particular addresses. In turn, they write to you at your e-mailbox address. You can subscribe to the electronic equivalent of magazines and newspapers (mailing lists). There is even electronic junk mail.
E-mail has two distinct advantages over regular mail:
- speed.
- ability to access databases and file libraries.
E-mail also has advantages over the telephone:
- You send your message when it's convenient for you.
- Your recipient responds at their convenience.
- E-mail across the country or around the world is far cheaper.
Electronic mail acts as a personal connection to the world of the Internet.
The basic building block of Usenet is the newsgroup, which is a collection of messages with a related theme (on other networks, these would be called conferences, forums, bboards or special-interest groups).
There are now more than 4,500 of these newsgroups.
Some systems let you compile your own "reading list" so that you only see messages in conferences you want.
Newsgroup names start with one of a series of broad topic names. For example, newsgroups beginning with "comp." are about particular computer-related topics. These broad topics are followed by a series of more focused topics (so that comp.unix groups are limited to discussion about Unix).
Telnet: A program that allows remote login to internet resources. One step beyond electronic mail is the ability to control a remote computer using Telnet. This feature lets you virtually teleport anywhere on the network and use resources located physically at that (often physically very remote) host. Further, some hosts have gateways to other hosts, which have further gateways to still more hosts.
FTP or File Transfer Protocol is what to use to retrieve/archive a text file, software, or other item from/on a remote host. Normal practice in downloading/uploading to a public archive site is to ftp to the host you want and login as "anonymous". Some sites use the password "guest" while others require that you put in your network address as the password.
Archie is a collection of resource discovery tools that together provide an electronic directory service for locating information in an Internet environment. Originally created to track the contents of anonymous ftp archive sites, the archie service is now being expanded to include a variety of other online directories and resource listings."
Currently, archie tracks the contents of over 800 anonymous FTP archive sites containing some 106 files throughout the Internet. Collectively, these files represent well over 50 Gigabytes of information, with additional information being added daily. Anonymous ftp archive sites offer software, data and other information which can be copied and used without charge by anyone with connection to the Internet.
Gopher: A gopher (or go-fer) is someone who fetches necessary items from many locations. Gopher is one of the best ways to locate information on and in the Internet.
Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerised Archives): offers a keyword search of most gopher-server menus in the entire gopher web. As Archie is to ftp archives, Veronica is to gopherspace. The result of a Veronica search is an automatically-generated gopher menu, customised according to the user's keyword specification. Items on this menu may be drawn from many gopher servers. Unlike Archie, the search results can connect you directly to the data source.
WAIS (Wide Area Information Servers) (pronounced ways) allows users to get information from a variety of hosts by means of a "client". The user tells the client, in plain English, what to look for out in dataspace. The client then searches various WAIS servers around the globe. The user tells the client how relevant each hit is, and the client can be sent out on the same quest again and again to find new documents. Client software is available for many different types of computers.
The World Wide Web (WWW), a project initiated by CERN, is a client-server network-based document delivery system (see Fig. 3) that links computers world-wide.

Fig. 3 The client-server network paradigm
The WWW world consists of documents and links. Indexes are special documents which, rather than being read, may be searched. The result of such a search is another ('virtual') document containing links to the documents found. The Web contains documents in many formats. Those documents which are hypertext, (real or virtual) contain links to other documents, or places within documents. All documents, whether real, virtual or indexes, look similar to the reader and are contained within the same addressing scheme. To follow a link, a reader clicks with a mouse (or types in a number if he or she has no mouse). To search and index, a reader gives keywords (or other search criteria). These are the only operations necessary to access the entire world of data.
A single copy of a set of files (text, graphics, animations) can be shared across the Internet to multiple users by setting up a Web-server. In order to access the information one needs a computer which can run a Web-client program. The WWW browsers can access many existing data systems via existing protocols (FTP, NNTP) or via HTTP (hypertext transfer protocol) and a gateway. Providing information is as simple as running the Web-server and pointing it at an existing directory structure. The server automatically generates the a hypertext view of your files to guide the user around.
Furthermore, any file available by anonymous FTP, or any internet newsgroup can be immediately linked into the WWW. The very small start-up effort is designed to allow small contributions.At the other end of the scale, large information providers may provide an HTTP server with full text or keyword indexing. This may allow access to a large existing database without changing the way that database is managed. Such gateways have already been made into Oracle(tm), WAIS, and Digital's VMS/Help systems, to name but a few. The WWW model gets over the frustrating incompatibilities of data format between suppliers and reader by allowing negotiation of format between a smart browser and a smart server. This should provide a basis for extension into multimedia, and allow those who share application standards to make full use of them across the WWW.
The popularity of the WWW is demonstrated by Figs. 4 and 5 which illustrate recent traffic (January-March 1995) on the RSC WWW Server.

Fig. 4 Daily traffic on the RSC WWW Server (curve is smoothing of raw traffic data)

Fig. 5 Number of files sent to each network domain for period in Fig. 4
* Information for this article was gleaned (purloined) from various sources (e.g. CERN, NSCA, MIDS) encountered in a quasi-random walk over the Internet.
* Estimates of the size and growth of the Internet are based on investigations by Matrix Information and Directory Services (MIDS). John S. Quarterman is editor of the MIDS publications, Matrix News and Matrix Maps Quarterly.
* John S. Quarterman and Smoot Carl-Mitchell, The E-Mail Companion: Communicating Effectively via the Internet and Other Global Networks, 1994.
* John S. Quarterman, The Matrix: Computer Networks and Conferencing Systems Worldwide, 1990.
* Glenn Ricart, The Mosaic Internet Browser, Computers in Physics, p. 249, Vol. 8, No. 3, May/Jun 1994.
* J. H. Krieger and D. L. Illman, Internet Offers Alternative Ways for Chemists to Hold Conferences, p. 29, C&EN December 12, 1994.
An on-line HTML version of this article is available at
<URL=" http://rsc.anu.edu.au/~harry/WWW/overview.html">
Room E109, Ext: 3773