Chapter 19
URLs and JavaScript
A Crash Course in URLs
JavaScript features several properties and methods related to URLs. Before discussing
JavaScript’s support, a general description of URLs is in order.
A URL is a Uniform Resource Locator, a standard way to specify the location of an
electronic resource. Its definition is derived from concepts introduced by the World Wide
Web Global Information Initiative; it has been in use since 1990. URLs make Internet
resources available to different Internet protocols. When surfing the net, you often run
into URLs in your browser’s “location” box. Such URLs usually start with
“http:”, but other protocols such as FTP and Gopher are also supported. Even e-mail
addresses can be specified as URLs.
A URL is a very convenient, succinct way to direct people and applications to a file or
other electronic resource.
General URL Syntax
In general, URLs are written as follows:
<scheme>:<scheme-specific-part>
A URL includes the name of the scheme being used, followed by a colon and a string. The
characters supported as schemes are lowercase letters, “a” to “z”, and the
characters plus (“+”), period (“.”), and hyphen (“-”). For resiliency,
programs should treat uppercase letters as lowercase ones. For example, both HTTP and http
should be accepted. Examples of schemes are “http,” “ftp,” “gopher,” and
“news.” The scheme instructs the application or person how to treat that specific
resource.
Most schemes include two different types of information:
- the Internet machine where the resource resides
- the full path to that resource
Such schemes are usually separated from the machine address by two slashes (“//”),
whereas the machine address is separated from the full path via only one slash (“/”).
Therefore, the common format is:
scheme://machine.domain/full-path-of-file
As an exercise, let’s take a look at a simple URL:
http://www.geocities.com/SiliconValley/9000/index.html
The URL’s scheme is “http,” for the HyperText Transfer Protocol. The Internet
address of the machine is “www.geocities.com,” and the path to the specific
file is “SiliconValley/9000/index.html.” You will find that the path portion
sometimes ends with a slash. This indicates that the path is pointing to a directory
rather than a file. In this case, the server returns either a directory listing of all the
files or a default file, if one is available. The default filename is either “index.html”
or “home.html,” but other variants are also used.
The URL Schemes
HyperText Transfer
Protocol (HTTP)
HTTP is the Internet protocol specifically designed for use with the World Wide Web,
and therefore is most often seen by Web surfers. Its general syntax is:
http://<host>:<port>/<path>?<searchpart>
The host is the Internet address of the WWW server, such as www.geocities.com,
and the port is the port number to connect to. In most cases the port can be
omitted along with the colon delimiter, and it defaults to the standard “80.” The path
tells the server which file is requested. The searchpart is very important. It may
be used to pass information on to the server, often to an executable CGI script. It can
also be referenced by other languages, including JavaScript as you will soon find out.
Another frequently used character is the pound sign (“#”). It is used for referencing
a named anchor. Anchors are often used on Web pages to enable linking from one section of
the page to another one.
File Transfer Protocol
(FTP)
FTP is commonly used for distributing and transmitting files over the Internet. Its
general syntax is:
Ftp://<user>:<password>@<host>:<port>/<cwd1>/<cwd2>/.../
<cwdN>/<name>;type=<typecode>
When contacting a site providing anonymous login, the user and password
may be omitted, including the separating colon and the following at symbol. The host
and port are exactly the same as in the HTTP URL specification. The “<cwd1>/<cwd2>/.../<cwdN>”
refers to the series of “change directory” (cd in Unix) commands a client
must use to move from the main directory to the directory in which the desired file
resides. Since most servers use Unix operating systems, you can print the working
(current) directory by typing pwd at the command line. The name is the desired
file’s full name, as it is recognized by the operating system. The portion “;type=<typecode>”
allows you to specify the transmission mode (ASCII vs. binary). Most systems are not
configured to work properly with this trailing specification, and some are even misled by
it.
Gopher Protocol (Gopher)
The Gopher protocol is not important for JavaScript scripters. Its syntax is very
similar to HTTP’s:
gopher://<host>:<port>/<gopher-path>
Electronic Mail (Mailto)
The Mailto URL scheme is different from the previous three schemes in that it does not
identify the location of a file but rather someone’s e-mail address. Its syntax differs
widely as well:
mailto:<account@site>
The account@site is the Internet e-mail address of the person you wish to mail
to. Most WWW browsers, including the leaders, Navigator and IE, support this scheme when
encoded in an HTML document.