Tuesday, October 25, 2011

Understanding the World Wide Web

Understanding the World Wide Web

This tutorial covers the basics of the World Wide Web, focusing on its technical aspects. After all, the Web is a technological phenomenon. Therefore it's useful to understand some of the fundamentals of how it works.

The world wide web is a system of Internet servers that supports hypertext and multimedia to access several Internet protocols on a single interface. The World Wide Web is often abbreviated as the web or www.

The World Wide Web was developed in 1989 by Tim Berners-Lee of the European Particle Physics Lab (CERN) in Switzerland. The initial purpose of the Web was to use networked hypertext to facilitate communication among its members, who were located in several countries. Word was soon spread beyond CERN, and a rapid growth in the number of both developers and users ensued. In addition to hypertext, the Web began to incorporate graphics, video, and sound. The use of the Web has reached global proportions and has become a defining element of human culture in an amazingly short period of time.

In order for the Web to be accessible to anyone, certain agreed-upon standards must be followed in the creation and delivery of its content. An organization leading the efforts to standardize the Web is the World Wide Web (W3C) Consortium. Take a look at the W3C Consortium Web site to get an idea of its activities. A lot of the material is technical because, after all, the Web is a technical phenomenon.
Protocols of the Web

The surface simplicity of the Web comes from the fact that many individual protocols can be contained within a single Web site. internet protocols are sets of rules that allow for intermachine communication on the Internet. These are a few of the protocols you can experience on the Web:


HTTP (HyperText Transfer Protocol): transmits hyptertext over networks. This is the protocol of the Web.


E-mail (Simple Mail Transport Protocol or SMTP): distributes e-mail messages and attached files to one or more electronic mailboxes.
FTP (File Transfer Protocol): transfers files between an FTP server and a computer, for example, to download software. 
VoIP (Voice over Internet Protocol): allows delivery of voice communications over IP networks, for example, phone calls. 

The Web provides a single, graphical interface for accessing these and other protocols. This creates a convenient and user-friendly environment. Once upon a time, it was necessary to know how to use protocols within separate, command-level environments. This meant you needed to know the text commands and type them out to make things happen. The Web is much easier, since it gathers these protocols together into a unified graphical system. Because of this feature, and because of the Web's ability to work with multimedia and advanced programming languages, the Web is by far the most popular component of the Internet.

Hypertext and links: the motion of the Web
The operation of the Web relies primarily on hypertext as its means of information retrieval. HyperText is a document containing words that connect to other documents. These words are called links and are selectable by the user. A single hypertext document can contain links to many documents. In the context of the Web, words or graphics may serve as links to other documents, images, video, and sound. Links may or may not follow a logical path, as each connection is created by the author of the source document. Overall, the Web contains a complex virtual web of connections among a vast number of documents, images, videos, and sounds.

Producing hypertext for the Web is accomplished by creating documents with a language called hypertext markup language, or html. With HTML, tags are placed within the text to accomplish document formatting, visual features such as font size, italics and bold, and the creation of hypertext links.


<p> This is a paragraph that shows the underlying HTML code. <strong>This sentence is rendered in bold text</strong>. <em>This sentence is rendered in italic text.</em> </p>
HTML is an evolving language, with new tags being added as each upgrade of the language is developed and released. Nowadays, design features are often separated from the content of the HTML page and placed into cascading style sheets (css). This practice has several advantages, including the fact that an external style sheet can centrally control the design of multiple pages. The World Wide Web Consortium (W3C), led by Web founder Tim Berners-Lee, coordinates the efforts of standardizing HTML. The W3C now calls the language XHTML and considers it to be an application of the XML language standard.

Pages on the Web
The backbone of the World Wide Web are its files, called pages or Web pages, containing information and links to resources - both text and multimedia - throughout the Internet.

Web pages can be created by user activity. For example, if you visit a Web search engine and enter keywords on the topic of your choice, a page will be created containing the results of your search. In fact, a growing amount of information found on the Web today is served from databases, creating temporary Web pages "on the fly" in response to user searches. You can see an example of such a page below, taken from the search engine Hakia. This page only exists as a result of a search.


Access to Web pages can be accomplished in all sorts of ways, including:

    Entering a Web address into your browser and retrieving a page directly
    Browsing through sites and selecting links to move from one page to another both within and beyond the site
    Doing a search on a search engine to retrieve pages on the topic of your choice
    Searching through directories containing links to organized collections of Web pages
    Clicking on links within e-mail messages
    Using apps on social networking sites or your mobile phone to access Web and other online content
    Retrieving updates via RSS feeds and clicking on links within these feeds.

Retrieving doucuments on the Web: the URL and Domain Name System

url stands for uniform resource locator. The URL specifies the Internet address of a file stored on a host computer, or server, connected to the Internet. Web browsers use the URL to retrieve the file from the server. This file is downloaded to the user's computer, or client, and displayed on the monitor connected to the machine. Because of this relationship between clients and servers, the Web is a client-server network.

Every file on the Internet, no matter what its protocol, has a unique URL. URLs are translated into numeric addresses using the domain name system (dns). The DNS is a worldwide system of servers that stores location pointers to Web sites. The numeric address, called the ip (internet protocol) address, is actually the "real" URL. Since numeric strings are difficult for humans to use, alphneumeric addresses are employed by end users. Once the translation is made by the DNS, the browser can contact the Web server and ask for a specific file located on its site.


For example, 207.46.192.254 is also www.microsoft.com.

Anatomy of a URL


This is the format of the URL:
protocol://host/path/filename


For example, this is a URL from the site of the U.S. Senate of a live video stream sent by a camera pointed at the U.S. Capitol:
http://www.senate.gov/general/capcam.htm


This URL is typical of addresses hosted in domains in the United States. The structure of this URL is shown below.

    Protocol: http
    Host computer name: www
    Second-level domain name: senate
    Top-level domain name: gov
    Directory name: general
    File name: capcam.htm

Note how much information about the content of the file is present in this well-constructed URL.


Several generic top-level domains (gTLDs) are common in the United States:

In addition, dozens of domain names have been assigned to identify and locate files stored on servers in countries around the world. These are referred to as country codes, and have been standardized by the International Standards Organization as ISO 3166.

Additional top-level domain names were approved in 2000 by the Internet Corporation for Assigned Names and Numbers (ICANN): .biz, .museum, .info, .pro (for professionals) .name (for individuals), .aero (for the aerospace industry), and .coop (for cooperatives). Unconventional domain names have been marketed outside of the system, for example, .tv for sites that offer content similar to television broadcasts. In 2011, ICANN decided to open up domain names without restriction, including in any language or written script. The cost of establishing and maintaining a new name is quite expensive - $185,000 for the application fee alone - so the actual effect of this change will be limited.


As the technology of the Web evolves, URLs have become more complex. This is especially the case when content is retrieved from databases and served onto Web pages. The resulting URLs can have a variety of elaborate structures, for example,

http://spills.incidentnews.gov/incidentnews/FMPro?-db=images&-Format=maps.htm
&SpillLink=8&Subject=Waterway%20Closure%20Map&-SortField=EntryDate&
-SortOrder=descend&-SortField=EntryTime&-SortOrder=descend&-Token=8&
-Max=20&-Find

The first part of this URL looks familiar. What follows are search elements that query the database and determine the order of the results. As a growing number of databases serve content to the Web, these types of URLs are appearing more commonly in your browser's address window.
Programming languages and environments


The use of programming languages beyond HTML extend the capabilities of the Web. They are used to write software, process Web forms, fetch and display data, and perform all kinds of advanced functions. It is difficult to talk about these languages without getting into too much technical jargon, but here is an attempt. What follows is a brief guide to some of the more common languages in use on the Web today.


CGI (Common Gateway Interface) refers to a specification by which programs can communicate with a Web server. A CGI program, or script, is any program designed to process data that conforms to the CGI specification. The program can be written in any programming language, including C, Perl, and Visual Basic Script (VBScript). In the early days of the Web, CGI scripts were commonly used to process a form on a Web page. Perl is popular with Google, and is also the language of the Movable Type blog platform.


Active Server Pages (ASP): Developed by Microsoft, ASP is a programming environment that processes scripts on a Web server. The programming language VBScript is often used for the scripting. Lightweight programs can be written with this language. Active Server Pages end in the file extension .asp. For an example, check out Databases and Indexes at the University at Albany Libraries.

.NET framework: Also developed by Microsoft, this development framework is a more powerful one than ASP for writing applications for the Web. Programming languages include C+ and VB.Net. ASP.Net is a related environment, producing pages with the file extensions .aspx. The Microsoft site is a good example of a site created with the .NET framework.


PHP: This is another server-based language. It is frequently the language used to write open source (e.g., nonprofit, community-created) programs found on the Web, including MediaWiki (the software that runs the Wikipedia), and the popular blog software WordPress. While PHP functionality can be installed on Windows servers, it is native to the Linux server environment and commonly used there.


Java/Java Applets: Java is a programming language similar to C++. Developed by Sun Microsystems, the aim of Java is to create programs that will be platform independent. The Java motto is, "Write once, run anywhere." A perfect Java program should work equally well on a Windows, Apple, Unix, or Linux server, and so on, without any additional programming. This goal has yet to be realized. Java can be used to write applications for both Web and non-Web use.


Web-based Java applications are usually in the form of Java servlets. These are small Java programs fetched from within a Web page that can be downloaded from a server and run on a Java-compatible Web browser. A Web page that links to a Java servlet has the file extension .jsp.


JavaScript is a very popular programming language created by Netscape Communications. Small programs written in this language are embedded within a Web page, or fetched externally from within the page, to enhance the page's functionality. Examples of JavaScript include drop-down menus, image displays, and mouse-over interactions. The drop-down menus on the site of the UCLA Library shown below are a good example: when you hover your mouse over the menu item, a set of sub-menus opens up below. 


XML: XML (eXtensible Markup Language) is a mark-up language that enables Web designers to create customized tags to provide functionality not available with HTML alone. XML is a language of data structure and exchange, and allows developers to separate form from content. With XML, the same content can be formatted for multiple applications. In May 1999, the W3 Consortium announced that HTML 4.0 has been recast as an XML application called XHTML.


AJAX stands for Asynchronous JavaScript and XML. This langauge is used to create interactive Web applications. Its premise is that it sends data to the browser behind the scenes, so that when it is time to view the information, it is already "there." Google Maps is a well-known example of AJAX. A different kind of example can be found with SurfWax LookAhead, an RSS search tool that retrieves feeds as you type your search.


SQL (Structured Query Language): This is a language that focuses on extracting data from databases. Programmers write statements called queries that retrieve data from the tables in the database. Some Web sites are created extensively or entirely from data stored in database tables. You can often tell that a SQL query has produced data on a page by the presence of a question mark (?) and a record number in the URL, as the example below illustrates. 



Mashups


Programs on the Web can be flexible. Sometimes they are combined with each other to form ehanced presentations. These are known as mashups.


A mashup is a Web application or Web page that combines data from two or more external sources. Mashups give you access in one place to information available in multiple places.


There are all kinds of mashups on the Web. One example is Earthquakes In The Last week, a mashup derived from data from the U.S. Geological Survey along with Google Maps. Another is Mashpedia, a mashup of the Wikipedia encylopedia along with current information gathered from the social Web.
Last but not least: Applications (apps)


Applications, commonly called apps, are small programs that run within various online environments. These programs allow you to enjoy functionalities that enhance your experience within that environment.


Social networking sites often make use of apps. For example, Facebook is well-known for featuring thousands of apps created by Facebook or outside developers. These apps allow you to play games, shop, form issues-based communities, find family or classmates, etc.


Mobile phones are another environment within which apps are both popular and useful. In fact, no decent mobile phone these days comes without the option to add apps. A good example is the iPhone, which offers hundreds of thousands of apps in all sorts of areas, from work and education to travel, lifestyle, entertainment, and so on. Also take a look at the Android Market site to browse the apps available for the Droid phone. It is safe to say that apps make the mobile phone what it is today.


Apps are a very fast-growing area of the networked experience. Some observers believe that apps will be a focus of developments in the online world in the coming years.