Raw HTTP Web Requests

Author: Carl Sassenrath
Return to REBOL Cookbook

There are many examples that show how to use REBOL to read pages from the web using HTTP. For example to read a text (HTML) page:


    page: read http://www.rebol.com/rebolintro.html

and to read binary data, such as an image:


    image: read/binary http://www.rebol.com/graphics/reblets.jpg

and, you can even run REBOL programs from sites with:


    do http://www.rebol.com/speed.r

However, it is educational to also access web sites directly using TCP (the lower level Internet protocol). When you access a site this way, you can also see the hidden HTTP header information that is being returned from a server. It can be quite interesting.

To do a "raw" read of a web page, you can use a script like this. Just copy and paste this into your editor or into the REBOL console and run it.


    REBOL [title: "Raw HTTP Read"]

    port: open tcp://www.rebol.com:80
    insert port {GET / HTTP/1.1
    Host: www.rebol.com:80
    User-Agent: REBOL/Core
    Connection: close

    }
    result: copy port
    close port
    print result

This code opens the connection to the web server at www.rebol.com on port 80 (the standard web port). It then sends an HTTP GET command (using INSERT). This command instructs the web server to fetch the home page for the web site. In addition, the command includes the host name of the server (very important for virtual web sites), the name of the accessing program (REBOL), and that the connection should be closed after the command has been processed. The results are then printed to the console.

When running this example, be sure not to put any extra spaces before the lines in the GET command string, or you may get an error message back from the web server.

If all goes well, you'll get a result something like this:


    HTTP/1.1 200 OK
    Date: Mon, 19 Jul 2004 16:18:20 GMT
    Server: Apache
    Last-Modified: Sat, 10 Jul 2004 17:29:19 GMT
    ETag: "1d0325-2470-40f0276f"
    Accept-Ranges: bytes
    Content-Length: 9328
    Connection: close
    Content-Type: text/html

    <HTML>
    <HEAD>
    ... the rest of the home page...

What you are seeing is the HTTP header followed by the HTML web page contents (for www.rebol.com). The first line of the header tells you that the request was processed ok (it had no errors or warnings). Most of the rest of the header is fairly obvious, but if you want to understand the details, you can find more in documents like the HTTP 1.1 Header at the W3.org site.

To get the page contents only, add this line:


    page: find result "^/^/"

Then you can save the HTML to a file with


    write %home.html page

If the web page you fetch contains a REBOL script, you can actually execute it directly. Here is an example:


    REBOL []

    port: open tcp://www.rebol.com:80
    insert port {GET /speed.r HTTP/1.1
    Host: www.rebol.com:80
    User-Agent: REBOL/Core
    Connection: close

    }
    result: copy port
    close port
    do find result "^/^/"

This will run the standard REBOL speed test.

No Redirection or Proxy

A final note: The simple example above does not process redirected web pages or deal with other types of responses from the web server, nor does it handle proxy servers. If you need to do those things, just use the http:// scheme that was shown at the top of this document.


	2006 REBOL Technologies REBOL.com REBOL.net