Raw HTTP Web Requests
Author: Carl Sassenrath
There are many examples that show how to use REBOL to read pages from the web using HTTP. For example to read a text (HTML) page:
and to read binary data, such as an image:
and, you can even run REBOL programs from sites with:
However, it is educational to also access web sites directly using TCP (the lower level Internet protocol). When you access a site this way, you can also see the hidden HTTP header information that is being returned from a server. It can be quite interesting.
To do a "raw" read of a web page, you can use a script like this. Just copy and paste this into your editor or into the REBOL console and run it.
This code opens the connection to the web server at www.rebol.com on port 80 (the standard web port). It then sends an HTTP GET command (using INSERT). This command instructs the web server to fetch the home page for the web site. In addition, the command includes the host name of the server (very important for virtual web sites), the name of the accessing program (REBOL), and that the connection should be closed after the command has been processed. The results are then printed to the console.
When running this example, be sure not to put any extra spaces before the lines in the GET command string, or you may get an error message back from the web server.
If all goes well, you'll get a result something like this:
What you are seeing is the HTTP header followed by the HTML web page contents (for www.rebol.com). The first line of the header tells you that the request was processed ok (it had no errors or warnings). Most of the rest of the header is fairly obvious, but if you want to understand the details, you can find more in documents like the HTTP 1.1 Header at the W3.org site.
To get the page contents only, add this line:
Then you can save the HTML to a file with
If the web page you fetch contains a REBOL script, you can actually execute it directly. Here is an example:
This will run the standard REBOL speed test.