Simple TCP example: HTTP web page transfer

Carl Sassenrath, CTO
REBOL Technologies
19-Apr-2008 17:58 GMT

Article #0129
Main page || Index || Prior Article [0128] || Next Article [0130] || 6 Comments || Send feedback

(Updated 21-Apr-2008)

Here is a really simple example of how to do low level TCP networking to fetch a web page from a server using the HTTP protocol.

Normally, to use HTTP you would use a simple line like:

data: read

It would handle things like redirection and HTTP errors. But, for our example below, we will show how to do low-level asynchronous TCP.

Important notes...

To begin, let me point out a few important things, then I will give you the full script, then I will explain how it works.

  1. The recent Unicode installment requires that we use binary for network transfers. Why? Because strings can be encoded (such as in UTF-8 format), but don't worry. It's not hard to handle them.
  2. This code is tested under the lastest alpha release (2.100.5 - coming soon to servers near you). It should work on older versions, but has not been tested.
  3. The transfer is asynchronous by default. In other words, when you ask to read data from the network, the read function returns immediately, and a handler function will be called whenever new data arrives.

The code...

Ok, so here is the full script, all 50 lines of it:

REBOL [Title: "Tiny HTTP Reader"]

read-http: func [
    "Perform an HTTP transfer"
    url [url!]
    /local spec port
    spec: decode-url url
    spec/2: to-lit-word 'tcp
    port: open spec

    port/awake: func [event] [
        ;print ["Awake-event:" event/type]
        switch/default event/type [
            lookup [open event/port]
            connect [send-http-request event/port]
            wrote [read event/port]
            read  [
                print ["Read" length? event/port/data "bytes"]
                read event/port
            close [return true]
        ] [
            print ["Unexpected event:" event/type]
            close event/port
            return true
        false ; returned

send-http-request: func [port] [
    write port to-binary ajoin [
        "GET " port/spec/path " HTTP/1.0" crlf
        "Host: " port/spec/host crlf

print "reading..."
rp: read-http
wait [rp 10]
close rp
print to-string copy/part rp/data 4000

How it works...

The example starts with the main read function:

read-http: func [...

It decodes a URL into a block that we use to open a port:

spec: decode-url url
spec/2: to-lit-word 'tcp
port: open spec

You can add a probe to see what spec is:

[scheme: 'http host: "" path: "/builds/"]

The second value in the block is the scheme type, which we must change from HTTP to TCP for the port to be opened by the TCP scheme (a native device.)

The open function accepts several different types of specifications. Our spec block is just one possibility.

Also keep in mind that the port does not open immediately. When open returns, it only means that the port was created, but not that the TCP connection (socket) has been made.

The fun stuff starts with the line:

port/awake: func [event] [

This defines the port awake handler that will receive TCP events during the transfer. Each time an event arrives, the awake handler will be called. It is a callback function.

This line dispatches the event, based on its type:

switch/default event/type [

For our TCP example, we care about these types of events:

  • lookup - when the host address has been found by DNS.
  • connect - when we have connected to the host (web server). At that point, we send our request. More on that below.
  • wrote - when the write has done. Now, we start to read data.
  • read - a packet has been received, save its data, and ask for the next one.
  • close - the port has closed. Returns TRUE to cause an event.

If the event does not match those, then we assume it is an error, and we print a message and terminate the transfer.

Note that the awake function returns either TRUE or FALSE. This is important. A TRUE value is used to say "we are done." It will cause wait to return, as I will show below.

We send the request to the server with the function:

send-http-request: func [port] [

This function builds a little HTTP header. It's about as simple as it can be. We use ajoin to build a string and then we must call to-binary to convert it to binary. The default is UTF-8 encoding. (Other encodings can be done too, but more on that separately.)

You may notice we build an HTTP 1.0 header. You could use a 1.1 header, but if you do that, the TCP socket (the connection) may remain open, and the only way to know you are done is to parse the HTTP reply header. Since we don't want to do that for this little example, we use HTTP 1.0. It closes the socket after the reply has been sent.

Finally, we are ready to make it happen:

rp: read-http
wait [rp 10]
close rp

This calls our new function with a URL to read, and it returns a port (rp). We then use that port to wait for the read to finish or for a timeout to occur. Here the timeout is 10 seconds. Then, we close the port. This is done to force any cleanup of the TCP/IP stack state.

And, finally, we print the result with:

print to-string rp/data

We use to-string to decode the page contents to text. It assumes UTF-8 Unicode encoding. But, of course, other encodings are possible too.

Ok, so when you run the code, you'll see a progress report, then the HTTP page contents:

Evaluating: web-net.r
Read 252 bytes
Read 1964 bytes
Read 5136 bytes
Read 9768 bytes
Read 15860 bytes
Read 23412 bytes
Read 32424 bytes
Read 42896 bytes
Read 54828 bytes
Read 68220 bytes
Read 83072 bytes
Read 99384 bytes
Read 117156 bytes
Read 136388 bytes
Read 156035 bytes
HTTP/1.1 200 OK
Date: Sat, 19 Apr 2008 17:50:43 GMT
Server: Apache
Last-Modified: Fri, 04 Apr 2008 23:28:12 GMT
ETag: "1ec06f-4bc3-78b87b00"
Accept-Ranges: bytes
Content-Length: 19395
Connection: close
Content-Type: text/html; charset=UTF-8

Notice that the HTTP header shows the content type is UTF-8. So, the to-string decoding did the right thing.

Voila! There's the TCP example in R3. Yes, it's just a start, but you can build on it. For instance, you may want to parse the HTTP header to determine the Content-Length and Content-Type.

Have fun.


Updated 24-Jun-2024 - Edit - Copyright REBOL Technologies -