HTTP Introduction

What’s HTTP

HTTP is a protocol that is used for the communication in world wide web. For example, when you try to google something, you need to enter http://www.google.com in your web browser address bar. Then your browser start talking to the server of google on behalf of you by following the rules of http protocol.

HTTP client server

HTTP is an application protocol in the OSI seven layers model. So, it’s the protocol for application-to-application communication. For the case of accessing google website, it’s the protocol that enables the communication between client web browser and google server (apache, nginx). Until now, there are three versions of HTTP protocols, they are HTTP1.0, HTTP1.1 and HTTP2.0. This article will mainly focus on version 1.0 and 1.1.

The official documents of HTTP1.1 can be found here. (Afraid? XXD)

Characteristics

  • Connectionless
  • Stateless

HTTP is connectionless, because for each http request/response pair, a new tcp connection is set up (HTTP1.0). After that, the tcp connection is closed. Since HTTP1.1, the tcp connection is allowed to be reused by multiple request/response pair.

HTTP is stateless, because there is no link between two requests. But we can use session and cookies to keep the state or context of user interactions.

Comparing HTTP1.0 with HTTP1.1

  1. HTTP1.1 allows http connection to be reused(persistent connection) by multiple request/response pairs. In http1.0, you have to open a new http connection for each request/response pair.
  2. HTTP1.1 Uses Etag to replace If-Modified-Since
  3. HTTP1.1 Supports chunk transfer

Request message format

Let’s take a look at a typical example of http request:

1
2
3
4
5
6
7
8
9
10
11
GET /path/to/resource HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.kelvin.ink
Content-Type: text/xml; charset=utf-8
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive

<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://hello.com/">string</string>

A http request is composed of three parts: request line, header and request body. Where


Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Where SP represents space and CRLF represents carrige return line feed.

HTTP Request

The following figure shows the detail format of a HTTP datagram. Method can be one of GET, PUT, POST, DELETE .etc. And for the header field, you can specify some basic info such as host, Accept-Language, Content-Type for the request. Other fields are for advanced users, for example, if you want to crawl web content with spider, you can frequently change your User-Agent to pretend to be a normal human. You may also want to keep the connection alive by specifying the Connection field.

HTTP Request

After constructing the request message, we establish a TCP connection and send the message to the server, and then receive a HTTP response from it. We start to talk about the format of HTTP response in next section.

Methods? Explain Them!

It’s highty recommend to read through the RFC doc of HTTP request methods.

Response message format

The format of HTTP response is similar to that of HTTP request.

Let’s checkout a typical http response first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
HTTP/1.1 200 OK
Date: Sun, 30 Jun 2019 14:55:19 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2019-06-30-14; expires=Tue, 30-Jul-2019 14:55:19 GMT; path=/; domain=.google.com
Set-Cookie: NID=186=Igr9gPUb8HL-S-iDsDtkEqrqPdEsWE19BA4R-EZ3gUeCnbE1tdG1t0irtfiiEOL7ZenA3ukMB7l4qG9TBwcKXrra7GLPmMpuShtKWaCrH1nnGJbqysRB8mvtPcAp9LC4nAmuYU6xG78FAnkNCaAOKrIiSqi7rseO9_w4JPBjW8Y; expires=Mon, 30-Dec-2019 14:55:19 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="zh-HK"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/br ..........

A http response include three major components: status line, header, reponse body. Where


Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

HTTP Response

Basically, fields in a HTTP response tells us about the status of the http connection, information about the server, and some useful info for further access. For example, Status-Code tells us the healthiness of the connection, Server field tells us the identity of response server, and Set-Cookie sets a cookie on client side for later access.

Conclusion

We have talked about HTTP1.0 and HTTP1.1. What they are and for what purpose. We have also compared the differences between the two. But we haven’t discussed HTTP2.0 yet, this may be included in other posts or be expanded in this post later. Thanks!