Introduction to HTTP
The HTTP protocol is an abbreviation for the Hyper Text Transfer Protocol (Hypertext Transfer Protocol), which is used to transfer hypertext from the World Wide Web server to the local browser. HTTP is a TCP/IP communication protocol based on the transfer of data (HTML files, picture files, query results, etc.).
HTTP is an application-level object-oriented protocol, because of its simple, fast way, suitable for distributed hypermedia information system. It was put forward in 1990, after several years of use and development, have been constantly improved and expanded. Currently in the WWW is the use of HTTP / 1.0 sixth edition, HTTP/1.1 standardization work is ongoing, and HTTP-NG (Next Generation of HTTP) recommendations have been proposed.
The HTTP protocol works on the client-server architecture. The browser as HTTP client through the URL to the HTTP server that is the WEB server to send all the requests. The Web server sends a response message to the client based on the received request.
main feature
- simple and fast: customers request services to the server, just send the request method and path. Request method commonly used GET, HEAD, POST. Each method specifies the type of customer contact with the server. As the HTTP protocol is simple, making the HTTP server program size is small, so the communication speed is very fast.
- flexible: HTTP allows the transmission of any type of data object. The type being transferred is marked by Content-Type.
- no connection: no connection means that each connection is limited to only one request. The server processes the customer’s request and receives a client’s response. In this way can save the transmission time.
- stateless: HTTP protocol is stateless protocol. Stateless means that the protocol has no memory for transaction processing. A lack of state means that if subsequent processing requires the preceding information, it must be retransmitted, which may result in an increase in the amount of data transferred per connection. On the other hand, the response is faster when the server does not need the previous message.
- support B/S and C/S mode.
HTTP URL
HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections. A URL is a special type of URI that contains enough information to find a resource URL, the full name is UniformResourceLocator, is used to identify the Internet on a resource address. To the following URL, for example, to introduce the various parts of the ordinary URL:
http://www.ftpshop.com:8080/news/index.asp?cateID=9&ID=12321&page=1#product
From the above URL we can see that a complete URL includes the following parts:
- Protocol part: The protocol part of the URL is “http:”, which means that the page is using the HTTP protocol. In the Internet can use a variety of protocols, such as HTTP, FTP, etc. In this case is used in the HTTP protocol. The “//” after “HTTP” is a delimiter
- Domain part: The URL of the domain part of the “www.ftpshop.com”. In a URL, you can also use an IP address as a domain name
- Port part: followed by the domain name is the port, domain name and port between the use of “:” as a delimiter. The port is not a required part of the URL. If the port part is omitted, the default port will be used
- Virtual directory part: from the domain name after the first “/” to the last “/” so far, is the virtual directory part. The virtual directory is not a part of the URL. The virtual directory in this case is “/news/”
- File name part: from the domain name after the last “/” to “?” So far, is the file name part, if there is no “?”, It is from the domain name after the last “/” to “#” , Is the file part, if there is no “?” And “#”, then from the domain name after the last “/” to the end, are part of the file name. The file name in this example is “index.asp”. The file name part is not a part of the URL, if you omit the part, then use the default file name
- Anchor part: from “#” to the last, are anchor part. The anchor part in this example is “name”. The anchor part is not a part of the URL
- Part of the parameters: from “?” To “#” as part of the parameters between the part, also known as the search part of the query part. The parameter part in this example is “cateID=9&ID=12321&page=1”. Parameters can allow multiple parameters, between parameters and parameters with “&” as a delimiter.
The difference between URI and URL
URI, is uniform resource identifier, unified resource identifier, used to uniquely identify a resource. Each resource available on the Web, such as HTML documents, images, video clips, programs, etc., is a URI to locate the URI is generally composed of three:
- access to the naming mechanism of resources
- the host name of the resource
- the name of the resource itself, by the path that focus on the resources.
URL is uniform resource locator, unified resource locator, it is a specific URI, that URL can be used to identify a resource, but also specify how to locate this resource. URL is used on the Internet to describe the information resources of the string, mainly used in a variety of WWW client programs and server programs, especially the famous Mosaic. URL can be used in a unified format to describe a variety of information resources, including the file, the server’s address and directory. The URL is usually composed of three parts:
- agreement (or service mode)
- the host IP address of the resource (and sometimes the port number)
- the specific address of the host resource. Such as directories and file names
URN, uniform resource name, unified resource naming, is through the name to identify resources, such as mailto: java-net@java.sun.com.
URI is an abstract, high-level concept to define a unified resource identity, and URL and URN is a specific way of identifying the resource. Both the URL and the URN are a URI. In general terms, each URL is a URI, but not necessarily every URI is a URL. This is because the URI also includes a subclass, the Uniform Resource Name (URN), which names the resource but does not specify how to locate the resource. The above mailto, news, and isbn URIs are examples of URNs.
In a Java URI, a URI instance can represent either absolute or relative, as long as it conforms to the URI’s syntax rules. The URL class is not only in line with semantics, but also contains the information to locate the resource, so it can not be relative. In the Java class library, the URI class does not contain any way to access the resource, its only role is to parse. Instead, the URL class can open a stream that reaches the resource.
HTTP request message
The client sends an HTTP request to the server’s request message in the following format:
Request line, request header, blank line and request data.
The request line begins with a method symbol, separated by a space, followed by the requested URI and the version of the protocol.
Get request example:
POST request example:
HTTP response message
In general, the server receives and processes a request from the client to return an HTTP response message. HTTP response is also composed of four parts, namely: status line, message header, blank line and response to the text.
HTTP status code
The status code consists of three digits, and the first number defines the categories of responses, divided into five categories:
1xx: Indicates the message – indicates that the request has been received and continues processing
2xx: success – indicates that the request has been successfully received, understood, accepted
3xx: Redirect – There is a need for further action to complete the request
4xx: client error – request has a syntax error or request can not be made
5xx: Server-side error – The server failed to implement a valid request
Common status code:
200 OK // client request successful
400 Bad Request // client request syntax error, can not be understood by the server
401 Unauthorized // request is not authorized, this status code must be used together with the WWW-Authenticate header domain
403 Forbidden // The server received the request but refused to provide the service
404 Not Found // request resource does not exist, eg: entered the wrong URL
500 Internal Server Error // server unexpected error
503 Server Unavailable // The server can not currently handle the client’s request , After a period of time may return to normal
HTTP request method
According to the HTTP standard, HTTP requests can use multiple request methods.
HTTP1.0 defines three request methods: GET, POST, and HEAD methods.
HTTP.1.1 adds five request methods: OPTIONS, PUT, DELETE, TRACE, and CONNECT methods.
GET requests the specified page information and returns it to the entity body.
HEAD is similar to the get request, except that there is no specific content in the returned response, used to get the header
POST to submit data to the specified resource for processing requests (such as submitting a form or uploading a file). The data is included in the request body. POST requests may result in the creation of new resources and / or changes to existing resources.
PUT replaces the contents of the specified document from the data that the client sends to the server.
DELETE Requests the server to delete the specified page.
The CONNECT HTTP / 1.1 protocol is reserved for a proxy server that can change the connection to a pipe.
OPTIONS allows the client to view the performance of the server.
TRACE echoes the requests received by the server, primarily for testing or diagnostics.
HTTP works
The HTTP protocol defines how the Web client requests a Web page from a Web server and how the server sends the Web page to the client. The HTTP protocol uses a request / response model. The client sends a request message to the server, which contains the requested method, URL, protocol version, request header, and request data. The server responds with a status line that includes the version of the protocol, the success or error code, the server information, the response header, and the response data.
Here are the steps for HTTP requests / responses:
1, the client connects to the Web server
An HTTP client, usually a browser, establishes a TCP socket connection with the Web server’s HTTP port (default 80). For example, http://securityonline.info
2, send an HTTP request
Through the TCP socket, the client sends a text request message to the Web server. A request message consists of the request line, the request header, the blank line and the request data.
3. The server accepts the request and returns an HTTP response
The Web server resolves the request, locating the request resource. The server writes the resource copy to the TCP socket and is read by the client. A response consists of the status line, the response header, the blank line, and the response data.
4, release the connection TCP connection
If the connection mode is close, the server will automatically shut down the TCP connection, the client will close the connection passively, release the TCP connection; if the connection mode is keepalive, the connection will remain for a period of time during which the request can continue to be received;
5, the client browser to resolve HTML content
The client browser first parses the status line and looks at the status code indicating whether the request was successful. And then parse each response header, the response header informs the following as the number of bytes of the HTML document and the document’s character set. The client browser reads the response data HTML, formats it according to the syntax of the HTML, and displays it in the browser window.
For example: type the URL in the browser address bar, press the carriage return will experience the following process:
1, the browser to the DNS server request to resolve the URL in the corresponding domain name IP address;
2, resolve the IP address, according to the IP address and the default port 80, and the server to establish a TCP connection;
3, the browser sends an HTTP request to read the file (the file corresponding to the URL in the URL after the domain name), the request message is sent to the server as the third message of the TCP three-way handshake;
4, the server to respond to the browser request, and the corresponding html text sent to the browser;
5, the release of TCP connection;
6, the browser will be the html text and display content.