Web Architecture & Technologies

HTTP protocol

Hypertext Transport Protocol (HTTP) is a communication protocol between distributed, collaborative, and hypermedia systems that is the basis for the World Wide Web exchange information. HTTP is a standard (TCP) for client-side (user) and server (site) requests and responses. By using a Web browser, web crawler, or other tool, the client initiates an HTTP request to the specified port on the server (the default port is 80), calling the client as a user agent. Typically, an HTTP client initiates a request to create a TCP connection to the server specified port (the default port is port 80). The HTTP server listens to the client’s request on that port. Upon receipt of the request, the server returns a status to the client, such as “HTTP/1.1 200 OK” and the returned content (requested file, error message, or other information).

HTTP method

(1) OPTIONS: This method allows the server to return all the HTTP request methods supported by the resource. Use ‘*’ instead of the resource name to send an OPTIONS request to the Web server to test whether the server function is working properly.
(2) HEAD: As with the GET method, it is a request to the server to issue a specified resource. But the server will not return the part of the resource. The advantage of this is that you can use this method to get information about the resource (meta information or metadata) without having to transmit the entire contents
(3) GET: issue a “display” request to the specified resource The The use of the GET method should only be used to read data, and should not be used to produce “side effects” of the operation, such as in the Web Application, one of the reasons is that GET may be random access to network spiders.
(4) POST: Submit data to the specified resource, request the server to process (such as submitting a form or uploading a file). The data is included in the request for this article. This request may create new resources or modify existing resources, or both.
(5) PUT: upload the latest content to the resource location.
(6) DELETE: request the server to delete the Request-URI identified by the resources.
(7) TRACE: echo the server received the request, mainly for testing or diagnosis.
(8) CONNECT: HTTP / 1.1 protocol reserved for the connection can be changed to the way the proxy server. Commonly used for SSL encryption server links (via unencrypted HTTP proxy servers).
The following is a typical HTTP GET request:

GET / HTTP/1.1

User-Agent: Mozilla/4.0(compatible; MSIE 8.0; Windows NT 6.0)
Host: securityonline.info
Accept: text/html
Accept-Language: en
Accept-Rncoding: gzip,deflate
Cache-Control: no-cache
Cookie: SessionId= sfksfnkasdfjsfnsafsfkdsafjafassl1442
Connection: Keep-Alive

Where the first row of the request consists of three space-separated items, the request method, the request path, and the HTTP version. Some other common headers are as follows.
(1) The Referer header is used to indicate the original URL that issued the request.
(2) The User-Agent header provides information about the client software of the browser or other requests.
(3) The Host header is used to specify the host name in the full URL that appears to be accessed.
(4) The cookie header is used to submit other parameters that the server publishes to the client.

HTTP status code

HTTP Status Code (HTTP Status Code) is used to represent the page server HTTP response status of the 3-bit column code. The first line of all HTTP responses is a status line, followed by the current HTTP version number, a 3-digit state code, and a phrase that describes the state, separated by spaces.
The first digit of the status code represents the type of the current response.
(1) 1xx message —- request has been received by the server to continue processing.
(2) 2xx successful —- request has been successfully received by the server, understand and accept.
(3) 3xx redirect – need follow-up to complete this request.
(4) 4xx request error —- request contains lexical error or can not be executed.
(5) 5xx server error —- server in dealing with a correct request error.

HTTP/1.1 200 OK 

Date: Thur,10 Aug 2017 08:56:16 GMT
Server: Apache/2.2.23 (Unix) mod_jk/1.2.14
Content-Length: 31300
Keep-Alive: timeout=5,max=99
Connection: Keep-Alive
Content-Type: text-html

The first line of each HTTP response consists of three space-separated items, the HTTP version, the status code, and the reason phrase. Some other common message headers are as follows:
(1) The Server message header contains a flag indicating the Web server software used. Sometimes it also includes other information such as the installed modules and the server operating system. The information included may not be accurate.
(2) The Set-Cookie message header sends another cookie to the browser, which will be returned by the cookie message header in the last request sent to the server.
(3) Progma header indicates whether the browser will keep the response in the cache.
(4) Expires The message header indicates the response expiration date.
(5) Content-Length header specifies the byte length of the message subject.
(6) Content-Type header indicates that the message subject contains an HTML document.

The HTTP protocol is stateless, meaning that the server does not know what the user did last time, which seriously hinders the implementation of interactive Web applications. So cookies are one of the “extra means” used to bypass the HTTO’s statelessness. The server can set or read the information contained in the Cookies to maintain the status of the user with the server session.

HTTPS

Hypertext Transfer Protocol (Hypertext Transfer Protocol Secure, HTTPS) is a combination of Hypertext Transfer Protocol and SSL / TLS to provide encrypted communications and authentication of network server identities. HTTPS connections are often used for transactional payment on the World Wide Web and the transmission of sensitive information for enterprise information systems.
HTTPS provides confidentiality and integrity protection for data transferred between the browser and the server. Help to prevent information leakage, and can guarantee the user to deal with the security of the server. But HTTPS does not protect against attacks directed against an application server or client component, and many successful attacks fall into this category. So most Web application security vulnerabilities still exist, regardless of whether the server uses HTTPS.

Server function

Web applications primarily provide dynamically generated content to users. When a user requests a dynamic resource, the server creates a response and executes the corresponding script generation on the server side, and then returns the content to the user. In the form of a server similar to a computer program, accept input and process the input, and finally return the output to the user.
HTTP requests usually use the following three ways to pass parameters to the application.
(1) through the URL in the query string.
(2) by using the POST method in the requested topic.
(3) through HTTP cookies.

Client function

The server-side application will receive user input and operation and return the result to the user, which must provide a client interface. Because all Web applications are accessed through a Web browser, these interfaces share a technology core. In recent years, client technology continues to change, the following describes some of the common client technology.

HTML

Hyper Text Markup Language (HTML) is a markup language designed for “web creation and other information that can be seen in web browsers.” Hyperlinks are a common tag in HTML, in fact a lot of communication between the server and the client by the user click the hyperlink driver. While hyperlinks are extremely convenient, many Web applications also require a more flexible form of input, and HTML forms are a common mechanism that allows users to submit arbitrary input.

CSS

Cascading Style Sheets, also known as string style lists, a computer language used to add styles (fonts, spacing and colors, etc.) to structured documents such as HTML documents or XML applications, defined and maintained by the W3C The CSS is currently the latest version of CSS3, is able to truly achieve the separation of web content and content of a style is the design language. Compared to the performance of traditional HTML, CSS can be in the page of the object layout of the pixel level of the essence of control, to support almost all of the font font style, with the web page object and model style editing capabilities, and to the initial interaction Involved, is currently based on the text display the best performance design language. CSS can be based on different users to understand the ability to simplify or optimize the writing for all types of people, have a strong legibility.

JavaScript

JavaScript is a case-sensitive client-side scripting language developed by Netscape’s LiveScript prototyping inherited object-oriented dynamic types. The main purpose is to solve the server-side language, for example, the speed of Perl’s legacy, Provide smoother browsing.
JavaScript is a relatively simple but powerful programming language that can be used to easily extend the Web interface with a variety of methods that can not be implemented with HTML. JavaScript is often used to perform the following tasks.
(1) to determine the user input, and then submit it to the server to avoid data containing errors and submit unnecessary requests.
(2) dynamically modify the user interface according to the user’s input to reduce the number of client-server communication.
(3) query and update the browser within the document object model (Document Object Model, DOM), control browser behavior.

Browser extension technology

In addition to JavaScript technology, some Web applications also use browser extension technology, the use of custom code from all aspects of the expansion of the browser’s built-in features. These extensions can be executed by the appropriate browser or need to be installed on the client executable program. Here are some common post-client technologies.
(1) Java applet
(2) ActiveX controls
(3) Flash objects
(4) Silverlight objects

State and session

The HTTP protocol itself is stateless, and the client simply wants the server to request, and the server responds to the request to return a response message. Neither the client nor the server records the information, and each request is independent. With the development of Web applications, on-demand dynamic information becomes more and more important. Among them, Cookie is to solve the HTTP stateless and generated, and later there Session, a client and server-side state to maintain the solution.
Session base address is a server mechanism, the server generally uses a hash table similar to the storage structure to save information.
Unlike the cookie mechanism, the Session takes hold on the server side and the cookie remains on the client. At the same time, the server side needs to keep an identity on the client. So Session mechanism usually need to use Cookie to save Session ID.

Encoding format

URL encoding

Uniform resource locator (Uniform / Universal Resource Locator) is also known as the web address, as in the network on the door, is the Internet standard resource address (Address). It was originally invented by Tim Berners Lee as the address of the World Wide Web. Now it has been compiled by the World Wide Web Consortium for the Internet standard RFC1738.
In the history of the Internet, the invention of a unified resource locator is a very basic step. The syntax of the Uniform Resource Locator is generic and extensible, and it uses a portion of the ASCII code to represent the address of the Internet. The beginning of a unified resource locator typically marks a network protocol used by a computer network.
The standard format for the Uniform Resource Locator is as follows.

URLs allow only printable characters in the US-ASCII character set (ASCII code is in the range 0x20 to 0x7e). And because of its special meaning in the URL scheme or HTTP protocol, some characters in this range can not be used in the URL. The URL encoding scheme encodes any problematic characters in the extended ASCII character set so that it can be securely transmitted over HTTP. Any URL-encoded characters are prefixed by%, followed by the two-digit hexadecimal ASCII code for this character.

Unicode encoding

Unicode is a character encoding standard designed to support the various encoding systems used in the world. It uses a variety of coding schemes, some of which can be used to represent characters that are not common in Web applications.
16-bit Unicode encoding works like URL encoding. For transmission over HTTP, 16-bit Unicode-encoded characters are prefixed with% u, followed by the hexadecimal Unicode code for this character.

HTML encoding

HTML encoding is a scheme used to represent problem characters to incorporate them securely into HTML documents. Many characters have special meanings in HTML (such as HTML metacharacters) and are used to define the document structure rather than its contents. In order to use these characters safely and use them in the contents of the document, they must be HTML encoded.
When attacking Web applications, HTML coding plays a major role in probing cross-site scripting vulnerabilities.

Base64 encoding

Base64 is a representation of binary data based on 64 printable characters. Since 2 of the 6th power is equal to 64, so every 6 bits for a unit, corresponding to a printable character. The three bytes have 24 bits, corresponding to four Base64 units, that is, three bytes need to use four printable characters to represent. Base64 is often used in the usual processing of text data, said, transmission, storage of some binary data. Including MIME email, email via MIME, and store complex data in XML.