Everything about HTTPS

2023-10-21 1309 words 7 minutes

Contents

During my experience of this year’s fall job application process, many interviewers asked questions about HTTPS like such:

How is the security of HTTPS implemented compared to HTTP?
In the HTTPS protocol, how is data transmitted?
What are the steps of HTTPS?
(A speechless question) What advantages does ECDH have over HTTPS?

These interviews told me that it’s necessary to delve deeper into HTTPS. However, before we get into HTTPS, it’s important to understand its predecessor, HTTP.

1 The Internet Before HTTP

Before the World Wide Web and HTTP came into existence, people used Gopher for communication between networks.

As you can see, Gopher seems to refer to a type of rodent skilled at digging holes. These rodents were good at digging tunnels underground, much like networks. I’m not sure whether the name “Gopher” in the computer industry is related to these animals.

The Gopher protocol was a communication protocol used for distributing, searching, and retrieving documents on the Internet protocol network. The Gopher protocol and user interface design were menu-driven and offered an alternative to the early stages of the World Wide Web but were eventually replaced by HTTP. (Gopher is not free!)

The working process of the Gopher protocol is roughly as follows:

The client establishes a TCP connection with the server on port 70 (the standard Gopher port).
The client sends a string (the directory of the document to retrieve), followed by a carriage return and line feed (CR + LF).
The server replies with the requested items and closes the connection.

Even today, some intranets still use the Gopher protocol, but improper deployment can lead to SSRF vulnerabilities.

2 HTTP

Hypertext Transfer Protocol is an application layer protocol for distributed, collaborative, and hypermedia information systems. HTTP is the foundation for data communication on the World Wide Web.

The initial purpose of designing HTTP was to provide a way to publish and receive HTML pages. Resources requested via the HTTP or HTTPS protocol are identified by Uniform Resource Identifiers (URIs).

HTTP is a standard for requests and responses between a client (user) and a server (website), usually using the TCP protocol. The client initiates an HTTP request to the server on a specified port (the default port is 80). We refer to this client as the user agent, which can include web browsers, web crawlers, and more. Resources requested via HTTP or HTTPS are identified by Uniform Resource Identifiers (URIs).

The HTTP client initiates a request by establishing a TCP connection to the server on a specified port (typically 80 for HTTP, 443 for HTTPS). The HTTP server listens on that port for client requests. Once a request is received, the server responds with a status along with the requested content, error messages, or other information.

2.1 HTTP Versions

The development of HTTP versions is as follows:

HTTP/0.9: Supports only the GET method, does not specify a version in communication, and does not support request headers.
HTTP/1.0: The first HTTP version with a version number, default Connection: close.
Each HTTP operation results in a connection being established, but the connection is terminated when the task is complete. A new HTTP session is established for each request for a web resource.
HTTP/1.1: Default Connection: keep-alive.
In HTTP/1.1, the connection between the client and server does not close after a web page is opened. If the client accesses another web page on the same server, the existing connection is reused. Connection has a keep-alive time, and it can be configured in server software like Apache. Implementing a persistent connection requires both client and server support. The concept of long and short connections in HTTP is essentially related to long and short connections in TCP.
HTTP/2
HTTP/3: Abandons TCP and uses QUIC over UDP for carrying application-layer data.

As of now, HTTP/1.1 is still the most widely used version (from my personal observation).

HTTP/1.1 defines eight methods to operate on specified resources in different ways.

Request Method	Request Has Payload Body	Response Has Payload Body
GET	Optional	Yes
HEAD	Optional	No
POST	Yes	Yes
PUT	Yes	Yes
DELETE	Optional	Yes
CONNECT	Optional	Yes
OPTIONS	Optional	Yes
TRACE	No	Yes
PATCH	Yes	Yes

2.2 Standard Format of Requests and Responses

Request	Response
Request Line (Request Method + Address + HTTP Version) + `<CR><LF>`	Response Line (Response Code) + `<CR><LF>`
Request Header + `<CR><LF>`	Response Header + `<CR><LF>`
`<CR><LF>`	`<CR><LF>`
Other Message Body	Other Message Body

3 HTTPS

Hypertext Transfer Protocol Secure (often referred to as HTTP over TLS, HTTP over SSL, or HTTP Secure) is a transport protocol for secure communication over a computer network. HTTPS uses HTTP for communication but encrypts the data using SSL/TLS.

TLS is the child of SSL. Strictly speaking, the HTTPS we use today is based on TLS, not SSL. The latest version of SSL was deprecated in 2015.

Application layer protocols can transparently run over the TLS protocol, which is responsible for negotiating and authenticating the establishment of an encrypted channel. Data transmitted by application layer protocols is encrypted when passing through the TLS protocol, ensuring the privacy of communication.

The TLS protocol is optional and must be configured on both the client and server to be used. There are two main ways to achieve this: one is to use a common TLS protocol port (e.g., port 443 for HTTPS), and the other is for the client to request the server to use a specific protocol mechanism when connecting to TLS (e.g., commonly used STARTTLS in email). Once both the client and server agree to use the TLS protocol, they negotiate a stateful connection to transmit data.

Steps in the TLS handshake (key exchange, using RSA as an example):

The client requests a TLS connection with the server.
The server decides the cipher suite from a list and notifies the client.
The server sends back its digital certificate, including the server’s public key.
The client verifies the validity of the certificate issued.
To generate a session key for a secure connection, the client encrypts a randomly generated key (used for symmetric encryption, currently 2048 bits, very secure) using the server’s public key and sends it to the server.
The server decrypts the client’s symmetrically encrypted key using its private key. Handshake complete.

During the key exchange process, other methods can be used: ECDH, DH, DHE, ECDHE, DH_anon.

Among these algorithms, some are forward-secure, while others are not. Forward secrecy is a security feature in cryptographic communication protocols. It means that even if the long-term master key is compromised, it won’t lead to the exposure of past session keys. RSA doesn’t have forward secrecy. In the case of ECDH and DH, both parties generate a shared key using their public keys and their respective private keys. They then deleted their private keys and use a hash of the shared key as the AES key. Even if an attacker obtains the AES key, they can’t recover the server’s DH private key (which had been deleted in the two sides).

Well, so let’s discuss the question from the very beginning:
What advantages does ECDH have over ~~HTTPS~~ RSA?
Because ECDH is based on elliptic curves, it offers the advantages of smaller key sizes, faster encryption and decryption speeds, lower computational resource requirements, and of course, forward secrecy.
However, ECDH requires high-quality random number generation to ensure the security of the keys. Additionally, careless selection or implementation of curves can result in potential security vulnerabilities. (On the other hand, when configured improperly, RSA has its share of security issues as well…)

After the TLS handshake, both parties will use symmetric encryption to transmit data. Since application layer protocols can transparently run over the TLS protocol, the encryption and handshake process is transparent to us. When we send an HTTPS request, we are still using plain text, but the destination port for the request is 443 instead of 80.