What Happens When You Type an URL and Press Enter?

Tudor Barbu
Dev Genius
Published in
6 min readJul 24, 2020

--

I did my fair share of interviewing in my time. So I should be good at it. Practice makes perfect, right? The more we do something, the better we get at it. Or at least, that’s the theory. In practice, I’m still finding it quite difficult to interview someone.

Photo by Sam McGhee on Unsplash

The goal of any interview is to vet if the candidate can make a valuable contribution to the company or not and assess their level. That’s it!

No need for brain teasers, impossible situations, weird algorithms, or synthetic academic style problems. We need a teammate, not a techno-Rambo.

Leveling a candidate is a difficult topic, as there is no clear definition of a junior, mid, and senior engineer that’s consistently applied across the industry. They differ greatly from one company to another. I have seen “seniors” who couldn’t write a unit test. Or thought that one single 2000+ lines of spaghetti code file is the way to go.

There is no clear definition of a junior, mid, and senior engineer that’s consistently applied across the industry

Usually, there are several positions available in the company at any given time, in different teams, and at different levels. After completing the interview, I should be able to direct the candidate to the one that fits best. And to do this, I need to calibrate my expectations during the interview.

Am I interviewing a senior person and should I skip the easy questions? Or a mid-level engineer? Is the candidate a fullstack generalist or a domain specialist? This calibration should happen fast, in the first few minutes of the meeting, without taking too much time out of the actual interview.

One easy way to determine this is to ask the candidate a simple question: “what happens between the moment you type something in the browser’s address bar and the moment the page is fully loaded?”. A comprehensive answer can be given in less than 5 minutes. And based on the level of detail, it’s easy to gauge the level of the candidate.

What happens between the moment you type something in the browser’s address bar and the moment the page is fully loaded?

Is it a search term or an URL?

Modern browsers allow users to type keywords and perform searches directly from the address bar, without having to navigate to the search engine first. So the first step would be to determine whether the string in the address bar is an URL or a search term. In the second case, the string will be embedded into a search engine URL.

I would expect this answer from a mid-level onwards engineer with a lot of attention to detail. Someone who can work closely with Product.

The DNS resolution

The next step is the DNS or Domain Name System resolution. The purpose of DNS is to provide human-friendly domain names, by translating the domain name into an IP address. It’s much easier for you and I to remember google.com instead of 216.58.211.46.

This would be the minimum answer from a junior or mid-level candidate.

I expect senior and fullstack candidates to be able to get into details. Before querying a DNS server, the browser will first query its own cache. If no matching record is found, it will query the cache of the operating system.

The OS has its own cache and, as an artifact from the past, a hosts file:

  • /etc/hosts on *nix systems
  • c:\windows\system32\drivers\etc\hostson Windows

which gets queried before initiating an external connection.

If there no local answer can be provided, the browser will connect to the DNS server and ask for the IP address associated with the domain it’s trying to connect to.

This would be the answer I’d expect from a senior candidate. Any additional level of detail is just gravy.

DNS operates on port 53. Before reaching the DNS server, the IP associated with the query can be retrieved from the cache of one of the network nodes along the path. The DNS resolution can run additional optimizations, such as pointing users to servers that are closer to their geographical position in order to decrease network latency or doing a round-robin to distribute traffic among multiple servers running the same service.

Connecting to the server

Once the IP address is available, the browser will initiate a connection to the server. Based on the protocol used, the connection will be either on port 80 for HTTP or 443 for HTTPS.

In either case, the HTTP method or “verb” would be GET, which is used to request information from the server. Other commonly used methods are:

  • POST — used to submit data to the server. Under REST it’s used to create new entities
  • PUT — also used to submit data to the server, but with the goal of updating an existing resource
  • DELETE — used to indicate that a resource must be deleted

This would be a good response that I would expect from a mid-level candidate onwards.

Senior candidates should be able to explain the advantages HTTPS has over plain HTTP communications.

HTTPS uses a protocol called Transport Layer Security or TLS for short to encrypt connections between the client and the server. By doing so, even if the traffic is intercepted, it prevents eavesdropping or tampering, guaranteeing privacy and data integrity.

Unless the interview is for a security-sensitive position, I wouldn’t expect even senior candidates to go deeper into the details.

A candidate that’s able to mention more exotic HTTP methods like HEAD, CONNECT, TRACE, OPTIONS or PATCH just got lucky. Unless they can come up with an amazing use case, personally I don’t get any in-depth insight from applicants displaying this knowledge beyond a good memory.

Processing the request and sending back the response

Almost always, the HTTP server passes the request to a handler. The handler is an application that, based on information from the request such as path, cookies, headers, query-string, or a combination of them, generates a specific response and sends it back to the client.

I am happy with this level of detail from a junior candidate. For mid-level one, I would expect to be able to name some HTTP statuses and their meaning.

The response has 3 parts: status line, headers, and body. The most important part of the status line is the response code, a 3 digits number that falls under one of these classes:

  • 1xx — informational, the request was received and the client should continue
  • 2xx — OK responses, the request was handled without errors
  • 3xx — moved codes, the requested resource was moved to a different URL
  • 4xx — client-side error
  • 5xx — server-side error

If you like animals and you’d like to learn more about HTTP status codes, based on your preference: HTTP Status Dogs or HTTP Cats.

The headers contain additional information such as the MIME type of the response’s body, for how long the content should be cached, whether the payload is gzipped or not, and so on.

I would expect a senior candidate to be able to explain the request/response mechanism and give some examples of HTTP status codes and response headers.

Displaying the page

After the client gets the response, it starts parsing it in order to display the page. The browser will interpret the HTML and create the DOM tree. If CSS is found, the browser will also create the CSSOM tree.

The two entities are created independent of each other but are combined together they form the “Render Tree”, a tree structure which encapsulates all the elements that will eventually be rendered on the page. The render tree only includes visible elements, not those hidden with display: none.

CSS is considered a “render-blocking resource”, which means that whenever it’s encountered, it stops the render tree from being assembled until the CSS is parsed. Due to its cascading nature, CSS cannot be used in chunks, which means it must be parsed entirely before the browser can move to the next step.

From an optimization standpoint, it’s important to not send to the browser CSS that’s not used on the current page.

Javascript code is also a blocking resource. Since js code can programmatically alter both the CSS and the DOM, once Javascript is encountered, the rendering stops for both the DOM and the CSSOM until the code has finished executing.

I would expect mid-level engineers and fullstack seniors to know about blocking / non-blocking content and pure frontend seniors should be able to go into the details.

According to Medium, the time it takes to read this article is roughly 6 minutes, with all the extras. Strip those away and we’re left with about 4–5 minutes, which is the amount of time it would take an experienced candidate to give a comprehensive answer.

--

--