Designing Bit.ly

Designing Bit.ly
Photo by Pawel Czerwinski / Unsplash

Problem: Design a Tiny URL / Bit.ly competitor

Given a long URL, create a short URL.

Given a short URL, redirect to a long URL.

Requirements

  • Very high read, likely 80% of all links are the same 20% of links
  • Not so much write
  • Needs low latency
  • High availabilitiy

Flow

  • When a user goes to bit.ly/ajdjasjd they should be redirected to the original URL
  • A user should be able to create URLs

Let's start with creating a URL

Making shorter URLs

What we need to do:

  1. Given a URL
  2. Encode the URL so its shorter
  3. Store in a key:value database of short URL -> Original URL
  4. Return short URL

Database

We should probably use a key value store for this like DynamoDB, because nosql is very fast to read, and our data structure is obvious.

Since we do not want to store stats, we can just store the 2 values.

Our database will look like:

create table urls;
create column original_url;
create column short_url;

Then we can index on short_url, as users will be looking it up the most

TODO: can nosql be indexed...?

We should create replicas of the database across multiple regions to allow better readability later on.

We can use a master slave config as this is easier to implement.

The user writes to a single DB (master), and that master writes to the other replicas (or the replicas pull it. TODO: which one is it?)

Because the DB is read heavy and not write heavy, we do not have to worry so much about writes failing.

Encoding scheme

For our encoding scheme lets hash the URL.

This is because:

  • Hashing means users cannot work out the original URL without clicking it
  • Hashing is very fast
  • Hashing will restrict the URL to a specific length, if we used base58 the length could be longer if the URL is very long

TODO: should we use base58 or hashing?

We can then return the original URL to the user.

Reading a URL

Now we go onto the other part, taking the shortened URL and turning it back into the original URL.

HTTP response

We should always use a 301 HTTP response (redirect perm). This is because many internet providers, DNS services, CDNs etc will see this and cache it on their side potentially making it faster for the user.

Cache

When the user clicks the URL, the first thing it should look to do is search for the result in the cache.

TODO: how do we make sure cache is local to user physically? Do we need the load balancer first?

Maybe we should put the load balancer first.

  1. So the user clicks the URL
  2. it goes to our load balancer
  3. Our load balancer randomly selects a cache
  4. If its in the cache, return it
  5. Otherwise the cache fetches it from the DB and returns that

Our caching solution could be as simple as redis, which is a very fast key value store.

TODO: how do we make sure load balancer is local to the user? Do we want them to connect to LB from america?

With Redis you can save the cache too, to stop rebuilding it each time

Youtube

Rest API

How long is the URL?

How many URLs per second? 1000 per second = 31.5 billion URLs a year

10 to 1 read writes = 300 billion reads per year

How many characters can we use? Base62

62 chars

If we want to have 7 characters in our URL, thats 762 which is 3.5 trillion possible URL combos

TODO: why dont we just hash the original URL?

Using zookeeper as a count cache, so our URL will just be a counter

Zookeeper is good for the metadata here