Design of a Tinyurl Service

What is tinyurl?

A tinyurl service compresses a long url into a short one. This saves space, and is useful for scenarios such as twitter or weibo, where each character counts. The down side is possible spam use. Concerns include longevity of a short url.

i.e. “http://tiny.me/5ie0V2″. The highlight part can be any string with 6 letters containing [0-9, a-z, A-Z]. That is, 62^6 ~= 56.8 billions unique strings.

On Single Machine

One Simple Solution could be Hashing. Use a hash function to convert long string to short string. In hashing, that may be collisions (2 long urls map to same short url) and we need a unique short url for every long url so that we can access long url back but hash is one way function.

One of the most simple but also effective one, is to have a database table set up this way:

Table T_Url_Conversion (
    ID : int PRIMARY_KEY AUTO_INC,
    Original_url : varchar,
    Short_url : varchar
)

Then the auto-incremental primary key ID is used to do the conversion: (ID, 10) <==> (short_url, BASE). Whenever you insert a new original_url, the query can return the new inserted ID, and use it to derive the short_url, save this short_url and send it to cilent.

Here the BASE can be 62 for a-zA-Z0-9, or 36 for a-z0-9. So this is a conversion of numbers between base-10 and base-62 or base-36. If use base-62, and with a short_url length of 6 characters, that’s a space of 62^6 =~ 57 billion short urls to use.

Implementation:

string idToShortURL(long int n)
{
    // Map to store 62 possible characters
    char map[] = "abcdefghijklmnopqrstuvwxyzABCDEF"
                 "GHIJKLMNOPQRSTUVWXYZ0123456789";
 
    string shorturl;
 
    // Convert given integer id to a base 62 number
    while (n)
    {
        shorturl.push_back(map[n%62]);
        n = n/62;
    }
 
    // Reverse shortURL to complete base conversion
    reverse(shorturl.begin(), shorturl.end());
 
    return shorturl;
}
 
// Function to get integer ID back from a short url
long int shortURLtoID(string shortURL)
{
    long int id = 0; // initialize result
 
    // A simple base conversion logic
    for (int i=0; i < shortURL.length(); i++)
    {
        if ('a' <= shortURL[i] && shortURL[i] <= 'z')
          id = id*62 + shortURL[i] - 'a';
        if ('A' <= shortURL[i] && shortURL[i] <= 'Z')
          id = id*62 + shortURL[i] - 'A' + 26;
        if ('0' <= shortURL[i] && shortURL[i] <= '9')
          id = id*62 + shortURL[i] - '0' + 52;
    }
    return id;
}

On Multiple Machine

Suppose the service gets more and more traffic and thus we need to distributed data onto multiple servers.

We can use Distributed Database. But maintenance for such a db would be much more complicated (replicate data across servers, sync among servers to get a unique id, etc.).

Alternatively, we can use Distributed Key-Value Datastore.
Some distributed datastore (e.g. Amazon’s Dynamo) uses Consistent Hashing to hash servers and inputs into integers and locate the corresponding server using the hash value of the input. We can apply base conversion algorithm on the hash value of the input.

The basic process can be:

Insert
(1) Hash an input long url into a single integer;
(2) Locate a server on the ring and store the key–longUrl on the server;
(3) Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.

Retrieve
(1) Convert the shorten url back to the key using base conversion (from 62-base to 10-base);
(2) Locate the server containing that key and return the longUrl.

3 Thoughts on “Design of a Tinyurl Service

  1. Geek Web on April 29, 2016 at 1:14 pm said:

    I suppose that some of those websites gather statistics on the link that was generated like how many times that link had been visited. It makes people who post a link feel special when it gets viewed a lot of times.

  2. Pahlaj on May 4, 2016 at 8:49 pm said:

    To save on the space, probably a BloomFilter would be more space efficient to do a check on the url availability on a given shard server.

    1. First check if the short url exists on your service
    2. Then check on which shard the original url of the service might exist
    3. Check if there is a cache to return the url
    4. On a cache miss, look up the original url in the shard

  3. Pingback: Problems Set For Later Consideration – Lets Code

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation