What is tinyurl?
A tinyurl service compresses a long url into a short one. This saves space, and is useful for scenarios such as twitter or weibo, where each character counts. The down side is possible spam use. Concerns include longevity of a short url.
i.e. “http://tiny.me/5ie0V2″. The highlight part can be any string with 6 letters containing [0-9, a-z, A-Z]. That is, 62^6 ~= 56.8 billions unique strings.
On Single Machine
One Simple Solution could be Hashing. Use a hash function to convert long string to short string. In hashing, that may be collisions (2 long urls map to same short url) and we need a unique short url for every long url so that we can access long url back but hash is one way function.
One of the most simple but also effective one, is to have a database table set up this way:
Table T_Url_Conversion ( ID : int PRIMARY_KEY AUTO_INC, Original_url : varchar, Short_url : varchar )
Then the auto-incremental primary key ID is used to do the conversion: (ID, 10) <==> (short_url, BASE). Whenever you insert a new original_url, the query can return the new inserted ID, and use it to derive the short_url, save this short_url and send it to cilent.
Here the BASE can be 62 for a-zA-Z0-9, or 36 for a-z0-9. So this is a conversion of numbers between base-10 and base-62 or base-36. If use base-62, and with a short_url length of 6 characters, that’s a space of 62^6 =~ 57 billion short urls to use.
Implementation:
string idToShortURL(long int n) { // Map to store 62 possible characters char map[] = "abcdefghijklmnopqrstuvwxyzABCDEF" "GHIJKLMNOPQRSTUVWXYZ0123456789"; string shorturl; // Convert given integer id to a base 62 number while (n) { shorturl.push_back(map[n%62]); n = n/62; } // Reverse shortURL to complete base conversion reverse(shorturl.begin(), shorturl.end()); return shorturl; } // Function to get integer ID back from a short url long int shortURLtoID(string shortURL) { long int id = 0; // initialize result // A simple base conversion logic for (int i=0; i < shortURL.length(); i++) { if ('a' <= shortURL[i] && shortURL[i] <= 'z') id = id*62 + shortURL[i] - 'a'; if ('A' <= shortURL[i] && shortURL[i] <= 'Z') id = id*62 + shortURL[i] - 'A' + 26; if ('0' <= shortURL[i] && shortURL[i] <= '9') id = id*62 + shortURL[i] - '0' + 52; } return id; }
On Multiple Machine
Suppose the service gets more and more traffic and thus we need to distributed data onto multiple servers.
We can use Distributed Database. But maintenance for such a db would be much more complicated (replicate data across servers, sync among servers to get a unique id, etc.).
Alternatively, we can use Distributed Key-Value Datastore.
Some distributed datastore (e.g. Amazon’s Dynamo) uses Consistent Hashing to hash servers and inputs into integers and locate the corresponding server using the hash value of the input. We can apply base conversion algorithm on the hash value of the input.
The basic process can be:
Insert
(1) Hash an input long url into a single integer;
(2) Locate a server on the ring and store the key–longUrl on the server;
(3) Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.
Retrieve
(1) Convert the shorten url back to the key using base conversion (from 62-base to 10-base);
(2) Locate the server containing that key and return the longUrl.
I suppose that some of those websites gather statistics on the link that was generated like how many times that link had been visited. It makes people who post a link feel special when it gets viewed a lot of times.
Yes they do , even they allow to customise the URL so you can give a user friendly name which is easy to remember
To save on the space, probably a BloomFilter would be more space efficient to do a check on the url availability on a given shard server.
1. First check if the short url exists on your service
2. Then check on which shard the original url of the service might exist
3. Check if there is a cache to return the url
4. On a cache miss, look up the original url in the shard
Pingback: Problems Set For Later Consideration – Lets Code
Hello!
On a single machine how this method would return a tiny url of size 6 characters?
For any given long int (long url), using division by 62 repeatedly doesn’t guarantee that the resulting string will be only 6 characters long?
Betta fish want to rest on objects from the bottom with the container, so some decorations are a fantastic idea.
It is important to use objects that are smooth and don’t have
any sharp edges, because the betta’s fins are incredibly delicate and
could be easily damaged.