How Unique Is Your ID?

Most of us are familiar with the UUID library. Some might use other libraries for this purpose. Or you might already know that Node has built-in support for UUIDs, so you don't need an extra library for it. We use these for unique identifiers when storing data or for various other reasons. Let's dive into how they work, explore different types of random IDs, and see what happens when we scale to distributed systems (yeah, I've been exploring distributed systems lately, so I had to mention😌)

What's an ID and Why Do We Need It?

But can't we go without IDs? Of course we can, depending on the task. If you're building something simple that only stores users' names and surnames, and you know you won't update or delete them (unless manually), then sure, skip the IDs. But real-world apps (think any social media platform you use) need CRUD operations - Create, Read, Update, and Delete.

Here's a fun example: imagine you want to delete your cringe profile picture. You send a request to the server, but how does the server know it's your picture? Without a unique identifier, if you just say "Hey, I'm John, delete my profile pic," the server might go through all the Johns and delete their pictures too! Not cool for your friend John who just lost his only decent photo. 😅

Major JS libraries and frameworks also use something called 'key' that requires unique IDs to improve the rendering process and avoid unnecessary re-renders. Pretty useful, right? Let's check out some ways to generate these helpful IDs.

Auto Increment: The Simple Approach

The easy one! Most databases have auto_increment as the default for ID columns if you don't specify otherwise. It's nice and simple - each new record gets the next number in line (1, 2, 3...). But hold up, there are two main problems here:

Scaling Issues: When your app gets bigger and you need more servers (horizontal scaling), it gets messy. Imagine two servers both trying to count up from 1. Server A gives ID: 7 to Alice, and Server B also gives ID: 7 to Bob. Oops! Not so unique anymore.
Security Problems: Here's a fun (but scary) fact - if your ID is 15, someone could just change that number in the URL to 14 or 16 and potentially access other users' data! This is called an IDOR (Insecure Direct Object Reference) vulnerability, and it's a big no-no in security.

While auto-increment IDs are super simple to use, they might not be the best choice for public-facing applications or systems that need to scale across multiple servers.

UUID: The Universal Solution

UUID (Universally Unique Identifier) is a 128-bit label of random characters that can be used as an ID. 128 bits seems big enough to avoid duplicates, right? Let's see.

History of the UUID

Hmm, you might never have thought about the history behind that uuid() function you keep importing and using randomly, right? Let's take a quick look at the story of UUID. If you're not into history, feel free to skip this part.

From the moment machines started sharing information over networks, they needed a way to identify things uniquely. It all started with telephone networks in the 1870s. Back then, every phone needed its own wire to connect - imagine how messy the streets were with all those copper wires! Then came phone numbers, which were actually the first unique IDs used in networks.

Later, when computers came along, they needed better ways to identify data. In the 1980s, a company called Apollo was working on connected computers and created UIDs (Universal IDs). They used 20 bits for these IDs, which could identify about a million machines - they thought this was enough!

But when Apollo started working with other companies, they found a problem: everyone was using different-sized IDs. To fix this, they created UUID with 128 bits, making it much bigger. This worked so well that we still use it today! Early UUIDs used timestamps (like today's Version 1), but the modern ones we use (Version 4) are totally random.

Let's Build Our Own UUID Generator

If you think it's super complicated, I've got news for you - it's not (at least for a basic version)! While there are various UUID versions using different algorithms based on timestamps and MAC addresses (check out RFC 4122 if you're curious), we can create a simple random one:

./uuid.js

function generateID(length = 8) {
    return Math.random().toString(36).substring(2, 2 + length);
}

// Example usage
console.log("Generated ID:", generateID());
console.log("Generated ID (4 chars):", generateID(4));
console.log("Generated ID (12 chars):", generateID(12));

Here's how it works:

Math.random() generates a random number between 0 and 1
toString(36) converts this number to base 36 (using digits 0-9 and letters a-z)
substring(2, 2 + length) removes the "0." prefix and takes the specified number of characters

Want something more robust? Let's add process ID, MAC address, and optional prefix/suffix:

./uuid.js

import os from 'os';
import crypto from 'crypto';

export function generateUniqueID(prefix = '', suffix = '') {
    const time = Date.now().toString(36);
    const processId = process.pid.toString(36);
    const machineId = crypto.createHash('md5')
        .update(os.hostname())
        .digest('hex')
        .slice(0, 8);
    const randomPart = Math.random().toString(36).substring(2, 5);

    return `${prefix}${time}${processId}${machineId}${randomPart}${suffix}`;
}

// Example usage
console.log("Generated Unique ID:", generateUniqueID());
console.log("Unique ID with prefix:", generateUniqueID("user_"));

What About Duplicates?

Ever wondered if UUIDs can have duplicates? Like, what are the chances of generating the same UUID twice when using UUID v4? We can actually calculate this using probability theory (math is fun! 🤓). It relates to the famous 'Birthday Paradox' (that's a topic for another blog post, maybe I'll write about it!). Using similar probability theory, we can calculate the chances of UUID duplicates using this formula:

p(n)\approx 1-e^{-{\frac {n^{2}}{2\cdot 2^{x}}}}

We can easily turn it into code to check the probability:

./uuid.js

function uuidCollisionEstimate() {
   const x = Math.pow(2, 122);
   console.log('Number of UUIDs', 'Probability of Duplicates');

    for (let i = 35; i < 62; i++) {
        const n = Math.pow(2, i);
        const p = 1 - Math.exp(-(n * n) / (2 * x));
        
        // Printing the result as 2^i and formatted probability
        console.log(`2^${i}=${1n << BigInt(i)} : 0${String(1 + p).substring(1)}`);
    }
}

uuidCollisionEstimate();

You can try it out in your terminal and see a result like this:

Number of UUIDs	Probability of Duplicates
2^36=68'719'476'736	0.000'000'000'000'000'4
2^41=2'199'023'255'552	0.000'000'000'000'4
2^46=70'368'744'177'664	0.000'000'000'4

But what does all that mean?

"2^36 = 68,719,476,736 : 0.0000000000000004" : if you generate about 68.7 billion UUIDs (that's a lot!), the chance of getting a duplicate is only 0.0000000000000004 (or 4 x 10⁻¹⁶)

To give some perspective, the odds of a person being hit by a meteorite in a year are about 1 in 17 billion, which is a probability of roughly 0.00000000006 (6 x 10⁻¹¹). This is similar to the chance of generating trillions of UUIDs over a year and accidentally creating just one duplicate. To put it another way, even if you created 1 billion UUIDs every second for 100 years, the chance of making a single duplicate would only reach 50%.

Unique IDs in Distributed Systems

When it comes to distributed systems (my current favorite topic 🙃), generating unique IDs becomes even more interesting. Here are some popular approaches:

Multi-master replication

In multi-master replication, multiple systems create IDs independently. Each system knows about the others, so they don't generate the same ID. This helps avoid duplicates in large systems where many machines are working together.

Twitter Snowflake

Twitter Snowflake is a system Twitter created to generate unique IDs. It splits the ID into parts, including the time, machine, and a counter, so every ID is different. It's a simple way to ensure uniqueness even when multiple servers are involved.

Ticket server

A ticket server is a central place where systems ask for unique IDs. It gives out numbers to ensure no two systems create the same ID. This approach works well but can slow things down if many systems request IDs at once.

And of course, each method has its own pros, cons, and specific use cases. For example, if you're using multi-master replication, you may face scalability issues (as your servers scale, inconsistency can occur, which isn't ideal). On the other hand, with a ticket server, the problem is having a single point of failure. For more details, you can check out this system design book, which covers each method and their advantages and disadvantages in depth.

Bonus

Here's a simple, tiny app that generates a UUID using version 4:

Click the copy icon to copy UUID

Let's keep it here (even though you might never use it), just in case you need a quick unique ID for something.

Here are also some useful links I referred to while writing this blog (if you're interested in diving deeper):

And if you're into security stuff, definitely check out In GUID We Trust. It shows how the old UUID v1 (the one using timestamps and MAC addresses) could be hacked pretty easily. Pretty scary stuff, right? That's exactly why we prefer v4 nowadays!

Conclusion

Whether you're building a small app or a distributed system, understanding how to generate and use unique IDs is crucial. From simple auto-increment to complex distributed ID generation systems, each approach has its place. The key is choosing the right tool for your specific needs.

Remember, while making your own UUID generator is fun for learning (like we did in this blog!), for serious applications, stick to battle-tested solutions like UUID or other GUIDs.

Happy coding, and may all your IDs stay unique! 🚀