J
E
L
L
Y
E
N
T

I endure in solutions after I used to be first finding out about hash tables. I blueprint I stumbled all over an ingenious optimization for going by hash collisions. Sooner than explaining my optimization, let me first quick demonstrate how hash tables work and what hash collisions are. I moreover made a video within the occasion you’re interested.

Expose that now we trip a hash that maps somebody’s title to their age. `"adam"`

is `27`

, so we’re attempting to enter that into our hash.

Accurate right here’s what occurs:

`"adam"`

will get recount by a hash intention.- The hash intention spits out
`7`

. `7`

is the index of the array where we’re going to retailer`"adam"`

‘s place.- So we retailer
`27`

into`array[7]`

.

If we cruise `hash.obtain("adam")`

within the extinguish and love to hasten hunting for up the charge for `"adam"`

, we:

- Scoot
`"adam"`

by the hash intention. - Fabricate
`7`

for the reason that output. - This suggests now we must look for at
`array[7]`

to derive the charge. - So we glance for at
`array[7]`

to derive our rate.

With any very best fortune if now we trip a key of `"alice"`

or `"bob"`

it would possibly maybe well perhaps well hash to a odd index. On the choice hand… what occurs if it hashes to the a comparable index? That’s known as a hash collision.

A traditional capability is that if there would possibly maybe well perhaps well effectively be a collision, you retailer the values in a linked checklist.

Involving enough. That is staunch.

On the choice hand wait!!! It takes `O(n)`

time to hasten hunting for up a spot in a linked checklist. Can’t we exercise a binary search tree to cruise that as a lot as `O(log n)`

???

Or… can’t we elect it a step additional and exercise a *hash internal of a hash* to derive the watch time the complete capability the normal make the complete system down to `O(1)`

?!

I detest to picture it, but nonetheless a prolonged time within the past after I had these solutions, there were a pair of moments where I blueprint I used to be a genius. Now after I look for assist at that previous-normal self, I facepalm.

It’s compatible that going from a linked checklist to a BST to a hash takes you from `O(n)`

to `O(logn)`

to `O(1)`

watch time. Alternatively, since collisions are uncommon, `n`

goes to be diminutive. And when `n`

is diminutive, `O(n)`

would possibly maybe well perhaps well effectively effectively perhaps even be faster than `O(1)`

.

To love why that’s, replicate assist to what Big-O indubitably machine. I replicate it helps to replicate it as a « math thing » a diminutive bit than a « programming thing ». Get in solutions two selections:

```
f(n) = 4n + 1000
g(n) = 2n^2 + 5
```

Big-O of `f`

is `O(n)`

and Big-O of `g`

is `O(n^2)`

. We keen focal stage on the part that « indubitably matters ». When `n`

will get indubitably immense, the undeniable truth that it is `4n`

would now not indubitably topic. Nor does the `+ 1000`

.

On the choice hand what about when `n`

is diminutive? Effectively, let’s look for at occurs when `n`

is `5`

:

```
f(5) = 4(5) + 1000 = 20 + 1000 = 1020
g(5) = 2(5^2) + 5 = 2(25) + 5 = 50 + 5 = 55
```

Designate at that! The `O(n^2)`

intention is extra or much less 20 cases faster than the `O(n)`

intention!

And *that* is fully why they exercise a linked checklist a diminutive bit than a BST or hash to tackle collisions. Since `n`

is diminutive, Big-O will not be to any extent additional the lawful seek recordsdata from to quiz.

Accurate right here is yet any various occasion. I keen wrote the following code whereas prepping for an interview:

```
const isVowel = (char) => ["a", "e", "i", "o", "u"].entails(char);
```

Customarily I would possibly maybe well perhaps well effectively effectively perhaps now not replicate twice about it, but since interviewers care so worthy about time complexity, I achieved to replicate whether it would possibly maybe well perhaps well be improved.

And it hit me that it is indubitably `O(n)`

, on fable of now we trip an array and are keen to iterate over the complete lot to hasten hunting for if the part suits `char`

. It’s easy to fail to be conscious this on fable of we’re the exercise of `entails`

and now not writing the code ourselves.

So then I blueprint that per likelihood shall we effectively effectively perhaps also exercise a hash as a replace to derive `O(1)`

watch. One thing love this:

```
const isVowel = (char) => {
const vowels = {
a: compatible,
e: compatible,
i: compatible,
o: compatible,
u: compatible;
};
return !!vowels[char];
};
```

On the choice hand then I spotted that that’s the a comparable mistake I made with the hash collision stuff nonetheless a prolonged time within the past. * n is diminutive, so Big-O will not be to any extent additional the seek recordsdata from we desires to be asking*. Accurate right here

`n`

is `5`

.I replicate that the array capability would indubitably be faster than the hash capability, though it is `O(n)`

a diminutive bit than `O(1)`

. The clarification for that’s named locality.

Whenever you dive deep beneath the hood, for these who look for up a part in an array, the CPU indubitably grabs a bunch of adjoining parts as effectively for the reason that one you wished, and it stores the adjoining parts in a cache. So factual right here when we glance for up `"a"`

within the array, it would possibly maybe well perhaps well doubtlessly accept `"a"`

, `"e"`

, `"i"`

, `"o"`

, and `"u"`

. And so subsequent time when we’re attempting to to assemble `"e"`

, it would possibly maybe well perhaps well effectively effectively perhaps ponder it from the cache, which is so famous faster. This works on fable of the parts within the array are shut to every various within the physical reminiscence. On the choice hand with a hash, my knowing is that they wouldn’t be so shut, and thus we wouldn’t ponder pleasure on this spatial locality conclude.

I am no low-stage programming wiz so I am now not obvious about any of this. That’s staunch enough, I replicate it is beside the stage of this put up. The categorical stage of this put up is that for these who trip considerably `n`

, Big-O would now not topic.

## 5 Comments

## 3np

> const isVowel = (char) => ["a", "e", "i", "o", "u"].includes(char);

> [… ] it hit me that it's actually O(n), because we've got an array and have to iterate over every element to see if the element matches char.

This depends on how you define your `n`. Usually this would be defined as the input size. Unless you would like to be able to run this on arbitrary definitions of what a "vowel" is (and in this case, it's indeed both small and static), the only useful interpretation would be to consider this function O(1), regardless of iteration over this 5-element list. If the list of vowels would considered a parameter, it would indeed be O(n) or O(m).

## F-0X

> And it hit me that it's actually O(n), because we've got an array and have to iterate over every element to see if the element matches char.

No, it's actually O(1). n refers to the _input_, which is always a single character. The iteration over an array (fixed at 5 elements) means a maximum of 5 comparisons. O(isVowel) = 5.

## Tainnor

No, the author didn't really understand how hash tables* work and what the big-O notation really means.

If you do a worst-case analysis, hash tables just degenerate to a single linked list (everything gets hashed to the same bucket), and you don't gain anything by using them. That's where you have 0(n) performance.

But hash tables are probabilistic data structures, so we don't actually look at worst-case performance, but at the average case. It's important that for this analysis we need to make some assumptions about the hash function itself (e.g. what its collision probability is). Out of that, we can compute that hash table operations are on average constant.

* There are actually multiple ways to deal with collisions in hash table, let's focus on open hashing/seperate chaining for now.

## ncmncm

Of course if you are looking to see whether a lower-case letter is one of a set, you use a bitmap:

Often O(1) is better than O(n) even for small n.

Often O(1) is better than O(1).

Modern compilers will turn

into sort-of the same code (after a range check).

https://godbolt.org/z/aMcdY6

Looking at the code actually produced, the compiler has noticed that all the vowels are even-numbered characters

(0,4,8,14,20), so combined the range check with a "rotate-right" so it can compress 0x104111 down to 0x495, and shift that right and check the low bit of the result instead of doing a "bt", or bit-test. It's anybody's guess why that is considered better; shifts are supposed to be constant-time; but checking the low bit is a byte-sized operation. So, maybe

is better: https://godbolt.org/z/M784cK

## lolptdr

Newbie question: why would a hash function ever result in hash collision? Why can't a hash function guarantee a unique output value?