Hashing
-Hashing is a method / technique used for storing and retrieving keys in a rapid manner.
-Hashing is also used to index and retrieve items in a database because it is faster to find item
using shorter hashed key than to find it using the original value.
-The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given a key, the algorithm computes an index that suggests where the entry can be found.
Hash Table
-Hash table is a table (array) where we store the original string, the index of the table
is the hashed key while the value is the original string.
-The size of hash table is usually several orders of magnitude lower than the total number
of possible string, so several string might have a same-hashed key.
Hash Function
-There are many ways to hash a string into a key. The following are some of the important
methods for constructing hash functions.
1) Mid-square
Steps to be taken :
a) Square the value of the key. Make it like (V2).
b) Extract the middle r bits of the result obtained in the first step.
Illustration :
Function : h(V) = s
V = key
s = the hash key obtained by selecting r bits from V2.
2) Division
Steps to be taken :
a) Divide the string / identifier by using the modulus operator.
b) Its the most simple method of hashing an integer x.
Illustration :
Function: h(c) = c mod R
c = key
R = the value which is used to divide the key, usually a prime number or the table size
or the size of the memory used.
3) Folding
Steps to be taken :
a) Divide the key value into a number of parts where each part has the same number of digits except
the last part which may have lesser digits than the other parts.
b) Add the individual parts. That is obtain the sum of part 1 + part 2 + ... + part n. The hash value produced by ignoring last carry if any.
4) Digit Extraction
- A predefined digit of the given number which is considered as the hash address.
Example : M = 12345
- If we extract the 1st , 4th and 5th digit, we will get hash code of : 145.
5) Rotating Hash
Steps to be taken :
a) Use any hash method ( such as division or mid-square method)
b) After getting the hash code/address from the hash method, do rotation.
c) Rotation is performed by shifting the digits to get a new hash address.
Example : M is a hash address which contain the value 20101
The rotation result will be 10102 ( fold the digits ).
Collision
- A collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint or cryptographic digest.
- In general there are 2 ways to handle collisions :'
a) Linear Probing
- The idea of this concept is to process one by one until what we are searching is found, it is similar to linear search.
Example of Linear Probing algorithm :
void
linear_probing(item, h[]) {
hash_key = hash(item);
i = has_key;
while ( strlen(h[i] != 0 ) {
if ( strcmp(h[i], item) == 0 ) {
// DUPLICATE ENTRY
}
}
i = (i + 1) % TABLE_SIZE;
if ( i == hash_key ) {
// TABLE IS FULL
}
}
}
}
h[i] = item;
}
}
b) Chaining
-In
chaining, we store each string
in a chain (linked list). So
if there is collision, we only
need
to iterate on that chain.
Example of Chaining algorithm :
void
chaining(item, h[]) {
hash_key = hash(item);
trail = NULL, lead = h[hash_key];
while ( lead ) {
if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }
trail = lead; lead = lead->next;
}
}
p = malloc new node;
p->item = item;
p->next = NULL;
if ( trail == 0 ) h[hash_key] = p; else trail->next = p;
}