Hash versus hash pointer

1

A hash is the output of a hash function, which is a one-way functions, where knowing the output doesn't help figure out what the input was. They can sometimes serve as short, unique, identifiers for the data that was passed into the function, in order to reference that data. Hash pointers are just hashes that are used to reference another piece of known information.

In bitcoin's case, one place where hash pointers are used is in the the 80 byte block header. The block header is hashed to produce a block ID. You can see this ID on any block explorer. See this for example. In the upper right side of that page there is also a link for the "Previous Block", which is a hash, and it is a hash that points to the block that has that ID.

Bitcoin also uses hashes to reference funds supplied in previous transactions. So, if you want to make a transaction giving some bitcoins to someone, you have to reference (with a hash pointer) the transaction where you were given some...

0 0
2

The correct answer from a theoretical point of view is: "Use std::hash which is likely specialized to be as good as it gets, and if that is not applicable, use a good hash function rather than a fast one. The speed of the hash function does not matter so much as its quality".

The practical answer is: "Use std::hash, which is piss-poor, but nevertheless performs surprisingly well."

TL;DR
After having become intrigued, I ran about 30 hours of benchmarks over the weekend. Among other things, I tried to get an average case vs. worst case and tried to coerce std::unordered_map into worst-case behavior by giving it deliberately bad hints on bucket counts in respect of the set size inserted.

I compared poor hashes (std::hash) to well-known general purpose hashes of overall good quality (djb2, sdbm) as well as variations of these that account for the very short input length, and to hashes which are explicitly thought to be used in hashtables (murmur2 and murmur3),...

0 0
3

One point that I don't think has been addressed is that trees are much better for persistent data structures. That is, immutable structures. A standard hash table (i.e. one that uses a single array of linked lists) cannot be modified without modifying the whole table. One situation in which this is relevant is if two concurrent functions both have a copy of a hash table, and one of them changes the table (if the table is mutable, that change will be visible to the other one as well). Another situation would be something like the following:

def bar(table): # some intern stuck this line of code in table["hello"] = "world" return table["the answer"] def foo(x, y, table): z = bar(table) if "hello" in table: raise Exception("failed catastrophically!") return x + y + z important_result = foo(1, 2, { "the answer": 5, "this table": "doesn't contain hello", "so it should": "be ok" }) # catastrophic failure occurs

With a mutable table, we can't guarantee that the...

0 0
4

What is Hashing ?

Hashing is a technique used to store elements in such a way that searching over them should be very fast. It internally uses a Hash Table to store the elements.

What is Hash Table ?

Hash Table is a kind of Data Structure Used to achieve Hashing. It internally maintains a an array of Buckets. Where each bucket can store multiple elements and mapped to Hash Code. This hash code is calculated using Hash Function.

Now let’s see how Hashing works suning Hash Table,

Inserting an element in Hash Table

Whenever an element is inserted in Hash Table then first its hash code is calculated and based on its hash code it’s decided that in which bucket it will be stored.

Let’s understand this by an example,

For example, we want to store some numbers in a Hash Table i.e.

12, 23, 56, 67, 89, 43

This Hash Table will internally use 10 Buckets i.e.

Our Hash Function that will return the Hash...

0 0
5
weak pointers vs temporary hash tables [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] This discussion we've been having keeps going between temporary hash tables and weak pointers, and the implication of the latter discussion seemed to be that weak pointers are all you need for temporary hash tables. I'm not sure this is so. For temporary hash tables, special measures need to be taken when a weak key is GCed. You can't just set the key slot of the table entry to NIL. First of all, it would be nice to also nullify the value slot of an entry whose key is GCed, so that the value can be GCed. This could be done by high-level code outside the GC, though. A more significant problem is that most hash table mechanisms need to be able to distinguish never-used table entries from deleted table entries. Also, just setting the key field to NIL is no good, because NIL is a valid key. The mechanism for invalidating weak pointers to garbage must therefore be cognizant...
0 0
6

A hash table can insert and retrieve elements in O(1) (for a big-O refresher read here). A binary search tree can insert and retrieve elements in O(log(n)), which is quite a bit slower than the hash table which can do it in O(1).

A hash table is an unordered data structure

When designing a cell phone, you want to keep as much data as possible available for data storage. A hash table is an unordered data structure – which means that it does not keep its elements in any particular order. So, if you use a hash table for a cell phone address book, then you would need additional memory to sort the values because you would definitely need to display the values in alphabetical order – it is an address book after all. So, by using a hash table you have to set aside memory to sort elements that would have otherwise be used as storage space.

A binary search tree is a sorted data structure

Because a binary search tree is already sorted, there will be no need...

0 0
7

When replacing production C code with test doubles (or fakes), there are three basic approaches: function pointers, preprocessor hash defines, and link time-time substitution. All three are examples of what Michael Feathers calls a seam, or a way to change the behavior of a program without editing in that place. We use seams when we unit test software to decouple and localise the software under test. Each of these types of seams has different trade-offs for testing embedded C. Here’s what I think.

Function Pointers (run-time substitution)

+ Easy to substitute at runtime for test functions + Makes for more decoupled design – Harder for IDE to jump to definitions, etc. – Can make static code analysis mot work so well

Preprocessor hash defines (preprocessor-time substitution)

+ Good for getting legacy code under test – Needs discipline to structure the hash defines well – Can make code harder to read and understand – Definitions can be hard to locate because they can...
0 0
8

Introduction

During the development of the Concurrency Kit hash set and hash table, detailed microbenchmarks were used to measure latency and variability of various operations in relation to various open-source hash table implementations. For read-mostly workloads, the implementation was at least twice as fast than Google Dense Hash on reference machines even though it provides stronger forward-progress guarantees for concurrent workloads. For example, it is lock-free for readers and wait-free for writers in single-writer many-reader use-cases. However, a recent use-case required the hash table implementations to handle delete-heavy workloads. As many open-addressing schemes, the implementation failed miserably in this workload due to tombstone accumulation. The strength of the collision resolution mechanism would very quickly lead to the complete elimination of probe sequence termination markers (empty slots) in favor of tombstones. Google Dense Hash...

0 0
9
...
0 0
10

Here's the corrected C version:

// Like in the c++ version, we // have a regular hash table // with a custom hash function #include #include #include struct fht_node { char* key; void* data; struct fht_node * next; }; typedef struct fht { struct fht_node ** nodes; size_t size; } FHT; FHT* fht_create() { FHT *ht = malloc(sizeof(struct fht)); ht->size = 1 nodes = calloc(ht->size, sizeof(struct fht_node)); return ht; } fht_put(FHT* hashtbl, char* key, void* data) { struct fht_node *node = hashtbl->nodes[key[0]]; while(node) { if(!strcmp(node->key, key)) { node->data=data; return 0; } node=node->next; } node=malloc(sizeof(struct fht_node)); node->key= strdup(key); node->data=data; node->next=hashtbl->nodes[key[0]]; hashtbl->nodes[key[0]]=node; } void* fht_get(FHT* hashtbl, char* key) { struct fht_node *node = hashtbl->nodes[key[0]]; while (node) { if (!strcmp(node->key, key)) return node->data; node = node->next;...
0 0
11

Nowadays when discussing switching and routing, we talk about CAMs (Content Addressable Memory). Yet CAMs came into widespread use in networking gear less than 10 years ago, and the Internet is older than that. Before CAMs, switch ASICs used hash-and-match engines to route packets.

This is the first in a series on articles focussing on hash-and-match for networking applications. Why write this? I believe we rely too much on CAMs in networking today, resulting in equipment which is more expensive than it needs to be. Hash engines can fill many roles with aplomb, yet we plop down a CAM and move on. Furthermore we're in danger of defining new protocols in such a way that effectively mandates a CAM, blocking the ability to use hash tables at all. That would be unfortunate.

This first post provides definitions for the discussion to follow.

Hash tables have been used more or less forever in software. To find data in the table a function is computed over the key. The...

0 0
12

I read that Hash#key? was slower than Hash#[] and it made me sad because, shouldn’t Hash#key? generally require less work?

Besides that, there are cases where only Hash#key? will do the trick. For example, if you need to distinguish between these two cases:

Hash key is not present Hash key is present, value is nil

then you must use Hash#key.

The Benchmark

So, I wrote a little benchmark:

And here was my result

What’s up with that?!

Why??

Let’s compare the implementation of these methods:

They’re remarkably similar. They both:

Check that self has an ntbl Lookup the value for key in ntbl

But, Hash#key? does something a bit unusual: to doesn’t capture the value of key in self. Instead, it uses the return value of st_lookup to detect whether the lookup was a hit or a miss. In the case of a hit, it returns Qtrue (The C name for Ruby’s true.)

Digging deeper: st_lookup

st.c provides a general purpose...

0 0
13
...
0 0
14

A long time ago I had a very silly conversation with a developer where we both agreed that we weren't too concerned about algorithms and data structure memorization because, hey, we could look them up, right? We don't need to use that fancy-pants computer sciency stuff in our jobs because we've never needed to. Of course, that's like saying because you've never had a driver's license, a driver's license is useless. When you don't have a car, you naturally tend to look at problems and solutions without considering the options a car may provide you.

That's why I've been brushing up on my comp-sci lately. I've been working through a bunch of sorting algorithms (insertion sort, merge sort, and quick sort) to better understand their characteristics when I started to think about how I might find if an item is in a list. If it's a large list, I typically use a hash. Most of the time this is fine, but I wanted to understand better what I could do.

My general strategy would be...

0 0
15
...
0 0
16
How do I print a hash that has a hash within it?

You have a hash, within which one element is a reference to another hash:

my %hash = ( a => { 1 => "one", 2 => "two", 3 => "three", }, b => { 4 => "four", 5 => "five", 6 => "six", }, );

Solution 1

There are two simple ways to approach this. Firstly you could use embedded 'for' loops:

#!/usr/bin/perl -w use strict; my %hash = ( a => { 1 => "one", 2 => "two", 3 => "three", }, b => { 4 => "four", 5 => "five", 6 => "six", }, ); foreach my $line (keys %hash) { print "$line: \n"; foreach my $elem (keys %{$hash{$line}}) { print " $elem: " . $hash{$line}->{$elem} . "\n"; } } exit 0;

This produces the following output:

a: 1: one 2: two 3: three b: 4: four 5: five 6: six

Solution 2

A simple solution, if you only need to output the data, is to...

0 0
17
Hash Tables

Hash tables are a simple and effective method to implement dictionaries. Average time to search for an element is O(1), while worst-case time is O(n). Cormen [1990] and Knuth [1998] both contain excellent discussions on hashing.

Theory

A hash table is simply an array that is addressed via a hash function. For example, in Figure 3-1, HashTable is an array with 8 elements. Each element is a pointer to a linked list of numeric data. The hash function for this example simply divides the data key by 8, and uses the remainder as an index into the table. This yields a number from 0 to 7. Since the range of indices for HashTable is 0 to 7, we are guaranteed that the index is valid.


Figure 3-1: A Hash Table

To insert a new item in the table, we hash the key to determine which list the item goes on, and then insert the item at the beginning of the list. For example, to insert 11, we divide 11 by 8 giving a remainder of 3. Thus, 11 goes...

0 0
18
Hash functions. © Copyright 2004-2008 by Paul Hsieh

Why look at hash functions?

In a previous job, I was asked to look at hashing functions and got into a dispute with my boss about how such functions should be designed. I had advocated the used of LFSRs or CRCs that would be customized to the size of the table, as discussed in "Numerical Recipes". My boss advocated simply performing a modulo by prime operation and cited Knuth's 30 years old "the Art of Computer Programming". I showed him examples where modulo by prime had extremely bad collisions, but this did not seem to sway him at all. It seems Knuth's rewrites have come too late.

A coworker "settled" the dispute by discovering Bob Jenkin's hash function. This outperformed both of our suggestions while being based on better analysis regarding collisions. I had bookmarked the webpage and occassionally referred to it in future projects, and noticed the two additions of the "One at a time Hash"...

0 0
19
...
0 0
20
template < class Key, // unordered_set::key_type/value_type class Hash = hash, // unordered_set::hasher class Pred = equal_to, // unordered_set::key_equal class Alloc = allocator // unordered_set::allocator_type > class unordered_set; Unordered sets are containers that store unique elements in no particular order, and which allow for fast retrieval of individual elements based on their value.

In an unordered_set, the value of an element is at the same time its key, that identifies it uniquely. Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - they can be inserted and removed, though.

Internally, the elements in the unordered_set are not sorted in any particular order, but organized into buckets depending on their hash values to allow for fast access to individual elements directly by their values (with a constant average time complexity on average).

unordered_set...

0 0
21

class template

template < class Key, // unordered_map::key_type class T, // unordered_map::mapped_type class Hash = hash, // unordered_map::hasher class Pred = equal_to, // unordered_map::key_equal class Alloc = allocator< pair > // unordered_map::allocator_type > class unordered_map;

Unordered Map

Unordered maps are associative containers that store elements formed by the combination of a key value and a mapped value, and which allows for fast retrieval of individual elements based on their keys.

In an unordered_map, the key value is generally used to uniquely identify the element, while the mapped value is an object with the content associated to this key. Types of key and mapped value may differ.

Internally, the elements in the unordered_map are not sorted in any particular order with respect to either their key or mapped values, but organized into buckets depending on their hash...

0 0
22

The name malloc and calloc() are library functions that allocate memory dynamically. It means that memory is allocated during runtime(execution of the program) from heap segment.

Initialization: malloc() allocates memory block of given size (in bytes) and returns a pointer to the beginning of the block. malloc() doesn’t initialize the allocated memory. If we try to acess the content of memory block then we’ll get garbage values.

calloc() allocates the memory and also initializes the allocates memory block to zero. If we try to access the content of these blocks then we’ll get 0.

Number of arguments: Unlike malloc(), calloc() takes two arguments:
1) Number of blocks to be allocated.
2) Size of each block. Return Value: After successfull allocation in malloc() and calloc(), a pointer to the block of memory is returned otherwise NULL value is returned which indicates the failure of allocation.

For instance, If we want to allocate memory for array of 5...

0 0
23

In simple words, Singleton is a pattern and not a keyword. The Singleton pattern has several advantages over static classes. A singleton allows a class for which there is just one, persistent instance across the lifetime of an application. That means, it created a single instance and that instance (reference to that instance) can be passed as a parameter to other methods, and treated as a normal object. While a static class allows only static methods and and you cannot pass static class as parameter.

A Singleton can implement interfaces, inherit from other classes and allow inheritance. While a static class cannot inherit their instance members. So Singleton is more flexible than static classes and can maintain state.

A Singleton can be initialized lazily or asynchronously and loaded automatically by the .NET Framework CLR (common language runtime) when the program or namespace containing the class is loaded. While a static class is generally initialized...

0 0
24

Most of the time, pointer and array accesses can be treated as acting the same, the major exceptions being:

1) the sizeof operator
o sizeof(array) returns the amount of memory used by all elements in array
o sizeof(pointer) only returns the amount of memory used by the pointer variable itself

2) the & operator
o &array is an alias for &array[0] and returns the address of the first element in array
o &pointer returns the address of pointer

3) a string literal initialization of a character array
o char array[] = “abc” sets the first four elements in array to ‘a’, ‘b’, ‘c’, and ‘\0’
o char *pointer = “abc” sets pointer to the address of the “abc” string (which may be stored in read-only memory and thus unchangeable)

4) Pointer variable can be assigned a value whereas array variable cannot be.

int a[10]; int *p; p=a; /*legal*/ a=p; /*illegal*/

5) Arithmetic on pointer variable is allowed.

p++; /*Legal*/ a++;...
0 0
25

Let’s start out our tutorial and explanation of why you would need a database index by going through a very simple example. Suppose that we have a database table called Employee with three columns – Employee_Name, Employee_Age, and Employee_Address. Assume that the Employee table has thousands of rows.

Now, let’s say that we want to run a query to find all the details of any employees who are named ‘Jesus’? So, we decide to run a simple query like this:

SELECT * FROM Employee WHERE Employee_Name = 'Jesus'

What would happen without an index on the table?

Once we run that query, what exactly goes on behind the scenes to find employees who are named Jesus? Well, the database software would literally have to look at every single row in the Employee table to see if the Employee_Name for that row is ‘Jesus’. And, because we want every row with the name ‘Jesus’ inside it, we can not just stop looking once we find just one row with the name ‘Jesus’, because...

0 0