tag

algorithms

#70 2017-12-2716 min
Building a Distributed Log from Scratch, Part 2: Data Replication
In part one of this series we introduced the idea of a message log, touched on why it’s useful, and discussed the storage mechanics behind it. In part two, we discuss data replication. We have our log. We know how to write data to it and read it back as well as how data is persisted. The caveat to this is, although we have a durable log, it’s a single point of failure (SPOF). If the machine where the log data is stored dies, we’re SOL. Recall that one of our three priorities with this system is high availability, so the question is how do we achieve high availability and fault tolerance?
#58 2016-12-2813 min
Fast Topic Matching
A common problem in messaging middleware is that of efficiently matching message topics with interested subscribers. For example, assume we have a set of subscribers, numbered 1 to 3: Subscriber Match Request 1 forex.usd 2 forex.* 3 stock.nasdaq.msft And we have a stream of messages, numbered 1 to N: Message Topic 1 forex.gbp 2 stock.nyse.ibm 3 stock.nyse.ge 4 forex.eur 5 forex.usd … … N stock.nasdaq.msft We are then tasked with routing messages whose topics match the respective subscriber requests, where a “*” wildcard matches any word. This is frequently a bottleneck for message-oriented middleware like ZeroMQ, RabbitMQ, ActiveMQ, TIBCO EMS, et al. Because of this, there are a number of well-known solutions to the problem. In this post, I’ll describe some of these solutions, as well as a novel one, and attempt to quantify them through benchmarking. As usual, the code is available on GitHub.
#46 2015-12-061 min
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit from Tyler Treat
#34 2015-02-1319 min
Stream Processing and Probabilistic Methods: Data at Scale
Stream processing and related abstractions have become all the rage following the rise of systems like Apache Kafka, Samza, and the Lambda architecture. Applying the idea of immutable, append-only event sourcing means we’re storing more data than ever before. However, as the cost of storage continues to decline, it’s becoming more feasible to store more data for longer periods of time. With immutability, how the data lives isn’t interesting anymore. It’s all about how it moves.
#29 2014-12-065 min
Not Invented Here
Engineers love engineering things. The reason is self-evident (and maybe self-fulfilling—why else would you be an engineer?). We like to think we’re pretty good at solving problems. Unfortunately, this mindset can, on occasion, yield undesirable consequences which might not be immediately apparent but all the while damaging. Developers are all in tune with the idea of “don’t reinvent the wheel,” but it seems to be eschewed sometimes, deliberately or otherwise. People don’t generally write their own merge sort, so why would they write their own consensus protocol? Anecdotally speaking, they do.

Building a Distributed Log from Scratch, Part 2: Data Replication

Fast Topic Matching

Probabilistic algorithms for fun and pseudorandom profit

Stream Processing and Probabilistic Methods: Data at Scale

Not Invented Here