From “Hello, World!” to Radiation Monitoring: Exploring HarperDB with a Practical Application

Reading Time: 5 minutes

In my previous blog post, I talked a bit about how I stumbled upon HarperDB and played around with setting it up on my Mac Mini.

I wrote a simple “Hello, World!” application that put some values in a table. (I think it took less than an hour from start to finish.)

Emboldened by this early success, I started thinking about how I might extend this project to explore the other features that HarperDB offers. Of course, when you’try to think up an application like this without a use case, it’s difficult to think of something that would be interesting or useful. Continue reading “From “Hello, World!” to Radiation Monitoring: Exploring HarperDB with a Practical Application”

HarperDB – First Impressions

Reading Time: 2 minutes

I saw a post on LinkedIn about a platform called HarperDB.

It’s billed as a distributed database, cache, and pub/sub engine rolled into one. That sounded interesting enough to try out.

HarperDB is a single-system platform that combines a SQL-like API, REST endpoints, and MQTT messaging on top of a document-style data model. It runs locally or in the cloud, and the setup takes less than a minute.

Continue reading “HarperDB – First Impressions”

Exploring Bach’s BWV Catalog

Reading Time: 2 minutes

If you open any of the popular music streaming services, ask it to play Johann Sebastian Bach, there’s about a 100% chance that it’s going to start with  the Bach unaccompanied cello Suite No. 1 performed by Yo-Yo Ma.  It’s a great piece, of course, but not what I want to hear every time.  If you skip forward through that piece, you’ll likely land on Glenn Gould mumbling along to his performance of The Goldberg Variations on piano.

To get around this, I started telling Siri or Alexa to play a random piece, using the Bach catalog number, such as “Alexa, play Bach’s BWV xxx” where xxx is a random number.  (For example, the Goldberg Variations is BWV 988 and the six Cello Suites are BWV 1007–1012.)

The BWV (Bach-Werke-Verzeichnis, or Bach Works Catalog) numbering system, created by musicologist Wolfgang Schmieder in 1950, isn’t strictly chronological. Instead, it organizes Bach’s compositions thematically by genre—starting with cantatas, followed by choral works, keyboard music, and so forth.

It’s an imperfect system, though, as Bach pieces get discovered from time to time.  If, for example, they discovered a seventh cello suite, it wouldn’t get to be BWV 989, as that number is occupied by his “Aria variata alla maniera italiana”, so it gets tacked on to the end of the catalog. (I’d prefer they go with something like BWV 988.001 to get the piece in the right spot, but whatever…)

Recently, I stumbled upon a Spotify playlist containing all known cataloged works by Johann Sebastian Bach, from BWV 1 to BWV 1128. Intrigued by this, I decided to embark on the arduous task of listening through them sequentially.

Anyway, here’s the playlist for the whole catalog. If you’re not up for a sequential run through the 136 hours of the whole lot, you can put it on random and play any of the 3,196 pieces.

Understanding the CAP Theorem

Reading Time: 3 minutes

Distributed systems are fundamental to modern computing, but they come with inherent trade-offs. The CAP theorem, introduced by Eric Brewer, provides a framework for understanding these trade-offs by defining three key properties: Consistency, Availability, and Partition Tolerance. In this article, we will explore CAP theorem step by step, starting from its basic definition to its deeper implications in distributed system design.

1. What is the CAP Theorem?

Imagine an online banking system with multiple servers across different regions. If a customer deposits money in one branch, that update must reflect accurately across all servers to prevent double spending. However, if a network failure occurs between data centers, the system faces a dilemma:

  • Should it prioritize consistency and prevent withdrawals until all servers are synchronized?
  • Should it prioritize availability and allow transactions to proceed, even if some servers have outdated information?
  • Or should it tolerate network partitions and find a balance between the two?

The CAP theorem provides a framework to understand and resolve such trade-offs in distributed system design.

The CAP theorem states that in a distributed data system, it is impossible to achieve all three of the following simultaneously:

  • Consistency (C) – Every read receives the most recent write or an error.
  • Availability (A) – Every request receives a response (success or failure), even in the presence of failures.
  • Partition Tolerance (P) – The system continues to function despite network partitions.

In practice, distributed systems must make trade-offs between these properties, choosing to optimize for two at the expense of the third.

2. Breaking Down the CAP Properties

Consistency (C)

Consistency ensures that all nodes return the same data at any given time. If a write is performed on one node, all subsequent reads on any node should reflect that write.

Example: A traditional relational database (e.g., PostgreSQL) ensures strict consistency by using distributed transactions.

Availability (A)

Availability ensures that every request to the system receives a response, even if some nodes are unavailable.

Example: A DNS system is highly available, meaning queries always receive responses, even if some servers are down.

Partition Tolerance (P)

Partition tolerance means that the system continues to operate even when network failures occur, causing communication issues between nodes.

Example: A global system like Apache Cassandra remains operational even if some nodes lose connectivity due to network failures.

3. CAP Theorem in Practice

Since no distributed system can achieve all three properties, they typically fall into one of the following categories:

  1. CP (Consistency + Partition Tolerance) – Ensures data consistency even during network failures but may sacrifice availability.
    • Example: Apache Zookeeper prioritizes consistency and partition tolerance but may become unavailable if a partition occurs.
  2. AP (Availability + Partition Tolerance) – Ensures availability during network failures but may return stale or inconsistent data.
    • Example: DynamoDB prioritizes availability and partition tolerance, allowing eventual consistency.
  3. CA (Consistency + Availability) – Provides consistency and availability but cannot tolerate network partitions.
    • Example: A traditional relational database like MySQL in a single-node setup.

4. Implications of CAP on System Design

When designing a distributed system, understanding CAP helps in making informed trade-offs:

  • If strong consistency is needed, use a CP system.
  • If high availability is critical, an AP system is preferable.
  • If partitioning is not a concern, a CA system can work, but it is rare in distributed environments.

5. CAP Theorem in Real-World Systems

Many modern distributed databases adopt a flexible consistency model rather than strictly following CAP trade-offs:

  • Hazelcast – Prioritizes consistency with partition tolerance in its CP subsystem.
  • MongoDB – Offers tunable consistency to balance between AP and CP properties.
  • Cassandra – Favors availability and partition tolerance, achieving eventual consistency.

6. Further Reading

For a deeper understanding of CAP theorem, check out:

 

Understanding the RAFT Consensus Algorithm

Reading Time: 3 minutes

A distributed system needs a reliable mechanism for reaching consensus across multiple nodes. The RAFT algorithm is one such consensus algorithm.  It was designed to be easier to understand than Paxos (an earlier, more complex protocol for providing consensus in distributed networks,) while maintaining strong fault tolerance.

1. What is RAFT?

RAFT is widely used in distributed systems that require strong consistency. In a distributed environment, multiple nodes work together to store and process data reliably. However, ensuring that all nodes agree on the same state at any given time is a challenge. RAFT helps solve this problem by ensuring that commands are consistently applied across all nodes, preventing conflicts and maintaining data integrity. Unlike traditional databases that rely on a single primary instance for writes, distributed systems using RAFT elect a leader that coordinates operations, ensuring that updates are applied in the same order across all nodes. This makes RAFT a crucial component for building fault-tolerant, highly available distributed applications.

RAFT (Reliable, Replicated, and Fault-Tolerant) is a consensus algorithm used to manage a replicated log in distributed systems. It ensures that multiple nodes in a distributed system agree on the same sequence of commands, maintaining consistency even in the face of network failures.

(For comparison, another well-known consensus algorithm is Paxos, which is more complex but serves a similar purpose. You can read more about it here.)

Key Goals of RAFT:

  • Leader Election – Ensuring that one node acts as the leader at any given time.
  • Log Replication – Maintaining a consistent log across all nodes.
  • Safety & Fault Tolerance – Ensuring that committed entries are never lost, even if some nodes fail.

RAFT is used in distributed databases, coordination services, and in-memory data grids like Hazelcast to ensure consistency.

2. RAFT’s Three Key Roles

RAFT divides nodes into three roles:

  • Leader – The central authority that handles client requests and replicates logs.
  • Followers – Passive nodes that accept updates from the leader.
  • Candidates – Nodes that attempt to become the leader during elections.

At any given time, there is at most one leader, while all other nodes function as followers.

3. Leader Election Process

When a RAFT cluster starts, or if the leader fails, an election process takes place:

  1. A follower node times out and transitions to a candidate state.
  2. The candidate requests votes from other nodes.
  3. If a majority of nodes vote for the candidate, it becomes the leader.
  4. The leader then starts sending heartbeat messages to followers, maintaining authority.

Example: Leader Election

Node A (Follower) -> Times out -> Becomes Candidate
Node A requests votes from Nodes B & C
Nodes B & C vote for A (majority wins)
Node A becomes the Leader

This mechanism ensures a stable leadership structure even during network failures.

4. Log Replication

Once a leader is established, it manages log replication:

  1. The leader receives a client request.
  2. The request is appended to the leader’s log.
  3. The leader sends the log entry to followers.
  4. Once a majority acknowledges the entry, it is committed.
  5. Followers apply the committed log to their state machine.

What is a Log in RAFT?

A log in RAFT is an append-only structure that stores client requests (commands). The leader ensures that all followers maintain an identical sequence of logs so that they reach the same state.

Example: Log Replication

Client -> Sends "Write X=10" to Leader
Leader -> Appends "Write X=10" to log
Leader -> Sends entry to Followers
Majority acknowledges -> Entry is committed

This ensures all nodes eventually apply the same commands in order.

Log Consistency Rules in RAFT

  • Leader Enforces Order: Followers must accept logs from the leader that match their current state.
  • Log Matching Property: If two logs share the same index and term, they must contain the same command.
  • Commit Rule: A log entry is committed when a majority of nodes replicate it.

Example of Log Entries

Log Index Term Command
1 1 Write X = 5
2 2 Write Y = 20
3 2 Write X = 10
  • The Log Index ensures the correct order.
  • The Term tracks the leader’s election cycle.
  • The Command is the action that changes the system state.

Logs serve as the foundation for fault tolerance in RAFT, ensuring that even if a node fails, the system can recover and maintain a consistent state.

5. Handling Failures

RAFT ensures that failures do not cause inconsistencies by following strict rules:

  • Election Timeout: If a leader crashes, a new election starts after a timeout.
  • Log Matching Property: Followers accept only consistent log updates from the leader.
  • Commit Consistency: Entries are only committed when a majority acknowledges them.

This design prevents split-brain scenarios and guarantees system integrity.

6. RAFT in Practice

Many distributed systems, including Hazelcast, leverage RAFT for high availability and fault tolerance. Hazelcast’s CP Subsystem implements RAFT to ensure data consistency in distributed environments.

7. Further Reading

For a deeper dive into RAFT, check out:

 

Custom Command Namespace

Reading Time: 2 minutes

If you are a moderate to heavy user of the command line in Linux or OS X, you’ll eventually build up a set of custom commands and shortcuts that you’ve written that help streamline your workflow. These are known as “aliases.”
Let’s say you frequently find yourself searching your shell history for a command you used before.
To display your command history, you simply type historyin your shell, and every command you’ve typed streams by.
This is useful, of course, but most likely you want to know about a particular command you typed, so you might use shell redirection to pipe the output to another command, in this case, grep to find lines in history that are interesting to you. In this example, we’re looking for commands we’ve typed that involved the word “projects”: $ history | grep projects 9627 cd projects/ansible_dexter 9641 cd ~/projects/ansible_dexter 9675 cd projects/ansible_dexter 9684 cd ~/projects/sendgrid-test 9696 source /home/jim/projects/sns_python/venv/bin/activate 9707 source /home/jim/projects/sns_python/venv/bin/activate 9713 cd ~/projects/ 9863 cp -r ~/projects/sendgrid-test/lib64 . 9865 cp -r ~/projects/sendgrid-test/share . 9962 cd projects 9972 cd projects/terraform-provider-aws 9979 cd projects/terraform-provider-aws 10003 cd projects/terraform-provider-aws For me, this is something I do so frequently that I wanted to shorten that command to the minimum number of characters. I settled upon hg for “history grep”, so I created an alias as a line in my ~/.zshrc file: alias hg="history | grep" The ~/.zshrc (or ~/.bashrc if you use bash as your shell,) is a file that gets read every time you log in and sets up your preferred environment. Perfect! Now, if I want to search history for commands that included a particular string, I can just type hg projects and I will get the above output. Well, nearly perfect…

As it happens, hg is also a command used by the Mercurial Source Code Management system. This is not a big issue for me, as I’m a loyal Git user, but it presents an interesting dilemma:

How do I protect against command name collisions in my aliases?

You see, pretty much any alias you create has the potential to conflict with a command someone else has written, but where this gets even more complicated is when these commands are referenced from a script. Scripts should, however, only call external programs using the program’s full path, e.g.:/bin/ls instead of just ls A blog post by Brandon Rhodes proposes an interesting solution: Prepend all of your custom commands and aliases with a comma. Interestingly, a comma is just a normal letter to the shell, no different than renaming your command jimoconnell_hg, though a comma is of course much shorter.

I found this tip via the excellent ‘/r/commandline’ subreddit.

I haven’t used this technique much at all yet, so let me know what you think!

Create a Directory and `cd` into it in one command

Reading Time: 2 minutes

(This post is meant not only to describe a little shell function, but to introduce you to a powerful, yet simple concept.)

When I’m working with a command line, I will frequently need to create a directory and then immediately change to that directory.

Let’s say I’m starting a new project.  I will cd  into my ~/projects directory and then create a new folder where I’ll be putting my files.  After creating the new folder, I’ll generally need to cd to that folder. cd ~/projects mkdir foo cd foo

My first thought was to write a little alias that combined the two operations, but aliases don’t really handle parameters well, so I wrote it as a shell function.  (I use the amazing zsh as my terminal shell, but this approach works just the same in bash.) In the below example, the “$1” refers to the argument passed. “$2”, “$3” and so on also work. (Be sure to quote them!) mgo () { mkdir "$1"<br /> cd "$1" } You can paste that into your shell and it will work for the duration of your session, but of course the better way of making it available is to put it in your ~/.zshrc or whatever your ~/.profile file is in your shell of choice. For example, in OS X, you have to first create the file:

Start up Terminal. Type "cd ~/" to go to your home folder. Type "touch .bash_profile" to create your new file. Edit .bash_profile with your favorite editor (or you can just type "open -e .bash_profile" to open it in TextEdit.

To use it, you need to have that file be re-read, so you can either close and re-open your terminal, source the file you just edited, or just paste the function in to have it be active for the current session.

I realize that this is an insanely simple thing to worry over, but so much of the Linux/Unix philosophy is about making things just a bit more efficient and shell functions are a very approachable way to craft your own customizations for efficiency.

Another Example: If you’re a python programmer or student, you are probably aware of venv, the virtual environment for Python that create lightweight Virtual Environment on a directory-by-directory basis. If you have, you know that each time you move to a project directory, you need to type ./venv/bin/activate to set up the environment. A clever function I found a while back takes over the cd command to check for a virtual environment and activate it if it exists: cd () { builtin cd $1 if [[ -d ./venv/bin ]] then source ./venv/bin/activate fi if [[ -d ./bin ]] then source ./bin/activate fi } Both of the above examples are simple, but useful. Please leave a comment below if you know of any other good examples, or can think of a good use case for a function.

More about IOS Bluetooth radios not shutting off

Reading Time: 3 minutes


The other day I was writing about a method for Proximity Presence Detection using a Raspberry Pi and the Bluetooth radio in your smartphone.

(Basically, when your phone (or other Bluetooth device) gets within range of the Raspberry Pi’s Bluetooth, the MAC Address of the phone is noted and a script sends some data to an MQTT server, so you can determine with some confidence that you are home. It’s quick, lightweight and doesn’t require any software to be installed on the phone. Once the script notes your presence, you can do things such as turn on lights, have your home automation system greet you or do pretty much anything that computers can do.)

While setting it up, I noticed something odd, though–turning off Bluetooth didn’t really turn off Bluetooth, at least on my iPhone.

Here’s what I expected to see. The number in bold, right after “confidence” is the value that shows that the radio is active within range. When it disappears from range, the number tapers down over a period of 90 seconds or so:

{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"1394","timestamp":"Sun Feb 03 2019 09:39:04 GMT-0500 (EST)"}
{"confidence":"83","name":"J’s iPhone","scan_duration_ms":"5150","timestamp":"Sun Feb 03 2019 09:39:47 GMT-0500 (EST)"}
{"confidence":"66","name":"J’s iPhone","scan_duration_ms":"5153","timestamp":"Sun Feb 03 2019 09:39:55 GMT-0500 (EST)"}
{"confidence":"50","name":"J’s iPhone","scan_duration_ms":"5148","timestamp":"Sun Feb 03 2019 09:40:04 GMT-0500 (EST)"}
{"confidence":"33","name":"J’s iPhone","scan_duration_ms":"5148","timestamp":"Sun Feb 03 2019 09:40:12 GMT-0500 (EST)"}
{"confidence":"16","name":"J’s iPhone","scan_duration_ms":"5149","timestamp":"Sun Feb 03 2019 09:40:20 GMT-0500 (EST)"}
{"confidence":"0","name":"J’s iPhone","scan_duration_ms":"5149","timestamp":"Sun Feb 03 2019 09:40:28 GMT-0500 (EST)"}

To test it, I tap the home button on my phone and swipe up, revealing the Control Center:


Oddly, when I clicked the Bluetooth icon, it dimmed, as you’d expect, but it was still appearing as 100% in my script:

{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"1218","timestamp":"Sun Feb 03 2019 10:00:26 GMT-0500 (EST)"}
{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"385","timestamp":"Sun Feb 03 2019 10:00:57 GMT-0500 (EST)"}
{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"1197","timestamp":"Sun Feb 03 2019 10:01:28 GMT-0500 (EST)"}

My first suspicion was that my installation of the script was at fault or that perhaps I was reading the wrong MAC address, but I checked those and quickly started to suspect my phone.
To test this, I shut off my phone and saw the confidence percentage drop gracefully to zero.
I turned the phone back on and the script performed perfectly, detecting the radio instantly.

(“Airplane Mode,” I found, did actually turn off the radio, as you would expect.)

Next, I connected a Bluetooth speaker and started playing music, while watching my script. Hitting the Control Center icon disconnected the speaker, but didn’t shut off the radio.

This struck me as odd behavior, but not surprising, given that it’s Apple and I have little confidence in them to be upfront about something like this.

I dug around a bit this morning and came across an Apple support article titled “Use Bluetooth and Wi-Fi in Control Center with iOS 11 and later.” In that rather vaguely titled article, it says the following:

In iOS 11 and later, when you toggle the Wi-Fi or Bluetooth buttons in Control Center, your device immediately disconnects from Wi-Fi and Bluetooth accessories. Both Wi-Fi and Bluetooth will continue to be available, so you can use these important features:


AirDrop
AirPlay
Apple Pencil
Apple Watch
Continuity features, like Handoff and Instant Hotspot
Instant Hotspot
Location Services

Not in the list, but according to another source, Apple’s AirPod earphones apparently also do not disconnect when Bluetooth is deactivated from Control Center. (I have no AirPods to test this.)

Also worthy of note is that if you do it this way, it will automatically restore Bluetooth to full capacity the next day at 5:00 AM local time. Why it does this isn’t clear, but it’s probably so your Bluetooth alarm clock will still function or something.

If you go into IOS’s “Settings” app, you the Bluetooth controls behave as you would expect–turning off Bluetooth actually does shut off the radio, so I doubt this is anything beyond Apple making their own proprietary technologies and accessories work better than third-party stuff.

Anyway, I thought it was interesting enough to dig into a bit and share with you all.

The big takeaway is that using Control Center is not the same as controlling things via the Settings app and may not behave as expected.

Threat Level: Mostly Harmless ;-)

Learn Byobu while Listening to Mozart

Reading Time: < 1 minute

If you do much of anything using a Linux terminal, you really owe it to yourself to take a look at Byobu, which is, according to Wikipedia, “an enhancement for the GNU Screen terminal multiplexer or tmux used with the GNU/Linux computer operating system that can be used to provide on-screen notification or status, and tabbed multi-window management. It is intended to improve terminal sessions when users connect to remote servers.

This is one of the first things I install on any of my Linux boxes.

Watch the video for a bit–it’s a bit like watching a master chef slice and dice vegetables.

Proximity Presence Detection with the Raspberry Pi

Reading Time: 2 minutes

There’s some cool code I’ve been playing with the last few days. It uses the Raspberry Pi’s Bluetooth capabilities to determine when you come within range of the Pi with your smartphone. It’s called Presence and explains itself as “Reliable, Multi-User, Distributed BT Occupancy/Presence Detection.”

What it does is run a script as a service on the Pi and watches for a pre-defined list of MAC addresses to enter its Bluetooth radio’s range. It sends out statuses on that MAC address to an MQTT server running on the Pi or elsewhere. The messages each look something like this:


home/bedroom/E4:E4:ZZ:00:XX:XX {"confidence":"100","name":"Jim’s iPhone","scan_duration_ms":"806","timestamp":"Tue Jan 22 2019 13:20:50 GMT-0500 (EST)"}

The first part of that is the MQTT topic (with my MAC address obscured): home/bedroom/E4:E4:ZZ:00:XX:XX

Next is a bit of JSON that tells you the script’s “confidence” that the device is home, as well as the name of the device:
{"confidence":"100","name":"Jim’s iPhone"}

There are several things that make this strategy particularly cool.
The first is that once it’s in MQTT, you can use this piece of information in any number of ways, the most obvious one being a Home Automation system such as Home Assistant or Domoticz. It would be trivial to whip up a script with Python and its <a href="https://www.eclipse.org/paho/">paho-mqtt</a>libraries, for example.

Another reason is that it doesn’t require any special software to be running on the phone–the Bluetooth radio broadcasts everything you need by default.

What all this gives you is a way to detect when your phone (you) comes within about 30 feet of a Raspberry Pi or even the $10 Raspberry Pi Zero. This is ideal for, say, turning on some lights when you pull into the driveway, (but perhaps not for unlocking an automatic door.)
I’ve only tried it with my iPhone, but in theory, you could do this with any Bluetooth device. It should even work with something like a Tile, the key-finding gizmo, something that would be of great interest. (I just need to figure a way to retrieve the MAC address from the Tile.)

Neat stuff.
So far, I like this better than other solutions I’ve found for presence detection.

Finally, something unexpected:

When I was setting this up and testing it, I noticed something unusual. To turn off Bluetooth on my iPhone, I would open this panel and tap the Bluetooth icon:

The odd thing is that when I did that, the Presence software would still see the iPhone as being nearby. Tapping “Airplane Mode” would actually turn off the Bluetooth radio and the script would see it as gone.

This seems odd at best, nefarious at worst.