From “Hello, World!” to Radiation Monitoring: Exploring HarperDB with a Practical Application

Posted on June 10, 2025July 2, 2025 by Jim O'Connell Posted in geiger counter, harperdb, nodeJSLeave a comment

Reading Time: 5 minutes

In my previous blog post, I talked a bit about how I stumbled upon HarperDB and played around with setting it up on my Mac Mini. Continue reading “From “Hello, World!” to Radiation Monitoring: Exploring HarperDB with a Practical Application” →

HarperDB – First Impressions

Posted on May 19, 2025May 21, 2025 by Jim O'Connell Posted in distributed systems, nodeJS1 Comment

Reading Time: 2 minutes

I saw a post on LinkedIn about a platform called HarperDB.

It’s billed as a distributed database, cache, and pub/sub engine rolled into one. That sounded interesting enough to try out.

HarperDB is a single-system platform that combines a SQL-like API, REST endpoints, and MQTT messaging on top of a document-style data model. It runs locally or in the cloud, and the setup takes less than a minute.

Continue reading “HarperDB – First Impressions” →

Exploring Bach’s BWV Catalog

Posted on March 11, 2025March 11, 2025 by Jim O'Connell Posted in classical music, JSBach, musicLeave a comment

Reading Time: 2 minutes

If you open any of the popular music streaming services, ask it to play Johann Sebastian Bach, there’s about a 100% chance that it’s going to start with the Bach unaccompanied cello Suite No. 1 performed by Yo-Yo Ma. It’s a great piece, of course, but not what I want to hear every time. If you skip forward through that piece, you’ll likely land on Glenn Gould mumbling along to his performance of The Goldberg Variations on piano.

To get around this, I started telling Siri or Alexa to play a random piece, using the Bach catalog number, such as “Alexa, play Bach’s BWV xxx” where xxx is a random number. (For example, the Goldberg Variations is BWV 988 and the six Cello Suites are BWV 1007–1012.)

The BWV (Bach-Werke-Verzeichnis, or Bach Works Catalog) numbering system, created by musicologist Wolfgang Schmieder in 1950, isn’t strictly chronological. Instead, it organizes Bach’s compositions thematically by genre—starting with cantatas, followed by choral works, keyboard music, and so forth.

It’s an imperfect system, though, as Bach pieces get discovered from time to time. If, for example, they discovered a seventh cello suite, it wouldn’t get to be BWV 989, as that number is occupied by his “Aria variata alla maniera italiana”, so it gets tacked on to the end of the catalog. (I’d prefer they go with something like BWV 988.001 to get the piece in the right spot, but whatever…)

Recently, I stumbled upon a Spotify playlist containing all known cataloged works by Johann Sebastian Bach, from BWV 1 to BWV 1128. Intrigued by this, I decided to embark on the arduous task of listening through them sequentially.

Anyway, here’s the playlist for the whole catalog. If you’re not up for a sequential run through the 136 hours of the whole lot, you can put it on random and play any of the 3,196 pieces.

Understanding the CAP Theorem

Posted on January 14, 2025February 14, 2025 by Jim O'Connell Posted in cap theorem, distributed systems, raft algorithmLeave a comment

Reading Time: 3 minutes

Distributed systems are fundamental to modern computing, but they come with inherent trade-offs. The CAP theorem, introduced by Eric Brewer, provides a framework for understanding these trade-offs by defining three key properties: Consistency, Availability, and Partition Tolerance. In this article, we will explore CAP theorem step by step, starting from its basic definition to its deeper implications in distributed system design.

1. What is the CAP Theorem?

Imagine an online banking system with multiple servers across different regions. If a customer deposits money in one branch, that update must reflect accurately across all servers to prevent double spending. However, if a network failure occurs between data centers, the system faces a dilemma:

Should it prioritize consistency and prevent withdrawals until all servers are synchronized?
Should it prioritize availability and allow transactions to proceed, even if some servers have outdated information?
Or should it tolerate network partitions and find a balance between the two?

The CAP theorem provides a framework to understand and resolve such trade-offs in distributed system design.

The CAP theorem states that in a distributed data system, it is impossible to achieve all three of the following simultaneously:

Consistency (C) – Every read receives the most recent write or an error.
Availability (A) – Every request receives a response (success or failure), even in the presence of failures.
Partition Tolerance (P) – The system continues to function despite network partitions.

In practice, distributed systems must make trade-offs between these properties, choosing to optimize for two at the expense of the third.

2. Breaking Down the CAP Properties

Consistency (C)

Consistency ensures that all nodes return the same data at any given time. If a write is performed on one node, all subsequent reads on any node should reflect that write.

Example: A traditional relational database (e.g., PostgreSQL) ensures strict consistency by using distributed transactions.

Availability (A)

Availability ensures that every request to the system receives a response, even if some nodes are unavailable.

Example: A DNS system is highly available, meaning queries always receive responses, even if some servers are down.

Partition Tolerance (P)

Partition tolerance means that the system continues to operate even when network failures occur, causing communication issues between nodes.

Example: A global system like Apache Cassandra remains operational even if some nodes lose connectivity due to network failures.

3. CAP Theorem in Practice

Since no distributed system can achieve all three properties, they typically fall into one of the following categories:

CP (Consistency + Partition Tolerance) – Ensures data consistency even during network failures but may sacrifice availability.
- Example: Apache Zookeeper prioritizes consistency and partition tolerance but may become unavailable if a partition occurs.
AP (Availability + Partition Tolerance) – Ensures availability during network failures but may return stale or inconsistent data.
- Example: DynamoDB prioritizes availability and partition tolerance, allowing eventual consistency.
CA (Consistency + Availability) – Provides consistency and availability but cannot tolerate network partitions.
- Example: A traditional relational database like MySQL in a single-node setup.

4. Implications of CAP on System Design

When designing a distributed system, understanding CAP helps in making informed trade-offs:

If strong consistency is needed, use a CP system.
If high availability is critical, an AP system is preferable.
If partitioning is not a concern, a CA system can work, but it is rare in distributed environments.

5. CAP Theorem in Real-World Systems

Many modern distributed databases adopt a flexible consistency model rather than strictly following CAP trade-offs:

Hazelcast – Prioritizes consistency with partition tolerance in its CP subsystem.
MongoDB – Offers tunable consistency to balance between AP and CP properties.
Cassandra – Favors availability and partition tolerance, achieving eventual consistency.

6. Further Reading

For a deeper understanding of CAP theorem, check out:

Understanding the RAFT Consensus Algorithm

Posted on January 10, 2025February 14, 2025 by Jim O'Connell Posted in distributed systems, raft algorithmLeave a comment

Reading Time: 3 minutes

A distributed system needs a reliable mechanism for reaching consensus across multiple nodes. The RAFT algorithm is one such consensus algorithm. It was designed to be easier to understand than Paxos (an earlier, more complex protocol for providing consensus in distributed networks,) while maintaining strong fault tolerance.

1. What is RAFT?

RAFT is widely used in distributed systems that require strong consistency. In a distributed environment, multiple nodes work together to store and process data reliably. However, ensuring that all nodes agree on the same state at any given time is a challenge. RAFT helps solve this problem by ensuring that commands are consistently applied across all nodes, preventing conflicts and maintaining data integrity. Unlike traditional databases that rely on a single primary instance for writes, distributed systems using RAFT elect a leader that coordinates operations, ensuring that updates are applied in the same order across all nodes. This makes RAFT a crucial component for building fault-tolerant, highly available distributed applications.

RAFT (Reliable, Replicated, and Fault-Tolerant) is a consensus algorithm used to manage a replicated log in distributed systems. It ensures that multiple nodes in a distributed system agree on the same sequence of commands, maintaining consistency even in the face of network failures.

(For comparison, another well-known consensus algorithm is Paxos, which is more complex but serves a similar purpose. You can read more about it here.)

Key Goals of RAFT:

Leader Election – Ensuring that one node acts as the leader at any given time.
Log Replication – Maintaining a consistent log across all nodes.
Safety & Fault Tolerance – Ensuring that committed entries are never lost, even if some nodes fail.

RAFT is used in distributed databases, coordination services, and in-memory data grids like Hazelcast to ensure consistency.

2. RAFT’s Three Key Roles

RAFT divides nodes into three roles:

Leader – The central authority that handles client requests and replicates logs.
Followers – Passive nodes that accept updates from the leader.
Candidates – Nodes that attempt to become the leader during elections.

At any given time, there is at most one leader, while all other nodes function as followers.

3. Leader Election Process

When a RAFT cluster starts, or if the leader fails, an election process takes place:

A follower node times out and transitions to a candidate state.
The candidate requests votes from other nodes.
If a majority of nodes vote for the candidate, it becomes the leader.
The leader then starts sending heartbeat messages to followers, maintaining authority.

Example: Leader Election

Node A (Follower) -> Times out -> Becomes Candidate
Node A requests votes from Nodes B & C
Nodes B & C vote for A (majority wins)
Node A becomes the Leader

This mechanism ensures a stable leadership structure even during network failures.

4. Log Replication

Once a leader is established, it manages log replication:

The leader receives a client request.
The request is appended to the leader’s log.
The leader sends the log entry to followers.
Once a majority acknowledges the entry, it is committed.
Followers apply the committed log to their state machine.

What is a Log in RAFT?

A log in RAFT is an append-only structure that stores client requests (commands). The leader ensures that all followers maintain an identical sequence of logs so that they reach the same state.

Example: Log Replication

Client -> Sends "Write X=10" to Leader
Leader -> Appends "Write X=10" to log
Leader -> Sends entry to Followers
Majority acknowledges -> Entry is committed

This ensures all nodes eventually apply the same commands in order.

Log Consistency Rules in RAFT

Leader Enforces Order: Followers must accept logs from the leader that match their current state.
Log Matching Property: If two logs share the same index and term, they must contain the same command.
Commit Rule: A log entry is committed when a majority of nodes replicate it.

Example of Log Entries

Log Index	Term	Command
1	1	Write X = 5
2	2	Write Y = 20
3	2	Write X = 10

The Log Index ensures the correct order.
The Term tracks the leader’s election cycle.
The Command is the action that changes the system state.

Logs serve as the foundation for fault tolerance in RAFT, ensuring that even if a node fails, the system can recover and maintain a consistent state.

5. Handling Failures

RAFT ensures that failures do not cause inconsistencies by following strict rules:

Election Timeout: If a leader crashes, a new election starts after a timeout.
Log Matching Property: Followers accept only consistent log updates from the leader.
Commit Consistency: Entries are only committed when a majority acknowledges them.

This design prevents split-brain scenarios and guarantees system integrity.

6. RAFT in Practice

Many distributed systems, including Hazelcast, leverage RAFT for high availability and fault tolerance. Hazelcast’s CP Subsystem implements RAFT to ensure data consistency in distributed environments.

7. Further Reading

For a deeper dive into RAFT, check out:

Enhancing SSH Security with Two-Factor Authentication (2FA)

Posted on December 22, 2024February 14, 2025 by editor Posted in UncategorizedLeave a comment

Reading Time: 2 minutes

Secure Shell (SSH) is a fundamental tool for secure remote access. However, relying solely on SSH keys can still pose a risk if a key is compromised. Adding Two-Factor Authentication (2FA) enhances security by requiring an additional verification step, reducing unauthorized access risks.

Why Combine SSH Keys with 2FA?

Enhanced Security – 2FA ensures that even if an SSH key is stolen, an attacker cannot log in without the second factor.
Reduced Risk – The requirement of a secondary authentication method, such as a time-based one-time password (TOTP), significantly decreases unauthorized access risks.
Compliance – Many security frameworks and regulations, including NIST guidelines , recommend or mandate 2FA for secure system access.

Setting Up 2FA with SSH

1. Install a 2FA Module

To enable 2FA, install a module like Google Authenticator or Duo Security on your server:

sudo apt install libpam-google-authenticator  # For Debian-based systems

2. Configure SSH to Require 2FA

Edit /etc/ssh/sshd_config to enable challenge-response authentication:

ChallengeResponseAuthentication yes
AuthenticationMethods publickey,password publickey,keyboard-interactive

Then, edit /etc/pam.d/sshd to include:

auth required pam_google_authenticator.so

Restart the SSH service for changes to take effect:

sudo systemctl restart ssh

3. Set Up 2FA for a User

Run the Google Authenticator setup:

google-authenticator

Follow the prompts to generate a QR code and store the recovery codes safely. Use an authenticator app like Google Authenticator or Authy to scan the QR code.

4. Test Your Setup

Attempt to log in using your SSH key first, followed by the TOTP code from your authentication app:

ssh user@your-server

If configured correctly, you will be prompted for your SSH key authentication and then asked for the 2FA code.

Conclusion

By implementing 2FA alongside SSH keys, you significantly strengthen remote access security. It ensures that only authorized users with both factors can gain access, reducing the risk of breaches. Additionally, using 2FA aligns with modern security best practices and compliance standards, offering peace of mind.

For further reading, check out:

Comparing SSH Key Algorithms: RSA, DSA, ECDSA, and Ed25519

Posted on December 14, 2024February 14, 2025 by editor Posted in commandline, cryptography, linux, security, ssh, UncategorizedLeave a comment

Reading Time: 2 minutes

Secure Shell (SSH) authentication relies on key pairs to provide secure access to remote systems. Various key algorithms offer different levels of security, performance, and compatibility. In this article, we’ll compare four commonly used SSH key algorithms: RSA, DSA, ECDSA, and Ed25519 .

1. RSA

RSA (Rivest-Shamir-Adleman) is one of the most widely supported and trusted SSH key algorithms. It provides strong security, especially with key lengths of 2048 bits or more.

Generating an RSA Key Pair

To create an RSA key pair, use the following command:

ssh-keygen -t rsa -b 2048 -C "your_email@example.com"

This command generates a 2048-bit RSA key pair and associates it with your email address.

Configuring the SSH Server for RSA Authentication

Modify the SSH server configuration file ( /etc/ssh/sshd_config ) with these settings:

PubkeyAuthentication yes
RSAAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

RSA is a robust and compatible choice for most SSH authentication needs.

2. DSA

DSA (Digital Signature Algorithm) was once a common choice but is now considered outdated due to its fixed 1024-bit key size, which is no longer considered sufficiently secure.

Generating a DSA Key Pair

ssh-keygen -t dsa -b 1024 -C "your_email@example.com"

Since DSA keys are limited to 1024 bits, they are less secure than other algorithms.

Configuring the SSH Server for DSA Authentication

Add the following lines to /etc/ssh/sshd_config :

PubkeyAuthentication yes
DSAAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

While DSA offers fast key generation, its security limitations make it less favorable for modern deployments.

3. ECDSA

ECDSA (Elliptic Curve Digital Signature Algorithm) is designed for better performance with smaller key sizes compared to RSA while maintaining strong security.

Generating an ECDSA Key Pair

ssh-keygen -t ecdsa -b 256 -C "your_email@example.com"

ECDSA supports key sizes of 256, 384, and 521 bits, with 256-bit ECDSA offering a good balance of security and performance.

Configuring the SSH Server for ECDSA Authentication

Modify /etc/ssh/sshd_config to include:

PubkeyAuthentication yes

While ECDSA is more efficient than RSA, its security depends on the choice of an elliptic curve, and some concerns exist regarding its implementation.

4. Ed25519

Ed25519 is a modern elliptic curve algorithm designed for high security, fast performance, and resistance to side-channel attacks.

Generating an Ed25519 Key Pair

ssh-keygen -t ed25519 -C "your_email@example.com"

This command generates a compact and efficient key pair suitable for modern SSH authentication.

Configuring the SSH Server for Ed25519 Authentication

Edit /etc/ssh/sshd_config and ensure the following is present:

PubkeyAuthentication yes

Ed25519 is increasingly favored for its speed, security, and compact key sizes.

Choosing the Right SSH Key Algorithm

When selecting an SSH key algorithm, consider the following factors:

Algorithm	Key Size	Security	Performance	Compatibility
RSA	2048+	Strong	Moderate	Widely Supported
DSA	1024	Weak	Fast	Limited Support
ECDSA	256+	Strong	High	Moderate
Ed25519	256	Very Strong	Very High	Growing Adoption

Recommendations:

For general use , RSA (2048-bit or higher) is a reliable and widely supported choice.
For better performance , Ed25519 provides strong security with fast authentication.
For constrained environments , ECDSA offers efficiency but requires careful implementation.
Avoid DSA unless compatibility reasons require it, as it is outdated and less secure.

Conclusion

SSH key authentication is a critical aspect of securing remote access. While RSA remains a robust default, Ed25519 is gaining traction due to its efficiency and security. ECDSA is suitable for performance-sensitive environments, while DSA is largely obsolete. Selecting the right algorithm depends on your specific security, compatibility, and performance requirements.

Mastering SSH: The Power of SSH Keys Over Passwords

Posted on December 12, 2024February 14, 2025 by editor Posted in commandline, unixLeave a comment

Reading Time: 2 minutes

SSH is how you log in to remote servers and devices.

Once upon a time, people used telnet, but telnet wasn’t encrypted, so it was replaced with the “Secure Shell”, or ssh.

Like telnet, it’s possible to log in using your username and password, but, if you’re still relying on passwords in this day and age, you’re leaving yourself vulnerable. SSH keys are far superior in terms of security and once you have them set up, they’re much more convenient.

In this article, we’ll talk about why SSH keys are essential, how to create them, and best practices for ensuring their security.

Why SSH Keys are Non-Negotiable

Security You Can’t Ignore: SSH keys are significantly more secure than passwords. If you’re not using them, you’re leaving your systems exposed. They leverage public-key cryptography, making brute-force attacks virtually impossible.
Effortless Access: Once configured, SSH keys enable password-less login, making your workflow more efficient and secure.
Control and Flexibility: SSH keys can be easily managed, revoked, or rotated as needed, giving you unparalleled control over access. Keys can be easily managed, revoked, or rotated as needed, providing better control over access.

Creating SSH Keys

To create SSH keys, follow these steps:

Generate a Key Pair: Use the ssh-keygen command to generate a key pair. You can specify the key length and encryption algorithm. For example, to create a 4096-bit RSA key:

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

Secure Your Private Key: Store your private key in a secure location, such as ~/.ssh/id_rsa, and set appropriate permissions:

chmod 600 ~/.ssh/id_rsa

Copy the Public Key: Copy your public key to the remote server’s ~/.ssh/authorized_keys file:

ssh-copy-id user@remote_server

Best Practices for SSH Key Security

Use Strong Passphrases: Protect your private key with a strong passphrase to prevent unauthorized access.
Regularly Rotate Keys: Rotate your SSH keys periodically to minimize the risk of compromise.
Limit Key Usage: Use separate keys for different servers and restrict key usage to specific commands or hosts.
Monitor Key Usage: Monitor SSH key usage and access logs to detect any suspicious activity.

Don’t compromise on security; follow these steps to make sure your SSH keys are secure and well-managed.

Custom Command Namespace

Posted on April 6, 2020April 6, 2020 by Jim O'Connell Posted in bash, commandline, unix, zshTagged alias, bash, zshLeave a comment

Reading Time: 2 minutes

If you are a moderate to heavy user of the command line in Linux or OS X, you’ll eventually build up a set of custom commands and shortcuts that you’ve written that help streamline your workflow. These are known as “aliases.”
Let’s say you frequently find yourself searching your shell history for a command you used before.
To display your command history, you simply type historyin your shell, and every command you’ve typed streams by.
This is useful, of course, but most likely you want to know about a particular command you typed, so you might use shell redirection to pipe the output to another command, in this case, grep to find lines in history that are interesting to you. In this example, we’re looking for commands we’ve typed that involved the word “projects”: $ history | grep projects 9627 cd projects/ansible_dexter 9641 cd ~/projects/ansible_dexter 9675 cd projects/ansible_dexter 9684 cd ~/projects/sendgrid-test 9696 source /home/jim/projects/sns_python/venv/bin/activate 9707 source /home/jim/projects/sns_python/venv/bin/activate 9713 cd ~/projects/ 9863 cp -r ~/projects/sendgrid-test/lib64 . 9865 cp -r ~/projects/sendgrid-test/share . 9962 cd projects 9972 cd projects/terraform-provider-aws 9979 cd projects/terraform-provider-aws 10003 cd projects/terraform-provider-aws For me, this is something I do so frequently that I wanted to shorten that command to the minimum number of characters. I settled upon hg for “history grep”, so I created an alias as a line in my ~/.zshrc file: alias hg="history | grep" The ~/.zshrc (or ~/.bashrc if you use bash as your shell,) is a file that gets read every time you log in and sets up your preferred environment. Perfect! Now, if I want to search history for commands that included a particular string, I can just type hg projects and I will get the above output. Well, nearly perfect…

As it happens, hg is also a command used by the Mercurial Source Code Management system. This is not a big issue for me, as I’m a loyal Git user, but it presents an interesting dilemma:

How do I protect against command name collisions in my aliases?

You see, pretty much any alias you create has the potential to conflict with a command someone else has written, but where this gets even more complicated is when these commands are referenced from a script. Scripts should, however, only call external programs using the program’s full path, e.g.:/bin/ls instead of just ls A blog post by Brandon Rhodes proposes an interesting solution: Prepend all of your custom commands and aliases with a comma. Interestingly, a comma is just a normal letter to the shell, no different than renaming your command jimoconnell_hg, though a comma is of course much shorter.

I found this tip via the excellent ‘/r/commandline’ subreddit.

I haven’t used this technique much at all yet, so let me know what you think!

Create a Directory and `cd` into it in one command

Posted on January 9, 2020June 12, 2025 by Jim O'Connell Posted in bash, commandline, unix, zshLeave a comment

Reading Time: 2 minutes

(This post is meant not only to describe a little shell function, but to introduce you to a powerful, yet simple concept.)

When I’m working with a command line, I will frequently need to create a directory and then immediately change to that directory.

Let’s say I’m starting a new project. I will cd into my ~/projects directory and then create a new folder where I’ll be putting my files. After creating the new folder, I’ll generally need to cd to that folder. cd ~/projects mkdir foo cd foo

My first thought was to write a little alias that combined the two operations, but aliases don’t really handle parameters well, so I wrote it as a shell function. (I use the amazing zsh as my terminal shell, but this approach works just the same in bash.) In the below example, the “$1” refers to the argument passed. “$2”, “$3” and so on also work. (Be sure to quote them!) mgo () { mkdir "$1"<br /> cd "$1" } You can paste that into your shell and it will work for the duration of your session, but of course the better way of making it available is to put it in your ~/.zshrc or whatever your ~/.profile file is in your shell of choice. For example, in OS X, you have to first create the file:

Start up Terminal. Type "cd ~/" to go to your home folder. Type "touch .bash_profile" to create your new file. Edit .bash_profile with your favorite editor (or you can just type "open -e .bash_profile" to open it in TextEdit.

To use it, you need to have that file be re-read, so you can either close and re-open your terminal, source the file you just edited, or just paste the function in to have it be active for the current session.

I realize that this is an insanely simple thing to worry over, but so much of the Linux/Unix philosophy is about making things just a bit more efficient and shell functions are a very approachable way to craft your own customizations for efficiency.

Another Example: If you’re a python programmer or student, you are probably aware of venv, the virtual environment for Python that create lightweight Virtual Environment on a directory-by-directory basis. If you have, you know that each time you move to a project directory, you need to type ./venv/bin/activate to set up the environment. A clever function I found a while back takes over the cd command to check for a virtual environment and activate it if it exists: cd () { builtin cd $1 if [[ -d ./venv/bin ]] then source ./venv/bin/activate fi if [[ -d ./bin ]] then source ./bin/activate fi } Both of the above examples are simple, but useful. Please leave a comment below if you know of any other good examples, or can think of a good use case for a function.