Understanding the CAP Theorem

Reading Time: 3 minutes

Distributed systems are fundamental to modern computing, but they come with inherent trade-offs. The CAP theorem, introduced by Eric Brewer, provides a framework for understanding these trade-offs by defining three key properties: Consistency, Availability, and Partition Tolerance. In this article, we will explore CAP theorem step by step, starting from its basic definition to its deeper implications in distributed system design.

1. What is the CAP Theorem?

Imagine an online banking system with multiple servers across different regions. If a customer deposits money in one branch, that update must reflect accurately across all servers to prevent double spending. However, if a network failure occurs between data centers, the system faces a dilemma:

  • Should it prioritize consistency and prevent withdrawals until all servers are synchronized?
  • Should it prioritize availability and allow transactions to proceed, even if some servers have outdated information?
  • Or should it tolerate network partitions and find a balance between the two?

The CAP theorem provides a framework to understand and resolve such trade-offs in distributed system design.

The CAP theorem states that in a distributed data system, it is impossible to achieve all three of the following simultaneously:

  • Consistency (C) – Every read receives the most recent write or an error.
  • Availability (A) – Every request receives a response (success or failure), even in the presence of failures.
  • Partition Tolerance (P) – The system continues to function despite network partitions.

In practice, distributed systems must make trade-offs between these properties, choosing to optimize for two at the expense of the third.

2. Breaking Down the CAP Properties

Consistency (C)

Consistency ensures that all nodes return the same data at any given time. If a write is performed on one node, all subsequent reads on any node should reflect that write.

Example: A traditional relational database (e.g., PostgreSQL) ensures strict consistency by using distributed transactions.

Availability (A)

Availability ensures that every request to the system receives a response, even if some nodes are unavailable.

Example: A DNS system is highly available, meaning queries always receive responses, even if some servers are down.

Partition Tolerance (P)

Partition tolerance means that the system continues to operate even when network failures occur, causing communication issues between nodes.

Example: A global system like Apache Cassandra remains operational even if some nodes lose connectivity due to network failures.

3. CAP Theorem in Practice

Since no distributed system can achieve all three properties, they typically fall into one of the following categories:

  1. CP (Consistency + Partition Tolerance) – Ensures data consistency even during network failures but may sacrifice availability.
    • Example: Apache Zookeeper prioritizes consistency and partition tolerance but may become unavailable if a partition occurs.
  2. AP (Availability + Partition Tolerance) – Ensures availability during network failures but may return stale or inconsistent data.
    • Example: DynamoDB prioritizes availability and partition tolerance, allowing eventual consistency.
  3. CA (Consistency + Availability) – Provides consistency and availability but cannot tolerate network partitions.
    • Example: A traditional relational database like MySQL in a single-node setup.

4. Implications of CAP on System Design

When designing a distributed system, understanding CAP helps in making informed trade-offs:

  • If strong consistency is needed, use a CP system.
  • If high availability is critical, an AP system is preferable.
  • If partitioning is not a concern, a CA system can work, but it is rare in distributed environments.

5. CAP Theorem in Real-World Systems

Many modern distributed databases adopt a flexible consistency model rather than strictly following CAP trade-offs:

  • Hazelcast – Prioritizes consistency with partition tolerance in its CP subsystem.
  • MongoDB – Offers tunable consistency to balance between AP and CP properties.
  • Cassandra – Favors availability and partition tolerance, achieving eventual consistency.

6. Further Reading

For a deeper understanding of CAP theorem, check out:

 

Understanding the RAFT Consensus Algorithm

Reading Time: 3 minutes

A distributed system needs a reliable mechanism for reaching consensus across multiple nodes. The RAFT algorithm is one such consensus algorithm.  It was designed to be easier to understand than Paxos (an earlier, more complex protocol for providing consensus in distributed networks,) while maintaining strong fault tolerance.

1. What is RAFT?

RAFT is widely used in distributed systems that require strong consistency. In a distributed environment, multiple nodes work together to store and process data reliably. However, ensuring that all nodes agree on the same state at any given time is a challenge. RAFT helps solve this problem by ensuring that commands are consistently applied across all nodes, preventing conflicts and maintaining data integrity. Unlike traditional databases that rely on a single primary instance for writes, distributed systems using RAFT elect a leader that coordinates operations, ensuring that updates are applied in the same order across all nodes. This makes RAFT a crucial component for building fault-tolerant, highly available distributed applications.

RAFT (Reliable, Replicated, and Fault-Tolerant) is a consensus algorithm used to manage a replicated log in distributed systems. It ensures that multiple nodes in a distributed system agree on the same sequence of commands, maintaining consistency even in the face of network failures.

(For comparison, another well-known consensus algorithm is Paxos, which is more complex but serves a similar purpose. You can read more about it here.)

Key Goals of RAFT:

  • Leader Election – Ensuring that one node acts as the leader at any given time.
  • Log Replication – Maintaining a consistent log across all nodes.
  • Safety & Fault Tolerance – Ensuring that committed entries are never lost, even if some nodes fail.

RAFT is used in distributed databases, coordination services, and in-memory data grids like Hazelcast to ensure consistency.

2. RAFT’s Three Key Roles

RAFT divides nodes into three roles:

  • Leader – The central authority that handles client requests and replicates logs.
  • Followers – Passive nodes that accept updates from the leader.
  • Candidates – Nodes that attempt to become the leader during elections.

At any given time, there is at most one leader, while all other nodes function as followers.

3. Leader Election Process

When a RAFT cluster starts, or if the leader fails, an election process takes place:

  1. A follower node times out and transitions to a candidate state.
  2. The candidate requests votes from other nodes.
  3. If a majority of nodes vote for the candidate, it becomes the leader.
  4. The leader then starts sending heartbeat messages to followers, maintaining authority.

Example: Leader Election

Node A (Follower) -> Times out -> Becomes Candidate
Node A requests votes from Nodes B & C
Nodes B & C vote for A (majority wins)
Node A becomes the Leader

This mechanism ensures a stable leadership structure even during network failures.

4. Log Replication

Once a leader is established, it manages log replication:

  1. The leader receives a client request.
  2. The request is appended to the leader’s log.
  3. The leader sends the log entry to followers.
  4. Once a majority acknowledges the entry, it is committed.
  5. Followers apply the committed log to their state machine.

What is a Log in RAFT?

A log in RAFT is an append-only structure that stores client requests (commands). The leader ensures that all followers maintain an identical sequence of logs so that they reach the same state.

Example: Log Replication

Client -> Sends "Write X=10" to Leader
Leader -> Appends "Write X=10" to log
Leader -> Sends entry to Followers
Majority acknowledges -> Entry is committed

This ensures all nodes eventually apply the same commands in order.

Log Consistency Rules in RAFT

  • Leader Enforces Order: Followers must accept logs from the leader that match their current state.
  • Log Matching Property: If two logs share the same index and term, they must contain the same command.
  • Commit Rule: A log entry is committed when a majority of nodes replicate it.

Example of Log Entries

Log Index Term Command
1 1 Write X = 5
2 2 Write Y = 20
3 2 Write X = 10
  • The Log Index ensures the correct order.
  • The Term tracks the leader’s election cycle.
  • The Command is the action that changes the system state.

Logs serve as the foundation for fault tolerance in RAFT, ensuring that even if a node fails, the system can recover and maintain a consistent state.

5. Handling Failures

RAFT ensures that failures do not cause inconsistencies by following strict rules:

  • Election Timeout: If a leader crashes, a new election starts after a timeout.
  • Log Matching Property: Followers accept only consistent log updates from the leader.
  • Commit Consistency: Entries are only committed when a majority acknowledges them.

This design prevents split-brain scenarios and guarantees system integrity.

6. RAFT in Practice

Many distributed systems, including Hazelcast, leverage RAFT for high availability and fault tolerance. Hazelcast’s CP Subsystem implements RAFT to ensure data consistency in distributed environments.

7. Further Reading

For a deeper dive into RAFT, check out:

 

Custom Command Namespace

Reading Time: 2 minutes

If you are a moderate to heavy user of the command line in Linux or OS X, you’ll eventually build up a set of custom commands and shortcuts that you’ve written that help streamline your workflow. These are known as “aliases.”
Let’s say you frequently find yourself searching your shell history for a command you used before.
To display your command history, you simply type `history`in your shell, and every command you’ve typed streams by.
This is useful, of course, but most likely you want to know about a particular command you typed, so you might use shell redirection to pipe the output to another command, in this case, `grep` to find lines in history that are interesting to you. In this example, we’re looking for commands we’ve typed that involved the word “projects”:

$ history | grep projects
9627 cd projects/ansible_dexter
9641 cd ~/projects/ansible_dexter
9675 cd projects/ansible_dexter
9684 cd ~/projects/sendgrid-test
9696 source /home/jim/projects/sns_python/venv/bin/activate
9707 source /home/jim/projects/sns_python/venv/bin/activate
9713 cd ~/projects/
9863 cp -r ~/projects/sendgrid-test/lib64 .
9865 cp -r ~/projects/sendgrid-test/share .
9962 cd projects
9972 cd projects/terraform-provider-aws
9979 cd projects/terraform-provider-aws
10003 cd projects/terraform-provider-aws

For me, this is something I do so frequently that I wanted to shorten that command to the minimum number of characters. I settled upon `hg` for “history grep”, so I created an alias as a line in my `~/.zshrc` file: `alias hg=”history | grep”`
The `~/.zshrc` (or `~/.bashrc` if you use `bash` as your shell,) is a file that gets read every time you log in and sets up your preferred environment.
Perfect! Now, if I want to search history for commands that included a particular string, I can just type `hg projects` and I will get the above output.
Well, nearly perfect…

As it happens, hg is also a command used by the Mercurial Source Code Management system. This is not a big issue for me, as I’m a loyal Git user, but it presents an interesting dilemma:


How do I protect against command name collisions in my aliases?

You see, pretty much any alias you create has the potential to conflict with a command someone else has written, but where this gets even more complicated is when these commands are referenced from a script. Scripts should, however, only call external programs using the program’s full path, e.g.:/bin/ls instead of just ls
A blog post by Brandon Rhodes proposes an interesting solution:
Prepend all of your custom commands and aliases with a comma. Interestingly, a comma is just a normal letter to the shell, no different than renaming your command jimoconnell_hg, though a comma is of course much shorter.

I found this tip via the excellent ‘/r/commandline’ subreddit.

I haven’t used this technique much at all yet, so let me know what you think!

Create a Directory and `cd` into it in one command

Reading Time: 2 minutes

(This post is meant not only to describe a little shell function, but to introduce you to a powerful, yet simple concept.)

When I’m working with a command line, I will frequently need to create a directory and then immediately change to that directory.

Let’s say I’m starting a new project.  I will cd  into my ~/projects directory and then create a new folder where I’ll be putting my files.  After creating the new folder, I’ll generally need to `cd` to that folder.

cd ~/projects
mkdir foo
cd foo

My first thought was to write a little `alias` that combined the two operations, but aliases don’t really handle parameters well, so I wrote it as a shell function.  (I use the amazing `zsh` as my terminal shell, but this approach works just the same in bash.)
In the below example, the “$1” refers to the argument passed. “$2”, “$3” and so on also work. (Be sure to quote them!)

mgo () {
mkdir "$1"
cd "$1"
}

You can paste that into your shell and it will work for the duration of your session, but of course the better way of making it available is to put it in your `~/.zshrc` or whatever your ~/.profile file is in your shell of choice. For example, in OS X, you have to first create the file:

Start up Terminal.
Type "cd ~/" to go to your home folder.
Type "touch .bash_profile" to create your new file.
Edit .bash_profile with your favorite editor (or you can just type "open -e .bash_profile" to open it in TextEdit.

To use it, you need to have that file be re-read, so you can either close and re-open your terminal, `source` the file you just edited, or just paste the function in to have it be active for the current session.

I realize that this is an insanely simple thing to worry over, but so much of the Linux/Unix philosophy is about making things just a bit more efficient and shell functions are a very approachable way to craft your own customizations for efficiency.

Another Example:
If you’re a python programmer or student, you are probably aware of venv, the virtual environment for Python that create lightweight Virtual Environment on a directory-by-directory basis. If you have, you know that each time you move to a project directory, you need to type `./venv/bin/activate` to set up the environment. A clever function I found a while back takes over the `cd` command to check for a virtual environment and activate it if it exists:
cd () {
builtin cd $1
if [[ -d ./venv/bin ]]
then
source ./venv/bin/activate
fi
if [[ -d ./bin ]]
then
source ./bin/activate
fi
}

Both of the above examples are simple, but useful. Please leave a comment below if you know of any other good examples, or can think of a good use case for a function.

More about IOS Bluetooth radios not shutting off

Reading Time: 3 minutes


The other day I was writing about a method for Proximity Presence Detection using a Raspberry Pi and the Bluetooth radio in your smartphone.

(Basically, when your phone (or other Bluetooth device) gets within range of the Raspberry Pi’s Bluetooth, the MAC Address of the phone is noted and a script sends some data to an MQTT server, so you can determine with some confidence that you are home. It’s quick, lightweight and doesn’t require any software to be installed on the phone. Once the script notes your presence, you can do things such as turn on lights, have your home automation system greet you or do pretty much anything that computers can do.)

While setting it up, I noticed something odd, though–turning off Bluetooth didn’t really turn off Bluetooth, at least on my iPhone.

Here’s what I expected to see. The number in bold, right after “confidence” is the value that shows that the radio is active within range. When it disappears from range, the number tapers down over a period of 90 seconds or so:

{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"1394","timestamp":"Sun Feb 03 2019 09:39:04 GMT-0500 (EST)"}
{"confidence":"83","name":"J’s iPhone","scan_duration_ms":"5150","timestamp":"Sun Feb 03 2019 09:39:47 GMT-0500 (EST)"}
{"confidence":"66","name":"J’s iPhone","scan_duration_ms":"5153","timestamp":"Sun Feb 03 2019 09:39:55 GMT-0500 (EST)"}
{"confidence":"50","name":"J’s iPhone","scan_duration_ms":"5148","timestamp":"Sun Feb 03 2019 09:40:04 GMT-0500 (EST)"}
{"confidence":"33","name":"J’s iPhone","scan_duration_ms":"5148","timestamp":"Sun Feb 03 2019 09:40:12 GMT-0500 (EST)"}
{"confidence":"16","name":"J’s iPhone","scan_duration_ms":"5149","timestamp":"Sun Feb 03 2019 09:40:20 GMT-0500 (EST)"}
{"confidence":"0","name":"J’s iPhone","scan_duration_ms":"5149","timestamp":"Sun Feb 03 2019 09:40:28 GMT-0500 (EST)"}

To test it, I tap the home button on my phone and swipe up, revealing the Control Center:


Oddly, when I clicked the Bluetooth icon, it dimmed, as you’d expect, but it was still appearing as 100% in my script:

{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"1218","timestamp":"Sun Feb 03 2019 10:00:26 GMT-0500 (EST)"}
{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"385","timestamp":"Sun Feb 03 2019 10:00:57 GMT-0500 (EST)"}
{"confidence":"100","name":"J’s iPhone","scan_duration_ms":"1197","timestamp":"Sun Feb 03 2019 10:01:28 GMT-0500 (EST)"}

My first suspicion was that my installation of the script was at fault or that perhaps I was reading the wrong MAC address, but I checked those and quickly started to suspect my phone.
To test this, I shut off my phone and saw the confidence percentage drop gracefully to zero.
I turned the phone back on and the script performed perfectly, detecting the radio instantly.

(“Airplane Mode,” I found, did actually turn off the radio, as you would expect.)

Next, I connected a Bluetooth speaker and started playing music, while watching my script. Hitting the Control Center icon disconnected the speaker, but didn’t shut off the radio.

This struck me as odd behavior, but not surprising, given that it’s Apple and I have little confidence in them to be upfront about something like this.

I dug around a bit this morning and came across an Apple support article titled “Use Bluetooth and Wi-Fi in Control Center with iOS 11 and later.” In that rather vaguely titled article, it says the following:

In iOS 11 and later, when you toggle the Wi-Fi or Bluetooth buttons in Control Center, your device immediately disconnects from Wi-Fi and Bluetooth accessories. Both Wi-Fi and Bluetooth will continue to be available, so you can use these important features:


AirDrop
AirPlay
Apple Pencil
Apple Watch
Continuity features, like Handoff and Instant Hotspot
Instant Hotspot
Location Services

Not in the list, but according to another source, Apple’s AirPod earphones apparently also do not disconnect when Bluetooth is deactivated from Control Center. (I have no AirPods to test this.)

Also worthy of note is that if you do it this way, it will automatically restore Bluetooth to full capacity the next day at 5:00 AM local time. Why it does this isn’t clear, but it’s probably so your Bluetooth alarm clock will still function or something.

If you go into IOS’s “Settings” app, you the Bluetooth controls behave as you would expect–turning off Bluetooth actually does shut off the radio, so I doubt this is anything beyond Apple making their own proprietary technologies and accessories work better than third-party stuff.

Anyway, I thought it was interesting enough to dig into a bit and share with you all.

The big takeaway is that using Control Center is not the same as controlling things via the Settings app and may not behave as expected.

Threat Level: Mostly Harmless ;-)

Learn Byobu while Listening to Mozart

Reading Time: < 1 minute

If you do much of anything using a Linux terminal, you really owe it to yourself to take a look at Byobu, which is, according to Wikipedia, “an enhancement for the GNU Screen terminal multiplexer or tmux used with the GNU/Linux computer operating system that can be used to provide on-screen notification or status, and tabbed multi-window management. It is intended to improve terminal sessions when users connect to remote servers.

This is one of the first things I install on any of my Linux boxes.

Watch the video for a bit–it’s a bit like watching a master chef slice and dice vegetables.

Proximity Presence Detection with the Raspberry Pi

Reading Time: 2 minutes

There’s some cool code I’ve been playing with the last few days. It uses the Raspberry Pi’s Bluetooth capabilities to determine when you come within range of the Pi with your smartphone. It’s called Presence and explains itself as “Reliable, Multi-User, Distributed BT Occupancy/Presence Detection.”

What it does is run a script as a service on the Pi and watches for a pre-defined list of MAC addresses to enter its Bluetooth radio’s range. It sends out statuses on that MAC address to an MQTT server running on the Pi or elsewhere. The messages each look something like this:


home/bedroom/E4:E4:ZZ:00:XX:XX {"confidence":"100","name":"Jim’s iPhone","scan_duration_ms":"806","timestamp":"Tue Jan 22 2019 13:20:50 GMT-0500 (EST)"}

The first part of that is the MQTT topic (with my MAC address obscured): home/bedroom/E4:E4:ZZ:00:XX:XX

Next is a bit of JSON that tells you the script’s “confidence” that the device is home, as well as the name of the device:
{"confidence":"100","name":"Jim’s iPhone"}

There are several things that make this strategy particularly cool.
The first is that once it’s in MQTT, you can use this piece of information in any number of ways, the most obvious one being a Home Automation system such as Home Assistant or Domoticz. It would be trivial to whip up a script with Python and its paho-mqttlibraries, for example.

Another reason is that it doesn’t require any special software to be running on the phone–the Bluetooth radio broadcasts everything you need by default.

What all this gives you is a way to detect when your phone (you) comes within about 30 feet of a Raspberry Pi or even the $10 Raspberry Pi Zero. This is ideal for, say, turning on some lights when you pull into the driveway, (but perhaps not for unlocking an automatic door.)
I’ve only tried it with my iPhone, but in theory, you could do this with any Bluetooth device. It should even work with something like a Tile, the key-finding gizmo, something that would be of great interest. (I just need to figure a way to retrieve the MAC address from the Tile.)

Neat stuff.
So far, I like this better than other solutions I’ve found for presence detection.

Finally, something unexpected:

When I was setting this up and testing it, I noticed something unusual. To turn off Bluetooth on my iPhone, I would open this panel and tap the Bluetooth icon:

The odd thing is that when I did that, the Presence software would still see the iPhone as being nearby. Tapping “Airplane Mode” would actually turn off the Bluetooth radio and the script would see it as gone.

This seems odd at best, nefarious at worst.

Managing DotFiles with a “bare” Repository

Reading Time: 5 minutes

Note: This is probably only of use to those using “Unix-like” systems such as OS X and Linux. It may or may not work on Windows, using Git Bash. Probably not useful on Windows, otherwise.

In this post, we’ll be covering a few things. First up is the concept of “dotfiles,” the configuration files you keep in your home directory that provide customization and configuration.

After that, we’ll take a look at some strategies for maintaining these configuration files using Git, the version control package that powers the Open Source movement.

Finally, we’ll be looking at so-called “bare” repositories as well as some clever tweaks that you’ll be able to apply not only to your dotfiles, but to anywhere you might want to leverage Git for files that may not live inside a traditional repository.

First up, the Dotfiles

On a Unix/Linux system, when you don’t want to see a file by default, you can hide it from view by starting its filename with a “dot,” or period character. This is a convenience, not a security measure–it’s a way of keeping your view of a directory relatively uncluttered.

For example, if I take a look at my home directory using ls, here’s what I see:

$ ls                                                               
AppData Documents Music Pictures Templates
bin Downloads node_modules projects Videos
Desktop ember-quickstart package-lock.json Public

But that’s not the complete list of files in that directory. There are a lot more, visible if you use the -a flag:

$ ls -a

drwxr-xr-x. 4 jim jim 4.0K Nov 9 14:00 .ansible
drwxr-xr-x. 3 jim jim 4.0K Jan 16 15:06 AppData
drwxrwxr-x. 6 jim jim 4.0K Dec 9 11:22 .atom
-rw-------. 1 jim jim 593 Nov 2 09:52 .bash_history
-rw-r--r--. 1 jim jim 18 Oct 8 09:41 .bash_logout
-rw-r--r--. 1 jim jim 141 Oct 8 09:41 .bash_profile
-rw-r--r--. 1 jim jim 389 Nov 2 00:21 .bashrc
drwxr-xr-x. 3 jim jim 4.0K Jan 6 15:06 bin
drwxrwxr-x. 3 jim jim 4.0K Jan 12 11:33 .byobu
drwx------. 23 jim jim 4.0K Jan 14 12:25 .cache
drwx------. 21 jim jim 4.0K Dec 19 12:58 .config
drwxr-xr-x. 4 jim jim 4.0K Dec 19 14:58 Desktop
drwxr-xr-x. 2 jim jim 4.0K Nov 2 00:01 Documents
drwxr-xr-x. 4 jim jim 4.0K Jan 16 15:06 Downloads
-rw-rw-r--. 1 jim jim 85 Dec 10 14:07 .gitconfig
-rw-r--r--. 1 jim jim 12 Dec 10 14:08 .gitignore
drwx------. 3 jim jim 4.0K Nov 19 08:39 .gnome
drwxr-xr-x. 3 jim jim 4.0K Dec 19 17:32 .grip
-rw-------. 1 jim jim 10K Jan 16 19:15 .ICEauthority
drwx------. 2 jim jim 4.0K Dec 15 11:01 .irssi
drwx------. 5 jim jim 4.0K Jan 6 18:36 .local
drwxr-xr-x. 6 jim jim 4.0K Nov 2 00:10 .mozilla
drwxr-xr-x. 2 jim jim 4.0K Nov 2 00:01 Music
drwxr-xr-x. 115 root root 4.0K Jan 14 12:24 node_modules
drwxr-xr-x. 6 jim jim 4.0K Dec 8 09:50 .npm
drwxr-xr-x. 6 jim jim 4.0K Dec 8 09:47 .nvm
drwxr-xr-x. 11 jim jim 4.0K Dec 27 14:06 .oh-my-zsh
-rw-r--r--. 1 root root 30K Jan 14 12:24 package-lock.json
drwxr-xr-x. 2 jim jim 4.0K Dec 9 11:22 Pictures
drwxrw----. 3 jim jim 4.0K Nov 2 00:01 .pki
drwxr-xr-x. 4 jim jim 4.0K Dec 8 10:57 projects
drwxr-xr-x. 2 jim jim 4.0K Nov 2 00:01 Public
drwx------. 2 jim jim 4.0K Jan 11 10:40 .ssh
drwxr-xr-x. 2 jim jim 4.0K Nov 2 00:01 Templates
drwxr-xr-x. 2 jim jim 4.0K Nov 2 00:01 Videos
drwxrwxr-x. 3 jim jim 4.0K Nov 2 00:20 .vim
-rw-------. 1 jim jim 21K Jan 12 11:34 .viminfo
-rw-rw-r--. 1 jim jim 4.1K Dec 13 10:33 .vimrc
drwxr-xr-x. 3 jim jim 4.0K Dec 9 16:37 .vscode
-rw-r--r--. 1 jim jim 215 Jan 6 15:02 .wget-hsts
-rw-------. 1 jim jim 24K Jan 17 13:38 .zsh_history
-rw-------. 1 jim jim 6.3K Dec 4 14:13 .zsh_history.corrupt
-rw-r--r--. 1 jim jim 3.4K Jan 12 11:34 .zshrc

There are both dot files and dot folders in that list. (Hiding a directory is done the same way, just prepend its name with a dot.)

Many of these are configuration files and directories, especially for command line utilities. The Files .zshrc and .zsh_history and the folder .oh-my-zsh for example, are the configuration files that hold all of my customization directives for ZSH, the shell I use, as well as the incredible Oh My Zsh.

To edit files, I use Vim, also with a ton of customizations found in the files .vimrc , .viminfo and the directory .vim

These files are what make your working environment your own. When you’ve gotten them just how you like, you don’t want to lose them, or have them get out of sync with other machines you may work on. It makes sense to keep them in a structured version control system.

Git to the rescue.

If you’re reading this, you’re probably familiar with Git. You may have cloned a repository from github.com or even created and shared repositories of your own.
Typically, you’ll clone a project, which will exist as a directory filled with source code that you’ll edit or compile or whatever. When you make changes, you can commit them back up to Github or BitBucket, or somewhere else.
Git’s great–you can work with highly distributed teams of collaborators or work offline. You can, (if you’ve set it up correctly,) jump back to any save point and step through every commit.
When you’re a single user, you can use it to sync your project across many machines, which makes it ideal for synchronizing your dotfiles.
What always stopped me from doing this is that I didn’t want to put my whole home directory into Git – there’s a lot of stuff I don’t want to sync and the only way I knew how to exclude it was by explicitly listing it in the “Git Ignore” file, which seemed like too much work.
What some people started doing was to create a git repo, one level down, that contained the files, along with a script that would create symbolic links from that directory to your home directory. To me, this seemed an inelegant solution.

An Elegant Solution

A while back, I decided to revisit the problem and did some Googling, eventually coming across a post on Ycombinator:

========= Start of Quoted post ==========

StreakyCobra on Feb 10, 2016
I use:

    git init --bare $HOME/.myconf
    alias config='/usr/bin/git --git-dir=$HOME/.myconf/ --work-tree=$HOME'
    config config status.showUntrackedFiles no

where my ~/.myconf directory is a git bare repository. Then any file within the home folder can be versioned with normal commands like:

    config status
    config add .vimrc
    config commit -m "Add vimrc"
    config add .config/redshift.conf
    config commit -m "Add redshift config"
    config push

And so on…

No extra tooling, no symlinks, files are tracked on a version control system, you can use different branches for different computers, you can replicate you configuration easily on new installation.

========= End of quoted post ==========

Let’s take a look at those lines that do the work. First, a bare Git repo is created in your home directory:

 git init --bare $HOME/.myconf

That’s simple enough, but what about that next line?

alias config='/usr/bin/git --git-dir=$HOME/.myconf/ --work-tree=$HOME'

This part is really, really cool–it creates a new command called config that you use just like the command git, that adds some special options that apply only to the dotfiles problem.
It’s a wrapper for Git that says that the working directory is in ~/.myconf, but the actual versioned files live in your home directory.
Read that a couple times until you completely understand the sheer awesome power of what that simple line does.
Elegant AF.

The last line uses our newly-minted command to tell git to ignore files that we haven’t explicitly added, something that makes a lot of sense, given how much junk gets stashed in $HOME:
config config status.showUntrackedFiles no

Anyway, what was supposed to be a quick post has grown a bit, so I’m going to go ahead and publish it. Let me know if you have any questions or see any errors.

Please note! Your .ssh directory is kind of special in that while you may want to back up your encryption keys, doing so on a site such as Github is a tremendously bad idea.
Just don’t.
Feel free to back up
.ssh/config though, if you’ve customized it.

Everything But The Code

Reading Time: < 1 minuteI’ve been working on a side project lately that, while isn’t anywhere near completion, is probably ready to start sharing.
It’s a course, of sorts, that I’ve been calling “Everything but the Code.” When I get a feel for how much time each section will take, I’ll begin the process of finding students for a face-to-face class.

It’s an introduction to some of the technologies I use that aren’t programming, but are all tools in a programmer’s tool box. Things like Unix and ‘Unix-like’ filesystems, Unix-like command interpreters, (AKA, “Shells,”) Git and GitHub–stuff that’s pretty much assumed that you know how to do, yet so many good programmers don’t.

So take a look at it and let me know what you think:
Everything But The Code

If you’d like to add something to the course, you can submit a pull request to This Repo.

What is a Bubble Sort?

Reading Time: 3 minutes

Introduction

In this post, we’re going to be taking a look at sorting, with a particular focus on what’s known as a “Bubble Sort.”

Sorting

In computing, sorting is one of those things you rarely think about. Applications that present users with lists usually have a button to sort search results. Your Amazon or eBay searches let you sort by price or distance, for example.

Linux has a handy commandline utility called ‘sort’. Let’s say you wanted to know about a certain subset of programs you were running and you started with:

$ ps aux |grep $USER

which dumps a whole lot of information to the screen, so you start piping that through other utilities, (often called ‘filters’ when used this way,) until you come up with something like this:

$ ps aux |grep $USER| grep bin| awk '{print $11}' | awk -F '/' '{print $NF}'|sort| uniq

(Note: This is not a useful set of filters. I just cobbed it together as an example)
It works and works well, without us having to think about how a computer performs the many, many operations needed to actually do the sort.
There are many algorithms, in many different languages and perhaps the simplest is known as a Bubble Sort.

So, what’s a Bubble Sort?

A “Bubble Sort” is an algorithm where the desired results “bubble up,” based upon the comparisons you do to pairs of elements, as you iterate through a stack of numbers. Basically, you step through your array and compare each value with the adjoining value. If it’s lower, move it to the left. Keep iterating until your array is sorted:
Bubble-sort
Gary Sims, from the excellent Gary Explains channel on Youtube explains it really well here:

(If you want to play with Gary’s code, but don’t feel like typing it in, you can grab it from my Github copy of it:

$ git clone https://github.com/jimoconnell/BubbleSortSaga.git

This is really a great way to understand what happens in a bubble sort, but of course, written in python, it’s probably only useful as an educational exercise. Python, after all, is a very high level language.
Take a look at this example, written in Assembler, a very low-level computer language that’s as close to machine code as anyone is likely to be working with:
(Example code by @stephenpaulger on GitHub)


; Runs on https://schweigi.github.io/assembler-simulator/
; https://en.wikipedia.org/wiki/Bubble_sort

	MOV C, data	; C tracks the address of end
	ADD C, 16	; of the unsorted section of data.

start:
	DEC C           ; One byte is sorted each pass
	MOV D, data	; Start of the data
bubble:
	MOV A, [D]
	CMP A, [D+1]    ; Compare two adjacent values
	JNA noswap      ; jump if the don't need swapping
	MOV B, [D+1]	; swap the two values
	MOV [D+1], A
	MOV [D], B
noswap:	
	INC D		; Move along the data one byte
	CMP D, C
	JB bubble	; loop if there's more unsorted data
	CMP C, data
	JNB start	; start again unless we've sorted all values.
	HLT


	DB "_______"	; Align the sorted data nicely.
data:
	DB "QWERTYUIOPASDFGH"  ; 16 characters to sort.

Even if you’ve never looked at Assembler before, with the comments, it’s not difficult to see what’s going on and that it’s not too different than Gary’s Python example.

This is all something that you could do with playing cards: take eight or so number cards from the deck and arrange them in random order, face up. Take the first two cards and compare them. If the card on the right is the lower of the two, swap the two cards. Move one spot to the right and repeat. Do this for the whole row. When you get to the far right, start again from the left and repeat the process, until you have no more cards to swap.

I hope that gives you a better understanding of one of those basic concepts that are too-often neglected in computer education.

Jim