A Brief Introduction to Category Theory

23 minute read


This is meant to be a relatively short and non-rigorous introduction to Category Theory. Although I will be defining and using a lot of the technical terminology that is commonly used, this talk is primarily aimed at introducing these concepts, why they exist, and where they’re useful and commonly used.

In fact, most of the results used here will be stated with very minimal proof – this is partly due to time constraints, and diving into them here would obfuscate the more high-level points I’d like to make. However, if you are interested in seeing and working through some of these types of proofs yourself, I’ve included some references that I’d recommend near the end.


Of course, if we’re going to talk about Category Theory, I should probably start by telling you whatu it is! However, instead of diving into the definitions immediately, I think it helps to have some motivation for why such a thing should even exist in the first place.

Category Theory was conceived (or invented, or discovered; whichever you prefer) in 1945 by Samuel Eilenberg and Saunders Mac Lane while working on something called the Cech cohomology, which is central in the field of algebraic topology.

One of Eilenberg and Mac Lane’s motivations was that it was (and still is!) common among mathematicians to refer to certain constructions as “natural” and “canonical” - broadly speaking, these terms are used to denote constructions that were somehow “choice-free”. For example, one might want to study vector spaces without explicitly choosing a basis vectors. In this way, one can discover properties that don’t actually depend on a particular frame of reference, and in some sense are more “universal” and intrinsic to the object being studied.

In particular, Eilinberg and Mac Lane wanted to formalize the notion of a natural transformation and things that were “naturally isomorphic.”

Interlude - What does “natural” mean?

A canonical example from mathematics is that, given a finite-dimensional vector space $V$ over a field $k$ (you can just take $k=\mathbb{R} $ here if you’d like), one can look at its dual space, denoted $V^{\ast}$, which is the space of all functions $f : V \rightarrow k$ that take vectors in $V$ as input and output scalars in the base field $k$. It turns out that $V^{*}$ is also a vector space, with the same dimension as $V$, and one result you might remember from linear algebra is that $\text{dim} V = n \Rightarrow V \cong R^n$ - that is, all vector spaces of finite dimension $n$ are indistinguishable (as vector spaces) from $\mathbb{R}^n$.

In particular, we have $\text{dim} V^{\ast} = n$, so $V^{\ast} \cong R^n \cong V$. So $V$ is isomorphic to its dual.

But $V^{\ast}$ is a vector space in its own right, so we can look at it’s dual too! This is denoted $V^{\ast\ast}$, and sometimes referred to as the “double dual” of $V$. In exactly the same way, we find that $\text{dim}V^{\ast\ast} =n$ as well, and so $V^{\ast\ast} \cong V^{\ast}$, and so we can conclude that $V \cong V^{\ast\ast}$ - that is, $V$ is isomorphic to its double dual.

So $V$ is isomorphic to $V^{\ast}$, and it is also isomorphic to $V^{\ast\ast}$. However, when one goes through the process of actually finding and constructing these bijections, one finds that the map from $V \rightarrow V^{\ast}$ truly depends on choosing a basis for $V$; on the other hand, the map from $V$ to $V^{\ast\ast}$ requires no such choice. In this way, we say that $V$ is isomorphic to its dual, but $V$ is naturally isomorphic to it’s double dual.

This idea of ‘naturality” is part of what category theory sets out to make precise.

A secondary motivation was to to abstract away properties that are really only a result of some particular structure or construction, and don’t actually have much to do with the specific kind of object you’re working with. (If you’ve programmed much, the analog here would be “refactoring” commonly used pieces of code into a more general interface.)

A few such constructions would be things like products or quotients of objects, which are ubiquitous in mathematics. With products, for example, it is possible to construct a product of sets (which have very little structure), but we can also construct a product of vector spaces (which have a very rich structure). It’s then natural to ask, what commonalities do these constructions share? Which properties of a product of vector spaces are due to them being vector spaces, and which are just a result of its construction as a product? This is another area where category theory shines; notions such as products and quotients can be described in terms of universal properties, which pay no heed to what the underlying objects really are at all.

As a result, Category Theory provides a way of describing things in ways that are general enough be applied very broadly. It is useful as both an organizational tool, and also as a general language and logical framework which has found use not only in various branches mathematics, but also in logic, computer science, physics, philosophy, linguistics, and a host of other fields.

On one hand, it serves as “simplification through abstraction” – we move from studying individual trees to studying the forest as a whole. On the other hand, it also allows us to reason about entire collections of forests, and how to transport our findings from one forest to another.

In a nutshell, categories were invented as a framework to support functors, which were in turn invented to describe natural transformations between objects, which are in turn used to define adjoints. Of course, many other useful categorical tools have been developed, adjunction is really one ofthe key notions that category theory is meant to describe support.

Interlude - What is an adjoint?

Adjunction is a slightly complicated concept, but informally speaking, functors map categories into other categories, and adjoints allow you to “approximate” one category by another. And in some cases, there is also an “inverse” to this approximation which takes you back to the original category.

For example, consider groups and sets; there are categories $\mathbf{Grp}$ and $\mathbf{Set}$ in which these objects live. A group is really just a set that is decorated with some additional structure - in this case, a binary operation that essentially behaves like modular addition. Usually groups are given to you with an a priori notion of what this operation is, but what if this weren’t the case? If you were just given a set, is there any way to “upgrade” it to a group?

The answer is yes; if $X$ is any set, there is a construction called the free group on $X$, denoted $F(X)$, which goes something like this: given a set like $A = {a, b}$, one thinks of $A$ as a formal alphabet of symbols, and makes another set of “formal inverses” of $A$, say $B = {a^{-1}, b^{-1}}$. Then, take the set $G = A \coprod B = {a,b,a^{-1},b^{-1}}$ , add an element $\varepsilon$ to denote an empty symbol, and define a group operation $\bigstar$ that is simply the concatenation of symbols together (subject to no rules or relations other than $x \bigstar \varepsilon = x$). We then stipulate that whenever something like $aa^{-1}$ occurs in a string (again, strictly as formal symbols over the alphabet $G$), there is a reduction operation that replaces this with $\varepsilon$. After quotienting out by an equivalence under these reductions, we produce something that is a well-defined group, and is somehow the minimal group that could have been made from the original set and no other information.

Then, there is something called a “forgetful functor” $\mathcal{F}$ from $\mathbf{Grp}$ into $\mathbf{Set}$ that takes a group and gives you only the underlying set, “forgetting” everything about its structure as a group. For example, if one took that group $(\mathbb{Z}_2 = {0,1}$ with the group operation $0+1 =1+0 = 1, 0+0=1+1=0$ (i.e., the $XOR$ operation), then applying $\mathcal{F}$ to $(\mathbb{Z}_2, XOR)$ just gives you a two element set ${a_0. a_1}$.

Then $\mathcal{F}$ has an adjoint $\mathcal{G}$, which creates the free group on that set, $F({a_0, a_1})$. So if you apply $\mathcal{G} \circ \mathcal{F}$ to $\mathbb{Z_2}$, you end up back in $\mathbf{Grp}$, but you don’t get back the same group you started with - indeed, the free group consists of infinitely many strings over the alphabet $a_o, a_1, a_0^{-1}, a_1^{-1}$, while $\mathbb{Z}_2$ had only two elements. So this adjunction, the free group, provided a way to reconstruct a minimal group out of the information we lost by applying $\mathcal{F}$. For this reason, you’ll often hear of adjunction as the “the most efficient” solution to a given problem, or as a form of “optimization”.

(In this case, however, there was only one group with an underlying set of two elements, so if we knew the adjunction was applied, we could deduce what the original group was!)

Definition of a Category

  • Informally, a category is a collection of objects and arrows between them.
    • Each arrow has a unique source and a target, both of which are objects, and arrows can be composed (definition later)
    • For example, there is a category Set where the objects are just normal sets, and the arrows are functions between sets. Composition of arrows is just the usual composition of functions.
  • Simply put, categories are Directed Graphs (diagrams) with certain constraints on the edges
    • Nodes are objects in the category, edges are morphisms between objects, subject to:
      • Every node has an edge to itself
      • For any sequence of paths between two nodes, there is a direct path between them.
    • Show free category construction
  • The objects are black boxes - not allowed to “look into” them!
  • Formally, a category $C$ is two pieces of data:
    • $Ob(C)$, the class of objects of $C$,
    • $Hom(C)$, the set of morphisms between objects in $Ob(C)$
      • Members of $Hom(C)$ are denoted $Hom_C(X,Y)$, where $X,Y \in Ob(C)$.
    • Along with a binary operation $\circ$ which composes morphisms:
      • $\forall X,Y,Z \in Ob(C)$ where $f: X \rightarrow Y$ and $g: Y \rightarrow Z$, there exists the composition $h$ of $f$ and $g$, denoted $h = g \circ f$, where $h: X \rightarrow Z$.
      • Using types:
    • And two rules:
      • Associativity of $\circ$, given by $f\circ(g\circ h) = (f\circ g)\circ h$
      • Existence of unique two-sided identities: $\forall X \in Ob(C), \exists id_X \in Hom(C)​$ where $id_x: X \rightarrow X​$. These satisfy
        • $\forall f: A\rightarrow X \in Hom(A,X), f\circ id_X = f$ and
        • $\forall g: X \rightarrow B \in Hom(X, B), id_X \circ g = g$.

Foundational Issues

  • Certain collections of objects are “too big” to be sets
    • Example: $(\exists R = { x : x \not \in x }) \Rightarrow (R\in R \iff R\not\in R)$ (Contradiction)
    • So one can’t have a “set of all sets”, but we’d like to study such things
      • e.g. we’d like a category Set that contains all sets
    • So we use classes (sets with restricted operations)
      • Classes that are sets: small classes
      • Classes that are not sets: proper classes
      • The collection of objects in a category form a proper class.
      • The morphisms (sometimes called homsets) are usually small classes, can be either.
  • In a sense, Category theory subsumes and generalizes set theory
    • There are mathematical camps that see it as a possible alternative to set theory for the foundations of math (see homotopy type theory)
    • In certain types of categories, all of mathematics can be formulated (see topos)


Since a category can be quite abstract objects in and of itself, it’s useful to have a few concrete categories in mind to check new definitions and theorems against. Here are a number of toy examples you can use.

Here, I’ll just cover what I think are the three most important parts of recognizing that some structure you’ve used is a category - the objects, the morphisms, and what kind of morphisms are called isomorphisms in that category. Checking the categorical axioms is pretty routine and perhaps not as enlightening, so we’ll skip that for now.

That being said, here’s how the examples are formatted:

  • : A somewhat informal name I’ve given to the category as a whole. Some names are more “official”, but these vary a lot across the literature. Some categories aren’t named at all, so I’ve supplied arbitrary names in some cases. Note that some categories are named after their object classes (), while others are actually named after their morphism classes (). Category names are usually typeset in mathbf.
    • Objects: Describes the entire class , and gives an example of what the full data of what two distinct members in might look like. I’ve tried to match the notation to the domain-specific notation one might use when working in each individual category.
    • Morphisms: Denotes what the entire class looks like, as well as what a morphism looks like.
    • Isomorphisms: Denotes what conditions one puts on a morphism , and perhaps a corresponding morphism , in order to recognize as isomorphic objects in this category. (Often denoted )

Notes: Some of these categories are constructed, and easier to demonstrate their construction blackboard. I’ve included notes to explain how this is done for a few examples.


Here, I’ll explicitly describe the full set of objects, and the full set of morphisms.

  • (The minimal category on two objects)
    • Objects: (A category made out of two arbitrary objects)
    • Morphisms:
    • Isomorphisms: None (There is no morphism from to .)
    • Objects:
    • Morphisms:
    • Isomorphisms: None (There is no morphism from to .)

Notes: Here I just took and added in a single extra morphism. The star symbol is used here just to denote the fact that this mapping is completely made up, and that arrows in categories don’t have to be “functions” in the traditional sense at all. Each arrow is just some way to associate a source object with a target object.

  • (The minimal category on objects)
    • Objects: (A category made out of arbitrary objects)
    • Morphisms:
    • Isomorphisms: None (There are no morphism from to for any )

Notes: This just shows that you can make a category out of any set of objects by only supplying identity morphisms - such a category is called discrete. Also note that since every object must have an identity morphism anyway, the objects themselves don’t really matter at all. If we wanted, we could just identify every object with its identity morphism and define categories entirely in terms of morphisms. Practically speaking, though, keeping the notion of objects around makes categories a little easier to work with.

Also, note that it didn’t matter that was finite here - this construction works for any set , yielding (the discrete category on )

    • Objects: (A “minimally interesting” extension of )
    • Morphisms:
    • Isomorphisms: None (There is no map from to , to , or to .)

Notes: The wacky symbols are again used to denote that these mappings are absolutely arbitrary.

A quick explanation of what I mean by “minimally interesting”, though: Given , note that there are really only a few things we can do with it at this point. We could add another morphism, , and we would get a category where .

The other thing we can do is add in a single object . We are forced to add an identity morphism for this to be a category, which is what the first union in the morphisms section supplies.

At this point, we just have , so we look to modify the morphisms a bit to get something slightly different. There are a few choices here, but we’ll go with one of the more interesting ones: a morphism from an existing object to the new object .

However, this won’t be a category unless it satisfies the axiom of composition, so we’re forced to add in a morphism that looks like .

Denote ​ by by by ,

and you get something that perhaps looks a little more familiar:

The Category 3

If you haven’t seen this before, don’t worry - you will! This particular kind of diagram shows up in many algebraic constructions (quotients and products, to name a few), and understanding it is the first step in getting a handle on things like universal properties.

More Standard Examples

Here are some common examples of categories that arise in various contexts, roughly in increasing order of complexity.

    • Objects: Sets
    • Morphisms: Set functions f: A \rightarrow B
    • Isomorphisms: Bijective set functions
      • is bijective iff is both
        • injective:
          • This lets you construct a left inverse
        • surjective:
          • This lets you construct a right inverse

    Notes: These conditions allow you to construct a g: B\rightarrow A such that

    • \forall b\in B, (f\circ g)(b) = b
      • (i.e.f\circ g = id_B as a function)
    • \forall a \in A, g\circ f(a) = a
      • (i.e. g\circ f = id_A as a function)

    So we refer to as the two-sided inverse and call it, which is unique when it exists. In many common cases, the objects in a category are “built” out of sets. These categories are called concrete, and the isomorphisms in these categories end up just being isomorphisms of the underlying sets, along with some other structure-preserving conditions. Thus understanding how morphisms and isomorphisms in are constructed is a key first step.

    • Objects: Partially-ordered sets ,
      • Recall that partial orders are reflexive, transitive, antisymmetric binary operations.
    • Morphisms: Set functions
    • Isomorphisms: Bijective functions such that if and , then
    • Objects: Binary Relations
      • Here is just a set and is a binary relation.
    • Morphisms: Relation-preserving set functions such that
    • Isomorphims: Bijective set functions (as in ) with an inverse such that .

    Notes: This works for any binary relation - for example, take with iff divides .

    Also notice that to get an isomorphism, all we really did was take an isomorphism on the underlying set, and required that the inverse also satisfied the conditions of the morphisms in this category. So really, it required to be bijective in , then just needed to also be morphism in . We’ll see this pattern in almost every concrete category!

    • Objects: Groups
    • Morphisms: Group homomorphisms where \forall x,y \in G, \phi(x\star y) = \phi(x) \diamond \phi(y)
    • Isomorphisms: Bijective group homomorphisms.
      • These are found by finding a that is bijective as a set function (as described in ) that is almost a homomorphism (as described above).
      • Then can be constructed as a set function, and a result from group theory shows that is also a homomorphism.

    Notes: This is the first case in a very common pattern - the isomorphisms in this category were just set bijections, but they preserve the structure of the objects. In this case, homomorphisms end up being the kind of morphisms you need to preserve the fundamental pieces of a group’s structure. They preserve the binary operation (by definition) and associativity (from function composition), but they also end up preserving inverses, identities, and information about the elements themselves like order.

    This can be summed up with a wave of the hand by saying that the isomorphisms in a category are just invertible structure-preserving morphisms.

    • Objects: Rings
    • Morphisms: Ring homomorphisms where \varphi(a\times(b+c)) = \varphi(a)\star(\varphi(b) \diamond \varphi(c))
    • Isomorphisms: Bijective ring homomorphisms
    • Objects: Abelian groups (Left -modules of )
    • Homomorphisms of abelian groups
    • Isomorphisms: Bijective group homomorphisms


    • Objects: Vector spaces over a field , say
    • Morphisms: -linear maps
      • These are maps such that , we have .
    • Isomorphisms: Invertible linear maps
      • These are maps between sets of vectors of and which are bijective functions on these sets (again, just as in ) with the restriction that they obey the linearity condition from above.

    Notes: The prescence of is just a generalization - if you haven’t seen a lot of algebra, you can just take and think of the category of all vector spaces over . Then an object in this category is just for some , and the maps are just the usual linear maps you’d see in an undergraduate course on linear algebra.

    Notice how the pattern seen in continues here - to get an isomorphism, you just look at all of the functions between the underlying sets (this is a large set!), take only the bijections, then filter it even further by taking the bijections which preserve the structure you care about.

    Here, the structure-preserving maps in vector spaces end up being linear maps. You might notice that condition of linearity looks very similar to the condition for homomorphisms - only now, the operations in both the source and target are the same.

    Informally, this is because you essentially get vector spaces by taking a group, tacking on a field , then adding a few more axioms - so the linearity condition is really just a souped-up homomorphism on the underlying group (here, vectors under addition) that takes into account the remaining axioms (namely, scalar multiplication).

    The point of this example is to show that (generally speaking) as more structure is put on the objects, more restrictions will need to be put on the morphisms to retain that structure.

  • (Propositional / “0-order” Logic)
    • Objects: Propositions
    • Morphisms: Deductions defined by or “P implies Q”
      • Also known as deductions
    • Isomorphisms: Tautologies – equivalent propositions such that

    Notes: This can be thought of as “the category of proofs”, and such a category can be derived from any deductive system. The isomorphisms here are “if and only if” statements, and they are often exploited in Mathematics to create definitions. (In other words, every Mathematical definition is an iff statement, and any proposition isomorphic to a definition in this category can be taken as an equivalent definition.)

  • (Finite state automata)

    • Objects: Finite state automata

      • is the set of states
      • is the input alphabet
      • is a transition map
      • is the initial state
      • is the set of final/accepting states
    • Morphisms: Simulations such that

        • (Transitions are preserved)
        • (Initial states are mapped to each other)
        • (Accepting states are preserved)
    • Isomorphisms: Bijective simulations that are also bijective on the underlying sets. Note that this forces to exist, and

      • , so . Similarly,
    • Objects: Graphs where
    • Morphisms: maps f: V_1 \rightarrow V_2 where (v,w) \in E_1 \Rightarrow (f(v), f(w)) \in E_2
      • i.e. maps between vertex sets that preserve incidence relations.
    • Isomorphisms: Bijective graph morphisms
    • Objects: Natural numbers
    • Morphisms: is matrix with entries in the underlying field
    • Isomorphisms: Natural numbers for which there exists a , i.e an matrix, such that
      • Note that this can only possibly happen when , so are square. But then we can always just take the identity matrix So isomorphisms are just equalities of natural numbers.

    Notes: This category is a little different - the objects don’t matter much, since they’re really just keeping tracks of matrix dimensions. Instead, the morphisms themselves are the data this category encodes.

    While this seems like an odd category to consider, the kicker is it’s possible to prove that there is a “full, faithful, surjective functor from to ” - in other words, one can move between these categories without losing any vital information. In this case, this tells us that when working with (finite dimensional) vector spaces, it doesn’t matter whether you study abstract linear maps or the matrices that represent them!

  • (pseudo-category)
    • Objects: Haskell types
    • Morphisms: Functions
    • Isomorphisms: Type for which there exist functions such that and
      • Note: From the compiler’s point of view, function equivalence is perhaps the more interesting/important thing to look at!
    • Objects: Typed lambda calculi
    • Morphisms: Translations that map types to types, terms to terms, and preserve equations ( conversions, reductions, etc)
    • Objects: Smooth manifolds ,where is a topological manifold (locally homeomorphic to ), and is a maximal smooth atlas on .
    • Morphisms: Smooth maps (where ) such that is continuous for all , and if is a chart on , then and is a chart on
    • Isomorphisms: Diffeomorphisms - morphisms with a smooth inverse .

    Notes: This is where differential geometry and a fair amount of topology takes places, as well as certain branches of analysis, partial differential equations, and physics.

    • Objects: Measurable spaces
      • (Where the are -algebras over their respective sets, and the members of are denoted the measurable sets)
      • Note that these are measurable spaces, not measure spaces - this is a space for which a measure can be assigned. The triple (X, \Sigma, \mu_X) would be a measure space.
    • Morphisms: Measurable functions such that

      (Where denotes the preimage or pullback of )

    • Isomorphisms: Measurable functions f with measurable inverses where

    Notes: This is where probability theory happens.

    Also, it turns out to actually be very tricky to formulate measure theory in a categorical way! If we try to look at the category of measure spaces, it turns out that adding the actual measure to a measurable space is in some sense “too strong” of a condition, and the resulting category lacks many useful properties.

    (In particular, it occludes the possibility of having a structure that is denoted the “categorical product”. Attempts formalize measure/probability in categorical terms is a topic of relatively current research.)

    • Objects: Topological Spaces
    • Morphisms: Continuous functions f: (X, \mathcal{T}_X) \rightarrow (Y, \mathcal{T}_Y) such that if is open in , then is open in .
      • Note that this is equivalent to
    • Isomorphisms: Homeomorphisms where has an inverse (as in ) where is also a continuous function.
    • Objects: Uniform Spaces
    • Morphisms: Uniformly continuous maps
    • Isomorphisms: Uniformic maps, i.e. uniformly continuous maps admitting a uniformly continuous inverse.
      • These can be thought of as homemorphisms, along with an added condition of uniformity on the maps and their inverses.

    Notes: A uniform space is a topological space, equipped with some notion of “-closeness”. Things like metric spaces and topological groups fit this description, so most analysis technically happens in this category.

    • Objects: Metric spaces
      • is denoted the metric on .
    • Morphisms: Contractions such that , we have .
    • Isomorphisms: Isometries
      • These are just the bijective contractions , so that

    Notes: The distance function has to satisfy a few more axioms than in

    Starting here, there are actually many choices we could make for the morphisms - for example, we could have chosen uniformly continuous functions, Lipschitz functions, or a few others. Here I’ve just chosen one of the weaker conditions - Lipschitz functions with constant 1.

    Since every metric space is a topological space, the morphisms here need to extend the morphisms on . This is in fact the case in , since contractions on metric spaces end up being continuous.

    • Objects: Normed spaces
    • Morphisms: Continuous and linear maps
      • i.e.,
    • Isomorphisms: Continuous linear bijective maps with continuous linear inverses
  • (Complete normed spaces)
    • Objects: Banach spaces
    • Morphisms: Bounded linear maps such that $$   f   _{\text{sup}}$$ is finite.
      • If , these are usually referred to as bounded linear operators
    • Isomorphisms: Bounded linear bijective maps with bounded linear inverses.

      Notes: A Banach space is a vector space that is also a (complete) metric space, so the morphisms simply reflect that these two structures “play nicely” together. This is the case, since the following three are equivalent in Banach spaces:

      • Bounded linear maps
      • Continuous linear maps
      • Uniformly continuous linear maps

      This is where much of functional analysis happens.

  • (Complete inner product spaces)
    • Objects: Hilbert Spaces
    • Morphisms: Bounded linear maps such that $$   T   _{sup}$$ is finite.
    • Isomorphisms: Bounded linear maps with bounded linear inverses.

    Notes: It might seem a bit simplistic at first to characterize something like a Hilbert space as essentially an “enriched vector space”, but this turns out to be reflected in its categorical structure - the forgetful functor from to given by forgetting the inner product is faithful (the categorical analog of “surjective” for normal functions)

    Similarly, one can think of Hilbert spaces like Banach spaces where the norm is induced by the inner product.

    • Objects: Small categories
    • Morphisms: Functors
      • Functors map:
        • objects to objects ,
        • morphisms to morphisms ,
          • Or to clean up notation a bit, morphisms that look like .
      • In words, this just sends the objects and arrows of one category to another, preserving the way arrows connect objects
    • Isomorphisms: Natural isomorphisms, i.e. functors with a dual functor such that F \circ G \cong Id_D \text{ and } G \circ F \cong Id_C