The GPU Backstory: How a Bedroom in Toronto Created the Modern Era of AI Compute

By Pete Paisley
The most consequential hands-on AI engineering work of the modern era was done by a PhD student who hadn’t yet finished his doctorate, in his parents’ bedroom in the Toronto area, on two NVIDIA GTX 580s. Here, three fundamental elements of modern AI converged for the first time, and AlexNet created the modern era of compute.
The three people behind that result were Geoffrey Hinton, one of the world’s foremost authorities on neural networks with thirty-five years of foundational research behind him, and two of his graduate students — Alex Krizhevsky and Ilya Sutskever (who would go on to co-found OpenAI) — neither of whom had yet completed their doctorate. Hinton provided the intellectual framework built over decades, Sutskever provided the conviction that the approach would work, and Krizhevsky wrote the code and ran the hardware. As Hinton later put it: “Ilya thought we should do it, Alex made it work, and I got the Nobel Prize.”
Those three converging elements were deceptively simple in hindsight: a massive labeled dataset in ImageNet, a GPU platform made programmable by NVIDIA’s CUDA, and a deep neural network architecture with the capacity to learn from both. Alex Krizhevsky didn’t hand-code rules for recognizing images. He built a system, fed it 1.2 million labeled photographs, and let it teach itself. Over five days of continuous training, spread across two gaming graphics cards with 3GB of memory each, the network learned to see. On September 30, 2012, it entered the ImageNet Large Scale Visual Recognition Challenge and won by a margin so large — 10.8 percentage points above the nearest competitor — that the entire computer vision research community was forced to confront a single uncomfortable conclusion: everything they had been doing was wrong.
The Technical Convergence: Architecture and the GTX 580
To understand why AlexNet hit so hard, you need to understand the plumbing underneath it. The architecture Krizhevsky used was a convolutional neural network — a CNN — a design with roots stretching back to Yann LeCun’s work in the late 1980s. Where previous approaches tried to process every pixel in relation to every other pixel, CNNs work the way eyes actually work: scanning small regions, detecting local patterns, building from simple features to complex ones layer by layer. Early layers learn to detect edges and corners. Middle layers learn shapes and textures. Later layers learn objects and concepts. Nobody programs these stages explicitly — the network discovers them through training. This parameter-sharing architecture dramatically reduced the computational load compared to earlier approaches while building in something previous methods lacked entirely: spatial awareness. The understanding that a feature detected in one corner of an image is the same feature wherever it appears. What made AlexNet specifically decisive was scale — five convolutional layers, three fully connected layers, 60 million parameters — running on GPU hardware fast enough to train the whole system in days rather than months. The CNN was the architecture. The GPU was what finally let it breathe.
Capital Signals and the Shift to Data Center Compute
The industry didn’t take long to understand what had happened. Within weeks of the ImageNet result, Hinton had consulted a lawyer about how to maximize the value of a company with three employees, no products, and no revenue. The answer was an auction. Four of the world’s largest technology organizations — Google, Microsoft, Baidu, and a young London-based startup called DeepMind — submitted bids for DNNresearch Inc. One evening in late 2012, as the bidding approached $44 million near midnight, Hinton suspended the auction and went to sleep. The next morning he sold to Google. Six months after a PhD student trained a neural network in his bedroom, the entire technology industry had validated what it meant. What followed was a rapid reorganization of research priorities across every major lab. GPU clusters began appearing in data centers. NVIDIA, which had built its business selling graphics cards to gamers, watched its data center revenue begin a climb that would not stop for over a decade. The GTX 580 had proved the concept. The question the industry now faced was what to build next.
The Transformer Era: From Vision to Massive Scaling
The answer to that question arrived in 2017, and it expanded the problem entirely. A team of Google researchers published a paper titled “Attention Is All You Need,” introducing the transformer architecture — a design that shifted AI’s center of gravity from images to sequences. Language. Audio. Code. Anything ordered in time. Where CNNs had given machines the ability to see, transformers gave them the ability to read, reason, and generate. The implications for compute were immediate and profound. Training a convolutional network on ImageNet required days on two consumer GPUs. Training a transformer-based large language model required weeks on thousands of specialized ones. The compute demands of the transformer era didn’t just grow — they scaled exponentially with model size, and the research community had already internalized Ilya Sutskever’s founding thesis from the bedroom in Toronto: scale is the mechanism. More data, more compute, deeper networks — better results. Every new capability unlocked by the transformer architecture required another order of magnitude of compute to achieve it. The GPU era had begun in 2012. By 2017 it had become something the original GTX 580 could not have imagined.
Semiconductor Response and the Expansion of AI Infrastructure
What the transformer made inevitable, the semiconductor industry built. Google had already deployed its first Tensor Processing Unit internally in 2015, purpose-built for the matrix multiplication operations that neural networks demand — a direct institutional response to the compute trajectory AlexNet had set in motion. As transformer-scale training became the dominant workload, every major hyperscaler followed. AWS built Trainium and Inferentia. Apple embedded Neural Engines in every iPhone. Qualcomm built AI acceleration into mobile processors reaching billions of devices. NVIDIA, meanwhile, had consciously transformed itself — introducing Tensor Cores in its Volta architecture specifically to accelerate the matrix operations at the heart of deep learning, evolving through the V100 and A100 to the H100 and beyond. A company that earned roughly $4 billion annually in 2012, almost entirely from gaming, was generating over $90 billion in data center revenue by 2024. That entire arc — from a $500 graphics card in a Toronto bedroom to the most valuable semiconductor company in history — traces to a five-day training run on two GTX 580s. The hyperscaler AI infrastructure that now consumes billions of dollars in capital expenditure quarterly, the GPU clusters that power every large language model in production today, the purpose-built silicon racing to meet demand that shows no sign of slowing — all of it is downstream of three elements converging for the first time in a bedroom in the Toronto area in 2012. AlexNet didn’t just win a competition. It defined what computing would become.
