How We Rolled Out AI Coding Tools at Bitly

It was early 2025.

Standard-issue code completion in GitHub Copilot was firmly established. ChatGPT was replacing Google Search for code reference searches—and even making pretty useful code block suggestions—but we mostly chuckled at AI-assisted coding because it was either too simplistic, lacked obvious code context, or was laughably overengineered.

All this changed in the Spring of 2025, when significant upgrades to the foundation models made AI coding tools like Cursor, GitHub Copilot Agent, Windsurf, Claude Code, and Cline much more viable. Our criticisms no longer seemed to hold, and if we did not get with the program, the joke was going to be on us.

At Bitly Engineering, we realized that we either had to dive in headfirst or we’d be left behind.

In May 2025, we launched a Bitly Engineering learning initiative. The goal: to upskill our 60 engineers from GitHub Copilot autocomplete to full AI Agent-assisted coding practices as quickly as we could.

We knew we couldn’t stop the world and send everybody to training, and in fact, formal training didn’t even exist yet, so we decided to make our own.

Our approach was to develop an „AI in Engineering“ program with a mix of structure, informal education, experimentation, assessments, and the assumption that we would iterate as we went. We would need to pave the road while driving on it.

This is our story—what worked, what didn’t, and what we’ve been learning along the way.

Selecting AI tools

Our AI Tool selection criteria were straightforward: Get state-of-the-art tools into engineers‘ hands. With the best tools, we could explore capabilities, learn as a group, and iterate rapidly. We selected tools that balanced maturity and usability and had the least potential impact on our development lifecycle. The shortlist included:

GitHub C opilot (which was already enabled)

Cursor (Business tier with privacy mode enforced)

ChatGPT (Team plan for general-purpose AI chat)

Claude Code (added mid-program as it emerged as a strong coding option)

Privacy was non-negotiable. We were firm with our privacy principles, including SOC2 compliance as a hard requirement, enforced privacy mode in Cursor to ensure zero data retention, and verified that prompt data wouldn’t be used for model training.

The launch

We kicked off in May 2025 and laid out the following goals:

Develop AI fluency across the team—everyone should know how to leverage AI coding tools in their workflows

Share experiences and document best practices

Continuously evaluate impact

Create a culture of learning and experimentation

An early strength: „Over the shoulder“ sessions

One of our most effective tactics was establishing „Over the Shoulder“ (OTS) sessions—working sessions where engineers shared their screens and demonstrated real-world, real-time AI coding workflows.

We asked volunteers already using AI agents to share their work, from tool setup, prompts, and rules to how they refined results when things went wrong. For the first two weeks, we ran daily sessions across global and regional time zones. As adoption grew, we shifted to weekly and then monthly sessions.

These were authentic working sessions on real projects like feature builds, refactors, tests, and debugging, not polished demos. We tried to encourage a sense of openness and vulnerability in these sessions by having more senior engineers who were also new to the AI coding tools run the early sessions. That helped to create a safe space where people could wrestle with uncertainty in public. It reinforced an important truth: there are no experts here yet, only learners.

We tried to encourage a sense of openness and vulnerability in these sessions by having more senior engineers who were also new to the AI coding tools run the early sessions.”

– Michael Richman, VP, Engineering at Bitly

What we learned: The good

The tools that won:

Cursor became popular for its seamless IDE integration, full codebase context, and chat functionality directly within the development environment.

ChatGPT became the go-to for architectural discussions, documentation, technical writing, and as an alternative when other tools got „stuck.“

GitHub Copilot proved useful for PR reviews, catching minor issues human reviewers might miss. As the program evolved, Copilot added Agent Tasks and more advanced models, demonstrating its continued power and relevance.

Claude Code emerged later as a top-tier option for those who adopted it, excelling at reasoning through complex problems.

We also experimented with ChatGPT Codex and were enthusiastic about its potential, but the lack of custom images made it unusable for our needs. OpenAI, we’re eagerly waiting…

Where AI excelled:

Repetitive tasks and find-and-replace operations across large codebases

Generating test coverage for existing code

Working with unfamiliar areas of the codebase

Documentation and technical writing

Brainstorming and architectural discussions

Backend unit tests following established patterns

One engineer managed to refactor 275 files in a single hackweek project. Another two used AI to build the draft of an entire internal admin dashboard in a week. These weren’t minor productivity gains. They were proving transformative in some cases.

That being said, the experience definitely also highlighted some challenges.

These weren’t minor productivity gains. They were proving transformative in some cases.”

– Michael Richman, VP, Engineering at Bitly

What we learned: The bad

Not everything went smoothly, but the obstacles taught us as much as the successes.

Quality issues surfaced early. AI could generate code fast, but much of it required heavy refactoring to meet our standards. Engineers often spent as much time reviewing and fixing as they would have writing the code themselves.

Human review required. AI-generated code should not be submitted for review without the human „author“ reviewing it first.

Context limits were painful. Tools frequently lost track of long conversations, forcing restarts or repeated explanations, which was a real barrier for complex changes.

Frontend work was tougher. AI tended to over-engineer solutions, repeat code, and miss UI standards, while Backend tasks fared better.

What we learned: The uncertain

Beyond the technical issues, we encountered cultural friction.

Some people felt pressure to use AI and worried about being replaced. Others interpreted the program as „forced“ adoption rather than exploration. Some engineers worried about losing touch with their craft, especially juniors still building core skills.

For non-developers, particularly QA engineers and analysts, guidance was unclear. How should a QA engineer use AI coding tools? What about data analysts? Knowledge sharing was harder than expected. Even today, the flow of prompts and best practices still isn’t yet where we want it.

But we found a metaphor that helped: Think of AI as a capable engineer early in their career. Fast, sometimes wrong, always needs review. That stuck. It set the right expectations and shifted the conversation from „replacement“ to „collaboration.“

The challenge of measuring success

Measuring the impact of AI Coding Tools is hard. Is it based on “before and after” comparisons of lines of code written? Number of PRs? Number of commits? Team velocity?

We know there is an impact, but we’re not sure how much. To say, “30% of our code is now written by AI,“ gives a false sense of certainty. We took a mixed approach to evaluation, combining subjective and objective assessments.

Subjectively, we surveyed engineers about accuracy, speed, confidence in unfamiliar code, cognitive load, and overall satisfaction. We asked about AI sentiment before and after the program.

Objectively, we are looking at all of the factors mentioned above before and after AI adoption, though we are acknowledging this as directional rather than definitive.

Measuring productivity gains is messy—too many variables, not enough consistent data. For now, directional trends and team sentiment are our best indicators.

Six months in…

The Tools Race

Six months in, there’s still no single winner in the tools race. Different tools shine for different tasks, and many engineers use more than one tool. I personally prefer Claude Code for heavy coding and Cursor when I need it to evaluate code and provide me with nicely formatted output.

New tools have emerged, too. Codex was released, GitHub Copilot Agent Tasks were released, and the landscape continues to shift beneath our feet. To this day, we remain hesitant to lock into annual contracts for AI Coding tools for fear that a different one emerges as a clear winner.

Rule Sets

We’re working on Rule Sets, which are configuration files that teach the AI about our coding standards, architecture patterns, and common pitfalls. We’re developing rule sets for both in-house coding standards as well as for specific tasks, like evaluating Jira tickets or doing the equivalent of „manual QA“ based on code changes.

Setting expectations

We’re also being more explicit about expectations. While AI was encouraged but not mandated at the beginning, we’re evolving as an organization to require the use of AI as a skill for all engineers—and all employees.

Continuous adaptation

By August 2025, AI tools subtly shifted from being a „special project“ to simply being part of how we work, mainly because of their immediate value once people became ramped up on using them.

A good indicator was that when someone ran out of Premium Requests on a coding tool, they suddenly felt like they couldn’t do their job as effectively. Of course, we haven’t achieved universal adoption, but there is now close to 95% utilization of AI coding tools by Bitly engineers on a regular basis.

We haven’t achieved universal adoption, but there is now close to 95% utilization of AI coding tools by Bitly engineers on a regular basis.”

– Michael Richman, VP, Engineering at Bitly

We stopped treating AI coding tools as experimental and started baking them into everyday engineering workflows: planning, coding, refactoring, introspection, documentation, and code review. The goal now isn’t to „finish the rollout“ of AI coding tools; it’s to keep evolving with them. The task now is continuous adaptation.

What’s next

We chose the “learn as you go” approach, and in that vein, we are still learning.

We have cost considerations to contend with. We’re a relatively small Engineering team, but paying for 3 tools that do have material overlap is not a good long-term approach. We anticipate paring down soon.

Next, we’re focusing on deepening skills and institutionalizing what we’ve learned. That means developing formal training for those who want more hands-on guidance, refining rule sets to codify standards, and improving how we measure impact.

That also involves pushing the envelope on productivity gains:

How do we best manage multiple agent sessions at once?

How do we balance AI coding agent output with reviewability?

How can we combine multiple agents to orchestrate parallel tasks?

How much autonomy can we give our AI coding agents?

I used to scoff at the output of AI coding tools. Now I’m usually astounded (and only sometimes scoff).

For better or worse, this is the new reality of software engineering, and the tools will only get more capable from here. To me, the question isn’t whether to adopt AI coding tools—it’s how to continue to evolve with them and maximize their value.

Search

How We Rolled Out AI Coding Tools at Bitly: Six Months and Counting

Selecting AI tools

The launch

An early strength: „Over the shoulder“ sessions

We tried to encourage a sense of openness and vulnerability in these sessions by having more senior engineers who were also new to the AI coding tools run the early sessions.”