Ollama as a local alternative for GitHub Copilot

As developers, we continually look for ways to enhance our experience. Since the introduction of ChatGPT and GitHub Copilot, many developers state that they are more efficient and write better code. Sounds good, doesn’t it? In reality, it also has disadvantages. There are risks, such as unclear legal aspects of using these tools because it is a cloud service that sends your data to their servers. Today we explore something called Ollama as a potential local AI alternative to GitHub Copilot.

Overview

In this post, we will be exploring Ollama, a powerful local AI alternative to cloud-based solutions like GitHub Copilot or ChatGPT. We will walk through the steps to set up Ollama on macOS, delve into the different AI models it supports, and demonstrate how to integrate it with Visual Studio Code for enhanced code completion and suggestions.

Key sections:

Setting up Ollama on macOS: You learn how to install Ollama using two different methods – the macOS installer and Homebrew.
Exploring Ollama and the models we can use with it: Learn about the various AI models available, including phi3 and codegemma. While it’s not specifically for coding, we still explore how it can assist with coding tasks.
Integrating Ollama with Visual Studio Code for code completion: Step-by-step guidance on configuring Ollama to work with Visual Studio Code using the CodeGPT extension, including setting up autocomplete for a smoother coding experience.

By the end of this post, you’ll have a comprehensive understanding of how to set up and utilise Ollama to enhance your development workflow while maintaining data privacy and security.

1. What is Ollama?

Ollama is a powerful and versatile software designed to offer a local AI alternative to cloud-based solutions like GitHub Copilot or ChatGPT. This software provides anyone with the ability to leverage artificial intelligence for asking questions, code completion, suggestions, and other development tasks, all while keeping data secure and local.

1.1 Simple and local alternative

What’s cool here, is that Ollama bundles models, configuration and data into a single package. It basically optimises your setup and configuration details, including GPU configuration. This eliminates the need to build your own configuration, which makes it user-friendly and easy to set up. But the best thing is it’s completely local and free to use.

AI generated image from a lama writing code

1.2 Key features of Ollama

Local AI processing: Ensures all data remains on your local machine, providing enhanced security and privacy.
Integration with development tools: Seamlessly integrates with popular development environments such as Visual Studio Code.
Support for robust AI models: Offers access to high-quality models like phi3 or codegemma that can assist in various coding tasks. Since the introduction, there are many more models available on their website.
Interactive shell: Run Ollama as a shell (terminal window) to interact with models. You will able to chat with it and simulate a conversation.
REST API: It has a built-in REST API which you can run as a service and send requests to.
User-friendly interface: Coupled with Open WebUI, it provides an intuitive frontend that simplifies interaction with the AI just like ChatGPT.

2. Performance required!

Running local AI models with Ollama demands considerable computational resources, particularly when it comes to hardware components like GPUs (Graphics Processing Units) and NPUs (Neural Processing Units). These components, paired with sufficient dedicated (V)RAM (Video RAM), are crucial to ensure smooth functioning and efficient processing of AI tasks.

Graphics Processing Unit (GPU):

GPUs are designed to handle parallel processing tasks, making them highly efficient at performing complex calculations required by AI models.
They excel in accelerating deep learning workloads, such as neural network training and inference, which are fundamental processes for AI model operations.
A high-performance GPU can significantly reduce the time required for model training and inferencing, improving the responsiveness and usability of AI tools like Ollama.

Neural Processing Unit (NPU):

You might have heard something about Copilot+ PCs, designed specifically for the use of AI. This type of branding revolves around the use of a Neural Processing Unit (NPU).

NPUs are specialized processors tailored specifically for neural network computations. They provide dedicated hardware for AI workloads, often outperforming general-purpose GPUs in specific tasks.
They are optimized for executing AI operations, offering better performance per watt, which translates to more efficient processing and lower power consumption.
Using an NPU can enhance the real-time capabilities of AI applications, making tasks such as code completion and suggestions faster and more accurate.

Relation with CPU, GPU and NPU. — The NPU is a dedicated chip specialized in processing neural network computations.

2.1 Importance of dedicated (V)RAM

Dedicated (V)RAM plays a pivotal role in running local AI models:

Memory Bandwidth: (V)RAM provides the necessary memory bandwidth to handle large datasets and model parameters efficiently. AI models, especially large ones, require substantial memory to store various elements during processing.
Performance Stability: Having sufficient (V)RAM ensures that the system can manage and process data without bottlenecks, leading to more stable and predictable performance.
Handling Large Models: Higher (V)RAM capacity allows the system to load and run larger models, which are typically more accurate and capable of handling complex tasks.

In conclusion, leveraging powerful GPUs and NPUs, coupled with sufficient dedicated (V)RAM, is essential for running local AI models efficiently with Ollama. These components ensure that the system can handle complex computations, maintain performance stability, and provide an enhanced user experience while processing AI tasks locally. Keep this in mind when you start exploring Ollama.

3. Install Ollama

For this post, I will be using my MacBook Pro M1 (2020) with 16GB of RAM. During testing, this machine provided the best performance when running local models. Ollama is also compatible with Windows, Linux, and Docker. You find detailed instructions for all operating systems on their GitHub page.

Note: If you are using something else then macOS, please navigate to ‘3. Getting started with Ollama’ to continue with this tutorial.

3.1 Installing Ollama using the macOS installer

Download the Installer: Visit the official Ollama website to download the installer.
Run the Installer: Once downloaded, locate the .dmg file in your Downloads folder and double-click on the .dmg file to open it.
Install Ollama: Drag the Ollama application icon to your Applications folder. When prompted, enter your macOS administrative password to complete the installation.
Launch Ollama: Navigate to the Applications folder and double-click on the Ollama app to launch it.

3.2 Installing Ollama using Homebrew

Install Homebrew: If you haven’t already installed Homebrew, open the Terminal and enter the following command:

Bash

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install Ollama using Homebrew: Open the Terminal and run the following command:

Bash

brew install ollama

Verify the installation: To ensure Ollama was installed correctly, you can check its version by running:

Bash

ollama --version

Show the current version of Ollama in the terminal — Show current version of Ollama

By following either of these methods, you will have Ollama installed and be ready to start downloading some local AI models.

4. Getting started with Ollama

After installation, you interact with Ollama using the ollama keyword. Open up a terminal to explore some of the commands below:

Bash

# List all images pulled by Ollama.
ollama list

# Run an image locally (phi3 by Microsoft).
ollama run phi3

# Show available commands for Ollama.
ollama --help

# Get Ollama version.
ollama --version

For demonstration purposes, let’s download and run phi3. It’s a light-weight model created by Microsoft. Since it is relatively small with a focus on high-quality and reasoning dense properties, it is a good model to demonstrate Ollama. You start by simply using this command:

Bash

ollama run phi3

Using Ollama for the first time, the model doesn’t exist on your computer yet. It will be downloaded first and run after. Once finished, you now see a prompt. Enter a question straightaway to find out what phi3 thinks.

Run phi3 using Ollama and ask a question directly — Generated answer from Phi3 in Ollama

Cool! You are running your own local AI model without sending a single byte to the internet. Unlike GitHub Copilot, you use Ollama completely offline.

5. Use the built-in REST API

Ollama comes with a built-in REST API which you can send requests to. Please see the example below where we use curl to send the request but other tools such as Postman work fine too.

Bash

curl http://localhost:11434/api/generate -d '{
  "model": "phi3",           
  "prompt": "Who is the current CEO of Microsoft?",  
  "stream": false
}'

The following response is given:

JSON

{
    "model":"phi3",
    "created_at":"2024-06-02T10:52:16.725475Z",
    "response":" As of my knowledge cutoff date in early 2023, the current Chief Executive Officer (CEO) of Microsoft is Satya Nadella. He has been leading Microsoft since February 4, 2014, succeeding Steve Ballmer and following Bill Gates' transition out of day-to-day operations at the company.",
    "done":true,
    "done_reason":"stop",
    "context":[32010,11644,338,278,1857,14645,29949,310,7783,29973,32007,32001,1094,310,590,7134,5700,2696,2635,297,4688,29871,29906,29900,29906,29941,29892,278,1857,14546,28841,28288,313,4741,29949,29897,310,7783,338,12178,3761,18496,3547,29889,940,756,1063,8236,7783,1951,6339,29871,29946,29892,29871,29906,29900,29896,29946,29892,9269,292,13981,13402,1050,322,1494,6682,402,1078,29915,9558,714,310,2462,29899,517,29899,3250,6931,472,278,5001,29889,32007],
    "total_duration":3926158792,
    "load_duration":3453625,
    "prompt_eval_count":11,
    "prompt_eval_duration":525719000,
    "eval_count":73,
    "eval_duration":3394643000
}

Send question to phi3 using the built-in REST API

6. Configure Ollama as Copilot in Visual Studio Code

Once you have your model configuration up and running, let’s setup it with Visual Studio Code. We use the CodeGPT extension to connect Ollama with it.

Install extension: Open Visual Studio Code, search for CodeGPT and install it from the marketplace.

Search for CodeGPT in Visual Studio Code and install the extension

Open CodeGPT: Once installed, select the CodeGPT icon in the menu bar.

Select CodeGPT in the Visual Studio Code menu bar

Setup the provider: From the CodeGPT icon in the menu bar, select Ollama as provider.
Select model: Now search for the model you want to use. For this tutorial, we use codegemma.

Select the provider and model in the CodeGPT extension

Ask your question: Now, in the chat dialogue, ask your code question and wait for the model to do its work. Since we use private models, there are no restrictions using CodeGPT (yet).

Use CodeGPT to ask a question in the chat

Note: The extension isn’t aware of the model availability on your local machine. Make sure the model is downloaded within Ollama first, because the extension will not do this for you.

6.1 Configure autocomplete

With autocomplete, suggestions will be provided directly in your code file by the model. This feature significantly enhances productivity by reducing the amount of manual typing required. Secondly, it also minimises errors by offering accurate code snippets and suggestions based on context.

Go to Settings: Open the CodeGPT chat window once again and navigate to the Menu.
Enable Autocomplete: Select Autocomplete and choose codegemma:code as the AI model.
Set Status: Make sure to set the status of Autocomplete to Enabled.

Preferences: Set your preferences, such as the max tokens and the delay for autocomplete suggestions.

Now, when writing code, suggestions pop up automatically. You can either use or discard these suggestions, similar to the functionality provided by GitHub Copilot.

Demonstrate autocomplete with CodeGPT — Autocomplete suggestion from codegemma using CodeGPT

7. Conclusion

Today we explored Ollama, we’ve seen how this powerful local AI alternative to GitHub Copilot can enhance your development experience. Unlike cloud-based solutions, Ollama ensures that all data remains on your local machine, providing heightened security and privacy. By seamlessly integrating with development tools such as Visual Studio Code, and supporting high-quality models like phi3 and codegemma, Ollama offers a reliable and robust solution for developers.

7.1 GitHub Copilot vs. Ollama

Data privacy: While GitHub Copilot relies on cloud services which may raise data privacy concerns, Ollama processes everything locally, ensuring that no data is sent to external servers.
Cost: GitHub Copilot requires a subscription fee, whereas Ollama is completely free to use.
Internet connectivity: GitHub Copilot necessitates an internet connection to function, while Ollama operates entirely offline, making it a more reliable option in environments with limited connectivity.
Accessibility: Since Ollama is open-source, it can be customized and optimized for specific hardware and use cases, whereas customization options for GitHub Copilot are limited.

7.2 Performance and battery life concerns

Running AI models locally, as with Ollama, does require substantial processing power and can have a noticeable impact on performance and battery life. On my MacBook Pro M1 (2020) with 16GB of RAM, the performance was satisfactory, but developers with older or less powerful hardware may experience slower response times and increased battery drain. It is essential to balance the benefits of local processing with these potential drawbacks.

7.3 Use a model with less parameters

When performance is slow, you should consider using a version of the codegemma model with fewer parameters (2 billion instead of the default 7 billion). Models with fewer parameters require less computational power, thus improving response times and reducing the strain on your machine’s resources.

However, the downside of using a model with fewer parameters is that it may generate less accurate and less nuanced code suggestions, as it lacks the depth and complexity of larger models. Consequently, while enhancing performance, you might find that the suggestions are not as precise or as helpful in handling more intricate coding tasks. To download codegemma (2B), run:

Bash

ollama run codegemma:2b

8. Final thoughts

Ollama stands out as a compelling alternative to GitHub Copilot, especially for those who prioritize privacy, local control, and cost-effectiveness. By keeping your data secure and offline, and by providing a free and open-source solution, Ollama aligns with the needs of developers who seek both efficiency and autonomy in their workflow. While performance and battery life considerations should be taken into account, the advantages of using Ollama make it a worthy contender in the realm of AI-assisted development tools.

Overview

Key sections:

1. What is Ollama?

1.1 Simple and local alternative

1.2 Key features of Ollama

2. Performance required!

2.1 Importance of dedicated (V)RAM

3. Install Ollama

3.1 Installing Ollama using the macOS installer

3.2 Installing Ollama using Homebrew

4. Getting started with Ollama

5. Use the built-in REST API

6. Configure Ollama as Copilot in Visual Studio Code

6.1 Configure autocomplete

7. Conclusion

7.1 GitHub Copilot vs. Ollama

7.2 Performance and battery life concerns

7.3 Use a model with less parameters

8. Final thoughts

Share this: