As developers, we continually look for ways to enhance our experience. Since the introduction of ChatGPT and GitHub Copilot, many developers state that they are more efficient and write better code. Sounds good, doesn’t it? In reality, it also has disadvantages. There are risks, such as unclear legal aspects of using these tools because it is a cloud service that sends your data to their servers. Today we explore something called Ollama as a potential local AI alternative to GitHub Copilot.
Overview
In this post, we will be exploring Ollama, a powerful local AI alternative to cloud-based solutions like GitHub Copilot or ChatGPT. We will walk through the steps to set up Ollama on macOS, delve into the different AI models it supports, and demonstrate how to integrate it with Visual Studio Code for enhanced code completion and suggestions.
Key sections:
- Setting up Ollama on macOS: You learn how to install Ollama using two different methods – the macOS installer and Homebrew.
- Exploring Ollama and the models we can use with it: Learn about the various AI models available, including
phi3
andcodegemma
. While it’s not specifically for coding, we still explore how it can assist with coding tasks. - Integrating Ollama with Visual Studio Code for code completion: Step-by-step guidance on configuring Ollama to work with Visual Studio Code using the
CodeGPT
extension, including setting up autocomplete for a smoother coding experience.
By the end of this post, you’ll have a comprehensive understanding of how to set up and utilise Ollama to enhance your development workflow while maintaining data privacy and security.
1. What is Ollama?
Ollama is a powerful and versatile software designed to offer a local AI alternative to cloud-based solutions like GitHub Copilot or ChatGPT. This software provides anyone with the ability to leverage artificial intelligence for asking questions, code completion, suggestions, and other development tasks, all while keeping data secure and local.
1.1 Simple and local alternative
What’s cool here, is that Ollama bundles models, configuration and data into a single package. It basically optimises your setup and configuration details, including GPU configuration. This eliminates the need to build your own configuration, which makes it user-friendly and easy to set up. But the best thing is it’s completely local and free to use.
1.2 Key features of Ollama
- Local AI processing: Ensures all data remains on your local machine, providing enhanced security and privacy.
- Integration with development tools: Seamlessly integrates with popular development environments such as Visual Studio Code.
- Support for robust AI models: Offers access to high-quality models like
phi3
orcodegemma
that can assist in various coding tasks. Since the introduction, there are many more models available on their website. - Interactive shell: Run Ollama as a shell (terminal window) to interact with models. You will able to chat with it and simulate a conversation.
- REST API: It has a built-in REST API which you can run as a service and send requests to.
- User-friendly interface: Coupled with Open WebUI, it provides an intuitive frontend that simplifies interaction with the AI just like ChatGPT.
2. Performance required!
Running local AI models with Ollama demands considerable computational resources, particularly when it comes to hardware components like GPUs (Graphics Processing Units) and NPUs (Neural Processing Units). These components, paired with sufficient dedicated (V)RAM (Video RAM), are crucial to ensure smooth functioning and efficient processing of AI tasks.
Graphics Processing Unit (GPU):
- GPUs are designed to handle parallel processing tasks, making them highly efficient at performing complex calculations required by AI models.
- They excel in accelerating deep learning workloads, such as neural network training and inference, which are fundamental processes for AI model operations.
- A high-performance GPU can significantly reduce the time required for model training and inferencing, improving the responsiveness and usability of AI tools like Ollama.
Neural Processing Unit (NPU):
You might have heard something about Copilot+ PCs, designed specifically for the use of AI. This type of branding revolves around the use of a Neural Processing Unit (NPU).
- NPUs are specialized processors tailored specifically for neural network computations. They provide dedicated hardware for AI workloads, often outperforming general-purpose GPUs in specific tasks.
- They are optimized for executing AI operations, offering better performance per watt, which translates to more efficient processing and lower power consumption.
- Using an NPU can enhance the real-time capabilities of AI applications, making tasks such as code completion and suggestions faster and more accurate.
2.1 Importance of dedicated (V)RAM
Dedicated (V)RAM plays a pivotal role in running local AI models:
- Memory Bandwidth: (V)RAM provides the necessary memory bandwidth to handle large datasets and model parameters efficiently. AI models, especially large ones, require substantial memory to store various elements during processing.
- Performance Stability: Having sufficient (V)RAM ensures that the system can manage and process data without bottlenecks, leading to more stable and predictable performance.
- Handling Large Models: Higher (V)RAM capacity allows the system to load and run larger models, which are typically more accurate and capable of handling complex tasks.
In conclusion, leveraging powerful GPUs and NPUs, coupled with sufficient dedicated (V)RAM, is essential for running local AI models efficiently with Ollama. These components ensure that the system can handle complex computations, maintain performance stability, and provide an enhanced user experience while processing AI tasks locally. Keep this in mind when you start exploring Ollama.
3. Install Ollama
For this post, I will be using my MacBook Pro M1 (2020) with 16GB of RAM. During testing, this machine provided the best performance when running local models. Ollama is also compatible with Windows, Linux, and Docker. You find detailed instructions for all operating systems on their GitHub page.
Note: If you are using something else then macOS, please navigate to ‘3. Getting started with Ollama’ to continue with this tutorial.
3.1 Installing Ollama using the macOS installer
- Download the Installer: Visit the official Ollama website to download the installer.
- Run the Installer: Once downloaded, locate the
.dmg
file in your Downloads folder and double-click on the.dmg
file to open it. - Install Ollama: Drag the Ollama application icon to your Applications folder. When prompted, enter your macOS administrative password to complete the installation.
- Launch Ollama: Navigate to the Applications folder and double-click on the Ollama app to launch it.
3.2 Installing Ollama using Homebrew
- Install Homebrew: If you haven’t already installed Homebrew, open the Terminal and enter the following command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install Ollama using Homebrew: Open the Terminal and run the following command:
brew install ollama
- Verify the installation: To ensure Ollama was installed correctly, you can check its version by running:
ollama --version
By following either of these methods, you will have Ollama installed and be ready to start downloading some local AI models.
4. Getting started with Ollama
After installation, you interact with Ollama using the ollama
keyword. Open up a terminal to explore some of the commands below:
# List all images pulled by Ollama.
ollama list
# Run an image locally (phi3 by Microsoft).
ollama run phi3
# Show available commands for Ollama.
ollama --help
# Get Ollama version.
ollama --version
For demonstration purposes, let’s download and run phi3
. It’s a light-weight model created by Microsoft. Since it is relatively small with a focus on high-quality and reasoning dense properties, it is a good model to demonstrate Ollama. You start by simply using this command:
ollama run phi3
Using Ollama for the first time, the model doesn’t exist on your computer yet. It will be downloaded first and run after. Once finished, you now see a prompt. Enter a question straightaway to find out what phi3
thinks.
Cool! You are running your own local AI model without sending a single byte to the internet. Unlike GitHub Copilot, you use Ollama completely offline.
5. Use the built-in REST API
Ollama comes with a built-in REST API which you can send requests to. Please see the example below where we use curl
to send the request but other tools such as Postman work fine too.
curl http://localhost:11434/api/generate -d '{
"model": "phi3",
"prompt": "Who is the current CEO of Microsoft?",
"stream": false
}'
The following response is given:
{
"model":"phi3",
"created_at":"2024-06-02T10:52:16.725475Z",
"response":" As of my knowledge cutoff date in early 2023, the current Chief Executive Officer (CEO) of Microsoft is Satya Nadella. He has been leading Microsoft since February 4, 2014, succeeding Steve Ballmer and following Bill Gates' transition out of day-to-day operations at the company.",
"done":true,
"done_reason":"stop",
"context":[32010,11644,338,278,1857,14645,29949,310,7783,29973,32007,32001,1094,310,590,7134,5700,2696,2635,297,4688,29871,29906,29900,29906,29941,29892,278,1857,14546,28841,28288,313,4741,29949,29897,310,7783,338,12178,3761,18496,3547,29889,940,756,1063,8236,7783,1951,6339,29871,29946,29892,29871,29906,29900,29896,29946,29892,9269,292,13981,13402,1050,322,1494,6682,402,1078,29915,9558,714,310,2462,29899,517,29899,3250,6931,472,278,5001,29889,32007],
"total_duration":3926158792,
"load_duration":3453625,
"prompt_eval_count":11,
"prompt_eval_duration":525719000,
"eval_count":73,
"eval_duration":3394643000
}
6. Configure Ollama as Copilot in Visual Studio Code
Once you have your model configuration up and running, let’s setup it with Visual Studio Code. We use the CodeGPT
extension to connect Ollama with it.
- Install extension: Open Visual Studio Code, search for
CodeGPT
and install it from the marketplace.
- Open CodeGPT: Once installed, select the
CodeGPT
icon in the menu bar.
- Setup the provider: From the
CodeGPT
icon in the menu bar, selectOllama
as provider. - Select model: Now search for the model you want to use. For this tutorial, we use
codegemma
.
- Ask your question: Now, in the chat dialogue, ask your code question and wait for the model to do its work. Since we use private models, there are no restrictions using CodeGPT (yet).
Note: The extension isn’t aware of the model availability on your local machine. Make sure the model is downloaded within Ollama first, because the extension will not do this for you.
6.1 Configure autocomplete
With autocomplete, suggestions will be provided directly in your code file by the model. This feature significantly enhances productivity by reducing the amount of manual typing required. Secondly, it also minimises errors by offering accurate code snippets and suggestions based on context.
- Go to Settings: Open the
CodeGPT
chat window once again and navigate to theMenu
. - Enable Autocomplete: Select
Autocomplete
and choosecodegemma:code
as the AI model. - Set Status: Make sure to set the status of
Autocomplete
toEnabled
.
- Preferences: Set your preferences, such as the max tokens and the delay for autocomplete suggestions.
Now, when writing code, suggestions pop up automatically. You can either use or discard these suggestions, similar to the functionality provided by GitHub Copilot.
7. Conclusion
Today we explored Ollama, we’ve seen how this powerful local AI alternative to GitHub Copilot can enhance your development experience. Unlike cloud-based solutions, Ollama ensures that all data remains on your local machine, providing heightened security and privacy. By seamlessly integrating with development tools such as Visual Studio Code, and supporting high-quality models like phi3
and codegemma
, Ollama offers a reliable and robust solution for developers.
7.1 GitHub Copilot vs. Ollama
- Data privacy: While GitHub Copilot relies on cloud services which may raise data privacy concerns, Ollama processes everything locally, ensuring that no data is sent to external servers.
- Cost: GitHub Copilot requires a subscription fee, whereas Ollama is completely free to use.
- Internet connectivity: GitHub Copilot necessitates an internet connection to function, while Ollama operates entirely offline, making it a more reliable option in environments with limited connectivity.
- Accessibility: Since Ollama is open-source, it can be customized and optimized for specific hardware and use cases, whereas customization options for GitHub Copilot are limited.
7.2 Performance and battery life concerns
Running AI models locally, as with Ollama, does require substantial processing power and can have a noticeable impact on performance and battery life. On my MacBook Pro M1 (2020) with 16GB of RAM, the performance was satisfactory, but developers with older or less powerful hardware may experience slower response times and increased battery drain. It is essential to balance the benefits of local processing with these potential drawbacks.
7.3 Use a model with less parameters
When performance is slow, you should consider using a version of the codegemma
model with fewer parameters (2 billion instead of the default 7 billion). Models with fewer parameters require less computational power, thus improving response times and reducing the strain on your machine’s resources.
However, the downside of using a model with fewer parameters is that it may generate less accurate and less nuanced code suggestions, as it lacks the depth and complexity of larger models. Consequently, while enhancing performance, you might find that the suggestions are not as precise or as helpful in handling more intricate coding tasks. To download codegemma (2B)
, run:
ollama run codegemma:2b
8. Final thoughts
Ollama stands out as a compelling alternative to GitHub Copilot, especially for those who prioritize privacy, local control, and cost-effectiveness. By keeping your data secure and offline, and by providing a free and open-source solution, Ollama aligns with the needs of developers who seek both efficiency and autonomy in their workflow. While performance and battery life considerations should be taken into account, the advantages of using Ollama make it a worthy contender in the realm of AI-assisted development tools.