TL;DR
In this episode, you’ll discover 3 AI tools that run locally on a Mac, and thus will keep our business data 100% private.
Here are the important links mentioned in the episode
- Ollama: https://ollama.ai
- RayCast: https://www.raycast.com/
- Ollama AI extension for RayCast: https://www.raycast.com/massimiliano_pasquini/raycast-ollama
- Ollamac: https://github.com/kevinhermawan/Ollamac
- ChatBot Ollama: https://github.com/ivanfioravanti/chatbot-ollama
- DiffusionBee: https://diffusionbee.com/
- Most downloaded Stable Diffusion models: https://huggingface.co/models?other=stable-diffusion&sort=downloads
- Stable Diffusion Prompt guide: https://stable-diffusion-art.com/prompt-guide/
- DiffusionBee images on Arthub.AI: https://arthub.ai/tags/diffusion%20bee
- MacWhisper (referral link): https://goodsnooze.gumroad.com/l/macwhisper?a=72161139
- Whisper Transcriptions on Mac App Store: https://apps.apple.com/app/whisper-transcription/id1668083311?mt=12
Visit https://macpreneur.com/ai to grab your own copy of the Top 10 AI tools Cheat Sheet that will help you boost your solo business in this fast pacing world.
Affiliate disclosure
Hey there! Quick heads-up: Some of the links in this post might be special. Why? Because if you click on them and make a purchase, I earn a small commission at no extra cost to you. It’s like a virtual high-five for recommending stuff I love! So, thank you for supporting me and the Macpreneur podcast! Remember, I only promote products that I genuinely believe in. Now, let’s dive back into the fun stuff!
Summary:
In this episode, you’ll discover three Mac AI tools that can be used for various purposes while maintaining data privacy and without requiring an internet connection.
The tools explored are Ollama for text generation, DiffusionBee for image generation, and MacWhisper for transcriptions.
Each tool has its advantages and limitations, and they are recommended for specific use cases.
I also highlight the requirements and considerations for using these tools effectively on your Mac.
Generate in English language.
Takeaways:
- Ollama is a free tool for running open source language models on Mac, but it lacks a user-friendly interface.
- DiffusionBee is a simple and easy-to-install tool for local image generation, but it may be slower on Intel-based Macs.
- MacWhisper is a freemium tool for transcriptions that supports multiple languages and offers advanced features with a paid license.
- These tools are resource-intensive and perform better on Macs with Apple Silicon chips and sufficient RAM.
- Local AI tools provide privacy benefits by not requiring data to be processed on third-party servers.
- Large language models may have limitations in knowledge cutoff dates and lack direct internet access.
- Prompting techniques are important for effective image generation with DiffusionBee.
- MacWhisper is a recommended tool for transcription tasks, especially for YouTube links, voice memos, and Zoom meetings.
Teaser
Have you ever wanted to chat with AI, create amazing images, or transcribe audio on your Mac? Today, we’re exploring the tools that make it possible while keeping your business data entirely private and without requiring an internet connection.
Stay tuned as we dive into Ollama, DiffusionBee, and MacWhisper. You’ll discover what you can do with them, their advantages, and any limitations that you should be aware of. By the end of this episode, you will know more about local AI tools for Mac than most solopreneurs.
Problem
Using AI directly from a web browser or through an API (Application Programming Interface) is great. However, despite the security measures put in place by the AI providers, data still needs to be processed by a third-party server. Therefore, our AI usage is not 100 percent private, even if it’s not used to further train the AI model.
Solution
And so, the solution to this problem is very simple: using AI tools that can run locally on a Mac without needing an internet connection.
This is especially useful when needing help involving proprietary information.
The goal here is not to replace web-based AI tools but rather to complement them in specific situations.
The main reasons that we still need web-based AI tools come from the limitations of locally run ones.
So, the first limitation is that AI models take several gigabytes of space, and then we often need to download multiple ones. So it means it requires a lot of hard disk space.
AI also requires a lot of memory. Eight gigabytes of memory is okay for image generation and for transcriptions, but it’s the bare minimum for text generation. If you want to run a large language model with 13 billion parameters, you need at least 16 gigabytes of RAM, and if you want bigger or larger models, you need more than that.
On top of that, AI tools require powerful processors, especially GPUs, so the graphical processing units.
And so we usually get much better performance on Apple Silicon Macs than on Intel-based Macs.
And because to get better accuracy, we need larger models, it also requires more resources, so more storage, more memory, and more compute power.
Ollama introduction
Now, with that in mind, let’s explore a local tool that can do text generation, and that is called Ollama. O-L-L-A-M-A.
It’s a free tool that can run open-source large language models. The name comes from Llama2, L-L-A-M-A two, which is an open-source model that was released by Meta in July twenty twenty-three.
It’s an alternative to ChatGPT, Google Bard, Bing Chat, and Claude, and it’s a lightweight method to download and run a bunch of open-source models.
Obviously, Llama2, but Meta also released another one called Code Llama, which has been trained on coding and programming stuff.
But then there is a French startup that created another one called Mistral, which performs in some situations better than Llama2.
And then you have OpenChat, Orca, so the open-source community regarding large language models is really a vibrant one.
Ollama Interface
Now, in terms of interface, Ollama provides the bare minimum because Ollama is basically a menu bar application that runs in the background all the time to check for updates.
And if you see the icon in the menu bar, it just confirms that Ollama and the models that you will install can be run.
There is no user interface for Ollama at all.
The basic way to interact with Ollama and to chat with the models is through the terminal. This is usually what we use as developers or when we want to run terminal commands on our Mac.
Now, the big disadvantage is that you don’t have any chat history, so you need to copy-paste all the time.
And yes, it’s not super user-friendly.
So this is why there are other methods to interact with the models that have been downloaded with Ollama.
Ollama AI RayCast extension
And one way is through a Raycast extension. So, if you remember, Raycast is a launcher like Spotlight. And so, you can install an extension called Ollama AI; it’s totally free. The advantage also of this extension is that you can manage models, so you can delete models. If you want to pull models, you can do that as well.
You have a chat history, and it comes with built-in presets.
So, like I explained in episode 72, you can fix spelling, you can translate, you can rephrase, you can change the tone, and you can improve writing.
Usually, what happens is that you select the text and copy it to the clipboard. Then, you invoke Raycast, look for the Ollama AI action that you want to perform, press Command + Enter, and then it will perform the action.
The big disadvantage of this method is that a conversation is just one prompt and one answer. There is no follow-up possible.
Ollamac (macOS 14+)
So, if you want a longer conversation with the chatbot, either use the terminal or, if you have a Mac running macOS 14 Sonoma, then you can use an application called Ollamac.
So, it’s like Ollama with a C at the end. It’s a native Mac application. It has a full chat history. It’s very easy to copy-paste the output as well.
Now, I’ve tested it and noticed that it is still buggy at the time of recording. For instance, when I put in a prompt, the first response was blank or only one letter.
So, I needed to paste the prompt again, ask it a second time, and then it gave me an answer.
So, if you have a Mac running at least Sonoma, Ollamac is the most user-friendly way to interact with the models.
You can’t download models with that application, at least at the moment, but you can interact with the models.
Chatbot Ollama
And then there is another method which is called Chatbot Ollama.
And this is a web interface that looks a lot like ChatGPT. So if you’re familiar with ChatGPT, you will recognize the interface. When preparing for this episode, my daughter came into my office, looked at the screen, and said, “Oh, ChatGPT.”
And I said, “No, it’s not ChatGPT, it’s ChatBot Ollama.”
But yeah, it looks so much like ChatGPT that if you’re used to ChatGPT on this interface, you will love ChatBot Ollama.
It offers chat history. You can even create your own library of prompts on the right side and then reuse them easily.
For every conversation, you can configure a system prompt. So, some information that will be used as context, if you want, and you can even configure how much creativity you give the model for each separate conversation.
Now, the downside of Chatbot Ollama is that it requires developer tools like HomeBrew, Git, NPM.
So, if you’re a techie or if you’re experienced with the terminal, and when I say the word HomeBrew, you understand completely what I mean, then I recommend going with Chatbot Ollama. If you have no clue what HomeBrew means, I would say don’t necessarily go that way.
Pros of Ollama
The big advantage of a tool like Ollama for local chatbot capabilities is that you can use it offline and it is 100 percent private.
The open-source community around large language models is very active, so there are constantly new models being developed.
Another thing I didn’t mention yet is that it’s possible to create a custom model using a text file.
This model file allows you to specify the AI model you want, such as Lama2 with 13 billion parameters. You can also configure another parameter called temperature, which determines the level of creativity the model exhibits. Additionally, you can provide some text as the context for the model.
One approach is to define your ideal customer avatar and how you can best serve them. This includes all relevant information about your business, products, and services. Essentially, you create your own GPT.
This functionality is very similar to what OpenAI introduced with ChatGPT Plus, which requires a subscription.
Cons of Ollama
Now, the limitations and disadvantages of Ollama are that most of the open source models that you can download today have limited knowledge up to 2021 or 2022.
So it’s best to always start by asking the model, “What is your world knowledge cutoff date?” And then it will tell you, “Okay, it’s October 2021 or something like that.”
Another limitation is that those models do not have direct access to the internet, so they can’t fetch information. If you give them a URL, they won’t open and look at what is behind the URL.
Also, it’s very resource-hungry. I’ve used it on both my 2020 iMac with an Intel processor, and even though it has 40 gigabytes of RAM, it’s super slow.
On the flip side, I used it on my MacBook Pro, which has only 16 gigs of RAM but an M2 Pro Chip, and there it’s two times faster than on the iMac.
DiffusionBee intro
Okay, so now let’s turn our attention to image generation, and the tool that I recommend using is called DiffusionBee.
It’s a totally free tool, and it allows you to run open-source image generation models locally on a Mac.
So the name DiffusionBee comes from Stable Diffusion. It’s an open-source model that was released by Stability AI in August 2022.
It’s an alternative to DALL-E, Midjourney, and Adobe Firefly.
Now, the application is about 600 megabytes, but after the first launch, it will download roughly 5 gigabytes worth of models.
It has a simplified user interface compared to other web-based tools that are trickier to install.
And the current official version is version 1.7.4, downloadable directly from the website.
It’s available for both Intel-based and Apple Silicon Macs, and it’s compatible with macOS Monterey.
DiffusioBee functionalities
Now, the main functionalities of version 1.7.4 are that it offers traditional text-to-image generation. So, you have a prompt area where you explain what you want. You can choose models, define different parameters, and also define what we call negative prompts, which are the things that we don’t want in the picture.
Then we also have image-to-image. So, you give it an existing photo, an existing image, and a prompt, and it will use both the prompt and the image to create your new image.
Then you can do inpainting, so basically you have an existing image or the result of a text-to-image generation, and then you will highlight certain parts of the image. With the prompt, you will ask the model to modify that part of the image, which is called inpainting.
And then you have outpainting. So, you give it an image, and the prompt is all about what’s outside the image. So, if you have a square image, you can make it portrait mode, landscape, or add stuff using AI.
Then we have what is called ControlNet. With ControlNet, you use a second image and a second AI neural network to control the output.
For instance, you can give it a picture of a person with a certain pose, and the ControlNet, which is a special kind of neural network, will analyze that image and realize, “Ah, the legs are crossed, the arms are crossed, the person is looking on the left or looking on the right, or something like that.”
And then it will create a wireframe diagram linking the head, the arms, and the legs, and use that information, the pose of the person, to guide the generation of the image afterwards. It’s a nice trick if you want to create an image of a person in a certain position or pose.
The final thing that it can do is provide a history of all your prompts with a small thumbnail for each image. You have everything, not only the prompt, but also which model you used, the size of the picture, and all the parameters for the image generation. There is even a search box, so you can quickly find a previous image that you generated.
Now, at the time of recording, there is a technical preview of version 2 of DiffusionBee. It’s still in beta, and the improvements that are already implemented include the ability to manage and download models directly from the application.
With version 1.7.4, you need to go to the website, download the models, and import them afterwards.
But hopefully, with version 2, it will be easier to do that.
The developer has already planned new features, but they are not activated yet.
So, in the future, it will be possible to train a model on our own images and create our own stable diffusion model.
Video creation will also be implemented in the future in the application.
Pros of DiffusionBee
And what are the major advantages of DiffusionBee?
Well, it’s very easy to install.
The interface is extremely simple.
Compared to tools like DALL-E and MidJourney, those tools have implemented restrictions to prevent the reuse of copyrighted characters or individuals.
So, I attempted to create a Star Wars character using ChatGPT 4 and DALL-E 3, but I received a message stating that I couldn’t do so because it involved copyrighted material.
However, with a tool like DiffusionBee and an open source model, you can freely mention Star Wars, Pixar, as well as the names of actors and actresses, and incorporate them into your prompt.
Cons of DiffusionBee
What makes tools like DiffusionBee not so good is that it’s very slow on Intel machines. So, it’s really better to have an Apple Silicon Mac with an M1 chip, M2 chip, or M3 chip.
There is a competitor to DiffusionBee, and it’s a web user interface developed by a company called Automatic1111.
And so I’ve tried both, and yes, I must admit the Web UI from Automatic offers much more capabilities than DiffusionBee, but at the same time, it’s much easier to install DiffusionBee than the Web UI of Automatic.
Again, you need Homebrew, you need the terminal to install this.
There is also quite a steep learning curve regarding prompting. So creating effective image generation prompts is not easy.
So, for that, I will put links at the top of the blog post to a website called Stable Diffusion Art, and they have a prompt guide. So I will put the link to that so that you can learn how to prompt Stable Diffusion.
And on another website called Arthub, you can filter images that have been created with DiffusionBee.
So those pictures have been tagged DiffusionBee. If you look at those, what you get is not only the picture that was generated with DiffusionBee, but you also have the prompt, the model, and all the parameters.
So if you see images that you like, you can then easily learn from those images and reproduce the prompts and make some variations on the prompt.
Now, the last thing about DiffusionBee is that it’s very easy to fall into a rabbit hole.
For a presentation, I wanted to have the DeLorean from Back to the Future flying over Luxembourg City.
And I’ve spent at least an hour trying to get a photorealistic picture, and yeah, it was not easy.
And so that’s, I would say, maybe the opposite of efficiency in that case.
MacWhisper Intro
Let’s now look at the last tool for transcriptions called MacWhisper.
It’s a freemium tool, meaning that there is a free version with some limitations, and then if you purchase a pro license, you can do more.
The name comes from Whisper, which is an open-source speech recognition model that was released by OpenAI in September 2022.
The particularity of Whisper and MacWhisper is that it understands 100 languages and it can do translation directly within the tool. So, you can give it audio in English, let’s say, it will transcribe it in English and then translate it into French or any of the 100 languages that it knows.
The free version of MacWhisper comes with two models called Tiny and Base. They are not the most accurate, obviously they are the least accurate models, but it’s very quick to do transcription.
So, if you imagine, one minute of audio is done in two seconds with the tiny model, and in four seconds with the base model. So it’s very, very quick.
Now, if you want better accuracy in terms of transcription, then you need the paid version, which offers three models: one called small, another medium, and the last one large.
The more the accuracy, the slower it is to do the transcription.
So, the large model, for instance, does not offer any speed up whatsoever. So if you have a video of one hour, it will take one hour to transcribe.
It will be super good quality, but obviously it will take a lot of time.
It is available from Gumroad, from the developer’s page. I will put a link at the top of the blog post.
But the author has also put a version on the Mac App Store.
So if you search for Whisper Transcription, you will find the same application. And to go from the free version to the paid version, it’s an in-app purchase from the Mac App Store.
A requirement for MacWhisper is macOS 13 Ventura.
MacWhisper input & output
So how do you use it?
You can directly record into the app. You click on a button. It uses the microphone, you speak, and it does the transcription on the fly.
You can give it a URL, for instance, a YouTube link or a Vimeo link, and it will then do the transcription using the URL.
You can give it an existing file so it understands audio files like mp3, wav, m4a, and also video files it understands mov and mp4.
So you can take a voice memo, drag and drop it on top of the interface, and then it will give you a transcription of your voice memo.
To do batch processing, drag and dropping multiple files at once, requires the paid version.
And the same goes with podcast mode. So if you use Zoom to record your podcast, it’s possible to configure Zoom so that each speaker has its own audio file.
The transcribe podcast option will then automatically assign the speaker names and position the text for each speaker.
And then another feature that requires the paid options is the ability to transcribe the system audio from macOS.
So whatever sound or music is on your Mac will be transcribed.
Or you can even choose an individual application.
In terms of output, you can generate text files such as txt, docx, and word documents. You can also generate csv and pdf files.
If you need subtitles, for instance, for YouTube or Vimeo, it can generate SRT files and the VTT format. Therefore, it generates both.
Pros of MacWhisper
So, what are the big advantages of MacWhisper?
It’s lightweight; the application doesn’t take up much space.
It’s multilingual, and it’s very easy to download the desired models.
It’s from within the interface. You go to the settings part of the interface, and you can download the models.
Cons of MacWhisper
Now, the disadvantages of Mac Whisper, are they related to the pricing model, right?
So, if you want advanced features like batch mode, for instance, it requires purchasing a license key. But apart from that, it’s a very good application.
Recap
So to recap, I have explored three AI tools that can help you generate text, images, and transcriptions locally on your Mac without an internet connection, thus keeping your business data private.
These tools run faster on newer Macs equipped with Apple Silicon chips, and even then, it’s recommended to have at least 16 gigabytes of RAM and be able to allocate 50 to 100 gigabytes of storage to store the models.
Now, talking specifically about large language models, the quality of the output may be inferior to the latest versions of ChatGPT or Claude, but it’s the price to pay at the moment to guarantee 100 percent privacy.
And when it comes to image generation, my experience tells me that the integration of DALL-E 3 inside ChatGPT 4 drastically reduces the need to learn good image prompting techniques.
And therefore, if you quickly need an image, a tool like DiffusionBee might not be the best option.
However, if you want to experiment and in the near future, if you want to train your own model with your images, then DiffusionBee is the right tool to install on your Mac and then start learning how to use it.
And finally, on the transcription front, MacWhisper is, in my opinion, a must-have AI tool, especially if you’re not using Descript.
So you can copy-paste YouTube links, drag and drop voice memos, and you can even transcribe Zoom meetings on the fly.
Now, I purchased a pro license simply because the batch transcription mode is a great time saver.
If you’ve never used any of the tools that I’ve covered today, I hope this episode has motivated you to start experimenting with at least one of them.
And if you’ve already started, I’m curious to know what specific use cases have helped you so far.
You can DM me on Instagram; my handle is MacpreneurFM.
Next and outro
In the next episode, I will explore different ways that we can use AI to generate content ideas.
So that’s it for today. If you haven’t done it yet, visit macpreneur.com/AI to grab your own copy of the top 10 AI tools cheat sheet.
This PDF will give you the edge that you need to boost your solo business in this fast-paced world.
Once again, it’s macpreneur.com/AI.
And until next time, I’m Damien Schreurs, wishing you a great day.