Off Prompt

AI Tools for Small Business

Productivity

How to use AI to build a simple voice-to-text system for capturing job notes, site visits, and client calls without typing anything up

How to record job notes with AI without typing — a complete voice-to-text setup for field workers using tools costing $0–$20/month.

Mara Chen 9 min read
How to use AI to build a simple voice-to-text system for capturing job notes, site visits, and client calls without typing anything up

Field service workers lose an estimated 15–30 minutes per day to post-job admin — that's up to 100 hours of non-billable time per employee per year, just from writing up notes. This post walks you through a complete voice-to-text system for capturing job notes, site visits, and client calls using tools that cost between $0 and $20/month. Get the setup right once and you can go from spoken observation to formatted, sendable job note in under 60 seconds.

What You Need to Record Job Notes with AI

Talknotes{:target="_blank"} — converts free-spoken voice recordings into clean, structured notes using AI, stripping filler words and organizing content automatically. Pricing: plans start at approximately $8–$12/month as of early 2026; check Talknotes pricing{:target="_blank"} before subscribing — there is a limited free trial.

ChatGPT{:target="_blank"} (optional, for formatting) — if you want more control over output structure, ChatGPT Plus at $20/month includes Voice Mode on iOS and Android. The free tier also supports voice input but with usage limits.

Otter.ai{:target="_blank"} (optional, for team use) — real-time transcription with team sharing features; Pro plan is approximately $16.99/month per user as of early 2026, per Otter.ai pricing{:target="_blank"}.

Apple Dictation (iOS 16+) or Google Voice Typing (Android) — both are free and work offline. Accuracy is lower than Whisper-based tools, but they're a practical fallback when you have no signal.

Time required: Basic single-tool setup: 20–30 minutes. Full setup with CRM or Google Sheets integration via Zapier: 45–90 minutes.

Skill level: No coding required. For the Zapier integration in the final section, you'll need a Zapier{:target="_blank"} account (free tier available, paid plans from $19.99/month as of early 2026) and an OpenAI API{:target="_blank"} key.


Step 1: Choose the Right AI Voice Notes App for Field Work

Before setting up anything, match your situation to the right tool. The wrong choice here costs you accuracy and money.

  1. Assess your connectivity. If you regularly work in basements, rural properties, or areas with poor signal, you need an offline-capable option. Apple Dictation and Google Voice Typing both run on-device. Talknotes and Otter.ai require a connection to process.
  2. Assess your noise level. Construction sites and outdoor landscaping are genuinely difficult for any voice-to-text tool. If you work in loud environments, shortlist a directional Bluetooth headset or a clip-on lapel mic before picking your app — hardware quality affects accuracy more than software choice in noisy conditions.
  3. Decide if you work solo or with a team. Solo field workers get the most value from Talknotes or ChatGPT Voice. Teams with 2+ people doing site visits benefit from Otter.ai's AI Channels feature (introduced in 2025), which lets the whole team search and share transcripts across jobs.
  4. Decide on your output destination. If notes just go into your own records or get emailed to clients, Talknotes alone handles this. If you need notes to push into a CRM or invoicing tool automatically, you'll need the Zapier + OpenAI Whisper workflow described in the final section.

The trade-off is this: Talknotes is the fastest path to clean notes with minimal setup, but it's optimized for solo dictation. Otter.ai has more team infrastructure, but at $16.99/month per user it's meaningfully more expensive and was designed for meetings, not field dictation.


Step 2: Set Up Talknotes to Record Job Notes with AI

This is the recommended starting point for most field workers — contractors, inspectors, landscapers, cleaners — working alone.

  1. Go to talknotes.io{:target="_blank"} and create an account. Start the free trial before committing to a paid plan.
  2. Open the app on your phone (iOS or Android). Grant microphone permissions when prompted.
  3. Configure your output format. In settings, select your preferred note structure. For job notes, "structured list" output works better than "paragraph" — it forces the AI to break content into discrete fields rather than flowing prose.
  4. Set a custom prompt (if the option is available on your plan) to tell the AI what fields to extract. Paste this into the custom instructions field:

Always extract and label the following fields from my voice note:

  • Job address
  • Date
  • Client name
  • Work performed
  • Materials used
  • Issues encountered
  • Next steps / follow-up required
  • Estimated billable hours

Clean up filler words. Do not add information I didn't mention. If a field is missing, write "not mentioned."

  1. Record a test note. Speak for 30–45 seconds describing a sample job. Tap stop and wait 10–15 seconds for the formatted output.
  2. Review the output against your spoken note. Check that all fields populated correctly and that no information was fabricated. The "not mentioned" placeholder in the prompt prevents the AI from guessing — this is critical for billing and liability accuracy.

The custom prompt is doing most of the work here. Without it, you get a cleaned-up paragraph that still requires manual sorting. With it, you get a document you can forward directly to a client or paste into an invoice.


Step 3: Build Your 60-Second Field Dictation Habit

The technology works only if the spoken input contains the right information. A 2024 ServiceTitan survey found that over 60% of field technicians regularly skip or delay job notes due to time pressure — the fix isn't a better app, it's a faster verbal routine.

Use this script as your default opening every time you record:

"Job note. [Date]. [Client name] at [full address]. Today I [describe work performed in plain language]. Materials used: [list]. Issues encountered: [describe or say 'none']. Follow-up required: [describe or say 'none']. Billable hours: [number]."

The whole thing takes 30–45 seconds to say. You will get better at it within a week. The AI does the formatting — your job is to say the facts in roughly the right order.


When Something Goes Wrong

Symptom: Output fields are consistently missing or labeled incorrectly. Root cause: The app isn't retaining your custom prompt between sessions, or the plan you're on doesn't support custom instructions. Fix: Paste your prompt into a note on your phone and copy-paste it each time, or upgrade to a plan that supports persistent custom instructions. Verify which plan tier includes this feature on Talknotes' pricing page before paying.

Symptom: Transcription accuracy drops significantly on job sites. Root cause: Ambient noise overwhelming the phone microphone, not an AI model failure. Fix: Switch to a directional Bluetooth headset or a lapel microphone that clips near your mouth. OpenAI's Whisper{:target="_blank"} — the transcription engine behind most of these tools — achieves 95%+ accuracy on clear speech in quiet conditions; noisy environments degrade this significantly, but the hardware fix recovers most of that accuracy.

Symptom: Notes are being processed but the app stalls or fails to return output. Root cause: Poor or no internet connection; cloud-based tools can't complete the AI formatting step offline. Fix: Switch to Apple Dictation (iOS) or Google Voice Typing (Android) as a fallback. You'll get raw transcript without AI formatting, but the core information is captured. Format it later when you're back on signal.


How to Connect Voice Notes to Your Existing Tools

This is where the system earns back real time. A basic Zapier + Whisper + GPT-4o pipeline requires no coding and takes roughly 45–90 minutes to build.

The workflow: record voice memo on phone → file uploads to a cloud folder (Google Drive or Dropbox) → Zapier detects new file → triggers OpenAI Whisper API transcription at $0.006/minute (as of 2025, per OpenAI's API pricing{:target="_blank"}) → output passed to GPT-4o for structured formatting → formatted note sent to your CRM, Google Sheets, or email.

  1. Create an OpenAI API account at platform.openai.com{:target="_blank"} and generate an API key. Keep this private.
  2. Set up a watched folder in Google Drive or Dropbox specifically for voice memos.
  3. Build a Zap in Zapier: Trigger = "New file in folder." Action 1 = OpenAI "Transcribe audio" using the Whisper model. Action 2 = OpenAI "Chat with GPT-4o" using your structured job note prompt. Action 3 = send output to your destination (Google Sheets row, CRM contact note, or email).
  4. Test with a real voice memo before relying on it. Verify the Whisper transcription is accurate, then verify GPT-4o's formatting matches your expected fields.

At $0.006/minute, transcribing a 45-second job note costs roughly $0.005 — essentially free at any realistic volume. A field worker recording 10 notes per day, 250 days per year, runs up about $12.50 annually in Whisper API costs. The formatting via GPT-4o adds a small additional cost, but at typical usage rates for job notes, total API costs for one worker stay well under $25/year.

For jobs involving sensitive client data — home inspectors, security contractors — review the data processing terms for any cloud-based tool before routing client information through it. If that's a concern, Whisper's open-source model{:target="_blank"} can run locally on a machine with 8GB RAM minimum, which keeps everything off external servers entirely.


What to Do Next

Start with Talknotes and the custom prompt in Step 2. Run it for two weeks on real jobs before deciding whether you need the Zapier pipeline. Most solo field workers won't need automation beyond the app itself — the Zapier setup pays off at 5+ workers or when notes need to feed directly into billing.

When you're ready to push notes into a CRM automatically, read how to connect AI tools to your existing business systems without coding.


FAQ

How accurate is AI voice-to-text on a real job site? On clear speech in quiet conditions, OpenAI's Whisper engine achieves around 95%+ accuracy — near-human level. In noisy outdoor environments, accuracy drops noticeably, but using a directional Bluetooth headset or lapel mic recovers most of that loss. The hardware matters more than the app when you're working in loud conditions.

What does it actually cost to set up a voice note system for one worker? The cheapest viable setup is free: Apple Dictation or Google Voice Typing for capture, with manual copy-paste into whatever system you use. The most useful paid setup is Talknotes at $8–$12/month as of early 2026. The full Zapier + Whisper + GPT-4o pipeline adds roughly $20–$40/month in Zapier costs (depending on task volume) plus under $25/year in OpenAI API fees. Total annual cost for one worker with a full automated pipeline: under $600. Against 60–100 hours of recovered non-billable admin time, the ROI calculation is straightforward.

Is Otter.ai worth $16.99/month for a solo contractor? Probably not for pure field note capture. Otter.ai is genuinely strong for meeting transcription and team-wide search across shared transcripts (AI Channels, added in 2025). For a solo worker dictating job notes, Talknotes is cheaper and purpose-built for that use case. The honest answer is: Otter.ai earns its cost at 3+ person teams who also use it for client calls or internal meetings.

What if I work in areas with no cell service? Use Apple Dictation (iOS 16+) or Google Voice Typing — both run on-device and don't need a connection. You'll get a raw transcript without AI cleanup, but the information is captured. Once you're back on signal, paste the raw text into ChatGPT or Talknotes to format it. This two-step process adds a minute of work but handles dead zones reliably.

Can I use this for client calls, not just site visits? Yes. Otter.ai integrates directly with Zoom, Google Meet, and Microsoft Teams and can transcribe calls automatically. For phone calls not on a video platform, you can use a call recording app (check your state's consent laws before recording calls) and then run the audio file through Whisper or Talknotes. ChatGPT Voice Mode also works during a call if you dictate a summary immediately after hanging up — 60 seconds of verbal recap turns into a structured client note in under 30 seconds.

Was this useful? ·