As teased/promised, I wooed my dear friend Maija into letting me interview her on her nine-years-long flute career, adjusting to high standards at a remarkably young age, and got some advice on how to not lose the spark of playing music. (She agreed in advance, but I expect bringing over a couple tall cans and complimenting her dinner also helped greatly.) (It was a phenomenal soup.)

I’d wanted to talk to a capital-f Flautist sometime before the end of the semester. I’ve had the sense that I have needed to ground myself, to come back to earth (instead of raising the stakes, as I find myself oft to do – especially when I look at my practice schedule on the calendar). This is paired, in part, with hearing something a couple weeks ago that resonated with me deeply: be teachable. By the nature of this project, this has been a pretty solitary venture. I knew that I could likely get away with keeping my circle small, but I didn’t think I would be better for it. Even though I’ve been approaching my practice with all sorts of mindfulness, considering my fixed and growth mindsets, it can still be a tiring journey. Part of me can’t believe that people do this professionally.

The catch is that Maija has no interest in being in front of a camera (though, she is a talented photographer). Because this project has been heavy on the video content, I thought that I’d have poor luck getting her on for a mock lesson. After working in a group project to record and transcribe a podcast, I had a lightbulb moment – Maija and I talk for hours on end, we just needed AI to keep up with us so that I wouldn’t have a mountain of transcription homework on my plate. I ended up choosing the app Transcribe, honestly because my preferred app Descript didn’t have an iPhone-compatible app, and the built-in for my phone seemed a little lame. We sat down for 15 minutes on her couch, wired headphone mic passed from hand to hand, and:

Fortunately, the formatting as a PDF file block has an ability that a JPG image block lacks: the ability to zoom. Fitting a 15 minute conversation – which, if you’re Maija, means 2,500 words – into just a few pages meant a smaller font.

On the left, I have the transcribed conversation, for your enjoyment (and mine). I was surprised that even though we were coming from different starting points – Maija: 8 years old, passionate and naive, climbing through the ranks of expertise;
Me: 27 years old, mindful and suspicious, working slowly, slowly, slowly –
we found similar heart in the purpose of playing music. It’s intangible, it’s exciting, tiring, devastating, you do it because it offers nothing but gives all. I felt so in touch with her when she said, in response of how to move forward, you have to be OK with being bad at it to continue to love it. Like hearing my professor affirm the challenges of picking up the flute, I needed to hear I wasn’t alone.

there is yet a little bit more to say, but that’s all the romantic wrap-up that i have to add. in case you aren’t fussed about ai.

Looking at the process of recording and transcribing, I was quite pleased while we were talking – the AI was just mere seconds behind us in our conversation, and we had a clear timeline of how much gabbing we could do before my free trial ran out. Each section proudly included an icon: 97%. I think this means accuracy. Would I agree? No. It took an immense load off of my to-do list by auto-eavesdropping, but the right side of the doc is the original transcription, with highlighted mistakes. As mentioned, we had a high word count, but the mistakes were often and hilarious. I knew we were off to a good start when it translated “flute career” to “flu clear.” (The next runner up was “if Suzie Azim” for “enthusiasm.”)

The biggest thing was that it didn’t differentiate between speakers, and there was one glaring headache in the making: no structural punctuation to be seen. It took a lot of editing and a lot of staring to get it right – I’m glad that I opted to save the text as well as audio because I listened back to the whole thing to pick up the loose ends. I would use it again, but I expect there is higher-level AI that is working to really reduce the workload of speech-to-text. If someone was tuning in to this live with hearing loss, they would have to do a lot of inference to pick up what translated word originally was just “um” and what was content, what might have been said during lost passages, and who said what. It’s growing, and that’s very cool. I love closed captioning, so my standards are high, but I expect that these things are like most other practices: you have to keep doing it for it to get better.