How does transcription work in the shop?

Record and transcribe group conversations
Thorsten Pehl
October 09, 2020

Transcribing a group discussion is very time-consuming and quite nerve-wracking. A good concentration is required, especially when transcribing group discussions. The noise level in the recording is often higher and the conversations are usually more turbulent than in a "contemplative" one-to-one conversation.

To clarify: For the transcription of a conversation by two people, we calculate a transcription time of 5–6 hours per hour of interview, for group interviews 8–10 hours per hour of interview.

With a few precautions, however, you can significantly reduce the later transcription effort and save yourself unnecessary expenditure of time. In this post we will give you some useful tips and use sound examples to show you how severe some effects can have on the sound quality.

1. Avoid rumbling tables

There is often a table in the middle of a discussion. So it makes sense to place the recording device there. But be careful: you probably know the effect of placing a vibrating tuning fork on a table top? The table transmits the sound and resonates with it. This is similar to the recording device when you place it on the table. Rattling paper, knocking, taking notes, a projector or notebook nearby, spoons in coffee cups, pouring water ... All these noises are transmitted directly to the recording device. Even if you do not perceive them explicitly during the conversation - the recording device captures them all. And to listen to them afterwards is really uncomfortable. Our proposed solution:

  • Use a mini tripod or position the recording device on a book, sweater, hat or similar to absorb vibrations from the table (writing, knocking).
  • If dishes are unavoidable, make sure that the recording device is not as close as possible to rattling cups.
  • Do not leave a projector or similar nearby.

Here you can hear the rustle of paper and hectic notes by the secretary on the table - DM-650 on the table (download)
And this is how it sounds with a tripod - LS-12 on a tripod (download)

2. Select the correct level and format

In order to record a group as well as possible, the right recording device and the settings made on it are among the most important factors. Please keep in mind:

  • If your device supports manual level control, switch it on and level it off well. If you are not sure: it is better to steer too low. The recordings can later with Audacity or MP3Gain be leveled (made louder). Once too loud and thus overdriven - and the recording is almost irretrievably uncomfortable to listen to.
  • The recording format should be .mp3 and the bit rates should be between 192 and 320 kB / s. Please do not use .wma, .dss and / or .m4a! Most recording devices, e.g. Philips DVT 4000, Olympus DM-650 and / or LS-12, support different bit rates in mp3 mode by default. If you have the choice, please always choose a fixed and non-variable bit rate. The latter can lead to deviating time stamps when using transcription programs and make it difficult to listen to or correct transcripts.

Here you can hear an example in m4a format (smartphone) - iPhone (download)
And here an mp3 of the same situation - LS-12 (download)

3. Quiet please - the room sounds too

The choice of location or room is extremely relevant. As cozy as a beer garden is - with its many background noises, it is one of the most unsuitable places for a group recording to carry out a reasonably efficient transcription. You should also avoid very large, reverberant rooms. At the latest when participants speak at the same time, only a “porridge” can be recorded, which makes a breakdown more difficult. It is almost as important to influence the situation so that background noises such as street noise from open windows or neighboring rooms are not recorded. Therefore:

  • Close the windows and doors and only ventilate in phases if possible.
  • Avoid outdoor shots or, if they cannot be avoided, use a fur windshield and manual control.
  • Maintain a calm environment. Conversations in the background or a café atmosphere, for example, are harmful.
  • Choose a room that is appropriate for the size of the group and has as little reverberation as possible (carpet, curtains or a crowded room are advantageous).
  • Be careful not to place soft speakers too far from the recording device.

Conversation supported

If you have now made the preliminary planning for the recording of the group conversation, you can still do a lot on the day of the recording to make typing as pleasant as possible later. This is especially true if the speakers are to be identified and identified throughout (i.e. Anna always as B1, Peter always as B2, etc.). From our experience in the transcription service, we can provide the following information:

  • Ask the participants to briefly introduce themselves in turn with their name and function: The recording then provides reference points that can facilitate the later identification of the speaker (especially important when outsourcing).
  • Prepare a seating plan in order to be able to better localize voices on the recording later based on the position of the recording device, the moderators / interviewers and the other people.
  • "Luxury" option: Note in a list of speakers the order in which the people speak. CAUTION: This is only useful if it is done consistently and closely enough.
  • If the topic and / or the research question allows, moderate the conversation in order to minimize overlaps and, above all, "wild" confusion.
  • If the group is larger, use a second recording device and let it record in parallel. So you can listen to incomprehensible passages (from a different angle).

Here is an example of such a moderation - DM-650 on the table (download)

Here again all recordings for comparison

For the following test recordings, we have chosen four different recording devices that are used frequently. Olympus DM-650 (automatic control), an Olympus DS-7000 (dictation machine in dss format), iPhone 5S (m4a format) and an LS-12 (manual control and mounted on a tripod). We used the Sony PCM-D100 as a reference for very high quality and comparatively expensive recordings.

During the first sound samples, a lot of background noise can be heard and the speakers often interrupt each other. The second group discussion example is a guided moderation. What differences do you hear in the recordings? What do you find disturbing?

"Orderly" discussion

"Lively" discussion