<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>https://web.dev/</id>
  <title>Dale Curtis on web.dev</title>
  <updated>2026-04-15T23:21:06Z</updated>
  <author>
    <name>Dale Curtis</name>
  </author>
  <link href="https://web.dev/authors/dalecurtis/feed.xml" rel="self"/>
  <link href="https://web.dev/"/>
  <icon>https://web-dev.imgix.net/image/T4FyVKpzu4WKF1kBNvXepbi08t52/htLpTCWkW3Z6kcuo1Hef.jpeg?auto=format</icon>
  <logo>https://web.dev/images/shared/rss-banner.png</logo>
  <subtitle>Dale is a Senior Software Engineer</subtitle>
  
  
  <entry>
    <title>Media Source Extensions for Audio</title>
    <link href="https://web.dev/mse-seamless-playback/"/>
    <updated>2015-06-11T00:00:00Z</updated>
    <id>https://web.dev/mse-seamless-playback/</id>
    <content type="html" mode="escaped">&lt;h2 id=&quot;introduction&quot;&gt;Introduction &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#introduction&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://developer.mozilla.org/docs/Web/API/Media_Source_Extensions_API&quot; rel=&quot;noopener&quot;&gt;Media Source Extensions (MSE)&lt;/a&gt;
provide extended buffering and playback control for the HTML5 &lt;code&gt;&amp;lt;audio&amp;gt;&lt;/code&gt; and
&lt;code&gt;&amp;lt;video&amp;gt;&lt;/code&gt; elements. While originally developed to facilitate
&lt;a href=&quot;http://dashif.org/about/&quot; rel=&quot;noopener&quot;&gt;Dynamic Adaptive Streaming over HTTP (DASH)&lt;/a&gt;
based video players, below we&#39;ll see how they can be used for audio; specifically for
&lt;a href=&quot;http://en.wikipedia.org/wiki/Gapless_playback&quot; rel=&quot;noopener&quot;&gt;gapless playback&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You&#39;ve likely listened to a music album where songs flowed seamlessly across
tracks; you may even be listening to one right now. Artists create these
&lt;a href=&quot;https://en.wikipedia.org/wiki/Gapless_playback&quot; rel=&quot;noopener&quot;&gt;gapless playback&lt;/a&gt; experiences
both as an artistic choice as well as an artifact of
&lt;a href=&quot;https://en.wikipedia.org/wiki/Gramophone_record&quot; rel=&quot;noopener&quot;&gt;vinyl records&lt;/a&gt; and
&lt;a href=&quot;https://en.wikipedia.org/wiki/Compact_disc&quot; rel=&quot;noopener&quot;&gt;CDs&lt;/a&gt; where audio was written as one
continuous stream. Unfortunately, due to the way modern audio codecs like
&lt;a href=&quot;https://en.wikipedia.org/wiki/MP3&quot; rel=&quot;noopener&quot;&gt;MP3&lt;/a&gt; and
&lt;a href=&quot;https://en.wikipedia.org/wiki/Advanced_Audio_Coding&quot; rel=&quot;noopener&quot;&gt;AAC&lt;/a&gt; work, this seamless
aural experience is often lost today.&lt;/p&gt;
&lt;p&gt;We&#39;ll get into the details of why below, but for now let&#39;s start with a
demonstration. Below is the first thirty seconds of the excellent
&lt;a href=&quot;http://www.sintel.org/&quot; class=&quot;external&quot; rel=&quot;noopener&quot;&gt;Sintel&lt;/a&gt; chopped into five separate MP3
files and reassembled using MSE. The red lines indicate gaps introduced during
the creation (encoding) of each MP3; you&#39;ll hear glitches at these points.&lt;/p&gt;
&lt;p&gt;&lt;video&gt;      &lt;source src=&quot;https://storage.googleapis.com/web-dev-uploads/video/C47gYyWYVMMhDmtYSLOWazuyePF2/P8hasF93W3pJc3LXWDRc.webm&quot; type=&quot;video/webm&quot; /&gt;    &lt;/video&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://simpl.info/mse/audio/gap&quot; rel=&quot;noopener&quot;&gt;Demo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yuck! That&#39;s not a great experience; we can do better. With a little more work,
using the exact same MP3 files in the above demo, we can use MSE to remove those
annoying gaps. The green lines in the next demo indicate where the files have
been joined and the gaps removed. On Chrome 38+ this will playback seamlessly!&lt;/p&gt;
&lt;p&gt;&lt;video&gt;      &lt;source src=&quot;https://storage.googleapis.com/web-dev-uploads/video/C47gYyWYVMMhDmtYSLOWazuyePF2/jRdAuBMz51Lplg6RvUIF.webm&quot; type=&quot;video/webm&quot; /&gt;    &lt;/video&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://simpl.info/mse/audio/gapless&quot; rel=&quot;noopener&quot;&gt;Demo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are a
&lt;a href=&quot;https://web.dev/mse-seamless-playback/#appendix-a-creating-gapless-content&quot;&gt;variety of ways to create gapless content&lt;/a&gt;.
For the purposes of this demo, we&#39;ll focus on the type of files a normal user
might have lying around. Where each file has been encoded separately without
regard for the audio segments before or after it.&lt;/p&gt;
&lt;h2 id=&quot;basic-setup&quot;&gt;Basic Setup &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#basic-setup&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First, let&#39;s backtrack and cover the basic setup of a &lt;code&gt;MediaSource&lt;/code&gt; instance.
Media Source Extensions, as the name implies, are just extensions to the
existing media elements. Below, we&#39;re assigning an
&lt;a href=&quot;https://developer.mozilla.org/docs/Web/API/URL.createObjectURL&quot; rel=&quot;noopener&quot;&gt;&lt;code&gt;Object URL&lt;/code&gt;&lt;/a&gt;,
representing our &lt;code&gt;MediaSource&lt;/code&gt; instance, to the source attribute of an audio
element; just like you would set a standard URL.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; audio &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createElement&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;audio&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; mediaSource &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;MediaSource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;SEGMENTS&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;mediaSource&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;addEventListener&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;sourceopen&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; sourceBuffer &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; mediaSource&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;addSourceBuffer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;audio/mpeg&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;onAudioLoaded&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; index&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Append the ArrayBuffer data into our new SourceBuffer.&lt;/span&gt;&lt;br /&gt;    sourceBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;appendBuffer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Retrieve an audio segment via XHR.  For simplicity, we&#39;re retrieving the&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// entire segment at once, but we could also retrieve it in chunks and append&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// each chunk separately.  MSE will take care of assembling the pieces.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token constant&quot;&gt;GET&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;sintel/sintel_0.mp3&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;onAudioLoaded&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;audio&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;src &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;URL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createObjectURL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;mediaSource&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;Once the &lt;code&gt;MediaSource&lt;/code&gt; object is connected, it will perform some initialization
and eventually fire a &lt;code&gt;sourceopen&lt;/code&gt; event; at which point we can create a
&lt;a href=&quot;http://www.w3.org/TR/media-source/#sourcebuffer&quot; rel=&quot;noopener&quot;&gt;&lt;code&gt;SourceBuffer&lt;/code&gt;&lt;/a&gt;. In the
example above, we&#39;re creating an &lt;code&gt;audio/mpeg&lt;/code&gt; one, which is able to parse and
decode our MP3 segments; there are several
&lt;a href=&quot;http://www.w3.org/2013/12/byte-stream-format-registry/&quot; rel=&quot;noopener&quot;&gt;other types&lt;/a&gt; available.&lt;/p&gt;
&lt;h2 id=&quot;anomalous-waveforms&quot;&gt;Anomalous Waveforms &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#anomalous-waveforms&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We&#39;ll come back to the code in a moment, but let&#39;s now look more closely at the
file we&#39;ve just appended, specifically at the end of it. Below, is a graph of
the last 3000 samples averaged across both channels from the
&lt;code&gt;sintel_0.mp3&lt;/code&gt;
track. Each pixel on the red line is a
&lt;a href=&quot;https://en.wikipedia.org/wiki/Audio_bit_depth&quot; rel=&quot;noopener&quot;&gt;floating point sample&lt;/a&gt;
in the range of &lt;code&gt;[-1.0, 1.0]&lt;/code&gt;.&lt;/p&gt;
&lt;img alt=&quot;mp3 gap&quot; decoding=&quot;async&quot; height=&quot;300&quot; loading=&quot;lazy&quot; sizes=&quot;(min-width: 750px) 750px, calc(100vw - 48px)&quot; src=&quot;https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&quot; srcset=&quot;https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=200 200w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=228 228w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=260 260w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=296 296w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=338 338w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=385 385w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=439 439w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=500 500w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=571 571w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=650 650w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=741 741w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=845 845w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=964 964w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=1098 1098w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=1252 1252w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=1428 1428w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/lQBjRIWPBMcEe27W7lUk.png?auto=format&amp;w=1500 1500w&quot; width=&quot;750&quot; /&gt;
&lt;p&gt;What&#39;s with all that those zero (silent) samples!? They&#39;re actually due to
&lt;a href=&quot;https://en.wikipedia.org/wiki/Gapless_playback#Compression_artifacts&quot; rel=&quot;noopener&quot;&gt;compression artifacts&lt;/a&gt;
introduced during encoding. Almost every encoder introduces some type of
padding. In this case &lt;a href=&quot;http://lame.sourceforge.net/&quot; class=&quot;external&quot; rel=&quot;noopener&quot;&gt;LAME&lt;/a&gt; added
exactly 576 padding samples to the end of the file.&lt;/p&gt;
&lt;p&gt;In addition to the padding at the end, each file also had padding added to the
beginning. If we peek ahead at the
&lt;code&gt;sintel_1.mp3&lt;/code&gt;
track we&#39;ll see another 576 samples of padding exists at the front. The amount
of padding varies by encoder and content, but we know the exact values based on
&lt;a href=&quot;https://web.dev/mse-seamless-playback/#appendix-b-parsing-gapless-metadata&quot;&gt;&lt;code&gt;metadata&lt;/code&gt;&lt;/a&gt; included within each file.&lt;/p&gt;
&lt;img alt=&quot;mp3 gap end&quot; decoding=&quot;async&quot; height=&quot;300&quot; loading=&quot;lazy&quot; sizes=&quot;(min-width: 750px) 750px, calc(100vw - 48px)&quot; src=&quot;https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&quot; srcset=&quot;https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=200 200w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=228 228w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=260 260w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=296 296w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=338 338w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=385 385w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=439 439w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=500 500w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=571 571w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=650 650w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=741 741w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=845 845w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=964 964w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=1098 1098w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=1252 1252w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=1428 1428w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/YZNsLpiUn09qoM2K5MdE.png?auto=format&amp;w=1500 1500w&quot; width=&quot;750&quot; /&gt;
&lt;p&gt;The sections of silence at the beginning and end of each file are what cause the
&lt;em&gt;glitches&lt;/em&gt; between segments in the previous demo. To achieve gapless playback,
we need to remove these sections of silence. Luckily, this is easily done with
&lt;code&gt;MediaSource&lt;/code&gt;. Below, we&#39;ll modify our &lt;code&gt;onAudioLoaded()&lt;/code&gt; method to use an
&lt;a href=&quot;https://w3c.github.io/media-source#definitions&quot; rel=&quot;noopener&quot;&gt;append window&lt;/a&gt; and a &lt;a href=&quot;https://w3c.github.io/media-source#definitions&quot; rel=&quot;noopener&quot;&gt;timestamp
offset&lt;/a&gt; to remove this silence.&lt;/p&gt;
&lt;h2 id=&quot;example-code&quot;&gt;Example Code &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#example-code&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;div&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;onAudioLoaded&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; index&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Parsing gapless metadata is unfortunately non trivial and a bit messy, so&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// we&#39;ll glaze over it here; see the appendix for details.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// ParseGaplessData() will return a dictionary with two elements:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;//&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;//    audioDuration: Duration in seconds of all non-padding audio.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;//    frontPaddingDuration: Duration in seconds of the front padding.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;//&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; gaplessMetadata &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ParseGaplessData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Each appended segment must be appended relative to the next.  To avoid any&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// overlaps, we&#39;ll use the end timestamp of the last append as the starting&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// point for our next append or zero if we haven&#39;t appended anything yet.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; appendTime &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; index &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;?&lt;/span&gt; sourceBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;buffered&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Simply put, an append window allows you to trim off audio (or video) frames&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// which fall outside of a specified time range.  Here, we&#39;ll use the end of&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// our last append as the start of our append window and the end of the real&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// audio data for this segment as the end of our append window.&lt;/span&gt;&lt;br /&gt;  sourceBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;appendWindowStart &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; appendTime&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  sourceBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;appendWindowEnd &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; appendTime &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; gaplessMetadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;audioDuration&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// The timestampOffset field essentially tells MediaSource where in the media&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// timeline the data given to appendBuffer() should be placed.  I.e., if the&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// timestampOffset is 1 second, the appended data will start 1 second into&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// playback.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;//&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// MediaSource requires that the media timeline starts from time zero, so we&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// need to ensure that the data left after filtering by the append window&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// starts at time zero.  We&#39;ll do this by shifting all of the padding we want&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// to discard before our append time (and thus, before our append window).&lt;/span&gt;&lt;br /&gt;  sourceBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;timestampOffset &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;br /&gt;    appendTime &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; gaplessMetadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;frontPaddingDuration&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// When appendBuffer() completes, it will fire an updateend event signaling&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// that it&#39;s okay to append another segment of media.  Here, we&#39;ll chain the&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// append for the next segment to the completion of our current append.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;index &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    sourceBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;addEventListener&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;updateend&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;index &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;SEGMENTS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token constant&quot;&gt;GET&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;sintel/sintel_&#39;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; index &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&#39;.mp3&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token function&quot;&gt;onAudioLoaded&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// We&#39;ve loaded all available segments, so tell MediaSource there are no&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// more buffers which will be appended.&lt;/span&gt;&lt;br /&gt;        mediaSource&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;endOfStream&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token constant&quot;&gt;URL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;revokeObjectURL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;audio&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;src&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// appendBuffer() will now use the timestamp offset and append window settings&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// to filter and timestamp the data we&#39;re appending.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;//&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Note: While this demo uses very little memory, more complex use cases need&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// to be careful about memory usage or garbage collection may remove ranges of&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// media in unexpected places.&lt;/span&gt;&lt;br /&gt;  sourceBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;appendBuffer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;h2 id=&quot;a-seamless-waveform&quot;&gt;A Seamless Waveform &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#a-seamless-waveform&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let&#39;s see what our shiny new code has accomplished by taking another look at the
waveform after we&#39;ve applied our append windows. Below, you can see that the
silent section at the end of
&lt;code&gt;sintel_0.mp3&lt;/code&gt;
(in red) and the silent section at the beginning of
&lt;code&gt;sintel_1.mp3&lt;/code&gt;
(in blue) have been removed; leaving us with a seamless transition between
segments.&lt;/p&gt;
&lt;img alt=&quot;mp3 mid&quot; decoding=&quot;async&quot; height=&quot;300&quot; loading=&quot;lazy&quot; sizes=&quot;(min-width: 750px) 750px, calc(100vw - 48px)&quot; src=&quot;https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&quot; srcset=&quot;https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=200 200w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=228 228w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=260 260w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=296 296w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=338 338w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=385 385w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=439 439w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=500 500w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=571 571w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=650 650w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=741 741w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=845 845w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=964 964w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=1098 1098w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=1252 1252w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=1428 1428w, https://web-dev.imgix.net/image/C47gYyWYVMMhDmtYSLOWazuyePF2/bmQVIc4ng7YwkD0mAXfm.png?auto=format&amp;w=1500 1500w&quot; width=&quot;750&quot; /&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#conclusion&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With that, we&#39;ve stitched all five segments seamlessly into one and have
subsequently reached the end of our demo. Before we go, you may have noticed
that our &lt;code&gt;onAudioLoaded()&lt;/code&gt; method has no consideration for containers or codecs.
That means all of these techniques will work irrespective of the container or
codec type. Below you can replay the original demo DASH-ready fragmented MP4
instead of MP3.&lt;/p&gt;
&lt;p&gt;&lt;video&gt;      &lt;source src=&quot;https://storage.googleapis.com/web-dev-uploads/video/C47gYyWYVMMhDmtYSLOWazuyePF2/EBCdw0qztJlW109jxf6x.webm&quot; type=&quot;video/webm&quot; /&gt;    &lt;/video&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://simpl.info/mse/audio/mp4gapless&quot; rel=&quot;noopener&quot;&gt;Demo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you&#39;d like to know more check the appendices below for a deeper look at
gapless content creation and metadata parsing. You can also explore
&lt;a href=&quot;https://simpl.info/mse/audio/js/gapless.js&quot; rel=&quot;noopener&quot;&gt;&lt;code&gt;gapless.js&lt;/code&gt;&lt;/a&gt; for a closer look at
the code powering this demo.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;
&lt;h2 id=&quot;appendix-a-creating-gapless-content&quot;&gt;Appendix A: Creating Gapless Content &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#appendix-a-creating-gapless-content&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Creating gapless content can be hard to get right. Below we&#39;ll walk through
creation of the &lt;a href=&quot;http://www.sintel.org/&quot; class=&quot;external&quot; rel=&quot;noopener&quot;&gt;Sintel&lt;/a&gt; media used in
this demo. To start you&#39;ll need a copy of the
&lt;a href=&quot;http://media.xiph.org/sintel/Jan_Morgenstern-Sintel-FLAC.zip&quot; rel=&quot;noopener&quot;&gt;lossless FLAC soundtrack&lt;/a&gt;
for Sintel; for posterity, the SHA1 is included below. For tools, you&#39;ll need
&lt;a href=&quot;http://ffmpeg.org/&quot; rel=&quot;noopener&quot;&gt;FFmpeg&lt;/a&gt;, &lt;a href=&quot;http://gpac.wp.mines-telecom.fr/mp4box/&quot; rel=&quot;noopener&quot;&gt;MP4Box&lt;/a&gt;,
&lt;a href=&quot;http://lame.sourceforge.net/&quot; rel=&quot;noopener&quot;&gt;LAME&lt;/a&gt;, and an OSX installation with
&lt;a href=&quot;https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man1/afconvert.1.html&quot; rel=&quot;noopener&quot;&gt;afconvert&lt;/a&gt;.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;    unzip Jan_Morgenstern-Sintel-FLAC.zip&lt;br /&gt;    sha1sum 1-Snow_Fight.flac&lt;br /&gt;    # 0535ca207ccba70d538f7324916a3f1a3d550194  1-Snow_Fight.flac&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;First, we&#39;ll split out the first 31.5 seconds the &lt;code&gt;1-Snow_Fight.flac&lt;/code&gt; track. We
also want to add a 2.5 second fade out starting at 28 seconds in to avoid any
clicks once playback finishes. Using the FFmpeg command line below we can
accomplish all of this and put the results in &lt;code&gt;sintel.flac&lt;/code&gt;.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;    ffmpeg -i 1-Snow_Fight.flac -t 31.5 -af &quot;afade=t=out:st=28:d=2.5&quot; sintel.flac&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;Next, we&#39;ll split the file into 5 &lt;a href=&quot;https://en.wikipedia.org/wiki/WAV&quot; rel=&quot;noopener&quot;&gt;wave&lt;/a&gt;
files of 6.5 seconds each; it&#39;s easiest to use wave since almost every encoder
supports ingestion of it. Again, we can do this precisely with FFmpeg, after
which we&#39;ll have: &lt;code&gt;sintel_0.wav&lt;/code&gt;, &lt;code&gt;sintel_1.wav&lt;/code&gt;, &lt;code&gt;sintel_2.wav&lt;/code&gt;,
&lt;code&gt;sintel_3.wav&lt;/code&gt;, and &lt;code&gt;sintel_4.wav&lt;/code&gt;.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;    ffmpeg -i sintel.flac -acodec pcm_f32le -map 0 -f segment \&lt;br /&gt;           -segment_list out.list -segment_time 6.5 sintel_%d.wav&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;Next, let&#39;s create the MP3 files. LAME has several options for creating gapless
content. If you&#39;re in control of the content you might consider using &lt;code&gt;--nogap&lt;/code&gt;
with a batch encoding of all files to avoid padding between segments altogether.
For the purposes of this demo though, we want that padding so we&#39;ll use a
standard high quality VBR encoding of the wave files.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;    lame -V=2 sintel_0.wav sintel_0.mp3&lt;br /&gt;    lame -V=2 sintel_1.wav sintel_1.mp3&lt;br /&gt;    lame -V=2 sintel_2.wav sintel_2.mp3&lt;br /&gt;    lame -V=2 sintel_3.wav sintel_3.mp3&lt;br /&gt;    lame -V=2 sintel_4.wav sintel_4.mp3&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;That&#39;s all that&#39;s necessary to create the MP3 files. Now let&#39;s cover the
creation of the fragmented MP4 files. We&#39;ll follow Apple&#39;s directions for
creating media which is
&lt;a href=&quot;http://www.apple.com/itunes/mastered-for-itunes/&quot; rel=&quot;noopener&quot;&gt;mastered for iTunes&lt;/a&gt;.
Below, we&#39;ll convert the wave files into intermediate
&lt;a href=&quot;https://en.wikipedia.org/wiki/Core_Audio_Format&quot; rel=&quot;noopener&quot;&gt;CAF&lt;/a&gt; files, per the
instructions, before encoding them as
&lt;a href=&quot;https://en.wikipedia.org/wiki/Advanced_Audio_Coding&quot; rel=&quot;noopener&quot;&gt;AAC&lt;/a&gt; in an
&lt;a href=&quot;https://en.wikipedia.org/wiki/MP4&quot; rel=&quot;noopener&quot;&gt;MP4&lt;/a&gt; container using the recommended
parameters.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;    afconvert sintel_0.wav sintel_0_intermediate.caf -d 0 -f caff \&lt;br /&gt;              --soundcheck-generate&lt;br /&gt;    afconvert sintel_1.wav sintel_1_intermediate.caf -d 0 -f caff \&lt;br /&gt;              --soundcheck-generate&lt;br /&gt;    afconvert sintel_2.wav sintel_2_intermediate.caf -d 0 -f caff \&lt;br /&gt;              --soundcheck-generate&lt;br /&gt;    afconvert sintel_3.wav sintel_3_intermediate.caf -d 0 -f caff \&lt;br /&gt;              --soundcheck-generate&lt;br /&gt;    afconvert sintel_4.wav sintel_4_intermediate.caf -d 0 -f caff \&lt;br /&gt;              --soundcheck-generate&lt;br /&gt;    afconvert sintel_0_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \&lt;br /&gt;              -b 256000 -q 127 -s 2 sintel_0.m4a&lt;br /&gt;    afconvert sintel_1_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \&lt;br /&gt;              -b 256000 -q 127 -s 2 sintel_1.m4a&lt;br /&gt;    afconvert sintel_2_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \&lt;br /&gt;              -b 256000 -q 127 -s 2 sintel_2.m4a&lt;br /&gt;    afconvert sintel_3_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \&lt;br /&gt;              -b 256000 -q 127 -s 2 sintel_3.m4a&lt;br /&gt;    afconvert sintel_4_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \&lt;br /&gt;              -b 256000 -q 127 -s 2 sintel_4.m4a&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;We now have several M4A files which we need to
&lt;a href=&quot;http://gpac.wp.mines-telecom.fr/mp4box/dash/&quot; rel=&quot;noopener&quot;&gt;fragment&lt;/a&gt;
appropriately before they can be used with
&lt;code&gt;MediaSource&lt;/code&gt;. For our purposes, we&#39;ll use a fragment size of one second. MP4Box
will write out each fragmented MP4 as &lt;code&gt;sintel_#_dashinit.mp4&lt;/code&gt; along with an
MPEG-DASH manifest (&lt;code&gt;sintel_#_dash.mpd&lt;/code&gt;) which can be discarded.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;    MP4Box -dash 1000 sintel_0.m4a &amp;&amp; mv sintel_0_dashinit.mp4 sintel_0.mp4&lt;br /&gt;    MP4Box -dash 1000 sintel_1.m4a &amp;&amp; mv sintel_1_dashinit.mp4 sintel_1.mp4&lt;br /&gt;    MP4Box -dash 1000 sintel_2.m4a &amp;&amp; mv sintel_2_dashinit.mp4 sintel_2.mp4&lt;br /&gt;    MP4Box -dash 1000 sintel_3.m4a &amp;&amp; mv sintel_3_dashinit.mp4 sintel_3.mp4&lt;br /&gt;    MP4Box -dash 1000 sintel_4.m4a &amp;&amp; mv sintel_4_dashinit.mp4 sintel_4.mp4&lt;br /&gt;    rm sintel_{0,1,2,3,4}_dash.mpd&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;That&#39;s it! We now have fragmented MP4 and MP3 files with the correct metadata
necessary for gapless playback. See Appendix B for more details on just what
that metadata looks like.&lt;/p&gt;
&lt;h2 id=&quot;appendix-b-parsing-gapless-metadata&quot;&gt;Appendix B: Parsing Gapless Metadata &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#appendix-b-parsing-gapless-metadata&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Just like creating gapless content, parsing the gapless metadata can be tricky
since there&#39;s no standard method for storage. Below we&#39;ll cover how the two most
common encoders, LAME and iTunes, store their gapless metadata. Let&#39;s start by
setting up some helper methods and an outline for the &lt;code&gt;ParseGaplessData()&lt;/code&gt; used
above.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;    &lt;span class=&quot;token comment&quot;&gt;// Since most MP3 encoders store the gapless metadata in binary, we&#39;ll need a&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// method for turning bytes into integers.  Note: This doesn&#39;t work for values&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// larger than 2^30 since we&#39;ll overflow the signed integer type when shifting.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ReadInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;buffer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; buffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;charCodeAt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; buffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        result &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        result &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; buffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;charCodeAt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ParseGaplessData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;arrayBuffer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// Gapless data is generally within the first 512 bytes, so limit parsing.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; byteStr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;TextDecoder&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;decode&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;arrayBuffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; frontPadding &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; endPadding &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; realSamples &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// ... we&#39;ll fill this in as we go below.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;We&#39;ll cover Apple&#39;s iTunes metadata format first since it&#39;s the easiest to parse
and explain. Within MP3 and M4A files iTunes (and afconvert) write a short
section in ASCII like so:&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;    iTunSMPB[ 26 bytes ]0000000 00000840 000001C0 0000000000046E00&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;This is written inside an ID3 tag within the MP3 container and within a metadata
atom inside the MP4 container. For our purposes, we can ignore the first
&lt;code&gt;0000000&lt;/code&gt; token. The next three tokens are the front padding, end padding, and
total non-padding sample count. Dividing each of these by the sample rate of the
audio gives us the duration for each.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// iTunes encodes the gapless data as hex strings like so:&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;//&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;//    &#39;iTunSMPB[ 26 bytes ]0000000 00000840 000001C0 0000000000046E00&#39;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;//    &#39;iTunSMPB[ 26 bytes ]####### frontpad  endpad    real samples&#39;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;//&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;// The approach here elides the complexity of actually parsing MP4 atoms. It&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;// may not work for all files without some tweaks.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; iTunesDataIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;indexOf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;iTunSMPB&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;iTunesDataIndex &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; frontPaddingIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; iTunesDataIndex &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;34&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  frontPadding &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;parseInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;frontPaddingIndex&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; endPaddingIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; frontPaddingIndex &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  endPadding &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;parseInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;endPaddingIndex&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; sampleCountIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; endPaddingIndex &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  realSamples &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;parseInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sampleCountIndex&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;On the flip side, most open source MP3 encoders will store the gapless metadata
within a special &lt;a href=&quot;http://gabriel.mp3-tech.org/mp3infotag.html&quot; rel=&quot;noopener&quot;&gt;Xing header&lt;/a&gt;
placed inside of a silent MPEG frame (it&#39;s silent so decoders which don&#39;t
understand the Xing header will simply play silence). Sadly this tag is not
always present and has a number of optional fields. For the purposes of this
demo, we have control over the media, but in practice some additional sensibility
checks will be required to know when gapless metadata is actually available.&lt;/p&gt;
&lt;p&gt;First we&#39;ll parse the total sample count. For simplicity we&#39;ll read this from
the Xing header, but it could be constructed from the normal
&lt;a href=&quot;http://www.codeproject.com/Articles/8295/MPEG-Audio-Frame-Header&quot; rel=&quot;noopener&quot;&gt;MPEG audio header&lt;/a&gt;.
Xing headers can be marked by either a &lt;code&gt;Xing&lt;/code&gt; or &lt;code&gt;Info&lt;/code&gt; tag. Exactly 4 bytes
after this tag there are 32-bits representing the total number of frames in the
file; multiplying this value by the number of samples per frame will give us the
total samples in the file.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;    &lt;span class=&quot;token comment&quot;&gt;// Xing padding is encoded as 24bits within the header.  Note: This code will&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// only work for Layer3 Version 1 and Layer2 MP3 files with XING frame counts&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// and gapless information.  See the following document for more details:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// http://www.codeproject.com/Articles/8295/MPEG-Audio-Frame-Header&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; xingDataIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;indexOf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;Xing&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;xingDataIndex &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; xingDataIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;indexOf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;Info&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;xingDataIndex &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// See section 2.3.1 in the link above for the specifics on parsing the Xing&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// frame count.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; frameCountIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; xingDataIndex &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; frameCount &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ReadInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;frameCountIndex&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// For Layer3 Version 1 and Layer2 there are 1152 samples per frame.  See&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// section 2.1.5 in the link above for more details.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; paddedSamples &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; frameCount &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1152&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// ... we&#39;ll cover this below.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;Now that we have the total number of samples we can move on to reading out the
number of padding samples. Depending on your encoder this may be written under a
LAME or Lavf tag nested in the Xing header. Exactly 17 bytes after this header
there are 3 bytes representing the front and end padding in 12-bits each
respectively.&lt;/p&gt;
&lt;div&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;        xingDataIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;indexOf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;LAME&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;xingDataIndex &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; xingDataIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;indexOf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;Lavf&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;xingDataIndex &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token comment&quot;&gt;// See http://gabriel.mp3-tech.org/mp3infotag.html#delays for details of&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token comment&quot;&gt;// how this information is encoded and parsed.&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; gaplessDataIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; xingDataIndex &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; gaplessBits &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ReadInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;byteStr&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;gaplessDataIndex&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;          &lt;span class=&quot;token comment&quot;&gt;// Upper 12 bits are the front padding, lower are the end padding.&lt;/span&gt;&lt;br /&gt;          frontPadding &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; gaplessBits &lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          endPadding &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; gaplessBits &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0xFFF&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;        realSamples &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; paddedSamples &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;frontPadding &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; endPadding&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token literal-property property&quot;&gt;audioDuration&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; realSamples &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;SECONDS_PER_SAMPLE&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token literal-property property&quot;&gt;frontPaddingDuration&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; frontPadding &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;SECONDS_PER_SAMPLE&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;p&gt;With that we have a complete function for parsing the vast majority of gapless
content. Edge cases certainly abound though, so caution is recommended before
using similar code in production.&lt;/p&gt;
&lt;h2 id=&quot;appendix-c-on-garbage-collection&quot;&gt;Appendix C: On Garbage Collection &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#appendix-c-on-garbage-collection&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Memory belonging to &lt;code&gt;SourceBuffer&lt;/code&gt; instances is actively
&lt;a href=&quot;https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)&quot; rel=&quot;noopener&quot;&gt;garbage collected&lt;/a&gt;
according to content type, platform specific limits, and the current play
position. In Chrome, memory will first be reclaimed from already played buffers.
However, if memory usage exceeds platform specific limits, it will remove memory
from unplayed buffers.&lt;/p&gt;
&lt;p&gt;When playback reaches a gap in the timeline due to reclaimed memory it may
glitch if the gap is small enough or stall completely if the gap is too large.
Neither is a great user experience, so it&#39;s important to avoid appending too
much data at once and to manually remove ranges from the media timeline that are
no longer necessary.&lt;/p&gt;
&lt;p&gt;Ranges can be removed via the
&lt;a href=&quot;https://w3c.github.io/media-source/#widl-SourceBuffer-remove-void-double-start-unrestricted-double-end&quot; rel=&quot;noopener&quot;&gt;&lt;code&gt;remove()&lt;/code&gt;&lt;/a&gt;
method on each &lt;code&gt;SourceBuffer&lt;/code&gt;; which takes a &lt;code&gt;[start, end]&lt;/code&gt; range in seconds.
Similar to &lt;code&gt;appendBuffer()&lt;/code&gt;, each &lt;code&gt;remove()&lt;/code&gt; will fire an &lt;code&gt;updateend&lt;/code&gt; event once
it completes. Other removes or appends should not be issued until the event
fires.&lt;/p&gt;
&lt;p&gt;On desktop Chrome, you can keep approximately 12 megabytes of audio content and
150 megabytes of video content in memory at once. You should not rely on these
values across browsers or platforms; e.g., they are most certainly not
representative of mobile devices.&lt;/p&gt;
&lt;p&gt;Garbage collection only impacts data added to &lt;code&gt;SourceBuffers&lt;/code&gt;; there are no
limits on how much data you can keep buffered in JavaScript variables. You may
also reappend the same data in the same position if necessary.&lt;/p&gt;
&lt;h2 id=&quot;feedback&quot;&gt;Feedback &lt;a class=&quot;headline-link&quot; href=&quot;https://web.dev/mse-seamless-playback/#feedback&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
</content>
    <author>
      <name>Dale Curtis</name>
    </author>
  </entry>
</feed>
