Artikel ini adalah ringkasan presentasi yang tersedia di sini. File demo yang dihasilkan dapat diunduh di akhir artikel. Repositori dapat ditemukan di

Perkenalan

File Tunggalalat untuk pengarsipan web, biasanya menyimpan sumber daya halaman web sebagai URI data. Namun, pendekatan ini mungkin tidak efisien untuk sumber daya yang besar. Solusi yang lebih elegan muncul melalui penggabungan struktur fleksibel format ZIP dengan HTML. Kami kemudian akan mengambil langkah lebih jauh dengan merangkum seluruh struktur ini dalam file PNG.

Kekuatan ZIP

Format ZIP menyediakan struktur terorganisir untuk menyimpan banyak file. Hal ini didasarkan pada struktur dengan entri file diikuti oleh direktori pusat. Direktori pusat bertindak sebagai daftar isi, berisi header dengan metadata tentang setiap entri file. Header ini mencakup informasi penting seperti nama file, ukuran, checksum, dan offset entri file. Apa yang membuat ZIP sangat serbaguna adalah fleksibilitasnya dalam penempatan data. Format ini memungkinkan data ditambahkan sebelum konten ZIP dengan mengatur offset lebih besar dari 0 untuk entri file pertama, sekaligus memungkinkan hingga 64 KB data untuk ditambahkan setelahnya (yaitu komentar ZIP). Fitur ini membuatnya cocok untuk membuat file poliglot.

Membuat File Poliglot HTML/ZIP

Berdasarkan pengetahuan ini, kita dapat membuat arsip self-extracting yang berfungsi di browser web. Halaman yang akan ditampilkan dan sumber dayanya disimpan dalam file ZIP. Dengan menyimpan data ZIP dalam komentar HTML, kita dapat membuat halaman self-extracting yang mengekstrak dan menampilkan konten file ZIP.

Berikut adalah struktur dasar halaman self-extracting:



  
     kumpulan karakter=utf-8>
    </span>Harap tunggu...<span class="nt"/>
    <span class="nt"><script type="rocketlazyloadscript"><!(CDATA(<span class="na">src=))></script></span><span class="s">lib/zip.min.js</span><span class="nt">></span>
  <span class="nt"/>
  <span class="nt"/>
    <span class="nt"/>Harap tunggu...<span class="nt"/>
    <span class="c"><!-- (ZIP data) --></span>
    <span class="nt"><script type="rocketlazyloadscript"/></span><span class="o"><!--</span> <span class="nx">Content</span> <span class="k">of</span> <span class="nx">assets</span><span class="o">/</span><span class="nx">main</span><span class="p">.</span><span class="nx">js</span> <span class="o">--></span><span class="nt"/>
  <span class="nt"/>
<span class="nt"/>
</span></code></pre>
</div>
</div>
The <code class="language-plaintext highlighter-rouge">assets/main.js</code> script on this “bootstrap page” reads the ZIP data by calling <code class="language-plaintext highlighter-rouge">fetch(””)</code> and uses the <code class="language-plaintext highlighter-rouge">lib/zip.min.js</code> JavaScript library to extract it. This bootstrap page is then replaced by the extracted page with its resources. However, there’s a problem: due to the same-origin policy, retrieving ZIP data directly with <code class="language-plaintext highlighter-rouge">fetch(””)</code> fails when the page is opened from the filesystem (except in Firefox).
<h2 id="reading-zip-data-from-the-dom">Reading ZIP Data from the DOM</h2>
To overcome the filesystem limitation, we can read ZIP data directly from the DOM. This approach requires careful handling of character encoding. The bootstrap page is now encoded in <code class="language-plaintext highlighter-rouge">windows-1252</code>, which allows data to be read from the DOM with minimum degradation. Some encoding challenges emerge:
<ol>
<li>DOM text content gets decoded to <code class="language-plaintext highlighter-rouge">UTF-16</code> instead of <code class="language-plaintext highlighter-rouge">windows-1252</code></li>
<li>The <code class="language-plaintext highlighter-rouge">NULL</code> character (<code class="language-plaintext highlighter-rouge">U+0000</code>) gets decoded to the replacement character (<code class="language-plaintext highlighter-rouge">U+FFFD</code>)</li>
<li>Carriage returns (<code class="language-plaintext highlighter-rouge">\r</code>) and carriage return + line feeds (<code class="language-plaintext highlighter-rouge">\r\n</code>) get decoded to line feeds (<code class="language-plaintext highlighter-rouge">\n</code>)</li>
</ol>
The first 2 points can be fixed by using an association table to convert characters to <code class="language-plaintext highlighter-rouge">windows-1252</code>. For the last point, “consolidation data” in a JSON script tag is added in the bootstrap page. This data tracks the offsets of carriage returns and carriage return + line feeds, and enables accurate reconstruction of the original content when extracting the ZIP data.
Here is the resulting structure:
<div class="language-html highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="cp"/>
<span class="nt"/>
  <span class="nt"/>
    <span class="nt"><meta/> <span class="na">charset=</span><span class="s">windows-1252</span><span class="nt">></span>
    <span class="nt"><title/></span>Please wait...<span class="nt"/>
    <span class="nt"><script><!(CDATA(<span class="na">src=))></script></span><span class="s">lib/zip.min.js</span><span class="nt">></span>
  <span class="nt"/>
  <span class="nt"/>
    <span class="nt"/>Harap tunggu...<span class="nt"/>
    <span class="c"><!-- (ZIP data) --></span>
    <span class="nt"><script type="rocketlazyloadscript"><!(CDATA(<span class="na">text=))></script></span><span class="s">aplikasi/json</span><span class="nt">></span>
    <span class="p">(</span><span class="nx">konsolidasi</span> <span class="nx">data</span><span class="p">)</span>
    <span class="nt"/>
    <span class="nt"><script type="rocketlazyloadscript"/></span><span class="o"><!--</span> <span class="nx">Content</span> <span class="k">of</span> <span class="nx">assets</span><span class="o">/</span><span class="nx">main</span><span class="p">.</span><span class="nx">js</span> <span class="o">--></span><span class="nt"/>
  <span class="nt"/>
<span class="nt"/>
</span></code></pre>
</div>
</div>
<h2 id="adding-png-to-the-mix">Adding PNG to the Mix</h2>
The PNG format consists of a signature followed by chunks. Each chunk contains these fieds:
<ul>
<li>Length (4 bytes)</li>
<li>Type identifier (4 bytes) e.g., <code class="language-plaintext highlighter-rouge">IHDR</code> (header), <code class="language-plaintext highlighter-rouge">IDAT</code> (data), <code class="language-plaintext highlighter-rouge">IEND</code> (end of file), <code class="language-plaintext highlighter-rouge">tEXt</code> (custom data)…</li>
<li>Data content (n bytes)</li>
<li>CRC32 checksum (4 bytes)</li>
</ul>
Here is the minimum structure of a PNG file:
<ol>
<li>PNG signature (8 bytes)</li>
<li><code class="language-plaintext highlighter-rouge">IHDR</code> chunk (13 bytes)</li>
<li>One or more <code class="language-plaintext highlighter-rouge">IDAT</code> chunks</li>
<li><code class="language-plaintext highlighter-rouge">IEND</code> chunk (12 bytes)</li>
</ol>
<h2 id="the-final-form-htmlzippng-polyglot-files">The Final Form: HTML/ZIP/PNG Polyglot Files</h2>
The ultimate implementation combines all three formats into a single file. The HTML format’s fault tolerance allows for this complex structure. However, this approach introduces new challenges:
<ol>
<li>The signature, the <code class="language-plaintext highlighter-rouge">IHDR</code> and the <code class="language-plaintext highlighter-rouge">IEND</code> chunks become visible as text nodes briefly and should be removed as soon as the page is parsed</li>
<li>The displayed page is rendered in quirks mode, requiring specific handling through <code class="language-plaintext highlighter-rouge">document.write()</code> and related methods to parse the displayed page</li>
</ol>
Here is the resulting structure viewed as PNG chunks:
<div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code>(PNG signature)
(IHDR chunk)
(tEXt chunk
  
  
    
      <meta charset="windows-1252"/>
      <title>Please wait...
      
      
    
  

    
  

    
  

    
  
) (potongan IEND)

Optimasi Melalui Penggunaan Kembali Gambar

Pengoptimalan terakhir menghapus gambar dari file ZIP dan menggunakan kembali halaman tersebut, yang ditafsirkan sebagai file PNG, untuk menggantikannya di halaman yang ditampilkan.

File yang Dihasilkan

Unduh demo.png.zip.html.

Sumber

Krystian Wiśniewski
Krystian Wiśniewski is a dedicated Sports Reporter and Editor with a degree in Sports Journalism from He graduated with a degree in Journalism from the University of Warsaw. Bringing over 14 years of international reporting experience, Krystian has covered major sports events across Europe, Asia, and the United States of America. Known for his dynamic storytelling and in-depth analysis, he is passionate about capturing the excitement of sports for global audiences and currently leads sports coverage and editorial projects at Agen BRILink dan BRI.