Structured Storage Viewer Explained: Features, Uses, and Examples

Structured Storage Viewer: A Complete Guide for Developers

Structured storage (also called compound files or Structured Storage File System) is a Microsoft technology that lets multiple streams and storages be stored within a single file. Developers encounter structured storage most often when working with legacy Office binary formats (e.g., .doc, .xls) and some COM-based compound files. A Structured Storage Viewer is a tool that lets you inspect the internal tree of storages and streams, view stream contents, extract streams, and diagnose corruption. This guide explains when and why to use such a viewer, how the file format is organized, common features of viewers, practical workflows, and tips for building or integrating a viewer into your developer tools.

Who should read this

  • Application developers debugging legacy Microsoft Office files.
  • Forensic analysts and malware researchers examining compound files.
  • Tooling engineers building file inspectors or document converters.
  • Developers implementing parsers for compound file formats.

What is a Structured Storage Viewer

A Structured Storage Viewer visualizes a compound file’s internal hierarchy. Compound files are similar to a file-system inside one file: nodes are either storages (like folders) or streams (like files). A viewer displays that tree, allows reading stream contents as text or binary, and often supports export, search, and simple edits.

Why it matters

  • Debugging: Inspect embedded objects, macros, and metadata inside old Office documents.
  • Data recovery: Extract undamaged streams from partially corrupted files.
  • Security: Locate suspicious macros, embedded executables, or anomalous streams.
  • Interoperability: Understand how third-party apps store data in compound files.

Compound file basics (high-level)

  • File header: identifies the file as a compound file and points to allocation structures.
  • FAT (File Allocation Table) / DIFAT: maps which sectors hold which streams.
  • Directory entries: a linked list/tree describing storages and streams (names, type, size, starting sector).
  • Mini FAT and mini streams: for small streams stored more compactly inside a “mini” allocation.
  • Streams: the actual byte content of items (documents, metadata, embedded objects).

Common Structured Storage Viewer features

  • Tree view of storages and streams with names, types, sizes.
  • Hex/ASCII viewer for raw stream bytes.
  • Text/Unicode/UTF-16 rendering for readable streams.
  • Export single streams or whole storages to files.
  • Search within streams (text or hex).
  • Detect and follow mini-streams and show when a stream is stored in mini FAT.
  • Integrity checks and simple repair or recovery options.
  • Plugins or file-type detectors to automatically interpret common stream formats (e.g., OLE10Native, VBAProject).

Practical workflows

1) Inspecting a suspicious .doc file

  1. Open the .doc file in the viewer.
  2. Expand the tree and locate “Macros” or “VBAProject” streams.
  3. Open the VBAProject stream in text mode to look for suspicious obfuscated code or auto-executing macros.
  4. Export the stream for deeper static analysis in a code editor or deobfuscator.

2) Recovering data from partially corrupted files

  1. Load the file; note any sector/FAT errors reported by the viewer.
  2. Identify large intact streams (e.g., WordDocument) and export them.
  3. For missing directory entries, scan raw sectors for known headers (e.g., PK for embedded ZIP) and carve streams.
  4. Reconstruct a minimal compound file by creating a new container and inserting recovered streams, if viewer supports write/export.

3) Extracting embedded files

  1. Find streams like “Package” or “Embedded Object”.
  2. Inspect OLE10Native or PK header inside stream.
  3. Export as a separate file with the correct extension for downstream tools.

Building or integrating a Structured Storage Viewer (developer notes)

Libraries and formats

  • Use existing libraries when possible:
    • libolecf / libole (forensic libraries)
    • Apache POI (Java) for reading older Office binary formats
    • OpenMcdf (.NET) for reading/writing compound files
  • Understand the two allocation paths: regular FAT and MiniFAT. Implement logic to read mini streams and map sector chains.

UI considerations

  • Tree control with lazy loading for large containers.
  • Dual-pane content viewers (text + hex).
  • Quick export buttons and context menus.
  • Highlight suspect stream types (e.g., macros, embedded executables).

Performance tips

  • Stream decoding lazily; avoid loading all stream bytes upfront.
  • Cache parsed directory to avoid repeated FAT traversals.
  • Limit rendering for very large streams; provide chunked viewing.

Security considerations

  • Treat streams as untrusted input—do not auto-execute embedded code or load embedded files.
  • Sanitize filenames when exporting.
  • Offer a sandboxed export option or explicit user confirmation before opening exported files in external apps.

Examples of tools

  • OleView (by Microsoft/Paul Haeberli variants) — classic for viewing OLE structures.
  • 7-Zip — can open some compound files as archives (useful for simple extraction).
  • libolecf-based forensic tools — for deeper analysis and carving.
  • Custom scripts using Apache POI or OpenMcdf — for automated extraction/conversion.

Quick reference: common stream names and meanings

  • WordDocument / 0Table: main Word binary content.
  • Workbook / Book: Excel workbook streams.
  • VBAProject / VbaProject: embedded macros.
  • SummaryInformation / DocumentSummaryInformation: metadata.
  • OLE10Native: wrapper for embedded files (often contains filename and payload).

Troubleshooting tips

  • If the viewer shows “invalid header”, check whether the file is actually a compound file or a different format (e.g., OpenXML .docx is ZIP).
  • If streams appear empty but size > 0, verify miniFAT handling—small streams may be in the mini stream.
  • Use hex search for known signatures (PK, MZ, OLE10) to locate embedded payloads.

Conclusion

A Structured Storage Viewer is an essential tool when working with legacy compound-file formats: it accelerates debugging, aids recovery, and enhances security inspection. Developers should use established libraries, follow safe handling practices for untrusted content, and design UI/UX to present hierarchical structures and raw data efficiently.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *