Structured Storage Viewer: A Complete Guide for Developers
Structured storage (also called compound files or Structured Storage File System) is a Microsoft technology that lets multiple streams and storages be stored within a single file. Developers encounter structured storage most often when working with legacy Office binary formats (e.g., .doc, .xls) and some COM-based compound files. A Structured Storage Viewer is a tool that lets you inspect the internal tree of storages and streams, view stream contents, extract streams, and diagnose corruption. This guide explains when and why to use such a viewer, how the file format is organized, common features of viewers, practical workflows, and tips for building or integrating a viewer into your developer tools.
Who should read this
- Application developers debugging legacy Microsoft Office files.
- Forensic analysts and malware researchers examining compound files.
- Tooling engineers building file inspectors or document converters.
- Developers implementing parsers for compound file formats.
What is a Structured Storage Viewer
A Structured Storage Viewer visualizes a compound file’s internal hierarchy. Compound files are similar to a file-system inside one file: nodes are either storages (like folders) or streams (like files). A viewer displays that tree, allows reading stream contents as text or binary, and often supports export, search, and simple edits.
Why it matters
- Debugging: Inspect embedded objects, macros, and metadata inside old Office documents.
- Data recovery: Extract undamaged streams from partially corrupted files.
- Security: Locate suspicious macros, embedded executables, or anomalous streams.
- Interoperability: Understand how third-party apps store data in compound files.
Compound file basics (high-level)
- File header: identifies the file as a compound file and points to allocation structures.
- FAT (File Allocation Table) / DIFAT: maps which sectors hold which streams.
- Directory entries: a linked list/tree describing storages and streams (names, type, size, starting sector).
- Mini FAT and mini streams: for small streams stored more compactly inside a “mini” allocation.
- Streams: the actual byte content of items (documents, metadata, embedded objects).
Common Structured Storage Viewer features
- Tree view of storages and streams with names, types, sizes.
- Hex/ASCII viewer for raw stream bytes.
- Text/Unicode/UTF-16 rendering for readable streams.
- Export single streams or whole storages to files.
- Search within streams (text or hex).
- Detect and follow mini-streams and show when a stream is stored in mini FAT.
- Integrity checks and simple repair or recovery options.
- Plugins or file-type detectors to automatically interpret common stream formats (e.g., OLE10Native, VBAProject).
Practical workflows
1) Inspecting a suspicious .doc file
- Open the .doc file in the viewer.
- Expand the tree and locate “Macros” or “VBAProject” streams.
- Open the VBAProject stream in text mode to look for suspicious obfuscated code or auto-executing macros.
- Export the stream for deeper static analysis in a code editor or deobfuscator.
2) Recovering data from partially corrupted files
- Load the file; note any sector/FAT errors reported by the viewer.
- Identify large intact streams (e.g., WordDocument) and export them.
- For missing directory entries, scan raw sectors for known headers (e.g., PK for embedded ZIP) and carve streams.
- Reconstruct a minimal compound file by creating a new container and inserting recovered streams, if viewer supports write/export.
3) Extracting embedded files
- Find streams like “Package” or “Embedded Object”.
- Inspect OLE10Native or PK header inside stream.
- Export as a separate file with the correct extension for downstream tools.
Building or integrating a Structured Storage Viewer (developer notes)
Libraries and formats
- Use existing libraries when possible:
- libolecf / libole (forensic libraries)
- Apache POI (Java) for reading older Office binary formats
- OpenMcdf (.NET) for reading/writing compound files
- Understand the two allocation paths: regular FAT and MiniFAT. Implement logic to read mini streams and map sector chains.
UI considerations
- Tree control with lazy loading for large containers.
- Dual-pane content viewers (text + hex).
- Quick export buttons and context menus.
- Highlight suspect stream types (e.g., macros, embedded executables).
Performance tips
- Stream decoding lazily; avoid loading all stream bytes upfront.
- Cache parsed directory to avoid repeated FAT traversals.
- Limit rendering for very large streams; provide chunked viewing.
Security considerations
- Treat streams as untrusted input—do not auto-execute embedded code or load embedded files.
- Sanitize filenames when exporting.
- Offer a sandboxed export option or explicit user confirmation before opening exported files in external apps.
Examples of tools
- OleView (by Microsoft/Paul Haeberli variants) — classic for viewing OLE structures.
- 7-Zip — can open some compound files as archives (useful for simple extraction).
- libolecf-based forensic tools — for deeper analysis and carving.
- Custom scripts using Apache POI or OpenMcdf — for automated extraction/conversion.
Quick reference: common stream names and meanings
- WordDocument / 0Table: main Word binary content.
- Workbook / Book: Excel workbook streams.
- VBAProject / VbaProject: embedded macros.
- SummaryInformation / DocumentSummaryInformation: metadata.
- OLE10Native: wrapper for embedded files (often contains filename and payload).
Troubleshooting tips
- If the viewer shows “invalid header”, check whether the file is actually a compound file or a different format (e.g., OpenXML .docx is ZIP).
- If streams appear empty but size > 0, verify miniFAT handling—small streams may be in the mini stream.
- Use hex search for known signatures (PK, MZ, OLE10) to locate embedded payloads.
Conclusion
A Structured Storage Viewer is an essential tool when working with legacy compound-file formats: it accelerates debugging, aids recovery, and enhances security inspection. Developers should use established libraries, follow safe handling practices for untrusted content, and design UI/UX to present hierarchical structures and raw data efficiently.
Leave a Reply