By Michal Necasek ©October 2000

References on LVM and JFS:


Basic concepts

LVM and JFS are not exactly new in the OS/2 world, they were first introduced in OS/2 Warp Server for e-business, aka WSeB, about two years ago. But only few lucky OS/2 users (including me) use WSeB on their home/office machines. Thus the upcoming Serenity Systems' eComStation (eCS) will be the first exposure to LVM/JFS for many OS/2 users. Many of the prospective eCS owners have understandable concerns about these new concepts, sometimes they perhaps even fear them. In this article I will try to explain the basic concepts introduced by LVM and JFS and some of the logic behind them.

First off, I'll finally explain what those acronyms are: LVM stands for Logical Volume Manager and JFS is a Journaled File System. Not much clearer, is it? It will be later - I hope.

LVM and JFS didn't originate on OS/2. They were created for AIX, IBM's high-end Unix clone running on IBM's RS/6000 hardware. For the users that means that all the really nasty bugs were ironed out long ago.

The role of LVM is to present a simple logical view of underlying physical storage space, ie. harddrive(s). LVM manages individual physical disks - or to be more precise, the individual partitions present on them (for a short glossary of terms, look at the end of the article). LVM hides the numbers, size and location of physical partitions from users. Instead it presents the concept of logical volume. A logical volume may correspond to a physical partition (but that obviously almost defeats the purpose of LVM) but it doesn't have to. One volume may be composed of several partitions located on multiple physical disks. Not only that, the volumes can even be extended (not shrunk - people usually want more space, not less). They can even be extended while the OS is running and the filesystem is being accessed! Of course, most home and SOHO users don't have the hardware required for this.

The more experienced readers are now probably wondering how 'traditional' file systems like FAT or HPFS could be extended at runtime. The answer is, they can't. To take full advantage of LVM, it is necessary to use a filesystem designed for it. This file system is of course JFS. JFS is not really tied to LVM, both LVM and JFS can exist separately, but only when working in concert both can reach their full potential.

JFS volume structure

JFS is organized like a traditional Unix-ish file system, it presents a logical view of files and directories linked together to form a tree-like structure. This is the concept that spread from the Unix world pretty much everywhere else and that we all know. I can only speculate about IBM's motives for incorporating JFS into WSeB, but it has some obvious advantages when compared to HPFS and HPFS386 (some shortcomings too). I see two significant advantages:

  • capacity - JFS allows much larger file and volume sizes than HPFS. Basically JFS is a 64-bit file system while HPFS structures are at most 32 bits large.
  • recovery - thanks to the journaling techniques employed by JFS (described in more detail later), CHKDSK times for JFS are significantly faster than for equivalent HPFS volumes. Roughly speaking, where HPFS checkdisk after a crash takes minutes, JFS takes seconds.
JFS is created on top of a logical volume. To maintain information about files and directories, it uses the following important internal structures:

  • the superblock>
  • the i-nodes
  • the data blocks
  • the allocation groups

The superblock lies at the heart of JFS (and many other file systems). It contains essential information such as size of file system, number of blocks it contains or state of the file system (clean, dirty etc.).

The entire file system space is divided into logical blocks that contain file or directory data. For JFS, the logical blocks are always 4096 bytes (4K) in size, but can be optionally subdivided into smaller fragments (512, 1024 or 2048 bytes).

An i-node is a logical entity that contains information about a file or directory. There is a 1:1 relationship between i-nodes and files/directories. An i-node contains file type, access permissions, user/group ID (UID/GID - unused on OS/2), access times and points to actual logical blocks where file contents are stored. The maximum file size allowed in JFS is 2TB (HPFS and FAT allow 2GB max). It should be noted that the number of i-nodes is fixed. It is determined at file system creation (FORMAT) time and depends on fragment size (which is user selectable). In theory users could run out of i-nodes, meaning that they would be unable to create more files even if there was enough free space. In practice this is extremely rare.

Fragments were already briefly mentioned in the discussion of logical blocks. The JFS logical block size is fixed at 4K. This is a reasonable default but it means that the file system cannot allocate less than 4K for file storage. If a file system stores large amounts of small files (< 2K), the disk space waste becomes significant. We've all got to know and hate this problem from FAT (cluster size of 32K leads to massive waste of space, in some cases over 50%). JFS attacks this by allowing fragmentation of logical blocks into smaller units, as small as 512 bytes (this is sector size on harddrives and it is not possible to read or write less than 512 bytes from/to disk). However users should be careful because fragmentation incurs additional overhead and hence slows down disk access. I would recommend using fragments smaller than 4K only when the users know for sure that they will store very large amounts of small files on the file system.

The entire JFS volume space is subdivided into allocation groups. Each allocation group contains i-nodes and data blocks. This enables the file system to store i-nodes and their associated data in physical proximity (HPFS uses a very similar technique). The allocation group size varies from 8MB to 64MB and depends on fragment size and number of fragments it contains.

Journaling

As the name of JFS implies, journaling is a very important feature of this file system. It should be noted that journaling is actually independent of JFS's structure described above. The journaling technique has its roots in database systems and it is employed to ensure maximum consistency of the file system, hence minimizing the risk of data loss - a very important feature for servers, but even home/SOHO users hate to lose data.

JFS uses a special log device to implement circular journal. On AIX, several JFS volumes can share single log device. I'm not sure this is possible on OS/2, I believe each JFS volume (corresponding to a drive letter) has its own 'inline' log located inside the JFS volume - its size is  selectable at FORMAT time.

It is important to note that JFS does not log (or journal) everything. It only logs all changes to file system meta-data. Simply speaking, the log contains a record of changes to everything in the file system except actual file data, ie. changes to the superblock, i-nodes, directories and allocation structures. It is clear that there must be some overhead here and indeed, performance may suffer when applications are doing lots of synchronous (uncached) I/O or creating and/or deleting many files in short amount of time. The performance loss is however not noticeable in most cases and is well worth the increased security.

The log (or journal) occupies a dedicated area on disk and is written to immediately when any meta-data change occurs. When the disk becomes idle, the actual file system structure is updated according to the log. After a crash, all it usually takes to restore the file system to full consistency is replaying the log, ie. performing the recorded transactions. Of course, if a process was in the middle of writing a file when the system crashed or power died, the file could be inconsistent (the app might not be able to read it again),  but you will not lose this file nor other files, as is often the case with other file systems.

OS/2 considerations

The above was mostly a generic description of LVM and JFS and applies to both AIX and OS/2 and perhaps even to Linux (at least the JFS part). Now I will discuss how exactly LVM/JFS differ from the solutions previously available on OS/2.

LVM

From users' point of view LVM replaces FDISK. On WSeB, FDISK is no longer available. In fact, if you try to run fdisk, you get the following message:

FDISK.COM has been replaced by LVM.EXE and FDISKPM.EXE has been
replaced by LVMGUI.CMD.  Please use one of these utilities.

It should be noted here that LVMGUI is a GUI app (as the name implies) and requires Java, while LVM is a VIO app and can be run from a command line boot. It looks and feels similar to FDISK, but it presents two views: logical and physical. FDISK didn't differentiate between the two. These views corresponds to the concepts described at the beginning of this article. Basically the physical view shows physical disks and lets users manage partitions while logical view presents volumes. One important concept must be introduced here, and that is a compatibility volume.  A compatibility volume corresponds to old FDISK partitions. During WSeB installation, the installer automatically converts all existing partitions to compatibility volumes. This conversion technically means that the installer writes a special block of LVM data to the sector following the partition table. OSes other than WSeB won't see any difference at all. It is however necessary to manage all partitions/volumes exclusively with LVM after this conversion.

I've captured several screenshots of LVM and LVMGUI to give users unacquainted with LVM some idea of what they can expect. First, there's the logical view of LVM: See screenshot

Now there's the physical view of the same system. See screenshot

And finally a glance at LVMGUI. It looks pretty cool but takes ages to start. Personally I prefer the VIO version. Disk 3 is a ZIP-100 by the way and G: is a FAT32 partition. See screenshot

All FAT, HPFS, FAT32 etc. partitions can reside on either compatibility or LVM volumes, however other OSes will only be able to access them on compatibility volumes.  JFS on the other hand must be created on LVM volumes. Those were already described above and enjoy all the flexibility of LVM, such as spanning multiple physical disks or online expansion.

Each volume, compatibility or LVM, represents a single drive letter on an OS/2 system. LVM however is significantly more flexible than FDISK because the drive letters are not assigned by a fixed algorithm. Instead, users can assign arbitrary drive letters to volumes. The drive letters can even be changed at runtime, but users have to understand the dangers before doing that. If you reassign the drive letter of the boot volume, it doesn't require a genius to understand that a system crash will be the most likely result.

JFS

OS/2 users often ask what exactly the difference is between the various file systems available on OS/2. The following table, taken almost verbatim from WSeB's Quick Beginnings book, summarizes the most important differences between the file systems available for WSeB from IBM.
 
Characteristic Journaled File System (JFS) 386 High Performance File System (386HPFS) High Performance File System (HPFS) FAT File System
Max volume size 2TB (terabytes) 64GB (gigabytes) 64GB (gigabytes) 2GB (gigabytes)
Max file size 2TB (terabytes) 2GB (gigabytes) 2GB (gigabytes) 2GB (gigabytes)
Allows spaces and periods in file names Yes Yes Yes No (8.3 format)
Standard directory and file attributes Within file system Within file system Within file system Within file system
Extended Attributes (64KB text or binary data with keywords) Within file system Within file system Within file system In separate file
Max path length 260 characters 1) 260 characters 260 characters 64 characters
Bootable No 2) Yes Yes Yes
Allows dynamic volume expansion Yes No No No
Scales with SMP Yes No No No
Local security support No Yes No No
Average wasted space per file 256 to 2048 bytes 256 bytes 256 bytes 1/2 cluster (1KB to 16KB)
Allocation information for files Near each file in its i-node Near each file in its FNODE Near each file in its FNODE Centralized near volume beginning
Directory structure Sorted B+tree Sorted B-tree Sorted B-tree, must be searched exhaustively Unsorted linear
Directory location Close to files it contains Near seek center of volume Near seek center of volume Root directory at beginning of volume; others scattered
Write-behind (lazy write) Optional Optional Optional Optional
Maximum cache size Physical memory available Physical memory available 2MB 14MB
Caching program None (parameters set in CONFIG.SYS) CACHE386.EXE CACHE.EXE None (parameters set in CONFIG.SYS)
LAN Server access control lists Within file system Within file system In separate file (NET.ACC) In separate file

1) JFS stores file and directory names in Unicode. This allows JFS to always maintain proper sort order, regardless of active codepage.

2) This is not a permanent limitation. Only no one wrote a JFS micro- and mini-IFS yet.

It might perhaps interest some users that JFS also seems to have built-in support for DASD limits. I have however never tried to use this feature. DASD limits, aka Directory Limits feature of LAN Server allows administrators to control how much space a directory can take, effectively enabling them to limit disk space usage of users. Previously this feature only worked on HPFS386 volumes. Obviously this is of no use to home users who have all their disk space for themselves but it can be very useful for system administrators.

JFS Utilities

WSeB comes with several new JFS-specific utilities, in addition to the usual ones like CHKDSK and FORMAT. I'll only give a quick overview of them here, the important ones are documented in the Command Reference.

  • DEFRAGFS - can be used to defragment and reorganize a JFS volume. It is similar in spirit to equivalent FAT or HPFS utilities. It should be noted that just like HPFS, JFS tries not to fragment files. However especially on nearly full volumes, this is not always possible. In addition to defragmenting files, DEFRAGFS will try to rearrange internal JFS structures by placing certain pieces of data physically close to each other to speed up disk access. DEFRAGFS is designed to be run in the background.
  • EXTENDFS - after enlarging a LVM volume, this utility must be used to tell the JFS file system that it should take up all the extra space now available.
  • CACHEJFS - not documented in Command Reference, this utility can be used to query the settings of the JFS cache and set its lazy writer parameters.
  • CHKLGJFS - again undocumented. This is a diagnostic tool and will show a formatted log of the last (or one before last) checkdisk process. Not very useful to normal users.

In addition to the above utilities that are supplied with WSeB, I also managed to build several extra utilities from the OpenJFS sources thanks to invaluable help from several friends. Those are not available publicly in binary form to my knowledge, though I could probably e-mail them to interested readers - but beware, these are for experts only and not guaranteed to work!

  • LOGDUMP - as the name suggests, this tool dumps formatted contents of the current JFS log (journal) to a file.
  • CSTATS - lists current statistics of the JFS cache.
  • XPEEK - perhaps the most useful of the bunch, this one is the closest thing to a JFS disk editor I've seen. This utility lets users dump and optionally modify various internal JFS structures. It has a very crude interface but it worked for me. Needless to say, this utility is extremely dangerous and you can easily destroy your data if you don't know exactly what you're doing.

Conclusion

I have deliberately skipped some of the more advanced and less widely used LVM/JFS concepts. Interested readers will find more in the books and files I listed in the reference section. I hope I managed to present the features and benefits of LVM and JFS in a clear and concise manner. I believe these two pieces of software brought/will bring new levels of flexibility, manageability and reliability to WSeB and shortly all eCS users. Don't be afraid of them!

Parting note: Everything said here about WSeB will equally apply to eCS.





Glossary of Terms:

  • Partition - a portion of physical hard disk space. A hard disk may contain one or more partitions. Partitions are defined by PC BIOS and described by partition tables stored on a harddrive. Every PC OS understands partitions.
  • Volume - a logical concept which hides the physical organization of storage space. A compatibility volume directly corresponds to a partition while LVM volume may span more than one partition on one or more physical disks. A volume is seen by users as a single drive letter. Only WSeB and eCS understand LVM volumes.
  • DASD - Direct Access Storage Device. A term often used by IBM instead of 'hard disk' to confuse mere mortals.


Unless otherwise noted, all content on this site is Copyright © 2004, VOICE