summaryrefslogtreecommitdiffstats
path: root/XFS_Filesystem_Structure/en-US/Allocation_Groups.xml
blob: a745abee27c24c235e117d69a296b090f07325f1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
<?xml version='1.0'?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
]>

<chapter id="Allocation_Groups">
	<title>Allocation Groups</title>
	<para>
		XFS filesystems are divided into a number of equally sized chunks called Allocation Groups. Each AG can almost be thought of as an individual filesystem that maintains it's own space usage. Each AG can be up to one terabyte in size (512 bytes * 2<superscript>31</superscript>), regardless of the underlying device's sector size.
	</para>
	<para>
		Each AG has the following characteristics:
	</para>
   <itemizedlist>
      <listitem>
         <para>A super block describing overall filesystem info</para>
      </listitem>
      <listitem>
         <para>Free space management</para>
      </listitem>
      <listitem>
         <para>Inode allocation and tracking</para>
      </listitem>
   </itemizedlist>
   <para>
		Having multiple AGs allows XFS to handle most operations in parallel without degrading performance as the number of concurrent accessing increases.
	</para>
	<para>
		The only global information maintained by the first AG (primary) is free space across the filesystem and total inode counts. If the <command>XFS_SB_VERSION2_LAZYSBCOUNTBIT</command> flag is set in the superblock, these are only updated on-disk when the filesystem is cleanly unmounted (umount or shutdown).
	</para>
	<para>
		Immediately after a mkfs.xfs, the primary AG has the following disk layout the subsequent AGs do not have any inodes allocated:
	</para>
	<para>
		<mediaobject>
			<imageobject><imagedata fileref="images/6.png" format="PNG" width="100%" scalefit="0"/></imageobject>
			<textobject><phrase>6</phrase></textobject>
		</mediaobject>
		
	</para>
	<para>
		Each of these structures are expanded upon in the following sections.
	</para>
	<section id="Superblocks">
		<title>Superblocks</title>
		<para>
			Each AG starts with a superblock. The first one is the primary superblock that stores aggregate AG information. Secondary superblocks are only used by xfs_repair when the primary superblock has been corrupted.
		</para>
		<para>
			The superblock is defined by the following structure. The description of each field follows.
		</para>
		<programlisting>
typedef struct xfs_sb
{
	__uint32_t        sb_magicnum;
	__uint32_t        sb_blocksize;
	xfs_drfsbno_t     sb_dblocks;
	xfs_drfsbno_t     sb_rblocks;
	xfs_drtbno_t      sb_rextents;
	uuid_t            sb_uuid;
	xfs_dfsbno_t      sb_logstart;
	xfs_ino_t         sb_rootino;
	xfs_ino_t         sb_rbmino;
	xfs_ino_t         sb_rsumino;
	xfs_agblock_t     sb_rextsize;
	xfs_agblock_t     sb_agblocks;
	xfs_agnumber_t    sb_agcount;
	xfs_extlen_t      sb_rbmblocks;
	xfs_extlen_t      sb_logblocks;
	__uint16_t        sb_versionnum;
	__uint16_t        sb_sectsize;
	__uint16_t        sb_inodesize;
	__uint16_t        sb_inopblock;
	char              sb_fname[12];
	__uint8_t         sb_blocklog;
	__uint8_t         sb_sectlog;
	__uint8_t         sb_inodelog;
	__uint8_t         sb_inopblog;
	__uint8_t         sb_agblklog;
	__uint8_t         sb_rextslog;
	__uint8_t         sb_inprogress;
	__uint8_t         sb_imax_pct;
	__uint64_t        sb_icount;
	__uint64_t        sb_ifree;
	__uint64_t        sb_fdblocks;
	__uint64_t        sb_frextents;
	xfs_ino_t         sb_uquotino;
	xfs_ino_t         sb_gquotino;
	__uint16_t        sb_qflags;
	__uint8_t         sb_flags;
	__uint8_t         sb_shared_vn;
	xfs_extlen_t      sb_inoalignmt;
	__uint32_t        sb_unit;
	__uint32_t        sb_width;
	__uint8_t         sb_dirblklog;
	__uint8_t         sb_logsectlog;
	__uint16_t        sb_logsectsize;
	__uint32_t        sb_logsunit;
	__uint32_t        sb_features2;
} xfs_sb_t;
		</programlisting>
		<variablelist>
			<varlistentry>
				<term>sb_magicnum</term>
				<listitem><para>Identifies the filesystem. It's value is <command>XFS_SB_MAGIC = 0x58465342 "XFSB"</command>.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_blocksize</term>
				<listitem><para>The size of a basic unit of space allocation in bytes. Typically, this is 4096 (4KB) but can range from 512 to 65536 bytes.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_dblocks</term>
				<listitem><para>Total number of blocks available for data and metadata on the filesystem.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rblocks</term>
				<listitem><para>Number blocks in the real-time disk device. Refer to <xref linkend="Real-time_Devices"/> for more information.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rextents</term>
				<listitem><para>Number of extents on the real-time device.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_uuid</term>
				<listitem><para>UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by the UUID instead of device name.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_logstart</term>
				<listitem><para>First block number for the journaling log if the log is internal (ie. not on a separate disk device). For an external log device, this will be zero (the log will also start on the first block on the log device).</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rootino</term>
				<listitem><para>Root inode number for the filesystem. Typically, this is 128 when using a 4KB block size.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rbmino</term>
				<listitem><para>Bitmap inode for real-time extents.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rsumino</term>
				<listitem><para>Summary inode for real-time bitmap.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rextsize</term>
				<listitem><para>Realtime extent size in blocks.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_agblocks</term>
				<listitem><para>Size of each AG in blocks. For the actual size of the last AG, refer to the <xref linkend="AG_Free_Space_Management"/> <command>agf_length</command> value.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_agcount</term>
				<listitem><para>Number of AGs in the filesystem.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rbmblocks</term>
				<listitem><para>Number of real-time bitmap blocks.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_logblocks</term>
				<listitem><para>Number of blocks for the journaling log.</para></listitem>
			</varlistentry>
			<varlistentry>
	<term>sb_versionnum</term>
	<listitem>
	   <para>Filesystem version number. This is a bitmask specifying the features enabled when creating the filesystem. Any disk checking tools or drivers that do not recognize any set bits must not operate upon the filesystem. Most of the flags indicate features introduced over time. The value must be 4 including the following flags:
   <informaltable frame="all">
      <tgroup cols="2"><thead><row>
            <entry>
               <para>Flag</para>
            </entry>
            <entry>
               <para>Description</para>
            </entry>
         </row>
		</thead>
		<tbody>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_ATTRBIT</command></para>
            </entry>
            <entry>
               <para>Set if any inode have extended attributes.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_NLINKBIT</command></para>
            </entry>
            <entry>
               <para>Set if any inodes use 32-bit di_nlink values.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_QUOTABIT</command></para>
            </entry>
            <entry>
               <para>Quotas are enabled on the filesystem. This also brings in the various quota fields in the superblock.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_ALIGNBIT</command></para>
            </entry>
            <entry>
               <para>Set if sb_inoalignmt is used.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_DALIGNBIT</command></para>
            </entry>
            <entry>
               <para>Set if sb_unit and sb_width are used.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_SHAREDBIT</command></para>
            </entry>
            <entry>
               <para>Set if sb_shared_vn is used.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_LOGV2BIT</command></para>
            </entry>
            <entry>
               <para>Version 2 journaling logs are used.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_SECTORBIT</command></para>
            </entry>
            <entry>
               <para>Set if sb_sectsize is not 512.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_EXTFLGBIT</command></para>
            </entry>
            <entry>
               <para>Unwritten extents are used. This is always set.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_DIRV2BIT</command></para>
            </entry>
            <entry>
               <para>Version 2 directories are used. This is always set.</para>
            </entry>
         </row>
         <row>
            <entry>
               <para><command>XFS_SB_VERSION_MOREBITSBIT</command></para>
            </entry>
            <entry>
               <para>Set if the sb_features2 field in the superblock contains more flags.</para>
            </entry>
         </row></tbody></tgroup>
   </informaltable>
   </para>
	</listitem>
</varlistentry>

			<varlistentry>
				<term>sb_sectsize</term>
				<listitem><para>Specifies the underlying disk sector size in bytes. Majority of the time, this is 512 bytes. This determines the minimum I/O alignment including Direct I/O.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_inodesize</term>
				<listitem><para>Size of the inode in bytes. The default is 256 (2 inodes per standard sector) but can be made as large as 2048 bytes when creating the filesystem.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_inopblock</term>
				<listitem><para>Number of inodes per block. This is equivalent to <command>sb_blocksize / sb_inodesize</command>.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_fname[12]</term>
				<listitem><para>Name for the filesystem. This value can be used in the mount command.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_blocklog</term>
				<listitem><para>log<superscript>2</superscript> value of <command>sb_blocksize</command>. In other terms, <command>sb_blocksize = 2sb_blocklog</command>.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_sectlog</term>
				<listitem><para>log<superscript>2</superscript> value of <command>sb_sectsize</command>.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_inodelog</term>
				<listitem><para>log<superscript>2</superscript> value of <command>sb_inodesize</command>.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_inopblog</term>
				<listitem><para>log<superscript>2</superscript> value of <command>sb_inopblock</command>.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_agblklog</term>
				<listitem><para>log<superscript>2</superscript> value of <command>sb_agblocks</command> (rounded up). This value is used to generate inode numbers and absolute block numbers defined in extent maps.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_rextslog</term>
				<listitem><para>log<superscript>2</superscript> value of <command>sb_rextents</command>.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_inprogress</term>
				<listitem><para>Flag specifying that the filesystem is being created.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_imax_pct</term>
				<listitem><para>Maximum percentage of filesystem space that can be used for inodes. The default value is 25%.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_icount</term>
				<listitem><para>Global count for number inodes allocated on the filesystem. This is only maintained in the first superblock.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_ifree</term>
				<listitem><para>Global count of free inodes on the filesystem. This is only maintained in the first superblock.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_fdblocks</term>
				<listitem><para>Global count of free data blocks on the filesystem. This is only maintained in the first superblock.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_frextents</term>
				<listitem><para>Global count of free real-time extents on the filesystem. This is only maintained in the first superblock.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_uquotino</term>
				<listitem><para>Inode for user quotas. This and the following two quota fields only apply if <command>XFS_SB_VERSION_QUOTABIT</command> flag is set in <command>sb_versionnum</command>. Refer to <xref linkend="Quota_Inodes"/> for more information.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_gquotino</term>
				<listitem><para>Inode for group or project quotas. Group and Project quotas cannot be used at the same time.</para></listitem>
			</varlistentry>
<varlistentry>
	<term>sb_qflags</term>
	<listitem><para>
	Quota flags. It can be a combination of the following flags:
	   <informaltable frame="all">
	      <tgroup cols="2"><thead><row>
	            <entry>
	               <para>Flag</para>
	            </entry>
	            <entry>
	               <para>Description</para>
	            </entry>
	         </row>
			</thead>
			<tbody>
	         <row>
	            <entry>
	               <para><command>XFS_UQUOTA_ACCT</command></para>
	            </entry>
	            <entry>
	               <para>User quota accounting is enabled.</para>
	            </entry>
	         </row>
	         <row>
	            <entry>
	               <para><command>XFS_UQUOTA_ENFD</command></para>
	            </entry>
	            <entry>
	               <para>User quotas are enforced.</para>
	            </entry>
	         </row>
	         <row>
	            <entry>
	               <para><command>XFS_UQUOTA_CHKD</command></para>
	            </entry>
	            <entry>
	               <para>User quotas have been checked and updated on disk.</para>
	            </entry>
	         </row>
	         <row>
	            <entry>
	               <para><command>XFS_PQUOTA_ACCT</command></para>
	            </entry>
	            <entry>
	               <para>Project quota accounting is enabled.</para>
	            </entry>
	         </row>
	         <row>
	            <entry>
	               <para><command>XFS_OQUOTA_ENFD</command></para>
	            </entry>
	            <entry>
	               <para>Other (group/project) quotas are enforced.</para>
	            </entry>
	         </row>
	         <row>
	            <entry>
	               <para><command>XFS_OQUOTA_CHKD</command></para>
	            </entry>
	            <entry>
	               <para>Other (group/project) quotas have been checked.</para>
	            </entry>
	         </row>
	         <row>
	            <entry>
	               <para><command>XFS_GQUOTA_ACCT</command></para>
	            </entry>
	            <entry>
	               <para>Group quota accounting is enabled.</para>
	            </entry>
	         </row></tbody></tgroup>
	   </informaltable>
	</para></listitem>
</varlistentry>

			<varlistentry>
				<term>sb_flags</term>
				<listitem><para>Miscellaneous flags.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_shared_vn</term>
				<listitem><para>Reserved and must be zero ("vn" stands for version number).</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_inoalignmt</term>
				<listitem><para>Inode chunk alignment in fsblocks. </para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_unit</term>
				<listitem><para>Underlying stripe or raid unit in blocks.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_width</term>
				<listitem><para>Underlying stripe or raid width in blocks.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_dirblklog</term>
				<listitem><para>log2 value multiplier that determines the granularity of directory block allocations in fsblocks.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_logsectlog</term>
				<listitem><para>log2 value of the log subvolume's sector size. This is only used if the journaling log is on a separate disk device (i.e. not internal).</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_logsectsize</term>
				<listitem><para>The log's sector size in bytes if the filesystem uses an external log device.</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_logsunit</term>
				<listitem><para>The log device's stripe or raid unit size. This only applies to version 2 logs (<command>XFS_SB_VERSION_LOGV2BIT</command> is set in <command>sb_versionnum</command>).</para></listitem>
			</varlistentry>
			<varlistentry>
				<term>sb_features2</term>
				<listitem><para>
				Additional version flags if <command>XFS_SB_VERSION_MOREBITSBIT</command> is set in <command>sb_versionnum</command>. The currently defined additional features include:
				   <orderedlist>
					  <listitem>
						 <para><command>XFS_SB_VERSION2_LAZYSBCOUNTBIT</command>  (0x02): Lazy global counters. Making a filesystem with this bit set can improve performance. The global free space and inode counts are only updated in the primary superblock when the filesystem is cleanly unmounted.</para>
					  </listitem>
					  <listitem>
						 <para><command>XFS_SB_VERSION2_ATTR2BIT</command>  (0x08): Extended attributes version 2. Making a filesystem with this optimises the inode layout of extended attributes. </para>
					  </listitem>
					  <listitem>
						 <para><command>XFS_SB_VERSION2_PARENTBIT</command>  (0x10): Parent pointers. All inodes must have an extended attribute that points back to its parent inode. The primary purpose for this information is in backup systems.</para>
					  </listitem>
				   </orderedlist>
				</para></listitem>
			</varlistentry>

		</variablelist>












<bridgehead>xfs_db Example:</bridgehead>
   <para>A filesystem is made on a single SATA disk with the following command:</para>
	<programlisting>
# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7
meta-data=/dev/sda7              isize=256    agcount=16, agsize=3923122 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=62769952, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=16384
log      =internal log           bsize=4096   blocks=30649, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

</programlisting>

   <para>And in xfs_db, inspecting the superblock:</para>
<programlisting>
xfs_db> sb
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 62769952
rblocks = 0
rextents = 0
uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5
logstart = 33554436
rootino = 128
rbmino = 129
rsumino = 130
rextsize = 16
agblocks = 3923122
agcount = 16
rbmblocks = 0
logblocks = 30649
versionnum = 0xb084
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 22
rextslog = 0
inprogress = 0
imax_pct = 25
icount = 64
ifree = 61
fdblocks = 62739235
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 2
logsectlog = 0
logsectsize = 0
logsunit = 0
features2 = 8
</programlisting>


</section>




<section id="AG_Free_Space_Management">
	<title>AG Free Space Management</title>
	<para>The XFS filesystem tracks free space in an allocation group using two B+trees. One B+tree tracks space by block number, the second by the size of the free space block. This scheme allows XFS to quickly find free space near a given block or of a given size.</para>
	<para>All block numbers, indexes and counts are AG relative.</para>
	<section id="AG_Free_Space_Block">
		<title>AG Free Space Block</title>
		<para>The second sector in an AG contains the information about the two free space B+trees and associated free space information for the AG. The "AG Free Space Block", also knows as the AGF, uses the following structure:</para>
		<programlisting>
typedef struct xfs_agf {
     __be32              agf_magicnum;
     __be32              agf_versionnum;
     __be32              agf_seqno;
     __be32              agf_length;
     __be32              agf_roots[XFS_BTNUM_AGF];
     __be32              agf_spare0;
     __be32              agf_levels[XFS_BTNUM_AGF];
     __be32              agf_spare1;
     __be32              agf_flfirst;
     __be32              agf_fllast;
     __be32              agf_flcount;
     __be32              agf_freeblks;
     __be32              agf_longest;
     __be32              agf_btreeblks;
} xfs_agf_t;
		</programlisting>




<para>
	The rest of the bytes in the sector are zeroed. <command>XFS_BTNUM_AGF</command> is set to 2, index 0 for the count B+tree and index 1 for the size B+tree.
</para>
<variablelist>
	<varlistentry>
		<term>agf_magicnum</term>
		<listitem><para>Specifies the magic number for the AGF sector: "XAGF" (0x58414746).</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_versionnum</term>
		<listitem><para>Set to <command>XFS_AGF_VERSION</command> which is currently 1.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_seqno</term>
		<listitem><para>Specifies the AG number for the sector.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_length</term>
		<listitem><para>Specifies the size of the AG in filesystem blocks. For all AGs except the last, this must be equal to the superblock's <command>sb_agblocks</command> value. For the last AG, this could be less than the <command>sb_agblocks</command> value. It is this value that should be used to determine the size of the AG.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_roots</term>
		<listitem><para>Specifies the block number for the root of the two free space B+trees. </para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_levels</term>
		<listitem><para>Specifies the level or depth of the two free space B+trees. For a fresh AG, this will be one, and the "roots" will point to a single leaf of level 0.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_flfirst</term>
		<listitem><para>Specifies the index of the first "free list" block. Free lists are covered in more detail later on.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_fllast</term>
		<listitem><para>Specifies the index of the last "free list" block.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_flcount</term>
		<listitem><para>Specifies the number of blocks in the "free list".</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_freeblks</term>
		<listitem><para>Specifies the current number of free blocks in the AG.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_longest</term>
		<listitem><para>Specifies the number of blocks of longest contiguous free space in the AG.</para></listitem>
	</varlistentry>
	<varlistentry>
		<term>agf_btreeblks</term>
		<listitem><para>Specifies the number of blocks used for the free space B+trees. This is only used if the <command>XFS_SB_VERSION2_LAZYSBCOUNTBIT</command> bit is set in <command>sb_features2</command>.</para></listitem>
	</varlistentry>
</variablelist>
</section>

<section id="AG_Free_Space_Btrees">
	<title>AG Free Space B+trees</title>
	<para>The two Free Space B+trees store a sorted array of block offset and block counts in the leaves of the B+tree. The first B+tree is sorted by the offset, the second by the count or size.</para>
	<para>The trees use the following header:</para>
	<programlisting>
typedef struct xfs_btree_sblock {
     __be32                    bb_magic;
     __be16                    bb_level;
     __be16                    bb_numrecs;
     __be32                    bb_leftsib;
     __be32                    bb_rightsib;
} xfs_btree_sblock_t;
	</programlisting>
	<para>Leaves contain a sorted array of offset/count pairs which are also used for node keys:</para>
	<programlisting>
typedef struct xfs_alloc_rec {
     __be32                    ar_startblock;
     __be32                    ar_blockcount;
} xfs_alloc_rec_t, xfs_alloc_key_t;
	</programlisting>
   
   <para>Node pointers are an AG relative block pointer:</para>
   <programlisting>typedef __be32 xfs_alloc_ptr_t;</programlisting>
   
   <itemizedlist>
      <listitem>
         <para>As the free space tracking is AG relative, all the block numbers are only 32-bits.</para>
      </listitem>
      <listitem>
         <para>The <command>bb_magic</command> value depends on the B+tree: "ABTB" (0x41425442) for the block offset B+tree, "ABTC" (0x41425443) for the block count B+tree.</para>
      </listitem>
      <listitem>
         <para>The <command>xfs_btree_sblock_t</command> header is used for intermediate B+tree node as well as the leaves.</para>
      </listitem>
      <listitem>
         <para>For a typical 4KB filesystem block size, the offset for the <command>xfs_alloc_ptr_t</command> array would be <command>0xab0</command> (2736 decimal).</para>
      </listitem>
      <listitem>
         <para>There are a series of macros in <command>xfs_btree.h</command> for deriving the offsets, counts, maximums, etc for the B+trees used in XFS.</para>
      </listitem>
   </itemizedlist>
   <para>The following diagram shows a single level B+tree which consists of one leaf:</para>
	<para>
		<inlinemediaobject>
			<imageobject><imagedata fileref="images/15a.png" format="PNG" width="100%" scalefit="0"/></imageobject>
			<textobject><phrase>15a</phrase></textobject>
		</inlinemediaobject>
		
	</para>
	
   <para>With the intermediate nodes, the associated leaf pointers are stored in a separate array about two thirds into the block. The following diagram illustrates a 2-level B+tree for a free space B+tree:</para>
	<para>
		<mediaobject>
			<imageobject><imagedata fileref="images/15b.png" format="PNG" width="100%" scalefit="0"/></imageobject>
			<textobject><phrase>15b</phrase></textobject>
		</mediaobject>
		
	</para>
</section>






<section id="AG_Free_List"><title>AG Free List</title>
   <para>The AG Free List is located in the 4<superscript>th</superscript> sector of each AG and is known as the AGFL. It is an array of AG relative block pointers for reserved space for growing the free space B+trees. This space cannot be used for general user data including inodes, data, directories and extended attributes.</para>
   <para>With a freshly made filesystem, 4 blocks are reserved immediately after the free space B+tree root blocks (blocks 4 to 7). As they are used up as the free space fragments, additional blocks will be reserved from the AG and added to the free list array.</para>
   <para>As the free list array is located within a single sector, a typical device will have space for 128 elements in the array (512 bytes per sector, 4 bytes per AG relative block pointer). The actual size can be determined by using the <command>XFS_AGFL_SIZE</command> macro.</para>
   <para>Active elements in the array are specified by the AGF's (<xref linkend="AG_Free_Space_Block"/>) <command>agf_flfirst</command>, <command>agf_fllast</command> and <command>agf_flcount</command> values. The array is managed as a circular list.</para>
	<para>
		<mediaobject>
			<imageobject><imagedata fileref="images/16.png" format="PNG" /></imageobject>
			<textobject><phrase>16</phrase></textobject>
		</mediaobject>
		
	</para>

   <para>The presence of these reserved block guarantees that the free space B+trees can be updated if any blocks are freed by extent changes in a full AG.</para>

			<bridgehead>xfs_db Examples:</bridgehead>
   			<para>These examples are derived from an AG that has been deliberately fragmented.</para>
   			<para>The AGF:</para>
			<programlisting>
xfs_db&gt; agf &lt;ag#&gt;
xfs_db> p
magicnum = 0x58414746
versionnum = 1
seqno = 0
length = 3923122
bnoroot = 7
cntroot = 83343
bnolevel = 2
cntlevel = 2
flfirst = 22
fllast = 27
flcount = 6
freeblks = 3654234
longest = 3384327
btreeblks = 0
			</programlisting>
			<para>In the AGFL, the active elements are from 22 to 27 inclusive which are obtained from the <command>flfirst</command> and <command>fllast</command> values from the <command>agf</command> in the previous example:</para>
			<programlisting>
xfs_db> agfl 0
xfs_db> p
bno[0-127] = 0:4 1:5 2:6 3:7 4:83342 5:83343 6:83344 7:83345 8:83346 9:83347
             10:4 11:5 12:80205 13:80780 14:81496 15:81766 16:83346 17:4 18:5
             19:80205 20:82449 21:81496 22:81766 23:82455 24:80780 25:5
             26:80205 27:83344
			</programlisting>

			<para>The free space B+tree sorted by block offset, the root block is from the AGF's <command>bnoroot</command> value:</para>
			<programlisting>
xfs_db> fsblock 7
xfs_db> type bnobt
xfs_db> p
magic = 0x41425442
level = 1
numrecs = 4
leftsib = null
rightsib = null
keys[1-4] = [startblock,blockcount]
           1:[12,16] 2:[184586,3] 3:[225579,1] 4:[511629,1]
ptrs[1-4] = 1:2 2:83347 3:6 4:4
			</programlisting>

		   <para>Blocks 2, 83347, 6 and 4 contain the leaves for the free space B+tree by starting block. Block 2 would contain offsets 16 up to but not including 184586 while block 4 would have all offsets from 511629 to the end of the AG.</para>
		   <para>The free space B+tree sorted by block count, the root block is from the AGF's <command>cntroot</command> value:</para>
			<programlisting>
xfs_db> fsblock 83343
xfs_db> type cntbt
xfs_db> p
magic = 0x41425443
level = 1
numrecs = 4
leftsib = null
rightsib = null
keys[1-4] = [blockcount,startblock]
           1:[1,81496] 2:[1,511729] 3:[3,191875] 4:[6,184595]
ptrs[1-4] = 1:3 2:83345 3:83342 4:83346
			</programlisting>

   <para>The leaf in block 3, in this example, would only contain single block counts. The offsets are sorted in ascending order if the block count is the same.</para>
   <para>Inspecting the leaf in block 83346, we can see the largest block at the end:</para>
			<programlisting>
xfs_db> fsblock 83346
xfs_db> type cntbt
xfs_db> p
magic = 0x41425443
level = 0
numrecs = 344
leftsib = 83342
rightsib = null
recs[1-344] = [startblock,blockcount]
           1:[184595,6] 2:[187573,6] 3:[187776,6]
           ...
           342:[513712,755] 343:[230317,258229] 344:[538795,3384327]

			</programlisting>
   <para>The longest block count must be the same as the AGF's <command>longest</command> value.</para>

</section>
</section>


<section id="AG_Inode_Management">
	<title>AG Inode Management</title>
	<section id="Inode_Numbers">
		<title>Inode Numbers</title>
		<para>Inode numbers in XFS come in two forms: AG relative and absolute.</para>
		<para>AG relative inode numbers always fit within 32 bits. The number of bits actually used is determined by the sum of the superblock's (<xref linkend="Superblocks"/>) <command>sb_inoplog</command> and <command>sb_agblklog</command> values. Relative inode numbers are found within the AG's inode structures.</para>
		<para>Absolute inode numbers include the AG number in the high bits, above the bits used for the AG relative inode number. Absolute inode numbers are found in directory (<xref linkend="Directories"/>) entries.</para>
   		<para>
 		      <mediaobject>
 		      	<imageobject><imagedata fileref="images/18.png" format="PNG" width="100%" scalefit="0"/></imageobject>
 		      	<textobject><phrase>18</phrase></textobject>
 		      </mediaobject>
 		      
		</para>
	</section>
	<section id="Inode_Information">
		<title>Inode Information</title>
			<para>Each AG manages its own inodes. The third sector in the AG contains information about the AG's inodes and is known as the AGI.</para>
			<para>The AGI uses the following structure:</para>
			<programlisting>
typedef struct xfs_agi {
     __be32              agi_magicnum;
     __be32              agi_versionnum;
     __be32              agi_seqno
     __be32              agi_length;
     __be32              agi_count;
     __be32              agi_root;
     __be32              agi_level;
     __be32              agi_freecount;
     __be32              agi_newino;
     __be32              agi_dirino;
     __be32              agi_unlinked[64];
} xfs_agi_t;
			</programlisting>
			<variablelist>
				<varlistentry>
					<term>agi_magicnum</term>
					<listitem><para>Specifies the magic number for the AGI sector: "XAGI" (0x58414749).</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_versionnum</term>
					<listitem><para>Set to <command>XFS_AGI_VERSION</command> which is currently 1.</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_seqno</term>
					<listitem><para>Specifies the AG number for the sector.</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_length</term>
					<listitem><para>Specifies the size of the AG in filesystem blocks. </para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_count</term>
					<listitem><para>Specifies the number of inodes allocated for the AG.</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_root</term>
					<listitem><para>Specifies the block number in the AG containing the root of the inode B+tree.</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_level</term>
					<listitem><para>Specifies the number of levels in the inode B+tree.</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_freecount</term>
					<listitem><para>Specifies the number of free inodes in the AG.</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_newino</term>
					<listitem><para>Specifies AG relative inode number most recently allocated.</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_dirino</term>
					<listitem><para>Deprecated and not used, it's always set to NULL (-1).</para></listitem>
				</varlistentry>
				<varlistentry>
					<term>agi_unlinked[64]</term>
					<listitem><para>Hash table of unlinked (deleted) inodes that are still being referenced. Refer to <xref linkend="Unlinked_Pointer"/> for more information.</para></listitem>
				</varlistentry>
			</variablelist>
</section>


<section id="Inode_Btrees"><title>Inode B+trees</title>
   <para>Inodes are allocated in chunks of 64, and a B+tree is used to track these chunks of inodes as they are allocated and freed. The block containing root of the B+tree is defined by the AGI's <command>agi_root</command> value.</para>
   <para>The B+tree header for the nodes and leaves use the <command>xfs_btree_sblock</command> structure which is the same as the header used in the AGF B+trees (<xref linkend="AG_Free_Space_Btrees"/>):</para>
   <programlisting>typedef struct xfs_btree_sblock xfs_inobt_block_t;</programlisting>
   
   <para>Leaves contain an array of the following structure:</para>
<programlisting>
typedef struct xfs_inobt_rec {
     __be32                    ir_startino;
     __be32                    ir_freecount;
     __be64                    ir_free;
} xfs_inobt_rec_t;

</programlisting>
   
   <para>Nodes contain key/pointer pairs using the following types:</para>
<programlisting>
typedef struct xfs_inobt_key {
     __be32                     ir_startino;
} xfs_inobt_key_t;
typedef __be32 xfs_inobt_ptr_t;

</programlisting>
   
   <para>For the leaf entries, <command>ir_startino</command> specifies the starting inode number for the chunk, <command>ir_freecount</command> specifies the number of free entries in the chuck, and the <command>ir_free</command> is a 64 element bit array specifying which entries are free in the chunk.</para>
   <para>The following diagram illustrates a single level inode B+tree:</para>
   <para>
<mediaobject>
	<imageobject><imagedata fileref="images/20a.png" format="PNG" width="100%" scalefit="0"/></imageobject>
	<textobject><phrase>20a</phrase></textobject>
</mediaobject>

   </para>
   <para>And a 2-level inode B+tree:</para>
   <para>
<mediaobject>
	<imageobject><imagedata fileref="images/20b.png" format="PNG" width="100%" scalefit="0"/></imageobject>
	<textobject><phrase>20b</phrase></textobject>
</mediaobject>

   </para>

<bridgehead>xfs_db Examples:</bridgehead>
   <para>TODO:</para></section></section><section id="Real-time_Devices"><title>
       Real-time Devices</title>
   <para>TODO:</para></section></chapter>