Table A.1 describes the MPEG-4 Visual levels for the Version 1 and Version 2 profiles only including natural visual (or video) data, this means the so-called MPEG-4 video profiles. Note that Level 0 for the Simple profile has been defined in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard.
Visual profile |
Level |
Typical visual session size |
Max. number of objects 1 |
Maximum number objects per type |
Max. unique quant. tables |
Max. VMV buffer size |
Max. VCV buffer size (MB)8 |
VCV decoder rate (MB/s) 4 |
VCV boundary MB |
Max. total VBV buffer size |
Max. VOL VBV buffer size |
Max. video packet length (bits)6 |
Max. sprite size (MB units) |
Wavelet restrictions |
Max. bitrate (kbit/s) |
Max. enhancement layers |
Simple10 |
L0 |
QCIF |
1 |
1 x Simple |
1 |
198 |
99 |
1485 |
N.A. |
10 |
10 |
2048 |
N. A. |
N. A. |
64 |
N. A. |
Simple |
L1 |
QCIF |
4 |
4 x Simple |
1 |
198 |
99 |
1485 |
N.A. |
10 |
10 |
2048 |
N. A. |
N. A. |
64 |
N. A. |
Simple |
L2 |
CIF |
4 |
4 x Simple |
1 |
792 |
396 |
5940 |
N. A. |
40 |
40 |
4096 |
N. A. |
N. A. |
128 |
N. A. |
Simple |
L3 |
CIF |
4 |
4 x Simple |
1 |
792 |
396 |
11880 |
N. A. |
40 |
40 |
8192 |
N. A. |
N. A. |
384 |
N. A. |
Advanced Real Time Simple |
L1 |
QCIF |
4 |
4 x Simple or Adv. Real Time Simple |
1 |
198 |
99 |
1485 |
N.A. |
10 |
10 |
8192 |
N. A. |
N. A. |
64 |
N. A. |
Advanced Real Time Simple |
L2 |
CIF |
4 |
4 x Simple or Adv. Real Time Simple |
1 |
792 |
396 |
5940 |
N. A. |
40 |
40 |
16384 |
N. A. |
N. A. |
128 |
N. A. |
Advanced Real Time Simple |
L3 |
CIF |
4 |
4 x Simple or Adv. Real Time Simple |
1 |
792 |
396 |
11880 |
N. A. |
40 |
40 |
16384 |
N. A. |
N. A. |
384 |
N. A. |
Advanced Real Time Simple |
L4 |
CIF |
16 |
16 x Simple or Adv. Real Time Simple |
1 |
792 |
396 |
11880 |
N. A. |
80 |
80 |
16384 |
N. A. |
N. A. |
2000 |
N. A. |
Simple Scalable |
L1 |
CIF |
4 |
4 x Simple or Simple Scalable |
1 |
1782 |
495 |
7425 |
N. A. |
40 |
40 |
2048 |
N. A. |
N. A. |
128 |
1 spatial or temporal enhancement layer |
Simple Scalable3 |
L2 |
CIF |
4 |
4 x Simple or Simple Scalable |
1 |
3168 |
792 |
23760 |
N.A. |
40 |
40 |
4096 |
N. A. |
N. A. |
256 |
1 spatial or temporal enhancement layer |
Core |
L1 |
QCIF |
4 |
4 x Core or Simple |
4 |
594 |
198 |
5940 |
2970 |
16 |
16 |
4096 |
N. A. |
N. A. |
384 |
1 |
Core |
L2 |
CIF |
16 |
16 x Core or Simple |
4 |
2376 |
792 |
23760 |
11880 |
80 |
80 |
8192 |
N. A. |
N. A. |
2000 |
1 |
Advanced Core |
L1 |
QCIF |
4 |
4 x Core or Simple or Adv. Scalable Texture |
4 |
594 |
198 |
5940 |
2970 |
16 |
8 |
4096 |
N. A. |
see Table A.5 |
384 |
1 |
Advanced Core |
L2 |
CIF |
16 |
16 x Core or Simple or Adv. scalable Texture |
4 |
2376 |
792 |
23760 |
11880 |
80 |
40 |
8192 |
N. A. |
See Table A.5 |
2000 |
1 |
Core Scalable |
L1 |
CIF |
4 |
4 x Core or Simple or Core scalable or Simple Scalable |
4 |
2376 |
792 |
14850 |
7425 |
64 |
64 |
4096 |
N.A. |
N.A. |
768 |
1 |
Core Scalable |
L2 |
CIF |
8 |
8 x Core or Simple or Core scalable or Simple |
4 |
2970 |
990 |
29700 |
14850 |
80 |
80 |
4096 |
N.A. |
N.A. |
1500 |
1 |
Core Scalable |
L3 |
CCIR601 |
16 |
16 x Core or Simple or Core scalable or Simple Scalable |
4 |
12906 |
4032 |
120960 |
60480 |
80 |
80 |
16384 |
N. A. |
N. A. |
4000 |
2 |
Main |
L2 |
CIF |
16 |
16 x Main or Core or Simple |
4 |
3960 |
1188 |
23760 |
11880 |
80 |
80 |
8192 |
1584 |
Scalable Texture Profile@L1 |
2000 |
1 |
Main |
L3 |
CCIR 601 |
32 |
32 x Main or Core or Simple |
4 |
11304 |
3240 |
97200 |
48600 |
320 |
320 |
16384 |
6480 |
Scalable Texture Profile@L1 |
15000 |
1 |
Main |
L4 |
1920 x 1088 |
32 |
32 x Main or Core or Simple |
4 |
65344 |
16320 |
489600 |
244800 |
760 |
760 |
16384 |
65280 |
Scalable Texture Profile@L2 |
38400 |
1 |
Advanced Coding Efficiency |
L1 |
CIF |
4 |
4 x Adv. Coding Efficiency or Core or Simple |
4 |
1188 |
792 |
11880 |
5940 |
40 |
40 |
8192 |
N. A. |
N. A. |
384 |
1 |
Advanced Coding Efficiency |
L2 |
CIF |
16 |
16 x Adv. Coding Efficiency or Core or Simple |
4 |
2376 |
1188 |
23760 |
11880 |
80 |
80 |
8192 |
N. A. |
N. A. |
2000 |
1 |
Advanced Coding Efficiency |
L3 |
CCIR 601 |
32 |
32 x Adv. Coding Efficiency or Core or Simple |
4 |
9720 |
3240 |
97200 |
48600 |
320 |
320 |
16384 |
N. A. |
N. A. |
15000 |
1 |
Advanced Coding Efficiency |
L4 |
1920 x 1088 |
32 |
32 x Adv. Coding Efficiency or Core or Simple |
4 |
48960 |
16320 |
489600 |
244800 |
760 |
760 |
16384 |
N. A. |
N. A. |
38400 |
1 |
N-Bit |
L2 |
CIF |
16 |
16 x Core or Simple or N-Bit |
4 |
2376 |
792 |
23760 |
11880 |
80 |
80 |
8192 |
N. A.7 |
N. A. |
2000 |
1 |
Notes:
- Enhancement layers are not counted as separate objects.
- The maximum VMV (Video Memory Verifier) buffer size is the bound on the memory (in macroblock units) which can be used by the VMV algorithm. This algorithm (see [MPEG4-2; subclause D.5]) models the pixel memory needed by the entire visual decoding process. This includes the memory needed for reference VOPs in the prediction of P, B, and S(GMC)-VOPs and the storage of the reconstructed VOPs until the time they are released by the decoder, plus the memory required to queue B-VOPs until composition occurs. For the profiles that contain more than one layer, the memory requirements include all base and enhancement layers. When belonging to different, overlapping objects, some of these macroblocks may overlay on the display; however separate memory is required (prior to composition) in the VMV.
- The conformance point for the base layer of the Simple Scalable Visual profile is the Simple Profile@L1 when Simple Scalable Profile@L1 is used and the Simple Profile@L2 when Simple Scalable Profile@L2 is used.
- The VCV (Video Complexity Verifier) decoder rate is the vcv_decoder_rate (H) referred in [MPEG4-2; subclause D.4]; this parameter is the number of macroblocks/second based on the typical spatial and temporal resolutions, as follows:
- 1485 MBs/s corresponds to QCIF at 15Hz
- 5940 MBs/s corresponds to CIF at 15 Hz and also twice QCIF at 30 Hz
- 11880 MB/s corresponds to CIF at 30 Hz
- 7425 MB/s corresponds to 1.25 times CIF at 15 Hz
- 23760 MB/s corresponds to twice CIF at 30 Hz
- 97200 MB/s corresponds to twice ITU-R 601 at 30 Hz
- 489600 MB/s corresponds to twice 1920×1088 at 30 Hz
- The total (aggregated) vbv_buffer_size is the sum of the individual VBV buffer occupancies at any given time (in units of 16384 bits) for all VOLs of all VOs. This total VBV size is limited according to the profile and level.
- The maximum video packet length is defined as the maximum number of bits of data_partitioned_motion_shape_texture() in one video packet. The constraint applies only when the data-partitioning tool is enabled in the bitstream. When data partitioning is disabled, there is no limit on the size of video packet length.
- N. A. means Not Applicable.
- The maximum VCV buffer size (cumulative over all layers of all VOs) is twice the maximum number of macroblocks per VOP in the profile and level combination except for the Simple Visual Profile, Simple Scalable profile (Level 1) and Advanced Real Time Simple Profile. For the Simple Visual Profile and the Advanced Real Time Simple Profile, this value is the maximum number of macroblocks per VOP. For the Simple Scalable profile (Level 1), it is 1.25 times the maximum number of macroblocks per VOP. The limit applies to both the VCV buffer and the boundary MB VCV buffer.
- The VCV boundary MB decoder rate column bounds the number of macroblocks containing non trivial shape information (boundary, not transparent nor opaque). The VCV boundary MB decoder rate constrains the total number of boundary MBs in all VOLs, concurrently. Note that the boundary macroblocks are added to both the VCV and boundary MB VCV buffers.
- For the Simple Profile@Level 0, the following restrictions apply:
- The maximum frame rate shall be 15 frames per second;
- The maximum f_code shall be 1;
- The intra_dc_vlc_threshold shall be 0;
- The maximum horizontal luminance pixel resolution shall be 176 pels/line;
- The maximum vertical luminance pixel resolution shall be 144 pels/VOP;
- If AC prediction is used, the following restriction applies : QP value shall not be changed within a VOP (or within a video packet if video packets are used in a VOP). If AC prediction is not used, there are no restrictions to changing QP value.
Table A.2 describes the MPEG-4 Visual levels for the Studio profiles defined in the 1st Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01a].
Table A.2 Levels for the Studio profiles
Visual profile |
Level |
Typical visual session formats1 |
Max. pixel depth |
Max. number of objects |
Max. number per type |
Max. VMV buffer size (sample)2 |
Max. VCV buffer size (sample)3 |
VCV decoder rate (sample /s) |
VCV boundary MB decoder rate (sample /s) |
Max total VBV buffer size |
Max VOL VBV buffer size |
Max. video packet length (bits) |
Max sprite size (sample)4 |
Wavelet restrictions |
Max bitrate (Mbit/s) |
Max. enhancement layers per object |
Simple Studio |
L1 |
ITU-R601:4224 ITU-R601:444 |
10 |
1 |
1 x Simple Studio |
1313280 |
1313280 |
33177600 |
33177600 |
576 |
576 |
N.A. |
N.A. |
N.A. |
180 |
N.A. |
Simple Studio |
L2 |
ITU-R709.60I:422 ITU-R601:444444 |
10 |
1 |
1 x Simple Studio |
4194304 |
4194304 |
125,829120 |
125,829120 |
1920 |
1920 |
N.A. |
N.A. |
N.A. |
600 |
N.A. |
Simple Studio |
L3 |
ITU-R709. 60I:444 ITU-R709. 60I:4224 |
12 |
1 |
1 x Simple Studio |
6291456 |
6291456 |
188,743680 |
188,743680 |
2880 |
2880 |
N.A. |
N.A. |
N.A. |
900 |
N.A. |
Simple Studio |
L4 |
ITU-R709. 60P:444 ITU-R709. 60I:444444 2Kx2Kx30P:444 |
12 |
1 |
1 x Simple Studio |
12582912 |
12582912 |
377487360 |
377487360 |
4320 |
4320 |
N.A. |
N.A. |
N.A. |
1800 |
N.A. |
Core Studio |
L1 |
ITU-R601:4224 ITU-R601:444 |
10 |
4 |
4 x Core Studio or Simple Studio |
5253120 |
2626560 |
66355200 |
66355200 |
576 |
576 |
N.A. |
8294400 |
N.A. |
90 |
N.A. |
Core Studio |
L2 |
ITU-R709.60I:422 ITU-R601:444444 |
10 |
4 |
4 x Core Studio or Simple Studio |
16777216 |
8388608 |
251658240 |
251658240 |
1920 |
1920 |
N.A. |
50135040 |
N.A. |
300 |
N.A. |
Core Studio |
L3 |
ITU-R709. 60I:444 ITU-R709. 60I:4224 |
10 |
8 |
8 x Core Studio or Simple Studio |
25165824 |
12582912 |
377487360 |
377487360 |
2880 |
2880 |
N.A. |
75202560 |
N.A. |
450 |
N.A. |
Core Studio |
L4 |
ITU-R709. 60P:444 ITU-R709. 60I:444444 2Kx2Kx30P:444 |
10 |
16 |
16 x Core Studio or Simple Studio |
50331648 |
25165824 |
754974720 |
754974720 |
4320 |
4320 |
N.A. |
150994944 |
N.A. |
900 |
N.A. |
Notes:
- ITU-R 709 is ITU-R BT. 709 and ITU-R 601 is ITU-R BT. 601; 444444 means 444(RGB) + 3 auxiliary channels; 4224 means 422(YUV)+ 1 auxiliary channel
- VMV is defined by the number of samples which belong to the bounding box of texture regardless shape information. VMV also includes auxiliary channel samples.
- VCV is defined by the number of samples which belong to the bounding box of texture regardless shape information. VCV also includes auxiliary channel samples.
- Maximum sprite size is defined by the number of samples for sprite memory.
Table A.3 describes the MPEG-4 Visual levels for the Advanced Simple and Fine Granularity Scalable profiles defined in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b].
Table A.3 Levels for the Advanced Simple and Fine Granularity Scalable (FGS) profiles
Visual profile |
Level |
Typical visual session size |
Max. number of objects |
Max. number |
Max. unique quant. tables |
Max. VMV buffer size (MB units) |
Max. VCV buffer size (MB) |
VCV decoder rate (MB/s) |
Max. percentage of intra MBs with AC prediction in VCV buffer |
Max total VBV buffer size |
Max. VOL VBV buffer size (units of 16384 bits) |
Max. video packet length |
Maximum bitrate (kbit/s) 2 |
Maximum number of coded VOP-bps 3 |
Adv. Sim. |
L0 |
176×144 |
1 |
1x AS or Simple |
1 |
297 |
99 |
2970 |
100 |
10 |
10 |
2048 |
128 |
N.A. |
Adv.Sim. |
L1 |
176×144 |
4 |
4x AS or Simple |
1 |
297 |
99 |
2970 |
100 |
10 |
10 |
2048 |
128 |
N.A. |
Adv.Sim. |
L2 |
352×288 |
4 |
4x AS or Simple |
1 |
1188 |
396 |
5940 |
100 |
40 |
40 |
4096 |
384 |
N.A. |
Adv.Sim. |
L3 |
352×288 |
4 |
4x AS or Simple |
1 |
1188 |
396 |
11880 |
100 |
40 |
40 |
4096 |
768 |
N.A. |
Adv.Sim. |
L4 |
352×576 |
4 |
4x AS or Simple |
1 |
2376 |
792 |
23760 |
50 |
80 |
80 |
8192 |
3000 |
N.A. |
Adv.Sim. |
L5 |
720×576 |
4 |
4x AS or Simple |
1 |
4860 |
1620 |
48600 |
25 |
112 |
112 |
16384 |
8000 |
N.A. |
FGS |
L0 |
176×144 |
1 |
1x AS or FGS or Simple |
1 |
297 |
99 |
2970 |
100 |
10 |
10 |
2048 |
128 |
4 |
FGS |
L1 |
176×144 |
4 |
4x AS or FGS or Simple |
1 |
297 |
99 |
2970 |
100 |
10 |
10 |
2048 |
128 |
4 |
FGS |
L2 |
352×288 |
4 |
4x AS or Simple |
1 |
1188 |
396 |
5940 |
100 |
40 |
40 |
4096 |
384 |
4 |
FGS |
L3 |
352×288 |
4 |
4x AS or FGS or Simple |
1 |
1188 |
396 |
11880 |
100 |
40 |
40 |
4096 |
768 |
4 |
FGS |
L4 |
352×576 |
4 |
4x AS or FGS or Simple |
1 |
2376 |
792 |
23760 |
50 |
80 |
80 |
8192 |
3000 |
4 |
FGS |
L5 |
720×576 |
4 |
4x AS or FGS or Simple |
1 |
4860 |
1620 |
48600 |
25 |
112 |
112 |
16384 |
8000 |
4 |
Notes:
- The following restriction applies to Level 0 of Advanced Simple profile and FGS profile: if AC prediction is used, the QP value shall not be changed within a VOP (or within a video packet if video packets are used in a VOP). If AC prediction is not used, there are no restrictions to changing the QP value.
- For the FGS profile, this column is the maximum base-layer bitrate.
- The maximum number of coded VOP-bps takes into consideration the shifted bits after applying frequency weighting and/or selective enhancement.
- The number of FGS, FGST, or FGS-FGST layers is always one. If the FGS layer and the FGST layer are separated, the number of total enhancement layers is two.
- The interlace tools are not used for levels L0, L1, L2, and L3 of the Advanced Simple and FGS profiles.
- It is inherent in the FGS profile that the base and enhancement layers are tightly coupled to each other. To avoid unnecessary memory storage, the following constraints apply to the decoding time relationship of the enhancement layer and the base layer:
- Decoding and composition (or presentation in a no-compositor decoder) of each FGS or FGST VOP shall be performed in the same time unit.
- Decoding of each FGS and FGST VOP shall be performed immediately after the reference base layer VOP(s) are decoded without violating the above constraint.