Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
F
FFmpeg
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
libremedia
Tethys
FFmpeg
Commits
41061adf
Commit
41061adf
authored
19 years ago
by
Diego Biurrun
Browse files
Options
Downloads
Patches
Plain Diff
spelling/wording/grammar
Originally committed as revision 4367 to
svn://svn.ffmpeg.org/ffmpeg/trunk
parent
9ba42958
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/ffmpeg_powerpc_performance_evaluation_howto.txt
+34
-34
34 additions, 34 deletions
doc/ffmpeg_powerpc_performance_evaluation_howto.txt
with
34 additions
and
34 deletions
doc/ffmpeg_powerpc_performance_evaluation_howto.txt
+
34
−
34
View file @
41061adf
...
@@ -8,7 +8,7 @@ I - Introduction
...
@@ -8,7 +8,7 @@ I - Introduction
The PowerPC architecture and its SIMD extension AltiVec offer some
The PowerPC architecture and its SIMD extension AltiVec offer some
interesting tools to evaluate performance and improve the code.
interesting tools to evaluate performance and improve the code.
This document tr
y
to explain how to use those tools with FFmpeg.
This document tr
ies
to explain how to use those tools with FFmpeg.
The architecture itself offers two ways to evaluate the performance of
The architecture itself offers two ways to evaluate the performance of
a given piece of code:
a given piece of code:
...
@@ -16,31 +16,31 @@ a given piece of code:
...
@@ -16,31 +16,31 @@ a given piece of code:
1) The Time Base Registers (TBL)
1) The Time Base Registers (TBL)
2) The Performance Monitor Counter Registers (PMC)
2) The Performance Monitor Counter Registers (PMC)
The firsts are always available, always active, but they're not very
The first
one
s are always available, always active, but they're not very
accurate
: the registers increment by one every four *bus* cycle. On
accurate: the registers increment by one every four *bus* cycle
s
. On
my 667 Mhz ti
b
ook (ppc7450)
, this means once every twenty *processor*
my 667 Mhz ti
B
ook (ppc7450), this means once every twenty *processor*
cycle. So we won't use that.
cycle
s
. So we won't use that.
The PMC are much more useful
: not only they
can
report cycle-accurate
The PMC are much more useful: not only
can
they report cycle-accurate
timing, but they can also be used to monitor many other parameters,
timing, but they can also be used to monitor many other parameters,
such as the number of AltiVec stalls for every kind of instruction
s
,
such as the number of AltiVec stalls for every kind of instruction,
or instruction cache misses. The downside is that not all processors
or instruction cache misses. The downside is that not all processors
support the PMC (all G3, all G4 and the 970 do support them), and
support the PMC (all G3, all G4 and the 970 do support them), and
they're inactive by default - you need to activate them with a
they're inactive by default - you need to activate them with a
dedicated tool. Also, the number of available PMC depend on the
dedicated tool. Also, the number of available PMC depend
s
on the
procesor
: the various 604 have 2, the various 75x (aka. G3) have 4,
procesor: the various 604 have 2, the various 75x (aka. G3) have 4,
an
b
d the various 74xx (aka G4) have 6.
and the various 74xx (aka G4) have 6.
*WARNING*: The
p
ower
pc
970 is not very well documented, and its PMC
*WARNING*: The
P
ower
PC
970 is not very well documented, and its PMC
registers are 64bits wide. To properly notify the code, you *must*
registers are 64
bits wide. To properly notify the code, you *must*
tune for the 970 (using --tune=970), or the code will assume 32bit
s
tune for the 970 (using --tune=970), or the code will assume 32
bit
registers.
registers.
II - Enabling FFmpeg PowerPC performance support
II - Enabling FFmpeg PowerPC performance support
This need to be done by hand. First, you need to configure FFmpeg as
This need
s
to be done by hand. First, you need to configure FFmpeg as
usual,
plus using
the "--powerpc-perf-enable".
f
or instance
:
usual,
but add
the "--powerpc-perf-enable"
option
.
F
or instance:
#####
#####
./configure --prefix=/usr/local/ffmpeg-cvs --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
./configure --prefix=/usr/local/ffmpeg-cvs --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
...
@@ -48,8 +48,8 @@ usual, plus using the "--powerpc-perf-enable". for instance :
...
@@ -48,8 +48,8 @@ usual, plus using the "--powerpc-perf-enable". for instance :
This will configure FFmpeg to install inside /usr/local/ffmpeg-cvs,
This will configure FFmpeg to install inside /usr/local/ffmpeg-cvs,
compiling with gcc-3.3 (you should try to use this one or a newer
compiling with gcc-3.3 (you should try to use this one or a newer
gcc), and tuning for the PowerPC7450 (i.e. the newer G4
; as a rule of
gcc), and tuning for the PowerPC
7450 (i.e. the newer G4; as a rule of
thumb, those at 550Mhz and more). It will also enable
s
the PMC
s
.
thumb, those at 550Mhz and more). It will also enable the PMC.
You may also edit the file "config.h" to enable the following line:
You may also edit the file "config.h" to enable the following line:
...
@@ -59,24 +59,24 @@ You may also edit the file "config.h" to enable the following line:
...
@@ -59,24 +59,24 @@ You may also edit the file "config.h" to enable the following line:
If you enable this line, then the code will not make use of AltiVec,
If you enable this line, then the code will not make use of AltiVec,
but will use the reference C code instead. This is useful to compare
but will use the reference C code instead. This is useful to compare
performance between
the
two versions of the code.
performance between two versions of the code.
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h"
:
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h":
#####
#####
#define POWERPC_NUM_PMC_ENABLED 4
#define POWERPC_NUM_PMC_ENABLED 4
#####
#####
If you have a G4
cpus
, you can enable all 6 PMC
s
. DO NOT enable more
If you have a G4
CPU
, you can enable all 6 PMC. DO NOT enable more
PMC
s
than available on your
cpu
!
PMC than available on your
CPU
!
Then, simply compile
ff
mpeg as usual (make && make install).
Then, simply compile
FF
mpeg as usual (make && make install).
III - Using FFmpeg PowerPC performance support
III - Using FFmpeg PowerPC performance support
This FFmeg can be used exactly as usual. But before exiting, F
f
mpeg
This FFmeg can be used exactly as usual. But before exiting, F
F
mpeg
will dump a per-function report that looks like this:
will dump a per-function report that looks like this:
#####
#####
...
@@ -99,16 +99,16 @@ PowerPC performance report
...
@@ -99,16 +99,16 @@ PowerPC performance report
#####
#####
In this example, PMC1 was set to record CPU cycles, PMC2 was set to
In this example, PMC1 was set to record CPU cycles, PMC2 was set to
record AltiVec Permute Stall Cycle, and PMC3 was set to record AltiVec
record AltiVec Permute Stall Cycle
s
, and PMC3 was set to record AltiVec
Issue Stalls.
Issue Stalls.
The function "gmc1_altivec" was monitored 255302 times, and the
The function "gmc1_altivec" was monitored 255302 times, and the
minimum execution time was 231 processor cycles. The max and average
minimum execution time was 231 processor cycles. The max and average
aren't much use, as it's very likely the OS interrupted execution for
aren't much use, as it's very likely the OS interrupted execution for
reasons of it
'
s own :-(
reasons of its own :-(
With the exact same setting and source file, but using the reference C
With the exact same setting
s
and source file, but using the reference C
code we get
:
code we get:
#####
#####
PowerPC performance report
PowerPC performance report
...
@@ -134,27 +134,27 @@ the fastest C execution in this example. It's not perfect but it's not
...
@@ -134,27 +134,27 @@ the fastest C execution in this example. It's not perfect but it's not
bad (well I wrote this function so I can't say otherwise :-).
bad (well I wrote this function so I can't say otherwise :-).
Once you have that kind of report, you can try to improve things by
Once you have that kind of report, you can try to improve things by
finding what goes wrong and fixing it
; in the example above, one
finding what goes wrong and fixing it; in the example above, one
shoud try to diminish the number of AltiVec stalls, as this *may*
shou
l
d try to diminish the number of AltiVec stalls, as this *may*
improve performance
s
.
improve performance.
IV) Enabling the PMC in MacOS X
IV) Enabling the PMC in Mac
OS X
This is easy. Use "Monster" and "monster". Those tools come from
This is easy. Use "Monster" and "monster". Those tools come from
Apple's CHUD package, and can be found hidden in the developer web
Apple's CHUD package, and can be found hidden in the developer web
site &
ftp
site. "MONster" is the graphical application, use it to
site &
FTP
site. "MONster" is the graphical application, use it to
generate a config file specifying what each register should
generate a config file specifying what each register should
monitor. Then use the command-line application "monster" to use that
monitor. Then use the command-line application "monster" to use that
config file, and enjoy the results.
config file, and enjoy the results.
Note that "MONster" can be used for many other
stuff
, but it's
Note that "MONster" can be used for many other
things
, but it's
documented by Apple, it's not my subject.
documented by Apple, it's not my subject.
V) Enabling the PMC
i
n Linux
V) Enabling the PMC
o
n Linux
I don't know how to do it, sorry :-) Any idea very much welcome.
I don't know how to do it, sorry :-) Any idea very much welcome.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment