
JVM Performance Tuning
Training Description
Focus of the training is to make JVM and Java performance tuning clear and simple as possible for the participants at the design, architecture and, implementation levels.
This is an end-to-end training. The Training illustrates almost every concept with the help of pictures because it is much easier to understand the concept pictorially and model code. There are a lot of illustrations in the course of the training. There are worked out examples to illustrate the concepts for almost every topic. There is a detailed case study that strings together all concepts and technology. There are case studies for debugging JVM Crashes, Memory Leaks, Operating System stalls , and Hardware Bottlenecks. We learn how to associate these events with code.
The target group is programmers who want to know foundations of concurrent programming and existing concurrent programming environments, in order, now or in future, to develop multithreaded applications for multi-core processors and shared memory multiprocessors.
Java Consultants, developers or anyone with Java experience interested in performance testing.
Intended Audience
Some topics(e.g. memory analysis) are listed is more than one heading (Hardware, OS, JVM).
This is an instructor led course provides lecture topics and the practical application of
JVM tuning techniques and the underlying technologies.
It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design.
The reason for a problem may be Hardware, OS , JVM or Application. The technique and tools for problem identification may be different.
Instructional Method
Identify and debug memory leak
Case studies of various real problems
Identify and debug Operating System Stalls
Identify and debug Hardware stalls
Debug JVM Crash
Using various hardware, OS,JVM tools and their usage
Key skills
Good Knowledge of Java.
Workstation with JDK 1.8.0 installed
Working knowledge of JVM
Workstation with JDK 1.7.0 installed
Pre-requisites
- 01
Cache and Memory
Measure the effects of cache on a java program
Identifying bottlenecks with the help of these measurements
Measuring TLBs performance and its effects
Keeping latency low and throughput high by engaging the cache
Associating bottlenecks with java code.
How Hardware effects Performance of JAVA application?
Measure the effects of other processor and hardware Counters on a java program
Hardware counters that can be measured
Hardware and software Prefetchers
TLBs effect on java code.
TLBs architecture
How cache effects performance?
Crash Course in modern hardware
Cache Levels and their architecture
Disk
How to measure tardy disk?
Identifying with correct reason
Is your disk IO slow?
Reasons for tardy performance.
Associating the tardy performance with java code
Locking and Concurrency​​​​​​​
Cache Coherency
MESI
False Sharing
Associating False Sharing with Java Code
Detecting False Sharing
Processor Affinity
Effects of processor affinity
Measuring effects of affinity
CPU​​​​​​​
Measuring CPI and IPC
CPU Performance Counters
Associating and CPI and IPC with performance and java code
- 02
Virtual Memory
Page Replacement
Caveats of using TLB
Introduction
How physical memory acts
Swap Space
Pages and page frames
Multilevel page tables
How the operating system sees memory
Optimizing Page table access
How virtual memory acts
Virtual memory and shared memory
Demand Paged Virtual Memory and Working Sets
Influencing TLB performance
JVMTuning for VirtualMemory​​​​​​​
-XX:_UseLargePages
/sys/kernel/mm/transparent_hugepages/enabled
/proc/sys/vm/nr_hugepages
/proc/meminfo Hugepagesize
Linux huge Pages
Linux Transparent huge pages
LargePages
Locking and Concurrency​​​​​​​
is_lock_owned
try_spin
Object layout with JOL
spin_pause
Undestanding Padding
False Sharing
Designing Classes to avoid false share
@Contended and related annotation
complete_monitor_locking
Setting Processor Affinity at OS​​​​​​​
what is taskset
isolcpus
Operating-System-Specific Tools​​​​​​​
gdb
conky
vmstat
mpstat
iostat
system tap
top
- 03
Garbage Collection-Advanced Tuning ScenariosAdvance Tuning Scenarios-Part2
JDK 5,6,7 defaults
Default Flags
Garbage Collection Data of Interest
Tuning GC For Throughput and Latency
Latency
Old(Parallel)
Perm
Young (Parallel)
Pset Configuration
Old (CMS)
Tenuring Distribution
Initiating Occupancy
Common Scenarios
Survivor Ratio
Tenuring threshold
Througput
(Parallel GC)
CondCardmark
Adaptive Sizing
Tlabs
Large Pages
Numa
Pset Configuration
CMS
Concurrent Mode Failure
Monitoring GC
Par New
Parallel GC
Safe Pointing
Time Stamps
Date Stamps
System.GC
Advance Tuning Scenarios-Part1
Monitoring the GC
Conclusions
GC Tuning
Tuning Parallel GC
Tuning CMS
Tuning the young generation
GC Tuning Methodology
Deployment Model
Choosing Runtime
General GuideLines
Data Model
Heap Sizing
Factor Controlling Heap Sizing
MonitoringCtrl-Break Handler
Deadlock Detection
Thread Dump
Heap Summary
jmap Utility
Heap Histogram of Running Process
Getting Information on the Permanent Generation
Heap Histogram of Core File
Heap Configuration and Usage
jps Utilityjhat Utility
Instances Query
Histogram Queries
Standard Queries
Heap Analysis Hints
All Classes Query
Object Query
Where was this object allocated?
Roots Query
Instance Counts for All Classes Query
New Instances Query
What is keeping an object alive?
All Roots Query
Class Query
Reachable Objects Query
Custom Queries
jstack Utility
Printing Stack Trace From Core Dump
Printing a Mixed Stack
Forcing a Stack Dump
jrunscript Utilityjsadebugd Daemonjstatd DaemonJMX
Introduction
Dynamic Mbeans
Open Mbean
Standard Mbeans
JMX Remoting
Advanced Features
J2EE Management(optional)
Model Mbean
jstat Utility
Example of -gcoldcapacity Option
Example of -gcnew Option
Example of -gcutil Option
jinfo Utilityvisualgc ToolCPU Usage Profilers
Solaris Studio Analyzer (Linux and Solaris)
stepping through assembly with source
er_print utility
stepping through call-stack (native and java)
stepping through byte codes with source
Associating hardware events with java code analyzer
Collecting processor specific hardware events
Collect command
Java Mission Control
Enabling JFR
Selecting JFR Events
Java Flight recorder
Diagnostics and Analysis
Native Memory Best Practices
Measuring Footprint
NIO Buffers
Minimizing Footprint
Native Memory Tracking
FootPrint
Integrating Signal and Exception Handling
Reducing Signal Usage
Console Handlers
Signal Chaining
Signal Handling on Solaris OS and Linux
Alternative Signals
Signal Handling in the HotSpot Virtual Machine
Reasons for Not Getting a Core FileDiagnosing Leaks in Native Code
Crash in Compiled Code
Tracking Memory Allocation With OS Support
Using libumem to Find Leaks
Tracking Memory Allocation in a JNI Library
Tracking All Memory Allocation and Free Calls
Sample Crashes
Crash in Native Code
Crash in VMThread
Determining Where the Crash Occurred
Crash due to Stack Overflow
Using dbx to Find Leaks
Crash in the HotSpot Compiler Thread
Troubleshooting System Crashes
Developing Diagnostic Tools
Java Platform Debugger Architecture
java.lang.management Package
Java Virtual Machine Tools Interface
Diagnosing Leaks in Java Language Code
Obtaining a Heap Histogram on a Running Process
-XX:+HeapDumpOnOutOfMemoryError Command-line
Option
jmap Utility
Using the jhat Utility
JConsole Utility
Monitoring the Number of Objects Pending Finalization
Obtaining a Heap Histogram at OutOfMemoryError
HPROF Profiler
NetBeans Profiler
Creating a Heap Dump
Troubleshooting Hanging or Looping Processes
Diagnosing a Looping Process
Deadlock Detected
No Thread Dump
Deadlock Not Detected
Diagnosing a Hung Process
Forcing a Crash DumpTroubleshooting Memory Leaks
Crash Instead of OutOfMemoryError
Meaning of OutOfMemoryError
Detail Message: <reason> <stack> (Native method)</stack></reason>
Detail Message: Java heap space
Detail Message: request <size> bytes for <reason> Out of swap</reason></size>
space?
Detail Message: PermGen space
Detail Message: Requested array size exceeds VM limit
Finding a Workaround
Crash During Garbage Collection
Class Data Sharing
Crash in HotSpot Compiler Thread or Compiled Code
Direct Memory
Avoiding GC for low latencies
Why not C/C++
Going Off-heap
Advantages of Off-heap structures
Disadvantages of Off-heap structures
Garbage Collection and Memory Architecture
Heap Fragmentation
GC Pros and Cons
Object Size
Algorithms
Overview
Performance
GC Tasks
Reachability
Managing OutOfMemoryError
Generational Spaces
Measuring GC Activity
History
Summary
Old Space
Young Space
JVM 1.4, 5, 6
Advanced JVM ArchitectureTuning inlining
MaxInlineSize
InlineSmallCode
MaxInline
MaxRecursiveInline
FreqInlineSize
Monitoring JIT
Deoptimizations
Backing Off
PrintCompilation
OSR
Log Compilations
Optimizations
PrintInlining
Intrinsics
Common intrinsics
Advanced JVM Architecture Part 1
NUMA
Inline caching
Virtual method calls Details
Virtual Machine Design
Dynamic Compilation
Large Pages
Biased Locking
Lock Coarsening
Standard Compiler Optimizations
Speculative Optimizations
Escape Analysis
Scalar Replacements
Inlining DetailsVM Philosophy
Understanding and Controlling JVM Options
DoEscapeAnalysis
AggressiveOpts
CallSites
Polymorphic
BiMorphic
MegaMorphic
MonoMorphic
HotSpot
Client
Server
Tiered
Advanced JVM Architecture-Part 2
JIT
Mixed mode
Golden Rule
Profiling
Optimizations
Memory AnalysisCore/Heap dumps Analyzerjdb Utility
Attaching to a Process
Attaching to a Core File on the Same Machine
Attaching to a Core File or a Hung Process from a Different Machine
JConsole Utility Serviceability Agent(SA)
Cache Dump
Stepping Through heap
Class Browser
Compute reverse pointers
Stepping through NON Heap
Deadlock detection
Value in code cache
Code Viewer
Java VisualVMHPROF - Heap Profiler
CPU Usage Sampling Profiles ( cpu=samples)
CPU Usage Times Profile ( cpu=times)
Heap Dump ( heap=dump)
Heap Allocation Profiles ( heap=sites)
- 04
NIO 2.0
File System Change Notification
Multicasting
File and Directories
Symbolic Links
Watch Service API
Metadata File Attributes
Two Security models
FileVisitor Class
Working with path
SPI Package
Asynchronous IO with Socket and File
Concurrent Data Structures
Java Memory Model(JMM)
Real Meaning and effect of synchronization
Volatile
Sequential Consistency would disallow common optimizations
The changes in JMM
Final
Shortcomings of the original JMM
Finals not really final
Prevents effective compiler optimizations
Processor executes operations out of order
Compiler is free to reorder certain instructions
Cache reorders writes
Old JMM surprising and confusing
Instruction Reordering
What is the limit of reordering
Programmatic Control
super-scalar processors
heavily pipelines processors
As-if-serial-semantics
Why is reordering done
Cache Coherency
Write-back Caching explained
What is cache Coherence.
How does it effect java programs.
Software based Cache Coherency
NUMA(Non uniform memory access)
Caching explained
Cache incoherency
New JMM and goals of JSR-133
Simple,intuitive and, feasible
Out-of-thin-air safety
High performance JVM implementations across
architectures
Minimal impact on existing code
Initialization safety
Preserve existing safety guarantees and type-safety
Highly Concurrent Data Structures-Part1Weakly Consistent Iterators vs Fail Fast IteratorsConcurrentHashMap
Structure
remove/put/resize lock
Almost immutability
Using volatile to detect interference
Read does not block in common code path
Applying Thread PoolsConfiguring ThreadPoolExecutor
Thread factories
corePoolSize
Customizing thread pool executor after construction
Using default Executors.new* methods
Managing queued tasks
maximumPoolSize
keepAliveTime
PriorityBlockingQueue
Saturation policies
Discard
Caller runs
Abort
Discard oldest
Sizing thread pools
Examples of various pool sizes
Determining the maximum allowed threads on your operating system
CPU-intensiv vs IO-intensive task sizing
Danger of hardcoding worker number
Problems when pool is too large or small
Formula for calculating how many threads to use
Mixing different types of tasks
Tasks and Execution Policies
Long-running tasks
Homogenous, independent and thread-agnostic tasks
Thread starvation deadlock
Extending ThreadPoolExecutor
terminate
Using hooks for extension
afterExecute
beforeExecute
Parallelizing recursive algorithms
Using Fork/Join to execute tasks
Converting sequential tasks to parallel
Common Issues with thread
Uncaught Exception Handler
problem with stop
Dealing with InterruptedStatus
Canned Synchronizers
Semaphore
Latches
SynchronousQueue
Future
Exchanger
Synchronous Queue Framework
Mutex
Barrier
Producer Consumer(Basic Hand-Off)Why wait-notify require Synchronization
notifyAll used as work around
Structural modification to hidden queue by wait-notify
locking handling done by OS
use cases for notify-notifyAll
Hidden queue
design issues with synchronization
Advanced Class-loading
Understanding visibility rules
Advantages of Peer Class-loading
Hot Loading
IllegalAccessException
ClassCastException
Peer Class-loading
Understanding delegation rules
LinkageError
Problems with these rules.
Class-loading
Common Class loading Issues
Changing the rules of default class visibility
Class Loading Basics
Introduction
Diagnosing and resolving class loading problems
Custom Class Loaders
Class visibility
NIOCharacter Streams Encoding
Other Charsets - ISO 8859
Big / Little Endian
Forms of Unicode
32-bit Characters
Code Points
Charset Class
Other Encodings
The Unicode Standard
Encoders and Decoders
Java New IO Package
Non-Blocking Operations
Buffers Advantages
Selectors
Channels
Allocating Buffers
Motivation for Using
Memory Mapped Files
Working with Buffers
NIO Uses
Event Driven Architecture
Topics
