
JVM Performance Tuning
Training Description
Focus of the training is to make JVM and Java performance tuning clear and simple as possible for the participants at the design, architecture and, implementation levels.
This is an end-to-end training. The Training illustrates almost every concept with the help of pictures because it is much easier to understand the concept pictorially and model code. There are a lot of illustrations in the course of the training. There are worked out examples to illustrate the concepts for almost every topic. There is a detailed case study that strings together all concepts and technology. There are case studies for debugging JVM Crashes, Memory Leaks, Operating System stalls , and Hardware Bottlenecks. We learn how to associate these events with code.
- The target group is programmers who want to know foundations of concurrent programming and existing concurrent programming environments, in order, now or in future, to develop multithreaded applications for multi-core processors and shared memory multiprocessors. 
- Java Consultants, developers or anyone with Java experience interested in performance testing. 
Intended Audience
- Some topics(e.g. memory analysis) are listed is more than one heading (Hardware, OS, JVM). 
- This is an instructor led course provides lecture topics and the practical application of 
- JVM tuning techniques and the underlying technologies. 
- It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design. 
- The reason for a problem may be Hardware, OS , JVM or Application. The technique and tools for problem identification may be different. 
Instructional Method
- Identify and debug memory leak 
- Case studies of various real problems 
- Identify and debug Operating System Stalls 
- Identify and debug Hardware stalls 
- Debug JVM Crash 
- Using various hardware, OS,JVM tools and their usage 
Key skills
- Good Knowledge of Java. 
- Workstation with JDK 1.8.0 installed 
- Working knowledge of JVM 
- Workstation with JDK 1.7.0 installed 
Pre-requisites
- 01Cache and Memory - Measure the effects of cache on a java program 
- Identifying bottlenecks with the help of these measurements 
- Measuring TLBs performance and its effects 
- Keeping latency low and throughput high by engaging the cache 
- Associating bottlenecks with java code. 
- How Hardware effects Performance of JAVA application? 
- Measure the effects of other processor and hardware Counters on a java program 
- Hardware counters that can be measured 
- Hardware and software Prefetchers 
- TLBs effect on java code. 
- TLBs architecture 
- How cache effects performance? 
- Crash Course in modern hardware 
- Cache Levels and their architecture 
 Disk - How to measure tardy disk? 
- Identifying with correct reason 
- Is your disk IO slow? 
- Reasons for tardy performance. 
- Associating the tardy performance with java code 
 Locking and Concurrency​​​​​​​ - Cache Coherency 
- MESI 
- False Sharing 
- Associating False Sharing with Java Code 
- Detecting False Sharing 
 Processor Affinity - Effects of processor affinity 
- Measuring effects of affinity 
 CPU​​​​​​​ - Measuring CPI and IPC 
- CPU Performance Counters 
- Associating and CPI and IPC with performance and java code 
 
- 02Virtual Memory - Page Replacement 
- Caveats of using TLB 
- Introduction 
- How physical memory acts 
- Swap Space 
- Pages and page frames 
- Multilevel page tables 
- How the operating system sees memory 
- Optimizing Page table access 
- How virtual memory acts 
- Virtual memory and shared memory 
- Demand Paged Virtual Memory and Working Sets 
- Influencing TLB performance 
 JVMTuning for VirtualMemory​​​​​​​ - -XX:_UseLargePages 
- /sys/kernel/mm/transparent_hugepages/enabled 
- /proc/sys/vm/nr_hugepages 
- /proc/meminfo Hugepagesize 
- Linux huge Pages 
- Linux Transparent huge pages 
- LargePages 
 Locking and Concurrency​​​​​​​ - is_lock_owned 
- try_spin 
- Object layout with JOL 
- spin_pause 
- Undestanding Padding 
- False Sharing 
- Designing Classes to avoid false share 
- @Contended and related annotation 
- complete_monitor_locking 
 Setting Processor Affinity at OS​​​​​​​ - what is taskset 
- isolcpus 
 Operating-System-Specific Tools​​​​​​​ - gdb 
- conky 
- vmstat 
- mpstat 
- iostat 
- system tap 
- top 
 
- 03Garbage Collection-Advanced Tuning ScenariosAdvance Tuning Scenarios-Part2 - JDK 5,6,7 defaults 
- Default Flags 
- Garbage Collection Data of Interest 
 Tuning GC For Throughput and Latency Latency - Old(Parallel) 
- Perm 
- Young (Parallel) 
- Pset Configuration 
 Old (CMS) - Tenuring Distribution 
- Initiating Occupancy 
- Common Scenarios 
- Survivor Ratio 
- Tenuring threshold 
 Througput - (Parallel GC) 
- CondCardmark 
- Adaptive Sizing 
- Tlabs 
- Large Pages 
- Numa 
- Pset Configuration 
 CMS - Concurrent Mode Failure 
 Monitoring GC - Par New 
- Parallel GC 
- Safe Pointing 
- Time Stamps 
- Date Stamps 
- System.GC 
 Advance Tuning Scenarios-Part1 - Monitoring the GC 
- Conclusions 
 GC Tuning - Tuning Parallel GC 
- Tuning CMS 
- Tuning the young generation 
 GC Tuning Methodology - Deployment Model 
- Choosing Runtime 
- General GuideLines 
- Data Model 
 Heap Sizing - Factor Controlling Heap Sizing 
 MonitoringCtrl-Break Handler - Deadlock Detection 
- Thread Dump 
- Heap Summary 
 jmap Utility - Heap Histogram of Running Process 
- Getting Information on the Permanent Generation 
- Heap Histogram of Core File 
- Heap Configuration and Usage 
 jps Utilityjhat Utility - Instances Query 
- Histogram Queries 
- Standard Queries 
- Heap Analysis Hints 
- All Classes Query 
- Object Query 
- Where was this object allocated? 
- Roots Query 
- Instance Counts for All Classes Query 
- New Instances Query 
- What is keeping an object alive? 
- All Roots Query 
- Class Query 
- Reachable Objects Query 
- Custom Queries 
 jstack Utility - Printing Stack Trace From Core Dump 
- Printing a Mixed Stack 
- Forcing a Stack Dump 
 jrunscript Utilityjsadebugd Daemonjstatd DaemonJMX - Introduction 
- Dynamic Mbeans 
- Open Mbean 
- Standard Mbeans 
- JMX Remoting 
- Advanced Features 
- J2EE Management(optional) 
- Model Mbean 
 jstat Utility - Example of -gcoldcapacity Option 
- Example of -gcnew Option 
- Example of -gcutil Option 
 jinfo Utilityvisualgc ToolCPU Usage Profilers Solaris Studio Analyzer (Linux and Solaris) - stepping through assembly with source 
- er_print utility 
- stepping through call-stack (native and java) 
- stepping through byte codes with source 
- Associating hardware events with java code analyzer 
- Collecting processor specific hardware events 
- Collect command 
 Java Mission Control - Enabling JFR 
- Selecting JFR Events 
- Java Flight recorder 
 Diagnostics and Analysis Native Memory Best Practices - Measuring Footprint 
- NIO Buffers 
- Minimizing Footprint 
- Native Memory Tracking 
- FootPrint 
 Integrating Signal and Exception Handling - Reducing Signal Usage 
- Console Handlers 
- Signal Chaining 
- Signal Handling on Solaris OS and Linux 
- Alternative Signals 
- Signal Handling in the HotSpot Virtual Machine 
 Reasons for Not Getting a Core FileDiagnosing Leaks in Native Code - Crash in Compiled Code 
- Tracking Memory Allocation With OS Support 
- Using libumem to Find Leaks 
- Tracking Memory Allocation in a JNI Library 
- Tracking All Memory Allocation and Free Calls 
- Sample Crashes 
- Crash in Native Code 
- Crash in VMThread 
- Determining Where the Crash Occurred 
- Crash due to Stack Overflow 
- Using dbx to Find Leaks 
- Crash in the HotSpot Compiler Thread 
- Troubleshooting System Crashes 
 Developing Diagnostic Tools - Java Platform Debugger Architecture 
- java.lang.management Package 
- Java Virtual Machine Tools Interface 
 Diagnosing Leaks in Java Language Code - Obtaining a Heap Histogram on a Running Process 
- -XX:+HeapDumpOnOutOfMemoryError Command-line 
- Option 
- jmap Utility 
- Using the jhat Utility 
- JConsole Utility 
- Monitoring the Number of Objects Pending Finalization 
- Obtaining a Heap Histogram at OutOfMemoryError 
- HPROF Profiler 
- NetBeans Profiler 
- Creating a Heap Dump 
 Troubleshooting Hanging or Looping Processes - Diagnosing a Looping Process 
- Deadlock Detected 
- No Thread Dump 
- Deadlock Not Detected 
- Diagnosing a Hung Process 
 Forcing a Crash DumpTroubleshooting Memory Leaks - Crash Instead of OutOfMemoryError 
- Meaning of OutOfMemoryError 
- Detail Message: <reason> <stack> (Native method)</stack></reason> 
- Detail Message: Java heap space 
- Detail Message: request <size> bytes for <reason> Out of swap</reason></size> 
- space? 
- Detail Message: PermGen space 
- Detail Message: Requested array size exceeds VM limit 
 Finding a Workaround - Crash During Garbage Collection 
- Class Data Sharing 
- Crash in HotSpot Compiler Thread or Compiled Code 
 Direct Memory - Avoiding GC for low latencies 
- Why not C/C++ 
- Going Off-heap 
- Advantages of Off-heap structures 
- Disadvantages of Off-heap structures 
 Garbage Collection and Memory Architecture - Heap Fragmentation 
- GC Pros and Cons 
- Object Size 
- Algorithms 
- Overview 
- Performance 
- GC Tasks 
- Reachability 
- Managing OutOfMemoryError 
- Generational Spaces 
- Measuring GC Activity 
 History - Summary 
- Old Space 
- Young Space 
- JVM 1.4, 5, 6 
 Advanced JVM ArchitectureTuning inlining - MaxInlineSize 
- InlineSmallCode 
- MaxInline 
- MaxRecursiveInline 
- FreqInlineSize 
 Monitoring JIT - Deoptimizations 
- Backing Off 
- PrintCompilation 
- OSR 
- Log Compilations 
- Optimizations 
- PrintInlining 
 Intrinsics - Common intrinsics 
 Advanced JVM Architecture Part 1 - NUMA 
- Inline caching 
- Virtual method calls Details 
- Virtual Machine Design 
- Dynamic Compilation 
- Large Pages 
- Biased Locking 
- Lock Coarsening 
- Standard Compiler Optimizations 
- Speculative Optimizations 
- Escape Analysis 
- Scalar Replacements 
- Inlining DetailsVM Philosophy 
 Understanding and Controlling JVM Options - DoEscapeAnalysis 
- AggressiveOpts 
 CallSites - Polymorphic 
- BiMorphic 
- MegaMorphic 
- MonoMorphic 
 HotSpot - Client 
- Server 
- Tiered 
 Advanced JVM Architecture-Part 2 - JIT 
- Mixed mode 
- Golden Rule 
- Profiling 
- Optimizations 
 Memory AnalysisCore/Heap dumps Analyzerjdb Utility - Attaching to a Process 
- Attaching to a Core File on the Same Machine 
- Attaching to a Core File or a Hung Process from a Different Machine 
 JConsole Utility Serviceability Agent(SA) - Cache Dump 
- Stepping Through heap 
- Class Browser 
- Compute reverse pointers 
- Stepping through NON Heap 
- Deadlock detection 
- Value in code cache 
- Code Viewer 
 Java VisualVMHPROF - Heap Profiler - CPU Usage Sampling Profiles ( cpu=samples) 
- CPU Usage Times Profile ( cpu=times) 
- Heap Dump ( heap=dump) 
- Heap Allocation Profiles ( heap=sites) 
 
- 04NIO 2.0 - File System Change Notification 
- Multicasting 
- File and Directories 
- Symbolic Links 
- Watch Service API 
- Metadata File Attributes 
- Two Security models 
- FileVisitor Class 
- Working with path 
- SPI Package 
- Asynchronous IO with Socket and File 
 Concurrent Data Structures Java Memory Model(JMM) - Real Meaning and effect of synchronization 
- Volatile 
- Sequential Consistency would disallow common optimizations 
- The changes in JMM 
- Final 
 Shortcomings of the original JMM - Finals not really final 
- Prevents effective compiler optimizations 
- Processor executes operations out of order 
- Compiler is free to reorder certain instructions 
- Cache reorders writes 
- Old JMM surprising and confusing 
 Instruction Reordering - What is the limit of reordering 
- Programmatic Control 
- super-scalar processors 
- heavily pipelines processors 
- As-if-serial-semantics 
- Why is reordering done 
 Cache Coherency - Write-back Caching explained 
- What is cache Coherence. 
- How does it effect java programs. 
- Software based Cache Coherency 
- NUMA(Non uniform memory access) 
- Caching explained 
- Cache incoherency 
 New JMM and goals of JSR-133 - Simple,intuitive and, feasible 
- Out-of-thin-air safety 
- High performance JVM implementations across 
- architectures 
- Minimal impact on existing code 
- Initialization safety 
- Preserve existing safety guarantees and type-safety 
 Highly Concurrent Data Structures-Part1Weakly Consistent Iterators vs Fail Fast IteratorsConcurrentHashMap - Structure 
- remove/put/resize lock 
- Almost immutability 
- Using volatile to detect interference 
- Read does not block in common code path 
 Applying Thread PoolsConfiguring ThreadPoolExecutor - Thread factories 
- corePoolSize 
- Customizing thread pool executor after construction 
- Using default Executors.new* methods 
- Managing queued tasks 
- maximumPoolSize 
- keepAliveTime 
- PriorityBlockingQueue 
 Saturation policies - Discard 
- Caller runs 
- Abort 
- Discard oldest 
 Sizing thread pools - Examples of various pool sizes 
- Determining the maximum allowed threads on your operating system 
- CPU-intensiv vs IO-intensive task sizing 
- Danger of hardcoding worker number 
- Problems when pool is too large or small 
- Formula for calculating how many threads to use 
- Mixing different types of tasks 
 Tasks and Execution Policies - Long-running tasks 
- Homogenous, independent and thread-agnostic tasks 
- Thread starvation deadlock 
 Extending ThreadPoolExecutor - terminate 
- Using hooks for extension 
- afterExecute 
- beforeExecute 
 Parallelizing recursive algorithms - Using Fork/Join to execute tasks 
- Converting sequential tasks to parallel 
 Common Issues with thread - Uncaught Exception Handler 
- problem with stop 
- Dealing with InterruptedStatus 
 Canned Synchronizers - Semaphore 
- Latches 
- SynchronousQueue 
- Future 
- Exchanger 
- Synchronous Queue Framework 
- Mutex 
- Barrier 
 Producer Consumer(Basic Hand-Off)Why wait-notify require Synchronization - notifyAll used as work around 
- Structural modification to hidden queue by wait-notify 
- locking handling done by OS 
- use cases for notify-notifyAll 
- Hidden queue 
- design issues with synchronization 
 Advanced Class-loading - Understanding visibility rules 
- Advantages of Peer Class-loading 
- Hot Loading 
- IllegalAccessException 
- ClassCastException 
- Peer Class-loading 
- Understanding delegation rules 
- LinkageError 
- Problems with these rules. 
 Class-loading - Common Class loading Issues 
- Changing the rules of default class visibility 
- Class Loading Basics 
- Introduction 
- Diagnosing and resolving class loading problems 
- Custom Class Loaders 
- Class visibility 
 NIOCharacter Streams Encoding - Other Charsets - ISO 8859 
- Big / Little Endian 
- Forms of Unicode 
- 32-bit Characters 
- Code Points 
- Charset Class 
- Other Encodings 
- The Unicode Standard 
- Encoders and Decoders 
 Java New IO Package - Non-Blocking Operations 
- Buffers Advantages 
- Selectors 
- Channels 
- Allocating Buffers 
- Motivation for Using 
- Memory Mapped Files 
- Working with Buffers 
 NIO Uses - Event Driven Architecture 
 
Topics
